C++23 “Pandemic Edition” is complete (Trip report: Winter ISO C++ standards meeting, Issaquah, WA, USA)

On Saturday, the ISO C++ committee completed technical work on C++23 in Issaquah, WA, USA! We resolved the remaining international comments on the C++23 draft, and are now producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and final editorial work, to be published later in 2023.

Our hosts, the Standard C++ Foundation, WorldQuant, and Edison Design Group, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We had about 160 attendees, more than half in-person and the others remote via Zoom. We had 19 nations formally represented, 9 in-person and 10 via Zoom. Also, at each meeting we regularly have new attendees who have never attended before, and this time there were 25 new first-time attendees in-person or on Zoom; to all of them, once again welcome!

The C++ committee currently has 26 active subgroups, 13 of which met in parallel tracks throughout the week. Some groups ran all week, and others ran for a few days or a part of a day and/or evening, depending on their workloads. You can find a brief summary of ISO procedures here.

From Prague, through the pandemic, to an on-time C++23 “Pandemic Edition”

The previous standard, C++20, was completed in Prague in February 2020, a month before the pandemic lockdowns began. At that same meeting, we adopted and published our C++23 schedule… without realizing that the world was about to turn upside down in just a few weeks. Incredibly, thanks to the effort and resilience of scores of subgroup chairs and hundreds of committee members, we still did it: Despite a global pandemic, C++23 has shipped exactly on time and at high quality.

The first pandemic-cancelled in-person meeting would have been the first meeting of the three-year C++23 cycle. This meant that nearly all of the C++23 release cycle, and the entire “development” phase of the cycle, was done virtually via Zoom with many hundreds of telecons from 2020 through 2022. Last week’s meeting was only our second in-person meeting since February 2020, and our second-ever hybrid meeting with remote Zoom participation. Both had a  high-quality hybrid Zoom experience for remote attendees around the world, and I want to repeat my thanks from November to the many volunteers who worked hard and carried hardware to Kona and Issaquah to make this possible. I want to again especially thank Jens Maurer and Dietmar Kühl for leading that group, and everyone who helped plan, equip, and run the meetings. Thank you very much to all those volunteers and helpers!

The current plan is that we’ve now returned to our normal cadence of having full-week meetings three times a year, as we did before the pandemic, but now those will be not only in-person but also have remote participation via Zoom. Most subgroups will additionally still continue to meet regularly via Zoom.

This week’s meeting

Per our published C++23 schedule, this was our final meeting to finish technical work on C++23. No features were added or removed, we just handled fit-and-finish issues and primarily focused on finishing addressing the 137 national body comments we received in the summer’s international comment ballot (Committee Draft, or CD). You can find a list of C++23 features here, many of them already implemented in major compilers and libraries. C++23’s main theme was “completing C++20,” and some of the highlights include module “std”, “if consteval,” explicit “this” parameters, still more constexpr, still more CTAD, “[[assume]]”, simplifying implicit move, multidimensional and static “operator[]”, a bunch of Unicode improvements, and Nicolai Josuttis’ personal favorite: fixing the lifetime of temporaries in range-for loops (some would add, “finally!”… thanks again for the persistence, Nico).

In addition to C++23 work, we also had time to make progress on a number of post-C++23 proposals, including continued work on contracts, SIMD execution, and more. We also decided to send the second Concurrency TS for international comment ballot, which includes hazard pointers, read-copy-update (RCU) data structures… and as of this week we also added Anthony Williams’ P0290 “synchronized_value” type.

The contracts subgroup made further progress on refining contract semantics targeting C++26.

The concurrency and parallelism subgroup is still on track to move forward with “std::execution” and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group.

Again, when you see “C++26” above, that doesn’t mean “three long years away”… we just closed the C++23 branch, and the C++26 branch is opening immediately and we will start approving features for C++26 at our next meeting in June, less than four months from now. Implementers interested in specific features often don’t wait for the final standard to start shipping implementations; note that C++23, which was just finished, already has many features shipping today in major implementations.

The newly-created SG23 Safety and Security subgroup met on Thursday for a well-attended session on hitting the ground running for making a targeted improvement in safety and security in C++, including that it approved the first two safety papers to progress to review next meeting by the full language evolution group.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next two meetings will be in Varna, Bulgaria in June and in Kona, HI, USA in November. At those two meetings we will start work on adding features into the new C++26 working draft.

Wrapping up

Thank you again to the approximately 160 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies!

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than four months from now we’ll be meeting again in Bulgaria to start adding features to C++26. I look forward to seeing many of you there. Thank you again to everyone reading this for your interest and support for C++ and its standardization.

Cpp2 and cppfront: Year-end mini-update

As we close out 2022, I thought I’d write a short update on what’s been happening in Cpp2 and cppfront. If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube.

Most of this post is about improvements I’ve been making and merging over the year-end holidays, and an increasing number of contributions from others via pull requests to the cppfront repo and in companion projects. Thanks again to so many of you who expressed your interest and support for this personal experiment, including the over 3,000 comments on Reddit and YouTube and the over 200 issues and PRs on the cppfront repo!

10 design notes

On the cppfront wiki, I’ve written more design notes about specific parts of the Cpp2 language design that answer common questions. They include:

  • Broad strategic topics, such as addressing ABI and versioning, “unsafe” code, and aiming to eliminate the preprocessor with reflection.
  • Specific language feature design topics, such as unified function call syntax (UFCS), const, and namespaces.
  • Syntactic choices, such as postfix operators and capture syntax.
  • Implementation topics, such as parsing strategies and and grammar details.

117 issues (3 open), 74 pull requests (9 open), 6 related projects, and new collaborators

I started cppfront with “just a blank text editor and the C++ standard library.” Cppfront continues to have no dependencies on other libraries, but since I open-sourced the project in September I’ve found that people have started contributing working code — thank you! Authors of merged pull requests include:

  • The prolific Filip Sajdak contributed a number of improvements, probably the most important being generalizing my UFCS implementation, implementing more of is and as as described in P2392, and providing Apple-Clang regression test results. Thanks, Filip!
  • Gabriel Gerlero contributed refinements in the Cpp2 language support library, cpp2util.h.
  • Jarosław Głowacki contributed test improvements and ensuring all the code compiles cleanly at high warning levels on all major compilers.
  • Konstantin Akimov contributed command-line usability improvements and more test improvements.
  • Fernando Pelliccioni contributed improvements to the Cpp2 language support library.
  • Jessy De Lannoit contributed improvements to the documentation.

Thanks also to these six related projects, which you can find listed on the wiki:

Thanks again to Matt Godbolt for hosting cppfront on Godbolt Compiler Explorer and giving feedback.

Thanks also to over 100 other people who reported bugs and made suggestions via the Issues. See below for some more details about these features and more.

Compiler/language improvements

Here are some highlights of things added to the cppfront compiler since I gave the first Cpp2 and cppfront talk in September. Most of these were implemented by me, but some were implemented by the PR authors I mentioned above.

Roughly in commit order (you can find the whole commit history here), and like everything else in cppfront some of these continue to be experimental:

  • Lots of bug fixes and diagnostic improvements.
  • Everything compiles cleanly under MSVC -W4 and GCC/Clang -Wall -Wextra.
  • Enabled implicit move-from-last-use for all local variables. As I already did for copy parameters.
  • After repeated user requests, I turned -n and -s (null/subscript dynamic checking) on by default. Yes, you can always still opt out to disable them and get zero cost, Cpp2 will always stay a “zero-overhead don’t-pay-for-what-you-don’t-use” true-C++ environment. All I did was change the default to enable them.
  • Support explicit forward of members/subobjects of composite types. For a parameter declared forward x: SomeType, the default continues to be that the last use of x is automatically forwarded for you; for example, if the last use is call_something( x ); then cppfront automatically emits that call as call_something( std::forward<decltype(x)>(x) ); and you never have to write out that incantation. But now you also have the option to separately forward parts of a composite variable, such as that for a forward x: pair<string, string>> parameter you can write things like do_this( forward x.first ) and do_that( 1, 2, 3, forward x.second ).
  • Support is template-name and is ValueOrPredicate: is now supports asking whether this is an instantiation of a template (e.g., x is std::vector), and it supports comparing values (e.g., x is 14) and using predicates (e.g., x is (less_than(20)) invoking a lambda) including for values inside a std::variant, std::any, and std::optional (e.g., x is 42 where x is a variant<int,string> or an any).
  • Regression test results for all major compilers: MSVC, GCC, Clang, and Apple-Clang. All are now checked in and can be conveniently compared before each commit.
  • Finished support for >> and >>= expressions. In today’s syntax, C++ currently max-munches the >> and >>= tokens and then situationally breaks off individual > tokens, so that we can write things like vector<vector<int>> without putting a space between the two closing angle brackets. In Cpp2 I took the opposite choice, which was to not parse >> or >>= as a token (so max munch is not an issue), and just merge closing angles where a >> or >>= can grammatically go. I’ve now finished the latter, and this should be done.
  • Generalized support for UFCS. In September, I had only implemented UFCS for a single call of the form x.f(y), where x could not be a qualified name or have template arguments. Thanks to Filip Sajdak for generalizing this to qualified names, templated names, and chaining multiple UFCS calls! That was a lot of work, and as far as I can tell UFCS should now be generally complete.
  • Support declaring multi-level pointers/const.
  • Zero-cost implementation of UFCS. The implementation of UFCS is now force-inlined on all compilers. In the tests I’ve looked at, even when calling a nonmember function f(x,y), using Cpp2’s x.f(y) unified function call syntax (which tries a member function first if there is one, else falls back to a nonmember function), the generated object code at all optimization levels is now identical, or occasionally better, compared to calling the nonmember function directly. Thanks to Pierre Renaux for pointing this out!
  • Support today’s C++ (Cpp1) multi-token fundamental types (e.g., signed long long int). I added these mainly for compatibility because 100% seamless interoperability with today’s C++ is a core goal of Cpp2, but note that in Cpp2 these work but without any of the grammar and parsing quirks they have in today’s syntax. That’s because I decided to represent such multi-word names them as a single Cpp2 token, which happens to internally contain whitespace. Seems to work pretty elegantly so far.
  • Support fixed-width integer type aliases (i32, u64, etc.), including optional _fast and _small (e.g., i32_fast).

I think that this completes the basic implementation of Cpp2’s initial subset that I showed in my talk in September, including that support for multi-level pointers and the multi-word C/C++ fundamental type names should complete support for being able to invoke any existing C and C++ code seamlessly.

Which brings us to…

What’s next

Next, as I said in the talk, I’ll be adding support for user-defined types (classes)… I’ll post an update about that when there’s more that’s ready to see.

Again, thanks to everyone who expressed interest and support for this personal experiment, and may you all have a happy and safe 2023.

Trip report: Autumn ISO C++ standards meeting (Kona)

A few minutes ago, the ISO C++ committee completed its second-to-last meeting of C++23 in Kona, HI, USA. Our host, the Standard C++ Foundation, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We currently have 26 active subgroups, nine of which met in six parallel tracks throughout the week; some groups ran all week, and others ran for a few days or a part of a day, depending on their workloads. We had over 160 attendees, approximately two-thirds in-person and one-third remote via Zoom.

This was our first in-person meeting since Prague in February 2020 just a few weeks before the lockdowns began. It was also our first-ever hybrid meeting with remote Zoom participation for all subgroups that met.

You can find a brief summary of ISO procedures here.

From Prague, through the pandemic, to Kona

During the pandemic, the committee’s subgroups began regularly meeting virtually, and over the past nearly three years there have been hundreds of virtual subgroup meetings and thrice-a-year virtual plenary sessions to continue approving features for C++23.

This week, we resumed in-person meetings with remote Zoom support. In the months before Kona, a group of volunteers did a lot of planning and testing: We did a trial run of a hybrid meeting with the subgroup SG14 at CppCon in September, using some of the equipment we planned to use in Kona. That initial September test was a pretty rough experience for many of the remote attendees, but it led to valuable learnings , and though we entered Kona with some trepidation, the hybrid meetings went amazingly smoothly with very few hiccups, and we got a lot of good work done in the second-to-last meeting to finalize C++23 including with remote presentations and comments.

This was only possible because of a huge amount of work by many volunteers, and I want to especially thank Jens Maurer and Dietmar Kühl for leading that group. But it was a true team effort, and so many people helped with the planning, with bringing equipment, and with running the meetings. Thank you very much to all those volunteers and helpers! We received many such appreciative comments of thanks on the committee mailing lists, and from national bodies on Saturday, from experts participating remotely who wanted to thank the volunteers for how smoothly they were able to participate.

Now that we have resumed in-person meetings, the current intent is that:

This week’s meeting

Per our published C++23 schedule, this was our second-to-last meeting to finish technical work on C++23. No features were added or removed, we just handled fit-and-finish issues and primarily focused on addressing the 137 national body comments we received in the summer’s international comment ballot (Committee Draft, or CD).

Today, the committee approved final resolutions for 92 (67%) of the 137 national comments. That leaves 45 comments, some of which have already been partly worked on, still to be completed between now and early February at our last meeting for completing C++23.

An example of a comment we just approved is adopting the proposal from Nicolai Josuttis et al. to extend the lifetime all temporaries (not just the last one) for the for-range-initializer of the range-for loop (see also the more detailed earlier paper). This closes a lifetime safety hole in C++. Here’s one of the many examples that will now work correctly:

std::vector<std::string> createStrings();

...

for (std::string s : createStrings()) ... // OK

for (char c : createStrings().at(0)) ...
    // use-after-free in C++20
    // OK, safe in C++23

In addition to C++23 work, we also had time to make progress on a number of post-C++23 proposals, including continued work on contracts, executors (std::execution), pattern matching, and more. We also decided to ship the third Library Fundamentals TS, which includes support for a number of additional experimental library features such as propagate_const, scope_exit and related scope guards, observer_ptr, resource_adapter, a helper to make getting a random numbers easier, and more. These can then be considered for C++26.

The contracts subgroup adopted a roadmap and timeline to try to get contracts into C++26. The group also had initial discussion of Gabriel Dos Reis’ proposal to control side effects in contracts, with the plan to follow up with a telecon between now and the next in-person meeting in February.

The concurrency and parallelism subgroup agreed to move forward with std::execution and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group… and recall that C++26 is not just something distant that’s three years away, but we will start approving features for C++26 starting this June, and when specific features are early and stable in the working draft the vendors often don’t wait for the final standard to start shipping implementations.

The language evolution group considered national body comments and C++26 proposals, and approved nine papers for C++26 including to progress Jean-Heyd Meneide’s proposal for #embed for C++26.

The language evolution group also held a well-attended evening session (so that experts from all subgroups could participate) to start discussion of the long-term future of C++, with over 100 experts attending (75% on-site, 25% on-line). Nearly all of the discussion was focused on improving safety (mostly) and simplicity (secondarily), including discussion about going beyond our business-as-usual evolution to help C++ programmers with these issues. We expect this discussion to continue and lead to further concrete papers for C++ evolution.

The library evolution group addressed all its national body comments and papers, forwarded several papers for C++26 including std::execution, and for the first time in a while does not have a backlog to catch up with which was happy news for LEWG.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next meeting will be in Issaquah, WA, USA in February. At that meeting we will finish C++23 by resolving the remaining national body comments on the C++23 draft, and producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and be published later in 2023.

Wrapping up

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than three months from now we’ll be meeting again in Issaquah, WA, USA for the final meeting of C++23 to finish and ship the C++23 international standard. I look forward to seeing many of you there. Thank you again to the over 160 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies! And thank you also to everyone reading this for your interest and support for C++ and its standardization.

My CppCon 2021 talk video is online

Whew — I’m now back from CppCon, after remembering how to travel.

My talk video is now online. If you haven’t already seen this via JetBrains’ CppCon 2021 video page or the Reddit post, here’s a link:

Please direct technical comments to the Reddit thread and I’ll watch for them there and respond to as many comments as I can. Thanks!

Thanks again to everyone who attended in person for supporting our requirements for meeting together safely. Interestingly, this was the largest CppCon ever (and the largest C++-specific conference ever as far as I know) in terms of total attendance, though most were attending online. It was good to see and e-see you all! With any luck, by CppCon 2022 our lives will be much closer to normal everywhere in the world… here’s hoping. Thanks again, and stay safe.

Trip report: Summer 2021 ISO C++ standards meeting (virtual)

On Monday, the ISO C++ committee held its third full-committee (plenary) meeting of the pandemic and adopted a few more features and improvements for draft C++23.

We had representatives from 17 voting nations at this meeting: Austria, Bulgaria, Canada, Czech Republic, Finland, France, Germany, Israel, Italy, Netherlands, Poland, Russia, Slovakia, Spain, Switzerland, United Kingdom, and United States. Slovakia is our newest national body to officially join international C++ work. Welcome!

We continue to have the same priorities and the same schedule we originally adopted for C++23, but online via Zoom during the pandemic.

This week: A few more C++23 features adopted

This week we formally adopted a third round of small features for C++23, as well as a number of bug fixes. Below, I’ll list some of the more user-noticeable changes and credit all those paper authors, but note that this is far from an exhaustive list of important contributors… even for these papers, nothing gets done without help from a lot of people and unsung heroes, so thank you first to all of the people not named here who helped the authors move their proposals forward! And thank you to everyone who worked on the adopted issue resolutions and smaller papers I didn’t include in this list.

P1938  by Barry Revzin, Richard Smith, Andrew Sutton, and Daveed Vandevoorde adds the if consteval feature to C++23. If you know about C++17 if constexpr and C++20 std::is_constant_evaluated, then you might think we already have this feature under the spelling if constexpr (std::is_constant_evaluated())… and that’s one of the reasons to add this feature, because that code actually doesn’t do what one might think. See the paper for details, and why we really want if consteval in the language.

P1401 by Andrzej Krzemieński enables testing integers as booleans in static_cast and if constexpr without having to cast the result to bool first (or test against zero). This is a small-but-nice example of removing redundant ceremony to help make C++ code that much cleaner and more readable.

P1132 by Jean-Heyd Meneide, Todor Buyukliev, and Isabella Muerte add out_ptr and inout_ptr abstractions to help with potential pointer ownership transfer when passing a smart pointer to a function that is declared with a T** “out” parameter. In a nutshell, if you’ve ever wanted to call a C API by writing something like some_c_function( &my_unique_ptr ); then these types will likely help you. The idea is that a call site can use one of these types to wrap a smart pointer argument, and then when the helper type is destroyed it automatically updates the pointer it wraps (using a reset call or semantically equivalent behavior).

P1659 by Christopher DiBella generalizes the C++20 starts_with and ends_with on string and string_view by adding the general forms ranges::starts_with and ranges::ends_with to C++23. These can work on arbitrary ranges, and also answer questions such as “are the starting elements of r1 less than the elements of r2?” and “are the final elements of r1 greater than the elements of r2?”.

P2166 by Yuriy Chernyshov helps reduce a commonly-taught pitfall with std::string. You know how since forever (C++98) you can construct a string from a string literal, like std::string("xyzzy")? But that you’d better watch out (and you’d better not cry or pout) not to pass a null pointer, like std::string(nullptr), because that’s undefined behavior where implementations aren’t required to check the pointer for null and can do just whatever they liked, including crash? That’s still the case if you pass a pointer variable whose value is null (sorry!), but with this paper, as of C++23 at least now we have overloads that reject attempts to construct or assign a std::string from nullptr specifically, as a compile-time “d’oh! don’t do that.”

We also adopted a number of other issue resolutions and small papers that made additional improvements, including a number that will be backported retroactively to C++20. Quite a few were of the “oh, you didn’t know that rare case didn’t work? now it does” variety.

Other progress

We also approved work on a second Concurrency TS. Recall that a “TS” or “Technical Specification” is like doing work in a feature branch, which can later be merged into the C++ standards (trunk).

Two related pieces of work were approved to go into the Concurrency TS: P1121 and P1122 by Paul McKenney, Maged M. Michael, Michael Wong, Geoffrey Romer, Andrew Hunter, Arthur O’Dwyer, Daisy Hollman, JF Bastien, Hans Boehm, David Goldblatt, Frank Birbacher, Erik Rigtorp, Tomasz Kamiński, and Jens Maurer add support for hazard pointers and read-copy-update (RCU) which are useful in highly concurrent applications.

What’s next

We’re going to keep meeting virtually in subgroups, and then have at least one more virtual plenary session to adopt features into the C++23 working draft in October.

The next tentatively planned ISO C++ face-to-face meeting is February 2022 in Portland, OR, USA. (Per our C++23 schedule, this is the “feature freeze” deadline for design-approving new features targeting the C++23 standard, whether the meeting is physical or virtual.) Meeting in person next February continues to look promising – barring unexpected surprises, it’s possible that by that time most ISO C++ participating nations will have been able to resume local sports/theatre/concert events with normal audiences, and removed travel restrictions among each other, so that people from most nations will be able to participate at an in-person meeting. But we still have to wait and see… we likely won’t know for sure until well into the autumn, and so we’re still calling this one “tentative” for now. You can find a list of our meeting plans on the Upcoming Meetings page.

Thank you again to the hundreds of people who are working tirelessly on C++, even in our current altered world. Your flexibility and willingness to adjust are much appreciated by all of us in the committee and by all the C++ communities! Thank you, and see you on Zoom.

GotW #102 Solution: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

1. Briefly, what is the difference among:

(a) undefined behavior

Undefined behavior is what happens when your program tries to do something whose meaning is not defined at all in the C++ standard language or library (illegal code and/or data). A compiler is allowed to generate an executable that does anything at all, from data corruption (objects not meeting the requirements of their types) to injecting new code to reformat your hard drive if the program is run on a Tuesday, even if there’s nothing in your source code that could possibly reformat anything. Note that undefined behavior is a global property — it always applies not only to the undefined operation, but to the whole program. [1]

(b) unspecified behavior

Unspecified behavior is what happens when your program does something for which the C++ standard doesn’t document the results. You’ll get some valid result, but you won’t know what the result is until your code looks at it. A compiler is not allowed to give you a corrupted object or to inject new code to reformat your hard drive, not even on Tuesdays.

(c) implementation-defined behavior

Implementation-defined behavior is like unspecified behavior, where the implementation additionally is required to document what the actual result will be on this particular implementation. You can’t rely on a particular answer in portable code because another implementation could choose to do something different, but you can rely on what it will be on this compiler and platform.

2. For each of the following, write a short function … where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

Easy peasy! Let’s dereference a null pointer:

// Example 2(a): If assert is violated, always undefined behavior

void deref_and_set( int* p ) {
    assert( p );
    *p = 42;
}

The function asserts that p is not null, and then on the next line unconditionally dereferences p and scribbles over the location it points to. If p is null and the assertion checking is off so that we can get to the next line, the compiler is allowed to make running the whole program format our hard drive.

(b) possibly results in undefined behavior

A general way to describe this class of program is that the call site has two bugs: first, it violates a precondition (so the callee’s results are always at least unspecified), and then it additionally then uses the unspecified result without checking it and/or in a dangerous way.

To make up an example, let’s bisect a numeric range:

// Example 2(b): If assert is violated, might lead to undefined behavior

int midpoint( int low, int high ) {
    assert( low <= high );
    return low + (high-low)/2;
        // less overflow-prone than “(low+high)/2”
        // more accurate than “low/2 + high/2”
}

The author of midpoint could have made the function more robust to take the values in either order, and thus eliminated the assertion, but assume they had a reason not to, as alluded to in the comments.

Violating the assertion does not result in undefined behavior directly. The function just doesn’t specify (ahem!) its results if call sites call it in a way that violates the precondition the assertion is testing. If the precondition is violated, then the function can add a negative number to low. But just calculating and returning some other int is not (yet) undefined behavior.

For many call sites, a bad call to midpoint won’t lead to later undefined behavior.

However, it’s possible that some call site might go on to use the unspecified result in a way that does end up being real undefined behavior, such as using it as an array index that performs an out-of-bounds access:

auto m = midpoint( low_index(arr1), high_index(arr2) );   // unspecified
   // here we expect m >= low_index(arr1) ...
stats[m-low_index(arr1)]++;                 // --> potentially undefined

This call site code has a typo, and accidentally mixes the low and high indexes of unrelated containers, which can violate the precondition and result in an index that is less than the “low” value. Then in the next line it tries to use it as an offset index into an instrumentation statistics array, which is undefined behavior for a negative number.

GUIDELINE: Remember that an unspecified result is not in itself undefined behavior, but a call site can run with it and end up with real undefined behavior later. This happen particularly when the calculated value is a pointer, or an integer used as an array index (which, remember, is basically the same thing; a pointer value is just an index into all available memory viewed as an array). If a program relies on unspecified behavior to avoid performing undefined behavior, then it has a path to undefined behavior, and so unspecified behavior is a Crouching Tiger, if you will… still dangerous, and can be turned into to the full dragon.

GUIDELINE: Don’t specify your function’s behavior (output postconditions) for invalid inputs (precondition violations), except for defense in depth (see Example 2(c)). By definition, if a function’s preconditions are violated, then the results are not specified. If you specify the outputs for precondition violations, then (a) callers will depend on the outputs, and (b) those “preconditions” aren’t really preconditions at all.

While we’re at it, here’s a second example: Let’s compare pointers in a way the C++ standard says is unspecified. This program attempts to use pointer comparisons to see whether a pointer points into the contiguous data stored in a vector, but this technique doesn’t work because today’s C++ standard only specifies the results of raw pointer comparison when the pointers point at (into, or one-past-the-end of) the same allocation, and so when ptr is not pointing into v’s buffer it’s unspecified whether either pointer comparison in this test evaluates to false:

// Example 2(b)(ii): If assert is violated, might lead to undefined behavior

// std::vector<int> v = ...;
assert(&v[0] <= ptr && ptr < (&v[0])+v.size());           // unspecified
*ptr = 42;                                  // --> potentially undefined

(c) is never undefined or unspecified behavior

An assertion violation is never undefined behavior if the function specifies what happens in every case even when the assertion is violated. Here’s an example mentioned in my paper P2064, distilled from real-world code:

// Example 2(c): If assert is violated, never undefined behavior
//               (function documents its result when x!=0)

some_result_value DoSomething( int x ) {
    assert( x != 0 );
    if    ( x == 0 ) { return error_value; }
    return sensible_result(x);
}

The function asserts that the parameter is not zero, to express that the call site shouldn’t do that, in a way the call site can check and test… but then it also immediately turns around and checks for the errant value and takes a well-defined fallback path anyway even if it does happen. Why? This is an example of “defense in depth,” and can be a useful technique for writing robust software. This means that even though the assertion may be violated, we are always still in a well-defined state and so this violation does not lead to undefined behavior.

GUIDELINE: Remember that violating an assertion does not necessarily lead to undefined behavior.

GUIDELINE: Function authors, always document your function’s requirements on inputs (preconditions). The caller needs to know what inputs are and aren’t valid. The requirements that are reasonably checkable should be written as code so that the caller can perform the checks when testing their code.

GUIDELINE: Always satisfy the requirements of a function you call. Otherwise, you are feeding “garbage in,” and the best you can hope for is “garbage out.” Make sure your code’s tests includes verifying all the reasonably checkable preconditions of functions that it calls.

Writing the above pattern has two problems: First, it repeats the condition, which invites copy/paste errors. Second, it makes life harder for static analysis tools, which often trust assertions to be true in order to reduce false positive results, but then will think the fallback path is unreachable and so won’t properly analyze that path. So it’s better to use a helper to express the “either assert this or check it and do a fallback operation” in one shot, which always avoids repeating the condition, and could in principle help static analysis tools that are aware of this macro (yes, it would be nicer to do it without resorting to a macro, but it’s annoyingly difficult to write the early return without a macro, because a return statement inside a lambda doesn’t mean the same thing):

// Using a helper that asserts the condition or performs the fallback

#define ASSERT_OR_FALLBACK(B, ACTION) { \
    bool b = B;                         \
    assert(b);                          \
    if(!b) ACTION;                      \
}

some_result_value DoSomething( int x ) {
    ASSERT_OR_FALLBACK( x != 0, return error_value; );
    return sensible_result(x);
}

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

In Example 2(a), violating the assertion leads to undefined behavior, 1(a).

In Example 2(b), violating the assertion leads to unspecified behavior, 1(b). At buggy call sites, this could subsequently lead to undefined behavior.

In Example 2(c), violating the assertion leads to implementation-defined behavior, 1(c), which never in itself leads to  undefined behavior.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

There are many. Here is just one example, that happens to be nice because it is perfectly accurate.

Let’s say we have all the code examples in question 2, written using C assert today (or even with those assertions missing!), and then at some future time we get a version of standard C++ that can express them as preconditions. Then only in Example 2(a), where we can see that the function body (and possibly transitively its further callees with the help of inlining) exercises undefined behavior, a tool can infer the precondition annotation and add it mechanically, and get the benefit of diagnosing existing bugs at call sites:

// What a precondition-aware tool could generate for Example 2(a)

auto f( int* p ) 
    [[pre( p )]]  // can add this automatically: because a violation
                  // leads to undefined behavior, this precondition
                  // is guaranteed to never cause a false positive
{
    assert( p );
    *p = 42;
}

For example, after some future C++2x ships with contracts, a vendor could write an automated tool that goes through every open source C++ project on GitHub and mechanically generates a pull request to insert preconditions for functions like Example 2(a) – but not (b) or (c) – whether or not the assertion already exists, just by noticing the undefined behavior. And it can inject those contract preconditions with complete confidence that none of them will ever cause a false positive, that they will purely expose existing bugs at call sites when that call site is built with contract checking enabled. I would expect such tool to identify a good number of (at least latent if not actual) bugs, and be a boon for C++ users, and it’s possible only for functions in the category of 2(a).

“Automated adoption” of at least part of a new C++ feature, combined with “automatically identifies existing bugs” in today’s code, is a pretty good value proposition.

Acknowledgments

Thank you to the following for their comments on this material: Joshua Berne, Gabriel Dos Reis, Gábor Horváth, Andrzej Krzemieński, Ville Voutilainen.

Notes

[1] In the standard, there are two flavors of undefined behavior. The basic “undefined behavior” is allowed to enter your program only once you actually try to execute the undefined part. But some code is so extremely ill-formed (with magical names like “IF-NDR”) that its very existence in the program makes the entire program invalid, whether you try to execute it or not.

GotW #102: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

JG Question

1. Briefly, what is the difference among:

(a) undefined behavior

(b) unspecified behavior

(c) implementation-defined behavior

Guru Questions

2. For each of the following, write a short function of the form:

/*...function name and signature...*/
{
    assert( /*...some condition about the parameters...*/ );
    /*...do something with parameters...*/;
}

where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

(b) possibly results in undefined behavior

(c) is never undefined or unspecified behavior

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

GotW #101 Solution: Preconditions, Part 2 (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. We covered some basics of preconditions in GotW #100. This time, let’s see how we can use preconditions in some practical examples…

1. Consider these functions, expanded from an article by Andrzej Krzemieński: [1] … How many ways could a caller of each function get the arguments wrong, but that would silently compile without error? Name as many different ways as you can.

There are several ways to break this down. I’ll use three major categories of possible mistakes, the first two of which overlap:

  • wrong order: passing an argument in the wrong position
  • wrong value: passing an argument with a valid but wrong value (e.g., index out of range)
  • invalid value: passing an argument that is already invalid (e.g., an invalid iterator)

Let’s see how these play out with our three examples, starting with (a).

(a) is_in_values (int val, int min, int max)

// Example 1: Adapted from [1]

auto is_in_values (int val, int min, int max)
  -> bool;  // true iff val is in the values [min, max]

Oh my, three identically typed integer parameters… what could be confusing about that?!

Wrong order (5 ways): First, there are five ways to pass these in the wrong order, because there are 3! = 6 permutations, all of which compile but only the first of which is correct:

is_in_values( v,  lo, hi );    // correct

is_in_values( v,  hi, lo );    // all these are wrong, but compile :(
is_in_values( lo, v,  hi ); 
is_in_values( lo, hi, v  ); 
is_in_values( hi, v,  lo ); 
is_in_values( hi, lo, v  );

Some of these argument orders may seem strange, but some are orders other libraries’ similar APIs might use which makes confusion easier, we all make mistakes… and the type system isn’t helping us at all.

Wrong value (1 way): Second, there is an implicit precondition that min <= max, so passing arguments where min > max would be wrong, but would silently compile. Some of these are exercised by the “wrong order” permutations above, but even call sites that remember the right argument order can make mistakes about the actual values.

Invalid value (0 ways): Finally, all possible values of an int are valid — some may be suspiciously big or small, but int doesn’t have the concept of “not a number” (NaN) as we have with floats, or the concept of “invalidated” like we have with iterators.

(b) is_in_container (int val, int idx_min, int idx_max)

It sure doesn’t help that the next function has the identical signature as is_in_values, but with very different meaning:

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool; // true iff container[i]==val for some i in [idx_min, idx_max]

Wrong order (5 ways): As in (a), we again have five ways to pass these in the wrong order, all of which compile but only the first is correct:

is_in_container( v,  lo, hi );    // correct

is_in_container( v,  hi, lo );    // all these are wrong, but compile :(
is_in_container( lo, v,  hi ); 
is_in_container( lo, hi, v  ); 
is_in_container( hi, v,  lo ); 
is_in_container( hi, lo, v  );

Wrong value (3 ways): Again as in (a), we have the implicit precondition that idx_min <= idx_max, so passing idx_min > idx_max would be wrong, but would silently compile. But this time there are two additional ways to go wrong, because idx_min and idx_max must both be valid subscripts into container, so if either is outside the range [0, container.size()) it is a valid integer but an out of bounds value for this use.

Invalid value (0 ways): Again as in (a), all possible values of an int are valid — though some may be wrong values if they’re out of bounds as we noted above, they’re still valid integers.

(c) is_in_range (T val, Iter first, Iter last)

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool; // true iff *i==val for some i in [first,last)

Wrong order (1 way): This time there’s only one way to pass the parameters in the wrong order (ignoring pathological cases where the same argument might convert both T and Iter):

is_in_container( v, istart, iend );    // correct

is_in_container( v, iend, istart );    // wrong, but compiles :(

Wrong value (2 ways): We could pass a first and last that are not a valid range in two ways:

  • they point into the same container, but first doesn’t precede last
  • they point into different containers

Invalid value (2 ways): And finally, either of first or last could actually be an invalidated iterator (e.g., dangling). For example, the container they point into may be destroyed so that both are invalid; or one of the two iterators might have been calculated before a more recent operation like vector::push_back that could have invalidated it.

But if the sight of these function signatures has had you pulling your hair and shouting “use the type system, Luke!” at your screen, you’re not alone… now let’s make things better.

2. Show how can you improve the function declarations in Question 1 by …

(a) just grouping parameters, using a struct with public variables

Interestingly, we actually get a lot of benefit simply by grouping ‘parameters that go together,’ using an creating an aggregate or “grouping” helper struct.[3] For example:

// Example 2(a)(i): Improving Example 1 with aggregate types

struct min_max { int min, max; };

auto is_in_values (int val, min_max minmax) -> bool;
auto is_in_container (int val, min_max rng) -> bool;

template <typename Iter> struct two_iters { Iter first, last; };

template <typename T, typename Iter>
auto is_in_range (T val, two_iters<Iter> rng) -> bool;

Or even just venerable anonymous std::pair is better than no grouping:

// Example 2(a)(ii): Improving Example 1 with aggregate types

auto is_in_values (int val, std::pair<int,int> minmax) -> bool;
auto is_in_container (int val, std::pair<int,int> rng) -> bool;

template <typename T, typename Iter>
auto is_in_range (T val, std::pair<Iter,Iter> rng) -> bool;

With either of the above, there’s only one way for callers to get the argument order wrong. And it requires only two extra characters at call sites, because we can use { } to group the arguments without creating actual named objects of the helper struct:

is_in_values( v, {lo, hi} );	// correct
is_in_values( v, {hi, lo} );	// wrong, but compiles

is_in_container( v, {lo, hi} );	// correct
is_in_container( v, {hi, lo} );	// wrong, but compiles

is_in_range( v, {i1, i2} );		// correct
is_in_range( v, {i2, i1} ); 	// wrong, but compiles

So just grouping parameters using a struct eliminates some errors. But really using the type system is even better…

(b) just using an encapsulated class, using a class with private variables (an abstraction with its own invariant)

Clearly all three functions are crying out for a “range”-like abstraction for its pair of parameters, in the first two cases a range of values and in the third a range of iterators. How do we know? Because:

Here’s one way we can apply class types we can find in the standard library or Boost today:

// Example 2(b): Improving Example 1 with encapsulated class types

auto is_in_values (int val, boost::integer_range<int> rng) -> bool;

auto is_in_container (int val, boost::integer_range<int> rng) -> bool;

template <typename T, std::ranges::input_range Range>
auto is_in_range (T val, Range&& rng) -> bool;

This gives us all the mistake-reduction goodness we got in (a), plus more.

First, as in (a), absent pathological conversions, it’s very difficult to get arguments in the wrong order simply because of being forced to group the parameters:

auto minmax = boost::irange(10, 100);
is_in_values( 42, minmax );

auto minmax2 = boost::irange(0, ssize(myvec)-1);
is_in_container( 42, minmax2 );

auto myvec = std::vector<int>();
is_in_range( 42, myvec );

But, unlike our helper structs in (a), we now get additional safety because the types can express constructor preconditions that move some of those mistakes (such as (hi,lo) misordering) to constructors of class abstractions that can then preserve them as invariants [4] – so the mistake can still be made but in fewer places, to where we construct or modify the abstracted object (e.g., range), rather than every time we use un-abstracted separately values (e.g., a couple of iterator objects we have lying around and whose relationship we have to maintain by hand over time). This is why we sometimes say “types are predicates,” because a type encapsulates a predicate, namely its invariant.

GUIDELINE: When multiple functions state the same precondition, it’s a telltale sign there’s a missing class that should turn it into an invariant. A repeated precondition is nearly always a “naked invariant” that should be encapsulated up inside a type. This is more obvious when the precondition involves multiple parameters (or ordinary variables for that matter); a poster child is the STL’s pervasive use of iterator pairs, which have long been crying out to be encapsulated using a range abstraction, and fortunately we now have that in C++20. Consider using a class instead.

GUIDELINE: Remember that a key reason why encapsulated classes are powerful is that they wrap up preconditions and turn them into invariants. Hiding data members is good dependency management because it limits the code that can depend on the details of the data and is responsible for maintaining the correct relationship among the data members.

(c) just using post-C++20 contract preconditions (not yet valid C++, but something like the syntax in [2])

Preconditions test values, so they can let us eliminate the “wrong values” kinds of mistakes. Consider this code:

// Example 2(c): Improving Example 1 with boolean preconditions

auto is_in_values (int val, int min, int max)
  -> bool // true iff val is in the values [min, max]
     [[pre (min <= max)]]
;

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool // true iff container[i]==val for i in [idx_min, idx_max]
     [[pre (0       <= idx_min
         && idx_min <= idx_max
         && idx_max <  container.size())]]          // see note [5]
;

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool // true iff *i==val for some i in [first,last)
     [[pre (/*... is_reachable? is_not_dangling? hmm ...*/)]]
;

For the first two functions, we can write clear preconditions that can check the “wrong value” bugs.

In these particular examples, the best place to write the preconditions is right on the constructors of the class types we saw in (b), and if we write them there then we don’t have to repeat them as explicit contracts on every function.

But is (b) always better than (c), in other examples? This brings us to our last question, which is all about “can” versus “should”…

3. Consider these three examples, where each shows expressing a boolean condition either as a function precondition or as an encapsulated invariant inside a new type… In each of these cases, which way is better? Explain your answer.

In Question 2, writing a type was often the best choice, but it isn’t always.

The benefits to writing a type include:

  • Encapsulation. We limit the code that is responsible for maintaining the boolean condition.
  • Language support. We get the help of the type system to statically enforce requirements.

But there are costs and limitations too:

  • What’s the abstraction? There may not be a suitable one. We can’t write a good type unless we can discover a useful abstraction that the type’s interface should support. A good type represents a useful reusable domain abstraction that programmers can understand and that makes their code clearer by elevating the vocabulary of the code. There won’t always be a practical and reusable abstraction; when there isn’t, we won’t be able to write a useful and reusable type. — Even when there is, we have to design that all ahead of time, which requires a lot more advance knowledge and engineering than just writing ad-hoc boolean conditions on individual functions.
  • What’s the cost? It may not be feasible to maintain the invariant. We have to do any extra work it takes to maintain the invariant, and it has to be practical to do. When it isn’t, we can’t maintain the invariant without help from outside code, and so we won’t be able to really encapsulate it properly.
  • Does it make sense as an independent abstraction? Will the user be carrying around objects of this type, or are we just jamming a precondition common to a few functions (or only one) into a type and calling it useful? Occam’s Razor: Don’t multiply entities beyond necessity.
  • What’s the type the caller is using? This is where a real usable abstraction shines, because many callers will be using it independently of calling our function. But if the caller isn’t using this type, then there typically has to be an implicit or explicit conversion (because inheritance from all argument types our callers might already have usually isn’t an option), and that conversion would need to be usable and sufficiently cheap.

GUIDELINE: Remember that types and contracts are “better together.” Use both. They are complementary, neither is a substitute for the other. All we are trying to accomplish with contracts is to augment the language’s static type checking with runtime checking where that is more appropriate because we can’t design a practical abstraction. And this is why we want contracts on functions (preconditions, postconditions) even though we already have types, and why we also want contracts on types (invariants).

Let’s consider the three examples.

(a) A vector that is sorted

template <typename T>
void f( vector<T> const& v ) [[pre( is_sorted(v) )]] ;

template <typename T>
void f( sorted<vector<T>> const& v );

If this looks familiar, it’s because is_sorted is one of the classic examples we saw in GotW #98 of conditions that are often impractical to check and enforce as an assertion, in this case a precondition.

Can we do better by making it a type, perhaps a sorted wrapper around a container like vector that maintains the guarantee that it’s always sorted? Well, we have to answer some questions about a sorted<T>:

  • What’s the abstraction it provides? It can’t easily fulfill the requirements of a sequence container like vector itself; for example, push_back doesn’t make much sense because letting the caller insert an arbitrary value at a specific location would easily cause the container to be unsorted. Instead, it would naturally want a more general insert function instead, and the interface would be more like set. This part could be workable.
  • What’s the cost? This where it starts to breaks down: Keeping a vector sorted all the time means that every insertion would cost O(N) work all the time. Which leads into…
  • Does it make sense as an independent abstraction? … that it’s very common for code to maintain an “almost-sorted” vector, such as by inserting new elements at the end which is fast (and, hmm, affects our abstraction design, because then it would make sense to have push_back after all, wouldn’t it? hmm) but leaves a suffix of unsorted elements in the container, and then periodically sorting the whole container so that the sorting cost is amortized. But an almost-sorted vector isn’t good enough, and so doesn’t fit the bill. We don’t have empirical evidence of such types in general use.
  • What’s the type the caller is using? And now we’re busted all the way, because we want this interface to be usable by anyone who has a vector<T>, which would require a conversion to sorted<vector<T>>. If we do a deep copy, that’s prohibitively expensive. Even if the conversion is lightweight by avoiding a deep copy, such as by just wrapping an existing vector object, it wouldn’t be very useful unless it did O(N) work every time unconditionally to verify the invariant. And even then the abstraction design is affected and compromised: If the user can still see and modify the original vector, then that’s still part of the accessible interface to the data, so the user can make the container be not fully sorted and we’re unable to really encapsulate and maintain our intended invariant.

So is_sorted is much better as a function precondition.

// (b) A vector that is not empty

template <typename T>
void f( vector<T> const& v ) [[pre( !v.empty() )]] ;

template <typename T>
void f( not_empty<vector<T>> const& v );

This one is more feasible as a type, but still not ideal:

  • What’s the abstraction it provides? It’s a vector, and we can make the interface identical to vector with just extra preconditions on pop and erase functions to not remove the last element in the container.
  • What’s the cost? Emptiness is cheap enough to check and maintain.
  • Does it make sense as an independent abstraction? This is where it starts to get questionable… the answer is at best “maybe.” It’s not clear to me than a “nonempty vector” is a generally useful abstraction.
  • What’s the type the caller is using? This is where I think we break down again. Again, we want this interface to be usable by anyone who has a vector<T>, and that means a conversion to not_empty<vector<T>>. If we do a deep copy, that’s prohibitively expensive. This time if we just wrap an existing vector object to avoid the deep copy, the check is cheap. But then we still have the problem that the abstraction design is affected and compromised so that it can’t maintain its invariant, because if the user can still see and modify the original vector, they can remove the last element on us.

So not_empty seems better as a function precondition.

(c) A pointer that is not null

void f( int* p ) [[pre( p != nullptr )]] ;

void f( not_null<int*> p );

This time we can do better:

  • What’s the abstraction it provides? This one’s easy to state: It’s a not-null pointer. That’s a far simpler interface than a container, because we just need operator* and operator->, construction, destruction, and copying. Even so it’s not totally without subtlety, because not_null should not have move operations that modify the source object. This means that a not_null<unique_ptr<T>> is legal but there’s not much you can do with it besides dereference it and destroy it: It can’t be copyable because unique_ptr isn’t copyable, and it must not be movable because moving a unique_ptr leaves the source null.
  • What’s the cost? Nullness is cheap enough to check and maintain.
  • Does it make sense as an independent abstraction? Definitely. A “non-null pointer” has been widely rediscovered and reinvented as a generally useful abstraction.
  • What’s the type the caller is using? A not_null<int*> is a useful object in its own right in the calling code, independently of calling this particular function. And if our function is invoked by someone who has only an ordinary int*, doing a full copy of the pointer is cheap, and applying the nullness check as a precondition on that converting constructor is exactly equivalent to writing the precondition by hand, but is automated.

So not_null seems better as a type, primarily because it is independently useful. This is why it has been reinvented a number of times, including as gsl::not_null. [6]

GUIDELINE: Wherever practical, design interfaces so that incorrect call sites are illegal (won’t compile, using the type system) or loud (won’t pass unit tests, using preconditions). This is a key part of achieving the goal to “make interfaces easy to use correctly, and hard to use incorrectly.” Preconditions directly help with that by letting us catch entire groups of errors at test time, and are a complement to the type system which makes incorrect uses “not fit” through the compiler and also carries extra preconditions around for us in the form of invariants.

GUIDELINE: Remember that the type system is a hammer, and not every precondition is a nail. The type system is a powerful tool, but not every precondition is naturally (part of) an invariant of a useful type that provides a good reusable abstraction that’s generally useful independently of this function.

Notes

[1] A. Krzemieński. “Contracts, preconditions and invariants” (Andrzej’s C++ blog, December 2020).

[2] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[3] For 2(a) and 2(b), on platform ABIs that do not pass small structs/classes in registers, turning individual parameters into a struct/class could cause them to be passed in stack memory instead of in registers.

[4] Upcoming GotWs will cover invariants and violation handling.

[5] If C++ gets chained comparisons as proposed in P0515 and P0893 we could write this much more clearly, and with fewer opportunities for mistakes, as:

[[pre( 0 <= idx_min <= idx_max < container.size() )]]

[6] B. Stroustrup and H. Sutter (eds.) “I.12 Declare a pointer that must not be null as not_null” (C++ Core Guidelines.) If the not_null<T> type we are using is implicitly convertible from T, which is the intent of I.12 to provide a drop-in replacement for pointer parameters, then the usability is the same as with the precondition. Otherwise, the caller has to provide a not_null argument at the call site, either by doing an explicit conversion or by just using a not_null local variable in their own body.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gabriel Dos Reis, J. Daniel Garcia, Gábor Horváth, Andrzej Krzemieński, Bjarne Stroustrup, Andrew Sutton, Ville Voutilainen

GotW #101: Preconditions, Part 2 (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. We covered some basics of preconditions in GotW #100. This time, let’s see how we can use preconditions in some practical examples…

JG Question

1. Consider these functions, expanded from an article by Andrzej Krzemieński: [1]

// Adapted from [1]

auto is_in_values (int val, int min, int max)
  -> bool; // true iff val is in the values [min, max]

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool; // true iff container[i]==val for some i in [idx_min, idx_max]

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool; // true iff *i==val for some i in [first,last)

How many ways could a caller of each function get the arguments wrong, but that would silently compile without error? Name as many different ways as you can.

Guru Questions

2. Show how can you improve the function declarations in Question 1 by:

(a) just grouping parameters, using a struct with public variables

(b) just using an encapsulated class, using a class with private variables (an abstraction with its own invariant)

(c) just using post-C++20 contract preconditions (not yet valid C++, but something like the syntax in [2])

In each case, how many of the possible kinds of mistakes for each function can the approach prevent?

3. Consider these three examples, where each shows expressing a boolean condition either as a function precondition or as an encapsulated invariant inside a new type:

// (a) A vector that is sorted

template <typename T>
void f( vector<T> const& v ) [[pre( is_sorted(v) )]] ;

template <typename T>
void f( sorted<vector<T>> const& v );


// (b) A vector that is not empty

template <typename T>
void f( vector<T> const& v ) [[pre( !v.empty() )]] ;

template <typename T>
void f( not_empty<vector<T>> const& v );


// (c) A pointer that is not null

void f( int* p ) [[pre( p != nullptr )]] ;

void f( not_null<int*> p );

In each of these cases, which way is better? Explain your answer.

Notes

[1] A. Krzemieński. “Contracts, preconditions and invariants” (Andrzej’s C++ blog, December 2020).

[2] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ). That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

GotW #100 Solution: Preconditions, Part 1 (Difficulty: 8/10)

This special Guru of the Week series focuses on contracts. We’ve seen how postconditions are directly related to assertions (see GotWs #97 and #99). So are preconditions, but that in one important way makes them fundamentally different. What is that? And why would having language support benefit us even more for writing preconditions more than for the other two?

1. What is a precondition, and how is it related to an assertion?

A precondition is a “call site prerequisite on the inputs”: a condition that must be true at each call site before the caller can invoke this function. In math terms, it’s about expressing the domain of recognized inputs for the function. If preconditions don’t hold, the function can’t possibly do its work (achieve its postconditions), because the caller hasn’t given it a starting point it understands.

A precondition IS-AN assertion in every way described in GotW #97, with the special addition that whereas a general assertion is always checked where it is written, a precondition is written on the function and conceptually checked at every call site. (In less-ideal implementations, including if we write it as a library today, the precondition check might be in the function body; see Question 2.)

Explain your answer using the following example, which uses a variation of a proposed post-C++20 syntax for preconditions. [1]

// Example 1(a): A precondition along the lines proposed in [1]

void f( int min, int max )
    [[pre( min <= max )]]
{
    // ...
}

The above would be roughly equivalent to writing the test before the call at every call site instead. For example, for a call site that performs f(x, y), we want to check the precondition at this specific call site at least when it is being tested (and possibly earlier and/or later, see GotW #97 Question 4):

// Example 1(b): What a compiler might generate at a call site
//               “f(x, y)” for the precondition in Example 1(a)

assert( x <= y ); // implicitly injected assertion at this call site,
                  // checked (at least) when this call site is tested
f(x, y);

And, as we’ll see in Question 4, language support for preconditions should apply this rewrite recursively for subexpressions that are themselves function calls with preconditions.

GUIDELINE: Use a precondition to write “this is what a bug is” as code the caller can check. A precondition states in code the circumstances under which this function’s behavior is not documented.

2. Rewrite the example in Question 1 to show how to approximate the same effect using assertions in today’s C++.

Here’s one way we can do it, that extends the MY_POST technique from GotW #99 Example 2 to also support preconditions. Again, instead of MY_ you’d use your company’s preferred unique macro prefix: [2]

// Eliminate forward-boilerplate with a macro (written only once)
#define MY_PRE_POST(preconditions, postconditions)         \
    assert( preconditions );                               \
    auto post = [&](auto&& _return_) -> auto&& {           \
        assert( postconditions );                          \
        return std::forward<decltype(_return_)>(_return_); \
    };

And then the programmer can just write:

// Example 2: Sample precondition

void f( int min, int max )
{   MY_PRE_POST_V( min <= max, true ); // true == no postconditions here
    // ...
}

This has the big benefit that it works using today’s C++. It has the same advantages as MY_POST in GotW #99, including that it’s future-friendly… if we use the macro as shown above, then if in the future C++ has language support for preconditions and postconditions with a syntax like [1], migrating your code to that could be as simple as search-and-replace:

{ MY_PRE_POST( **, * )[[pre: ** ]] [[post _return_: * )]] {

return post( * )return *

GUIDELINE (extended from GotW #99): If you don’t already use a way to write preconditions and postconditions as code, consider trying something like MY_PRE_POST until language support is available. It’s legal C++ today, it’s not terrible, and it’s future-friendly to adopting future C++ language contracts.

But even if macros don’t trigger your fight-or-flight response, it’s still a far cry from language support …

Are there any drawbacks to your solution compared to having language support for preconditions?

Yes:

  • Callee-body checking only. This method can run the check only inside the function’s body. First, this means we can’t easily perform the check at each call site, which would be ideal including so we can turn the check on for one call site but not another when we are testing a specific caller. Second, for constructors it can’t run at the very beginning of construction because member initialization happens before we enter the constructor body.
  • Doesn’t directly handle nested preconditions, meaning preconditions of functions invoked as part of the precondition itself. We’ll come to this in Question 4.

3. If a precondition fails, what does that indicate, and who is responsible for fixing the failure?

Each call site is responsible for making sure it meets all of a function’s preconditions before calling that function. If a precondition is false, it’s a bug in the calling code, and it’s the calling code author who is responsible for fixing it.

Explain how this makes a precondition fundamentally different from every other kind of contract.

A precondition is the only kind of contract you can write that someone else has to fulfill, and so if it’s ever false then it’s someone else’s fault — it’s the caller’s bug that they need to go fix.

GUIDELINE: Remember the fundamental way preconditions are unique… if they’re false, then it’s someone else’s fault (the calling code author). When you write any of the other contracts (assertions, function postconditions, class invariants), you state something that must be true about your own function or class, and if prior contracts were written and well tested then likely it’s your function or class that created the first unexpected state.

4. Consider this example, expanded from a suggestion by Gábor Horváth:

// Example 4(a): What are the implicit preconditions?

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre( x[0] <= std::sqrt(y) )]] ;

Note that std::floating_point is a C++20 concept.

a) What kinds of preconditions must a caller of calc satisfy that can’t generally be written as testable boolean expressions?

The language requires the number and types of arguments to match the parameter list. Here, calc must be called with two arguments. The first must be a std::vector<int> or something convertible to that. The second one’s type has to satisfy the floating_point concept (it must be float, double, or long double).

It’s worth remembering that these language-enforced rules are conceptually part of the function’s precondition, in the sense that they are requirements on call sites. Even though we generally can’t write testable boolean predicates for these to check that we didn’t write a bug, we also never need to do that because if we write a bug the code just won’t compile. [3] Code that is “correct by construction” doesn’t need to add assertions to find potential bugs.

GUIDELINE: Remember that a static type is a (non-boolean) precondition. It’s just enforced by language semantics with always-static checking (edit or compile time), and never needs to be tested using a boolean predicate whose test could be delayed until dynamic checking (test or run time).

COROLLARY: A function’s number, order, and types of parameters are all (non-boolean) parts of its precondition. This falls out of the “static type” statement because the function’s own static type includes those things. For example, the language won’t let us invoke this function with the argument lists () or (1,2,3,4,5) or (3.14, myvector). We’ll delve into this more deeply in GotW #101.

COROLLARY: All functions have preconditions. Even void f() { }, which takes no inputs at all including that it reads no global state, has the precondition that it must be passed zero arguments. The only counterexample I can think of is pathological: void f(...) { } can be invoked with any number of arguments but ignores them all.

b) What kinds of boolean-testable preconditions are implicit within the explicitly written declaration of calc?

There are three possible kinds of implied boolean preconditions. All three are present in this example.

(1) Type invariants

Each object must meet the invariants of its type. This is subtly different from “the object’s type matches” (a static property) that we say in 4(a), because this means additionally “the object’s value is not corrupt” (a dynamic property).

Here, this means x must obey the invariant of vector<int>, even though that invariant isn’t expressed in code in today’s C++. [4] For y this is fairly easy because all bit patterns are valid floating point values (more about NaNs in just a moment).

(2) Subexpression preconditions

The subexpression x[0] calls x.operator[] which has its own precondition, namely that the subscript be non-negative and less than x.size(). For 0, that’s true if x.size() > 0 is true, or equivalently !x.empty(), so that becomes an implicit part of our whole precondition.

(3) Subexpressions that make the whole precondition false

The subexpression std::sqrt(y) invokes C’s sqrt. The C standard says that unless y >= 0, the result of sqrt(y) is NaN (“not a number”), which means our precondition amounts to something <= NaN which is always false. Therefore, y >= 0 is effectively part of calc’s precondition too. [5]

Putting it all together

If we were to write this all out, the full precondition would be something like this — and note that the order is important! Here we’ll ignore the parts that are enforced by the language, such as parameter arity, and focus on the parts that can be written as boolean expressions:

// Example 4(b): Trying to write the precondition out more explicitly
//               (NOT all are recommended, this is for exposition)

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre(

//  1. parameter type invariants:
           /* x is a valid object, but we can’t spell that, so: */ true

//  2. subexpression preconditions:
        && x.size() > 0  // so checking x[0] won’t be undefined (!)

//  3. subexpression values that make our precondition false:
        && y >= 0        // redundant with the expression below

// finally, our explicit precondition itself:
        && x[0] <= std::sqrt(y) 
    )]] ;

GUIDELINE: Remember that your function’s full effective precondition is the precondition you write plus all its implicit prerequisites. Those are: (1) each parameter’s type invariants, (2) any preconditions of other function calls within the precondition, and (3) any defined results of function calls within the precondition that would make the precondition false.

c) Should any of these boolean-testable implicit preconditions also be written explicitly here in this precondition code? Explain.

For #1 and #3, we generally shouldn’t be repeating them as in 4(b):

  • We can skip repeating #1 because it’s enforced by the type system, plus if there is a bug it’s likely in the type itself rather than in our code or our caller’s code and will be checked when we check the type’s invariants.
  • We can skip repeating #3 because it’ll just make the whole condition be false and so is already covered.

But #2 is the problematic case: If x is actually empty, the subexpression’s precondition would actually make our precondition undefined to evaluate! “Undefined” is a very bad answer if we ever check this precondition, because if in our checking the full precondition is ever violated then we absolutely want that check to do something well-defined — we want it to evaluate to false and fail.

If a subexpression of our precondition itself has a real precondition, then we do want to check that first, otherwise we cannot check our full precondition without undefined behavior if that subexpression’s precondition was not met:

// Example 4(c): Today, we should repeat our subexpressions’ real
//               preconditions, so we can check our precondition
//               without undefined behavior

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre( x.size() > 0 && x[0] <= std::sqrt(y) )]] ;

With today’s library-based preconditions, such as the one shown in Question 2, we need to repeat subexpressions’ preconditions if we want to check our precondition without undefined behavior. One of the potential advantages of a language-supported contracts system is that it can “flatten” the preconditions to automatically test category #2 , so that nested preconditions like this one don’t need to be repeated (assuming that the types and functions you use, here std::vector and its member functions, have written their preconditions and invariants)… and then we could still debate whether or not to explicitly repeat subexpression preconditions in our preconditions, but it would be just a water-cooler stylistic debate, not a “can this even be checked at all without invoking undefined behavior” correctness debate.

Here’s a subtle variation suggested by Andrzej Krzemieński. For the sake of discussion, suppose we have a nested precondition that is not used in the function body (which I think is terribly unlikely, but let’s just consider it):

void display( /*...*/ )
    [[pre( globalData->helloMessageHasBeenPrinted() )]]
{
    // assume for sake of discussion that globalData is not
    // dereferenced directly or indirectly by this function body
}

Here, someone could argue: “If globalData is null, only actually checking the precondition would be undefined behavior, but executing the function body would not be undefined behavior.”

Question: Is globalData != nullptr an implicit precondition of display, since it applies only to the precondition, and is not actually used in the function body? Think about it for a moment before continuing…

… okay, here’s my answer: Yes, it’s absolutely part of the precondition of display, because by definition a precondition is something the caller is required to ensure is true before calling display, and a condition that is undefined to evaluate at all cannot be true.

GUIDELINE: If your checked precondition has a subexpression with its own preconditions, make sure those are checked first. Otherwise, you might find your precondition check doesn’t fire even when it’s violated. In the future, language support for preconditions might automate this for you; until then, be careful to write out the subexpression precondition by hand and put it first.

Notes

[1] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[2] Again, as in GotW #99 Note 4, in a real system we’d want a few more variations, such as:

// A separate _V version for functions that don’t return
// a value, because 'void' isn’t regular
#define MY_PRE_POST_V(preconditions, postconditions) \
    assert( preconditions );                         \
    auto post = [&]{ assert( postconditions ); };

// Parallel _DECL forms to work on forward declarations,
// for people who want to repeat the postcondition there
#define MY_PRE_POST_DECL(preconditions, postconditions)
#define MY_PRE_POST_V_DECL(preconditions, postconditions)

And see GotW #99 Note 5 for how to guarantee the programmer didn’t forget to write “return post” at each return.

[3] Sure, there are things like is_invocable, but the point is we can’t always write those expressions, and we don’t have to here.

[4] Upcoming GotWs will cover invariants and violation handling. For type invariants, today’s C++ doesn’t yet provide a way to write those as a checkable assertions to help us find bugs where we got it wrong and corrupted an object. The language just flatly assumes that every object meets the invariants of its type during the object’s lifetime, which is from the end of its construction to the beginning of its destruction.

[5] There’s more nuance to the details of what the C standard says, but it ends up that we should expect the result of passing a negative or NaN value to sqrt will be NaN. Although C calls negative and NaN inputs “domain errors,” which hints at a precondition, it still defines the results for all inputs and so strictly speaking doesn’t have a precondition.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gabriel Dos Reis, J. Daniel Garcia, Gábor Horváth, Andrzej Krzemieński, Jean-Heyd Meneide, Bjarne Stroustrup, Andrew Sutton, Jim Thomas, Ville Voutilainen.