My CppCon 2021 talk video is online

Whew — I’m now back from CppCon, after remembering how to travel.

My talk video is now online. If you haven’t already seen this via JetBrains’ CppCon 2021 video page or the Reddit post, here’s a link:

Please direct technical comments to the Reddit thread and I’ll watch for them there and respond to as many comments as I can. Thanks!

Thanks again to everyone who attended in person for supporting our requirements for meeting together safely. Interestingly, this was the largest CppCon ever (and the largest C++-specific conference ever as far as I know) in terms of total attendance, though most were attending online. It was good to see and e-see you all! With any luck, by CppCon 2022 our lives will be much closer to normal everywhere in the world… here’s hoping. Thanks again, and stay safe.

Trip report: Summer 2021 ISO C++ standards meeting (virtual)

On Monday, the ISO C++ committee held its third full-committee (plenary) meeting of the pandemic and adopted a few more features and improvements for draft C++23.

We had representatives from 17 voting nations at this meeting: Austria, Bulgaria, Canada, Czech Republic, Finland, France, Germany, Israel, Italy, Netherlands, Poland, Russia, Slovakia, Spain, Switzerland, United Kingdom, and United States. Slovakia is our newest national body to officially join international C++ work. Welcome!

We continue to have the same priorities and the same schedule we originally adopted for C++23, but online via Zoom during the pandemic.

This week: A few more C++23 features adopted

This week we formally adopted a third round of small features for C++23, as well as a number of bug fixes. Below, I’ll list some of the more user-noticeable changes and credit all those paper authors, but note that this is far from an exhaustive list of important contributors… even for these papers, nothing gets done without help from a lot of people and unsung heroes, so thank you first to all of the people not named here who helped the authors move their proposals forward! And thank you to everyone who worked on the adopted issue resolutions and smaller papers I didn’t include in this list.

P1938  by Barry Revzin, Richard Smith, Andrew Sutton, and Daveed Vandevoorde adds the if consteval feature to C++23. If you know about C++17 if constexpr and C++20 std::is_constant_evaluated, then you might think we already have this feature under the spelling if constexpr (std::is_constant_evaluated())… and that’s one of the reasons to add this feature, because that code actually doesn’t do what one might think. See the paper for details, and why we really want if consteval in the language.

P1401 by Andrzej Krzemieński enables testing integers as booleans in static_cast and if constexpr without having to cast the result to bool first (or test against zero). This is a small-but-nice example of removing redundant ceremony to help make C++ code that much cleaner and more readable.

P1132 by Jean-Heyd Meneide, Todor Buyukliev, and Isabella Muerte add out_ptr and inout_ptr abstractions to help with potential pointer ownership transfer when passing a smart pointer to a function that is declared with a T** “out” parameter. In a nutshell, if you’ve ever wanted to call a C API by writing something like some_c_function( &my_unique_ptr ); then these types will likely help you. The idea is that a call site can use one of these types to wrap a smart pointer argument, and then when the helper type is destroyed it automatically updates the pointer it wraps (using a reset call or semantically equivalent behavior).

P1659 by Christopher DiBella generalizes the C++20 starts_with and ends_with on string and string_view by adding the general forms ranges::starts_with and ranges::ends_with to C++23. These can work on arbitrary ranges, and also answer questions such as “are the starting elements of r1 less than the elements of r2?” and “are the final elements of r1 greater than the elements of r2?”.

P2166 by Yuriy Chernyshov helps reduce a commonly-taught pitfall with std::string. You know how since forever (C++98) you can construct a string from a string literal, like std::string("xyzzy")? But that you’d better watch out (and you’d better not cry or pout) not to pass a null pointer, like std::string(nullptr), because that’s undefined behavior where implementations aren’t required to check the pointer for null and can do just whatever they liked, including crash? That’s still the case if you pass a pointer variable whose value is null (sorry!), but with this paper, as of C++23 at least now we have overloads that reject attempts to construct or assign a std::string from nullptr specifically, as a compile-time “d’oh! don’t do that.”

We also adopted a number of other issue resolutions and small papers that made additional improvements, including a number that will be backported retroactively to C++20. Quite a few were of the “oh, you didn’t know that rare case didn’t work? now it does” variety.

Other progress

We also approved work on a second Concurrency TS. Recall that a “TS” or “Technical Specification” is like doing work in a feature branch, which can later be merged into the C++ standards (trunk).

Two related pieces of work were approved to go into the Concurrency TS: P1121 and P1122 by Paul McKenney, Maged M. Michael, Michael Wong, Geoffrey Romer, Andrew Hunter, Arthur O’Dwyer, Daisy Hollman, JF Bastien, Hans Boehm, David Goldblatt, Frank Birbacher, Erik Rigtorp, Tomasz Kamiński, and Jens Maurer add support for hazard pointers and read-copy-update (RCU) which are useful in highly concurrent applications.

What’s next

We’re going to keep meeting virtually in subgroups, and then have at least one more virtual plenary session to adopt features into the C++23 working draft in October.

The next tentatively planned ISO C++ face-to-face meeting is February 2022 in Portland, OR, USA. (Per our C++23 schedule, this is the “feature freeze” deadline for design-approving new features targeting the C++23 standard, whether the meeting is physical or virtual.) Meeting in person next February continues to look promising – barring unexpected surprises, it’s possible that by that time most ISO C++ participating nations will have been able to resume local sports/theatre/concert events with normal audiences, and removed travel restrictions among each other, so that people from most nations will be able to participate at an in-person meeting. But we still have to wait and see… we likely won’t know for sure until well into the autumn, and so we’re still calling this one “tentative” for now. You can find a list of our meeting plans on the Upcoming Meetings page.

Thank you again to the hundreds of people who are working tirelessly on C++, even in our current altered world. Your flexibility and willingness to adjust are much appreciated by all of us in the committee and by all the C++ communities! Thank you, and see you on Zoom.

GotW #102 Solution: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

1. Briefly, what is the difference among:

(a) undefined behavior

Undefined behavior is what happens when your program tries to do something whose meaning is not defined at all in the C++ standard language or library (illegal code and/or data). A compiler is allowed to generate an executable that does anything at all, from data corruption (objects not meeting the requirements of their types) to injecting new code to reformat your hard drive if the program is run on a Tuesday, even if there’s nothing in your source code that could possibly reformat anything. Note that undefined behavior is a global property — it always applies not only to the undefined operation, but to the whole program. [1]

(b) unspecified behavior

Unspecified behavior is what happens when your program does something for which the C++ standard doesn’t document the results. You’ll get some valid result, but you won’t know what the result is until your code looks at it. A compiler is not allowed to give you a corrupted object or to inject new code to reformat your hard drive, not even on Tuesdays.

(c) implementation-defined behavior

Implementation-defined behavior is like unspecified behavior, where the implementation additionally is required to document what the actual result will be on this particular implementation. You can’t rely on a particular answer in portable code because another implementation could choose to do something different, but you can rely on what it will be on this compiler and platform.

2. For each of the following, write a short function … where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

Easy peasy! Let’s dereference a null pointer:

// Example 2(a): If assert is violated, always undefined behavior

void deref_and_set( int* p ) {
    assert( p );
    *p = 42;
}

The function asserts that p is not null, and then on the next line unconditionally dereferences p and scribbles over the location it points to. If p is null and the assertion checking is off so that we can get to the next line, the compiler is allowed to make running the whole program format our hard drive.

(b) possibly results in undefined behavior

A general way to describe this class of program is that the call site has two bugs: first, it violates a precondition (so the callee’s results are always at least unspecified), and then it additionally then uses the unspecified result without checking it and/or in a dangerous way.

To make up an example, let’s bisect a numeric range:

// Example 2(b): If assert is violated, might lead to undefined behavior

int midpoint( int low, int high ) {
    assert( low <= high );
    return low + (high-low)/2;
        // less overflow-prone than “(low+high)/2”
        // more accurate than “low/2 + high/2”
}

The author of midpoint could have made the function more robust to take the values in either order, and thus eliminated the assertion, but assume they had a reason not to, as alluded to in the comments.

Violating the assertion does not result in undefined behavior directly. The function just doesn’t specify (ahem!) its results if call sites call it in a way that violates the precondition the assertion is testing. If the precondition is violated, then the function can add a negative number to low. But just calculating and returning some other int is not (yet) undefined behavior.

For many call sites, a bad call to midpoint won’t lead to later undefined behavior.

However, it’s possible that some call site might go on to use the unspecified result in a way that does end up being real undefined behavior, such as using it as an array index that performs an out-of-bounds access:

auto m = midpoint( low_index(arr1), high_index(arr2) );   // unspecified
   // here we expect m >= low_index(arr1) ...
stats[m-low_index(arr1)]++;                 // --> potentially undefined

This call site code has a typo, and accidentally mixes the low and high indexes of unrelated containers, which can violate the precondition and result in an index that is less than the “low” value. Then in the next line it tries to use it as an offset index into an instrumentation statistics array, which is undefined behavior for a negative number.

GUIDELINE: Remember that an unspecified result is not in itself undefined behavior, but a call site can run with it and end up with real undefined behavior later. This happen particularly when the calculated value is a pointer, or an integer used as an array index (which, remember, is basically the same thing; a pointer value is just an index into all available memory viewed as an array). If a program relies on unspecified behavior to avoid performing undefined behavior, then it has a path to undefined behavior, and so unspecified behavior is a Crouching Tiger, if you will… still dangerous, and can be turned into to the full dragon.

GUIDELINE: Don’t specify your function’s behavior (output postconditions) for invalid inputs (precondition violations), except for defense in depth (see Example 2(c)). By definition, if a function’s preconditions are violated, then the results are not specified. If you specify the outputs for precondition violations, then (a) callers will depend on the outputs, and (b) those “preconditions” aren’t really preconditions at all.

While we’re at it, here’s a second example: Let’s compare pointers in a way the C++ standard says is unspecified. This program attempts to use pointer comparisons to see whether a pointer points into the contiguous data stored in a vector, but this technique doesn’t work because today’s C++ standard only specifies the results of raw pointer comparison when the pointers point at (into, or one-past-the-end of) the same allocation, and so when ptr is not pointing into v’s buffer it’s unspecified whether either pointer comparison in this test evaluates to false:

// Example 2(b)(ii): If assert is violated, might lead to undefined behavior

// std::vector<int> v = ...;
assert(&v[0] <= ptr && ptr < (&v[0])+v.size());           // unspecified
*ptr = 42;                                  // --> potentially undefined

(c) is never undefined or unspecified behavior

An assertion violation is never undefined behavior if the function specifies what happens in every case even when the assertion is violated. Here’s an example mentioned in my paper P2064, distilled from real-world code:

// Example 2(c): If assert is violated, never undefined behavior
//               (function documents its result when x!=0)

some_result_value DoSomething( int x ) {
    assert( x != 0 );
    if    ( x == 0 ) { return error_value; }
    return sensible_result(x);
}

The function asserts that the parameter is not zero, to express that the call site shouldn’t do that, in a way the call site can check and test… but then it also immediately turns around and checks for the errant value and takes a well-defined fallback path anyway even if it does happen. Why? This is an example of “defense in depth,” and can be a useful technique for writing robust software. This means that even though the assertion may be violated, we are always still in a well-defined state and so this violation does not lead to undefined behavior.

GUIDELINE: Remember that violating an assertion does not necessarily lead to undefined behavior.

GUIDELINE: Function authors, always document your function’s requirements on inputs (preconditions). The caller needs to know what inputs are and aren’t valid. The requirements that are reasonably checkable should be written as code so that the caller can perform the checks when testing their code.

GUIDELINE: Always satisfy the requirements of a function you call. Otherwise, you are feeding “garbage in,” and the best you can hope for is “garbage out.” Make sure your code’s tests includes verifying all the reasonably checkable preconditions of functions that it calls.

Writing the above pattern has two problems: First, it repeats the condition, which invites copy/paste errors. Second, it makes life harder for static analysis tools, which often trust assertions to be true in order to reduce false positive results, but then will think the fallback path is unreachable and so won’t properly analyze that path. So it’s better to use a helper to express the “either assert this or check it and do a fallback operation” in one shot, which always avoids repeating the condition, and could in principle help static analysis tools that are aware of this macro (yes, it would be nicer to do it without resorting to a macro, but it’s annoyingly difficult to write the early return without a macro, because a return statement inside a lambda doesn’t mean the same thing):

// Using a helper that asserts the condition or performs the fallback

#define ASSERT_OR_FALLBACK(B, ACTION) { \
    bool b = B;                         \
    assert(b);                          \
    if(!b) ACTION;                      \
}

some_result_value DoSomething( int x ) {
    ASSERT_OR_FALLBACK( x != 0, return error_value; );
    return sensible_result(x);
}

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

In Example 2(a), violating the assertion leads to undefined behavior, 1(a).

In Example 2(b), violating the assertion leads to unspecified behavior, 1(b). At buggy call sites, this could subsequently lead to undefined behavior.

In Example 2(c), violating the assertion leads to implementation-defined behavior, 1(c), which never in itself leads to  undefined behavior.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

There are many. Here is just one example, that happens to be nice because it is perfectly accurate.

Let’s say we have all the code examples in question 2, written using C assert today (or even with those assertions missing!), and then at some future time we get a version of standard C++ that can express them as preconditions. Then only in Example 2(a), where we can see that the function body (and possibly transitively its further callees with the help of inlining) exercises undefined behavior, a tool can infer the precondition annotation and add it mechanically, and get the benefit of diagnosing existing bugs at call sites:

// What a precondition-aware tool could generate for Example 2(a)

auto f( int* p ) 
    [[pre( p )]]  // can add this automatically: because a violation
                  // leads to undefined behavior, this precondition
                  // is guaranteed to never cause a false positive
{
    assert( p );
    *p = 42;
}

For example, after some future C++2x ships with contracts, a vendor could write an automated tool that goes through every open source C++ project on GitHub and mechanically generates a pull request to insert preconditions for functions like Example 2(a) – but not (b) or (c) – whether or not the assertion already exists, just by noticing the undefined behavior. And it can inject those contract preconditions with complete confidence that none of them will ever cause a false positive, that they will purely expose existing bugs at call sites when that call site is built with contract checking enabled. I would expect such tool to identify a good number of (at least latent if not actual) bugs, and be a boon for C++ users, and it’s possible only for functions in the category of 2(a).

“Automated adoption” of at least part of a new C++ feature, combined with “automatically identifies existing bugs” in today’s code, is a pretty good value proposition.

Acknowledgments

Thank you to the following for their comments on this material: Joshua Berne, Gabriel Dos Reis, Gábor Horváth, Andrzej Krzemieński, Ville Voutilainen.

Notes

[1] In the standard, there are two flavors of undefined behavior. The basic “undefined behavior” is allowed to enter your program only once you actually try to execute the undefined part. But some code is so extremely ill-formed (with magical names like “IF-NDR”) that its very existence in the program makes the entire program invalid, whether you try to execute it or not.

GotW #102: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

JG Question

1. Briefly, what is the difference among:

(a) undefined behavior

(b) unspecified behavior

(c) implementation-defined behavior

Guru Questions

2. For each of the following, write a short function of the form:

/*...function name and signature...*/
{
    assert( /*...some condition about the parameters...*/ );
    /*...do something with parameters...*/;
}

where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

(b) possibly results in undefined behavior

(c) is never undefined or unspecified behavior

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

GotW #101 Solution: Preconditions, Part 2 (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. We covered some basics of preconditions in GotW #100. This time, let’s see how we can use preconditions in some practical examples…

1. Consider these functions, expanded from an article by Andrzej Krzemieński: [1] … How many ways could a caller of each function get the arguments wrong, but that would silently compile without error? Name as many different ways as you can.

There are several ways to break this down. I’ll use three major categories of possible mistakes, the first two of which overlap:

  • wrong order: passing an argument in the wrong position
  • wrong value: passing an argument with a valid but wrong value (e.g., index out of range)
  • invalid value: passing an argument that is already invalid (e.g., an invalid iterator)

Let’s see how these play out with our three examples, starting with (a).

(a) is_in_values (int val, int min, int max)

// Example 1: Adapted from [1]

auto is_in_values (int val, int min, int max)
  -> bool;  // true iff val is in the values [min, max]

Oh my, three identically typed integer parameters… what could be confusing about that?!

Wrong order (5 ways): First, there are five ways to pass these in the wrong order, because there are 3! = 6 permutations, all of which compile but only the first of which is correct:

is_in_values( v,  lo, hi );    // correct

is_in_values( v,  hi, lo );    // all these are wrong, but compile :(
is_in_values( lo, v,  hi ); 
is_in_values( lo, hi, v  ); 
is_in_values( hi, v,  lo ); 
is_in_values( hi, lo, v  );

Some of these argument orders may seem strange, but some are orders other libraries’ similar APIs might use which makes confusion easier, we all make mistakes… and the type system isn’t helping us at all.

Wrong value (1 way): Second, there is an implicit precondition that min <= max, so passing arguments where min > max would be wrong, but would silently compile. Some of these are exercised by the “wrong order” permutations above, but even call sites that remember the right argument order can make mistakes about the actual values.

Invalid value (0 ways): Finally, all possible values of an int are valid — some may be suspiciously big or small, but int doesn’t have the concept of “not a number” (NaN) as we have with floats, or the concept of “invalidated” like we have with iterators.

(b) is_in_container (int val, int idx_min, int idx_max)

It sure doesn’t help that the next function has the identical signature as is_in_values, but with very different meaning:

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool; // true iff container[i]==val for some i in [idx_min, idx_max]

Wrong order (5 ways): As in (a), we again have five ways to pass these in the wrong order, all of which compile but only the first is correct:

is_in_container( v,  lo, hi );    // correct

is_in_container( v,  hi, lo );    // all these are wrong, but compile :(
is_in_container( lo, v,  hi ); 
is_in_container( lo, hi, v  ); 
is_in_container( hi, v,  lo ); 
is_in_container( hi, lo, v  );

Wrong value (3 ways): Again as in (a), we have the implicit precondition that idx_min <= idx_max, so passing idx_min > idx_max would be wrong, but would silently compile. But this time there are two additional ways to go wrong, because idx_min and idx_max must both be valid subscripts into container, so if either is outside the range [0, container.size()) it is a valid integer but an out of bounds value for this use.

Invalid value (0 ways): Again as in (a), all possible values of an int are valid — though some may be wrong values if they’re out of bounds as we noted above, they’re still valid integers.

(c) is_in_range (T val, Iter first, Iter last)

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool; // true iff *i==val for some i in [first,last)

Wrong order (1 way): This time there’s only one way to pass the parameters in the wrong order (ignoring pathological cases where the same argument might convert both T and Iter):

is_in_container( v, istart, iend );    // correct

is_in_container( v, iend, istart );    // wrong, but compiles :(

Wrong value (2 ways): We could pass a first and last that are not a valid range in two ways:

  • they point into the same container, but first doesn’t precede last
  • they point into different containers

Invalid value (2 ways): And finally, either of first or last could actually be an invalidated iterator (e.g., dangling). For example, the container they point into may be destroyed so that both are invalid; or one of the two iterators might have been calculated before a more recent operation like vector::push_back that could have invalidated it.

But if the sight of these function signatures has had you pulling your hair and shouting “use the type system, Luke!” at your screen, you’re not alone… now let’s make things better.

2. Show how can you improve the function declarations in Question 1 by …

(a) just grouping parameters, using a struct with public variables

Interestingly, we actually get a lot of benefit simply by grouping ‘parameters that go together,’ using an creating an aggregate or “grouping” helper struct.[3] For example:

// Example 2(a)(i): Improving Example 1 with aggregate types

struct min_max { int min, max; };

auto is_in_values (int val, min_max minmax) -> bool;
auto is_in_container (int val, min_max rng) -> bool;

template <typename Iter> struct two_iters { Iter first, last; };

template <typename T, typename Iter>
auto is_in_range (T val, two_iters<Iter> rng) -> bool;

Or even just venerable anonymous std::pair is better than no grouping:

// Example 2(a)(ii): Improving Example 1 with aggregate types

auto is_in_values (int val, std::pair<int,int> minmax) -> bool;
auto is_in_container (int val, std::pair<int,int> rng) -> bool;

template <typename T, typename Iter>
auto is_in_range (T val, std::pair<Iter,Iter> rng) -> bool;

With either of the above, there’s only one way for callers to get the argument order wrong. And it requires only two extra characters at call sites, because we can use { } to group the arguments without creating actual named objects of the helper struct:

is_in_values( v, {lo, hi} );	// correct
is_in_values( v, {hi, lo} );	// wrong, but compiles

is_in_container( v, {lo, hi} );	// correct
is_in_container( v, {hi, lo} );	// wrong, but compiles

is_in_range( v, {i1, i2} );		// correct
is_in_range( v, {i2, i1} ); 	// wrong, but compiles

So just grouping parameters using a struct eliminates some errors. But really using the type system is even better…

(b) just using an encapsulated class, using a class with private variables (an abstraction with its own invariant)

Clearly all three functions are crying out for a “range”-like abstraction for its pair of parameters, in the first two cases a range of values and in the third a range of iterators. How do we know? Because:

Here’s one way we can apply class types we can find in the standard library or Boost today:

// Example 2(b): Improving Example 1 with encapsulated class types

auto is_in_values (int val, boost::integer_range<int> rng) -> bool;

auto is_in_container (int val, boost::integer_range<int> rng) -> bool;

template <typename T, std::ranges::input_range Range>
auto is_in_range (T val, Range&& rng) -> bool;

This gives us all the mistake-reduction goodness we got in (a), plus more.

First, as in (a), absent pathological conversions, it’s very difficult to get arguments in the wrong order simply because of being forced to group the parameters:

auto minmax = boost::irange(10, 100);
is_in_values( 42, minmax );

auto minmax2 = boost::irange(0, ssize(myvec)-1);
is_in_container( 42, minmax2 );

auto myvec = std::vector<int>();
is_in_range( 42, myvec );

But, unlike our helper structs in (a), we now get additional safety because the types can express constructor preconditions that move some of those mistakes (such as (hi,lo) misordering) to constructors of class abstractions that can then preserve them as invariants [4] – so the mistake can still be made but in fewer places, to where we construct or modify the abstracted object (e.g., range), rather than every time we use un-abstracted separately values (e.g., a couple of iterator objects we have lying around and whose relationship we have to maintain by hand over time). This is why we sometimes say “types are predicates,” because a type encapsulates a predicate, namely its invariant.

GUIDELINE: When multiple functions state the same precondition, it’s a telltale sign there’s a missing class that should turn it into an invariant. A repeated precondition is nearly always a “naked invariant” that should be encapsulated up inside a type. This is more obvious when the precondition involves multiple parameters (or ordinary variables for that matter); a poster child is the STL’s pervasive use of iterator pairs, which have long been crying out to be encapsulated using a range abstraction, and fortunately we now have that in C++20. Consider using a class instead.

GUIDELINE: Remember that a key reason why encapsulated classes are powerful is that they wrap up preconditions and turn them into invariants. Hiding data members is good dependency management because it limits the code that can depend on the details of the data and is responsible for maintaining the correct relationship among the data members.

(c) just using post-C++20 contract preconditions (not yet valid C++, but something like the syntax in [2])

Preconditions test values, so they can let us eliminate the “wrong values” kinds of mistakes. Consider this code:

// Example 2(c): Improving Example 1 with boolean preconditions

auto is_in_values (int val, int min, int max)
  -> bool // true iff val is in the values [min, max]
     [[pre (min <= max)]]
;

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool // true iff container[i]==val for i in [idx_min, idx_max]
     [[pre (0       <= idx_min
         && idx_min <= idx_max
         && idx_max <  container.size())]]          // see note [5]
;

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool // true iff *i==val for some i in [first,last)
     [[pre (/*... is_reachable? is_not_dangling? hmm ...*/)]]
;

For the first two functions, we can write clear preconditions that can check the “wrong value” bugs.

In these particular examples, the best place to write the preconditions is right on the constructors of the class types we saw in (b), and if we write them there then we don’t have to repeat them as explicit contracts on every function.

But is (b) always better than (c), in other examples? This brings us to our last question, which is all about “can” versus “should”…

3. Consider these three examples, where each shows expressing a boolean condition either as a function precondition or as an encapsulated invariant inside a new type… In each of these cases, which way is better? Explain your answer.

In Question 2, writing a type was often the best choice, but it isn’t always.

The benefits to writing a type include:

  • Encapsulation. We limit the code that is responsible for maintaining the boolean condition.
  • Language support. We get the help of the type system to statically enforce requirements.

But there are costs and limitations too:

  • What’s the abstraction? There may not be a suitable one. We can’t write a good type unless we can discover a useful abstraction that the type’s interface should support. A good type represents a useful reusable domain abstraction that programmers can understand and that makes their code clearer by elevating the vocabulary of the code. There won’t always be a practical and reusable abstraction; when there isn’t, we won’t be able to write a useful and reusable type. — Even when there is, we have to design that all ahead of time, which requires a lot more advance knowledge and engineering than just writing ad-hoc boolean conditions on individual functions.
  • What’s the cost? It may not be feasible to maintain the invariant. We have to do any extra work it takes to maintain the invariant, and it has to be practical to do. When it isn’t, we can’t maintain the invariant without help from outside code, and so we won’t be able to really encapsulate it properly.
  • Does it make sense as an independent abstraction? Will the user be carrying around objects of this type, or are we just jamming a precondition common to a few functions (or only one) into a type and calling it useful? Occam’s Razor: Don’t multiply entities beyond necessity.
  • What’s the type the caller is using? This is where a real usable abstraction shines, because many callers will be using it independently of calling our function. But if the caller isn’t using this type, then there typically has to be an implicit or explicit conversion (because inheritance from all argument types our callers might already have usually isn’t an option), and that conversion would need to be usable and sufficiently cheap.

GUIDELINE: Remember that types and contracts are “better together.” Use both. They are complementary, neither is a substitute for the other. All we are trying to accomplish with contracts is to augment the language’s static type checking with runtime checking where that is more appropriate because we can’t design a practical abstraction. And this is why we want contracts on functions (preconditions, postconditions) even though we already have types, and why we also want contracts on types (invariants).

Let’s consider the three examples.

(a) A vector that is sorted

template <typename T>
void f( vector<T> const& v ) [[pre( is_sorted(v) )]] ;

template <typename T>
void f( sorted<vector<T>> const& v );

If this looks familiar, it’s because is_sorted is one of the classic examples we saw in GotW #98 of conditions that are often impractical to check and enforce as an assertion, in this case a precondition.

Can we do better by making it a type, perhaps a sorted wrapper around a container like vector that maintains the guarantee that it’s always sorted? Well, we have to answer some questions about a sorted<T>:

  • What’s the abstraction it provides? It can’t easily fulfill the requirements of a sequence container like vector itself; for example, push_back doesn’t make much sense because letting the caller insert an arbitrary value at a specific location would easily cause the container to be unsorted. Instead, it would naturally want a more general insert function instead, and the interface would be more like set. This part could be workable.
  • What’s the cost? This where it starts to breaks down: Keeping a vector sorted all the time means that every insertion would cost O(N) work all the time. Which leads into…
  • Does it make sense as an independent abstraction? … that it’s very common for code to maintain an “almost-sorted” vector, such as by inserting new elements at the end which is fast (and, hmm, affects our abstraction design, because then it would make sense to have push_back after all, wouldn’t it? hmm) but leaves a suffix of unsorted elements in the container, and then periodically sorting the whole container so that the sorting cost is amortized. But an almost-sorted vector isn’t good enough, and so doesn’t fit the bill. We don’t have empirical evidence of such types in general use.
  • What’s the type the caller is using? And now we’re busted all the way, because we want this interface to be usable by anyone who has a vector<T>, which would require a conversion to sorted<vector<T>>. If we do a deep copy, that’s prohibitively expensive. Even if the conversion is lightweight by avoiding a deep copy, such as by just wrapping an existing vector object, it wouldn’t be very useful unless it did O(N) work every time unconditionally to verify the invariant. And even then the abstraction design is affected and compromised: If the user can still see and modify the original vector, then that’s still part of the accessible interface to the data, so the user can make the container be not fully sorted and we’re unable to really encapsulate and maintain our intended invariant.

So is_sorted is much better as a function precondition.

// (b) A vector that is not empty

template <typename T>
void f( vector<T> const& v ) [[pre( !v.empty() )]] ;

template <typename T>
void f( not_empty<vector<T>> const& v );

This one is more feasible as a type, but still not ideal:

  • What’s the abstraction it provides? It’s a vector, and we can make the interface identical to vector with just extra preconditions on pop and erase functions to not remove the last element in the container.
  • What’s the cost? Emptiness is cheap enough to check and maintain.
  • Does it make sense as an independent abstraction? This is where it starts to get questionable… the answer is at best “maybe.” It’s not clear to me than a “nonempty vector” is a generally useful abstraction.
  • What’s the type the caller is using? This is where I think we break down again. Again, we want this interface to be usable by anyone who has a vector<T>, and that means a conversion to not_empty<vector<T>>. If we do a deep copy, that’s prohibitively expensive. This time if we just wrap an existing vector object to avoid the deep copy, the check is cheap. But then we still have the problem that the abstraction design is affected and compromised so that it can’t maintain its invariant, because if the user can still see and modify the original vector, they can remove the last element on us.

So not_empty seems better as a function precondition.

(c) A pointer that is not null

void f( int* p ) [[pre( p != nullptr )]] ;

void f( not_null<int*> p );

This time we can do better:

  • What’s the abstraction it provides? This one’s easy to state: It’s a not-null pointer. That’s a far simpler interface than a container, because we just need operator* and operator->, construction, destruction, and copying. Even so it’s not totally without subtlety, because not_null should not have move operations that modify the source object. This means that a not_null<unique_ptr<T>> is legal but there’s not much you can do with it besides dereference it and destroy it: It can’t be copyable because unique_ptr isn’t copyable, and it must not be movable because moving a unique_ptr leaves the source null.
  • What’s the cost? Nullness is cheap enough to check and maintain.
  • Does it make sense as an independent abstraction? Definitely. A “non-null pointer” has been widely rediscovered and reinvented as a generally useful abstraction.
  • What’s the type the caller is using? A not_null<int*> is a useful object in its own right in the calling code, independently of calling this particular function. And if our function is invoked by someone who has only an ordinary int*, doing a full copy of the pointer is cheap, and applying the nullness check as a precondition on that converting constructor is exactly equivalent to writing the precondition by hand, but is automated.

So not_null seems better as a type, primarily because it is independently useful. This is why it has been reinvented a number of times, including as gsl::not_null. [6]

GUIDELINE: Wherever practical, design interfaces so that incorrect call sites are illegal (won’t compile, using the type system) or loud (won’t pass unit tests, using preconditions). This is a key part of achieving the goal to “make interfaces easy to use correctly, and hard to use incorrectly.” Preconditions directly help with that by letting us catch entire groups of errors at test time, and are a complement to the type system which makes incorrect uses “not fit” through the compiler and also carries extra preconditions around for us in the form of invariants.

GUIDELINE: Remember that the type system is a hammer, and not every precondition is a nail. The type system is a powerful tool, but not every precondition is naturally (part of) an invariant of a useful type that provides a good reusable abstraction that’s generally useful independently of this function.

Notes

[1] A. Krzemieński. “Contracts, preconditions and invariants” (Andrzej’s C++ blog, December 2020).

[2] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[3] For 2(a) and 2(b), on platform ABIs that do not pass small structs/classes in registers, turning individual parameters into a struct/class could cause them to be passed in stack memory instead of in registers.

[4] Upcoming GotWs will cover invariants and violation handling.

[5] If C++ gets chained comparisons as proposed in P0515 and P0893 we could write this much more clearly, and with fewer opportunities for mistakes, as:

[[pre( 0 <= idx_min <= idx_max < container.size() )]]

[6] B. Stroustrup and H. Sutter (eds.) “I.12 Declare a pointer that must not be null as not_null” (C++ Core Guidelines.) If the not_null<T> type we are using is implicitly convertible from T, which is the intent of I.12 to provide a drop-in replacement for pointer parameters, then the usability is the same as with the precondition. Otherwise, the caller has to provide a not_null argument at the call site, either by doing an explicit conversion or by just using a not_null local variable in their own body.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gabriel Dos Reis, J. Daniel Garcia, Gábor Horváth, Andrzej Krzemieński, Bjarne Stroustrup, Andrew Sutton, Ville Voutilainen

GotW #101: Preconditions, Part 2 (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. We covered some basics of preconditions in GotW #100. This time, let’s see how we can use preconditions in some practical examples…

JG Question

1. Consider these functions, expanded from an article by Andrzej Krzemieński: [1]

// Adapted from [1]

auto is_in_values (int val, int min, int max)
  -> bool; // true iff val is in the values [min, max]

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool; // true iff container[i]==val for some i in [idx_min, idx_max]

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool; // true iff *i==val for some i in [first,last)

How many ways could a caller of each function get the arguments wrong, but that would silently compile without error? Name as many different ways as you can.

Guru Questions

2. Show how can you improve the function declarations in Question 1 by:

(a) just grouping parameters, using a struct with public variables

(b) just using an encapsulated class, using a class with private variables (an abstraction with its own invariant)

(c) just using post-C++20 contract preconditions (not yet valid C++, but something like the syntax in [2])

In each case, how many of the possible kinds of mistakes for each function can the approach prevent?

3. Consider these three examples, where each shows expressing a boolean condition either as a function precondition or as an encapsulated invariant inside a new type:

// (a) A vector that is sorted

template <typename T>
void f( vector<T> const& v ) [[pre( is_sorted(v) )]] ;

template <typename T>
void f( sorted<vector<T>> const& v );


// (b) A vector that is not empty

template <typename T>
void f( vector<T> const& v ) [[pre( !v.empty() )]] ;

template <typename T>
void f( not_empty<vector<T>> const& v );


// (c) A pointer that is not null

void f( int* p ) [[pre( p != nullptr )]] ;

void f( not_null<int*> p );

In each of these cases, which way is better? Explain your answer.

Notes

[1] A. Krzemieński. “Contracts, preconditions and invariants” (Andrzej’s C++ blog, December 2020).

[2] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ). That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

GotW #100 Solution: Preconditions, Part 1 (Difficulty: 8/10)

This special Guru of the Week series focuses on contracts. We’ve seen how postconditions are directly related to assertions (see GotWs #97 and #99). So are preconditions, but that in one important way makes them fundamentally different. What is that? And why would having language support benefit us even more for writing preconditions more than for the other two?

1. What is a precondition, and how is it related to an assertion?

A precondition is a “call site prerequisite on the inputs”: a condition that must be true at each call site before the caller can invoke this function. In math terms, it’s about expressing the domain of recognized inputs for the function. If preconditions don’t hold, the function can’t possibly do its work (achieve its postconditions), because the caller hasn’t given it a starting point it understands.

A precondition IS-AN assertion in every way described in GotW #97, with the special addition that whereas a general assertion is always checked where it is written, a precondition is written on the function and conceptually checked at every call site. (In less-ideal implementations, including if we write it as a library today, the precondition check might be in the function body; see Question 2.)

Explain your answer using the following example, which uses a variation of a proposed post-C++20 syntax for preconditions. [1]

// Example 1(a): A precondition along the lines proposed in [1]

void f( int min, int max )
    [[pre( min <= max )]]
{
    // ...
}

The above would be roughly equivalent to writing the test before the call at every call site instead. For example, for a call site that performs f(x, y), we want to check the precondition at this specific call site at least when it is being tested (and possibly earlier and/or later, see GotW #97 Question 4):

// Example 1(b): What a compiler might generate at a call site
//               “f(x, y)” for the precondition in Example 1(a)

assert( x <= y ); // implicitly injected assertion at this call site,
                  // checked (at least) when this call site is tested
f(x, y);

And, as we’ll see in Question 4, language support for preconditions should apply this rewrite recursively for subexpressions that are themselves function calls with preconditions.

GUIDELINE: Use a precondition to write “this is what a bug is” as code the caller can check. A precondition states in code the circumstances under which this function’s behavior is not documented.

2. Rewrite the example in Question 1 to show how to approximate the same effect using assertions in today’s C++.

Here’s one way we can do it, that extends the MY_POST technique from GotW #99 Example 2 to also support preconditions. Again, instead of MY_ you’d use your company’s preferred unique macro prefix: [2]

// Eliminate forward-boilerplate with a macro (written only once)
#define MY_PRE_POST(preconditions, postconditions)         \
    assert( preconditions );                               \
    auto post = [&](auto&& _return_) -> auto&& {           \
        assert( postconditions );                          \
        return std::forward<decltype(_return_)>(_return_); \
    };

And then the programmer can just write:

// Example 2: Sample precondition

void f( int min, int max )
{   MY_PRE_POST_V( min <= max, true ); // true == no postconditions here
    // ...
}

This has the big benefit that it works using today’s C++. It has the same advantages as MY_POST in GotW #99, including that it’s future-friendly… if we use the macro as shown above, then if in the future C++ has language support for preconditions and postconditions with a syntax like [1], migrating your code to that could be as simple as search-and-replace:

{ MY_PRE_POST( **, * )[[pre: ** ]] [[post _return_: * )]] {

return post( * )return *

GUIDELINE (extended from GotW #99): If you don’t already use a way to write preconditions and postconditions as code, consider trying something like MY_PRE_POST until language support is available. It’s legal C++ today, it’s not terrible, and it’s future-friendly to adopting future C++ language contracts.

But even if macros don’t trigger your fight-or-flight response, it’s still a far cry from language support …

Are there any drawbacks to your solution compared to having language support for preconditions?

Yes:

  • Callee-body checking only. This method can run the check only inside the function’s body. First, this means we can’t easily perform the check at each call site, which would be ideal including so we can turn the check on for one call site but not another when we are testing a specific caller. Second, for constructors it can’t run at the very beginning of construction because member initialization happens before we enter the constructor body.
  • Doesn’t directly handle nested preconditions, meaning preconditions of functions invoked as part of the precondition itself. We’ll come to this in Question 4.

3. If a precondition fails, what does that indicate, and who is responsible for fixing the failure?

Each call site is responsible for making sure it meets all of a function’s preconditions before calling that function. If a precondition is false, it’s a bug in the calling code, and it’s the calling code author who is responsible for fixing it.

Explain how this makes a precondition fundamentally different from every other kind of contract.

A precondition is the only kind of contract you can write that someone else has to fulfill, and so if it’s ever false then it’s someone else’s fault — it’s the caller’s bug that they need to go fix.

GUIDELINE: Remember the fundamental way preconditions are unique… if they’re false, then it’s someone else’s fault (the calling code author). When you write any of the other contracts (assertions, function postconditions, class invariants), you state something that must be true about your own function or class, and if prior contracts were written and well tested then likely it’s your function or class that created the first unexpected state.

4. Consider this example, expanded from a suggestion by Gábor Horváth:

// Example 4(a): What are the implicit preconditions?

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre( x[0] <= std::sqrt(y) )]] ;

Note that std::floating_point is a C++20 concept.

a) What kinds of preconditions must a caller of calc satisfy that can’t generally be written as testable boolean expressions?

The language requires the number and types of arguments to match the parameter list. Here, calc must be called with two arguments. The first must be a std::vector<int> or something convertible to that. The second one’s type has to satisfy the floating_point concept (it must be float, double, or long double).

It’s worth remembering that these language-enforced rules are conceptually part of the function’s precondition, in the sense that they are requirements on call sites. Even though we generally can’t write testable boolean predicates for these to check that we didn’t write a bug, we also never need to do that because if we write a bug the code just won’t compile. [3] Code that is “correct by construction” doesn’t need to add assertions to find potential bugs.

GUIDELINE: Remember that a static type is a (non-boolean) precondition. It’s just enforced by language semantics with always-static checking (edit or compile time), and never needs to be tested using a boolean predicate whose test could be delayed until dynamic checking (test or run time).

COROLLARY: A function’s number, order, and types of parameters are all (non-boolean) parts of its precondition. This falls out of the “static type” statement because the function’s own static type includes those things. For example, the language won’t let us invoke this function with the argument lists () or (1,2,3,4,5) or (3.14, myvector). We’ll delve into this more deeply in GotW #101.

COROLLARY: All functions have preconditions. Even void f() { }, which takes no inputs at all including that it reads no global state, has the precondition that it must be passed zero arguments. The only counterexample I can think of is pathological: void f(...) { } can be invoked with any number of arguments but ignores them all.

b) What kinds of boolean-testable preconditions are implicit within the explicitly written declaration of calc?

There are three possible kinds of implied boolean preconditions. All three are present in this example.

(1) Type invariants

Each object must meet the invariants of its type. This is subtly different from “the object’s type matches” (a static property) that we say in 4(a), because this means additionally “the object’s value is not corrupt” (a dynamic property).

Here, this means x must obey the invariant of vector<int>, even though that invariant isn’t expressed in code in today’s C++. [4] For y this is fairly easy because all bit patterns are valid floating point values (more about NaNs in just a moment).

(2) Subexpression preconditions

The subexpression x[0] calls x.operator[] which has its own precondition, namely that the subscript be non-negative and less than x.size(). For 0, that’s true if x.size() > 0 is true, or equivalently !x.empty(), so that becomes an implicit part of our whole precondition.

(3) Subexpressions that make the whole precondition false

The subexpression std::sqrt(y) invokes C’s sqrt. The C standard says that unless y >= 0, the result of sqrt(y) is NaN (“not a number”), which means our precondition amounts to something <= NaN which is always false. Therefore, y >= 0 is effectively part of calc’s precondition too. [5]

Putting it all together

If we were to write this all out, the full precondition would be something like this — and note that the order is important! Here we’ll ignore the parts that are enforced by the language, such as parameter arity, and focus on the parts that can be written as boolean expressions:

// Example 4(b): Trying to write the precondition out more explicitly
//               (NOT all are recommended, this is for exposition)

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre(

//  1. parameter type invariants:
           /* x is a valid object, but we can’t spell that, so: */ true

//  2. subexpression preconditions:
        && x.size() > 0  // so checking x[0] won’t be undefined (!)

//  3. subexpression values that make our precondition false:
        && y >= 0        // redundant with the expression below

// finally, our explicit precondition itself:
        && x[0] <= std::sqrt(y) 
    )]] ;

GUIDELINE: Remember that your function’s full effective precondition is the precondition you write plus all its implicit prerequisites. Those are: (1) each parameter’s type invariants, (2) any preconditions of other function calls within the precondition, and (3) any defined results of function calls within the precondition that would make the precondition false.

c) Should any of these boolean-testable implicit preconditions also be written explicitly here in this precondition code? Explain.

For #1 and #3, we generally shouldn’t be repeating them as in 4(b):

  • We can skip repeating #1 because it’s enforced by the type system, plus if there is a bug it’s likely in the type itself rather than in our code or our caller’s code and will be checked when we check the type’s invariants.
  • We can skip repeating #3 because it’ll just make the whole condition be false and so is already covered.

But #2 is the problematic case: If x is actually empty, the subexpression’s precondition would actually make our precondition undefined to evaluate! “Undefined” is a very bad answer if we ever check this precondition, because if in our checking the full precondition is ever violated then we absolutely want that check to do something well-defined — we want it to evaluate to false and fail.

If a subexpression of our precondition itself has a real precondition, then we do want to check that first, otherwise we cannot check our full precondition without undefined behavior if that subexpression’s precondition was not met:

// Example 4(c): Today, we should repeat our subexpressions’ real
//               preconditions, so we can check our precondition
//               without undefined behavior

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre( x.size() > 0 && x[0] <= std::sqrt(y) )]] ;

With today’s library-based preconditions, such as the one shown in Question 2, we need to repeat subexpressions’ preconditions if we want to check our precondition without undefined behavior. One of the potential advantages of a language-supported contracts system is that it can “flatten” the preconditions to automatically test category #2 , so that nested preconditions like this one don’t need to be repeated (assuming that the types and functions you use, here std::vector and its member functions, have written their preconditions and invariants)… and then we could still debate whether or not to explicitly repeat subexpression preconditions in our preconditions, but it would be just a water-cooler stylistic debate, not a “can this even be checked at all without invoking undefined behavior” correctness debate.

Here’s a subtle variation suggested by Andrzej Krzemieński. For the sake of discussion, suppose we have a nested precondition that is not used in the function body (which I think is terribly unlikely, but let’s just consider it):

void display( /*...*/ )
    [[pre( globalData->helloMessageHasBeenPrinted() )]]
{
    // assume for sake of discussion that globalData is not
    // dereferenced directly or indirectly by this function body
}

Here, someone could argue: “If globalData is null, only actually checking the precondition would be undefined behavior, but executing the function body would not be undefined behavior.”

Question: Is globalData != nullptr an implicit precondition of display, since it applies only to the precondition, and is not actually used in the function body? Think about it for a moment before continuing…

… okay, here’s my answer: Yes, it’s absolutely part of the precondition of display, because by definition a precondition is something the caller is required to ensure is true before calling display, and a condition that is undefined to evaluate at all cannot be true.

GUIDELINE: If your checked precondition has a subexpression with its own preconditions, make sure those are checked first. Otherwise, you might find your precondition check doesn’t fire even when it’s violated. In the future, language support for preconditions might automate this for you; until then, be careful to write out the subexpression precondition by hand and put it first.

Notes

[1] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[2] Again, as in GotW #99 Note 4, in a real system we’d want a few more variations, such as:

// A separate _V version for functions that don’t return
// a value, because 'void' isn’t regular
#define MY_PRE_POST_V(preconditions, postconditions) \
    assert( preconditions );                         \
    auto post = [&]{ assert( postconditions ); };

// Parallel _DECL forms to work on forward declarations,
// for people who want to repeat the postcondition there
#define MY_PRE_POST_DECL(preconditions, postconditions)
#define MY_PRE_POST_V_DECL(preconditions, postconditions)

And see GotW #99 Note 5 for how to guarantee the programmer didn’t forget to write “return post” at each return.

[3] Sure, there are things like is_invocable, but the point is we can’t always write those expressions, and we don’t have to here.

[4] Upcoming GotWs will cover invariants and violation handling. For type invariants, today’s C++ doesn’t yet provide a way to write those as a checkable assertions to help us find bugs where we got it wrong and corrupted an object. The language just flatly assumes that every object meets the invariants of its type during the object’s lifetime, which is from the end of its construction to the beginning of its destruction.

[5] There’s more nuance to the details of what the C standard says, but it ends up that we should expect the result of passing a negative or NaN value to sqrt will be NaN. Although C calls negative and NaN inputs “domain errors,” which hints at a precondition, it still defines the results for all inputs and so strictly speaking doesn’t have a precondition.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gabriel Dos Reis, J. Daniel Garcia, Gábor Horváth, Andrzej Krzemieński, Jean-Heyd Meneide, Bjarne Stroustrup, Andrew Sutton, Jim Thomas, Ville Voutilainen.

Trip report: Winter 2021 ISO C++ standards meeting (virtual)

Today, the ISO C++ committee held its second full-committee (plenary) meeting of the pandemic and adopted a few more features and improvements for draft C++23.

A record of 18 voting nations sent representatives to this meeting: Austria, Bulgaria, Canada, Czech Republic, Finland, France, Germany, Israel, Italy, Japan, Netherlands, Poland, Romania, Russia, Spain, Switzerland, United Kingdom, and United States. Japan had participated in person during C++98 and C++11, and has always given us good remote ballot feedback during C++14/17/20, and is attending again now; welcome back! Italy and Romania are our newest national bodies; welcome!

Our virtual 2021

We continue to have the same priorities and the same schedule we originally adopted for C++23. However, since the pandemic began, WG21 and its subgroups have had to meet all-virtually via Zoom, and we are not going to try to have a face-to-face meeting in 2021 (see What’s Next below). Some subgroups had already been having virtual meetings for years, but this was a major change for other groups including our two main design groups – the language and library evolution working groups (EWG and LEWG). In all, over the past year we have held approximately 200 virtual meetings.

Today: A few more C++23 features adopted

Today we formally adopted a second round of small features for C++23, as well as a number of bug fixes. Below, I’ll list some of the more user-noticeable changes and credit all those paper authors, but note that this is far from an exhaustive list of important contributors… even for these papers, nothing gets done without help from a lot of people and unsung heroes, so thank you first to all of the people not named here who helped the authors move their proposals forward! And thank you to everyone who worked on the adopted issue resolutions and smaller papers I didn’t include in this list.

P1102 by Alex Christensen and JF Bastien is the main noticeable change we adopted for the core language itself. It’s just a tiny bit of cleanup, but one that I’m personally fond of: In C++23 we will be able to omit empty ( ) lambda parameter lists even when we have to declare the lambda mutable. I’m the one who proposed the lambda syntax we have today (except for the mutable part which wasn’t mine and I never liked), including that it enabled making unused parts of the syntax optional so that we can write simple lambdas simply. For example, today we can already write

[x]{ return f(x); }

as a legal synonym for

[x] () -> auto { return f(x); }

and omit the empty parameter list and deduced return type. Even so, I’ve noticed a lot of people write the ( ) part anyway, which isn’t wrong or anything, it’s just that often they write it because they don’t know they can omit it too. And part of the problem was the oddity in pre-C++23 that if you need to write mutable, then you actually do have to also write the ( ) (but not the return type), which was just weird but was another reason for people to just write ( ) all the time, because sometimes they had to. With P1102, we don’t have to. That’s more consistent. Thanks, Alex and JF!

In the spirit of “completing C++20,” P2259 by Tim Song makes several fixes to iterator_category to make it work better with ranges and adaptors. Here is an example of code that does not compile today for arcane reasons (see the paper), but will be legal C++23 thanks to Tim:

std::vector<int> vec = {42};
auto r = vec | std::views::transform([](int c) { return std::views::single(c);})
             | std::views::join
             | std::views::filter([](int c) { return c > 0; });
r.begin();

Further in the “completing C++20” spirit, P2017 by Barry Revzin fixes some additional glitches in ranges to make them work better. Here is an example of safe and efficient code that does not compile today, where for arcane reasons the declaration of e isn’t supported and today’s workaround is to make the code more complex and less efficient. This will be legal C++23 thanks to Barry:

auto trim(std::string const& s) {
    auto isalpha = [](unsigned char c){ return std::isalpha(c); };
    auto b = ranges::find_if(s, isalpha);
    auto e = ranges::find_if(s | views::reverse, isalpha).base();
    return subrange(b, e);
}

P2212 by Alexey Dmitriev and Howard Hinnant generalizes time_point::clock to allow for greater flexibility in the kinds of clocks it supports, including stateful clocks, external system clocks that don’t really have time_points, representing “time of day” as a distinct time_point, and more.

P2162 by Barry Revzin takes an important first step toward cleaning up std::visit and lay the groundwork for its further generalization. Even if you don’t yet love std::visit, it’s a useful tool that P2162 makes more useful by making it work more regularly. We expect to see further generalization in the future, which is much easier to do with a cleaner and more regular existing feature to build upon.

Finally, I saw cheers and celebratory emoji erupt in the Zoom chat window when we adopted P1682 by JeanHeyd Meneide. It’s very small, but very useful. When passing an enum to an API that uses the underlying type, today we have to write a static_cast to the std::underlying_type, which makes us repeat the enum’s name and so is cumbersome all the time and brittle for type-safety under maintenance if we change to use a different enum:

some_untyped_api( static_cast<std::underlying_type_t<ABCD>>(some_value) );

Thanks to JeanHeyd, in C++23 we will be able to write:

some_untyped_api( std::to_underlying(some_value) );

Note that of course standard library vendors don’t have to wait until 2023 to provide to_underlying or any of these other fixes and improvements. Just having a feature like this one voted into the draft standard is often enough for vendors to be proactive in providing it… these days, vendors are more closely tracking our draft standard meeting by meeting rather than waiting for the official release, in part because we are shipping regularly and predictably and we don’t vote features into the draft standard until we think they’re pretty well baked so that vendors have less risk in implementing them early.

We also adopted a number of other issue resolutions and small papers that made additional improvements.

Finally, we came close to adopting P0533 by Edward Rosten and Oliver Rosten, which is about adding constexpr to many of the functions in math.h that we share with C. This is clearly a Good Thing and therefore many voted in favor of adopting the paper. The only hesitation that stopped it from getting consensus this time were concerns that it needed more time to iron out how implementations would implement it, such as how to deal with errno in a constexpr context. This is the kind of question that often arises when we want to make improvements to entities declare in the C headers, because not only are they governed by the C standard rather than the C++ standard, but typically they are provided and controlled by the operating system vendor rather than by the C++ compiler/library writer, and those constraints always mean a bit of extra work when we want to make improvements for C++ programmers and remain compatible. As far as I know, everyone wants to see these functions made constexpr, so we expect to see this paper come to plenary again in the future. Thanks for your perseverance, Edward and Oliver!

What’s next

As long as we are meeting virtually, we will continue to have virtual plenaries like the one we had this week to formally adopt new features as they progress through subgroups. Our next two virtual plenaries to adopt features into the C++23 working draft will be held in June and November. Progress will be slower than when we can meet face-to-face, and we’ll doubtless defer some topics that really need in-person discussion until we can meet again safely, but in the meantime we’ll make what progress we can and we’ll ship C++23 on time.

The next tentatively planned face-to-face meeting is February 2022 in Portland, OR, USA; however, we likely won’t know until well into the autumn whether we’ll be able to confirm that or need to postpone it. You can find a list of our meeting plans on the Upcoming Meetings page.

Thank you again to the hundreds of people who are working tirelessly on C++, even in our current altered world. Your flexibility and willingness to adjust are much appreciated by all of us in the committee and by all the C++ communities! Thank you, and see you on Zoom.

GotW #100: Preconditions, Part 1 (Difficulty: 8/10)

This special Guru of the Week series focuses on contracts. We’ve seen how postconditions are directly related to assertions (see GotWs #97 and #99). So are preconditions, but that in one important way makes them fundamentally different. What is that? And why would having language support benefit us even more for writing preconditions more than for the other two?

JG Question

1. What is a precondition, and how is it related to an assertion? Explain your answer using the following example, which uses a variation of a proposed post-C++20 syntax for preconditions. [1]

// A precondition along the lines proposed in [1]

void f( int min, int max )
    [[pre( min <= max )]]
{
    // ...
}

Guru Questions

2. Rewrite the example in Question 1 to show how to approximate the same effect using assertions in today’s C++. Are there any drawbacks to your solution compared to having language support for preconditions?

3. If a precondition fails, what does that indicate, and who is responsible for fixing the failure? Explain how this makes a precondition fundamentally different from every other kind of contract.

4. Consider this example, expanded from a suggestion by Gábor Horváth:

auto calc( std::vector<int> const&  x ,
           std::floating_point auto y ) -> double
    [[pre( x[0] <= std::sqrt(y) )]] ;

Note that std::floating_point is a C++20 concept.

  • What kinds of preconditions must a caller of calc satisfy that can’t generally be written as testable boolean expressions?
  • What kinds of boolean-testable preconditions are implicit within the explicitly written declaration of calc?
  • Should any of these boolean-testable implicit preconditions also be written explicitly here in this precondition code? Explain.

Notes

[1] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ). That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

GotW #99 Solution: Postconditions (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Postconditions are directly related to assertions (see GotW #97)… but how, exactly? And since we can already write postconditions using assertions, why would having language support benefit us more for writing postconditions more than for writing (ordinary) assertions?

1. What is a postcondition, and how is it related to an assertion?

A function’s postconditions document “what it does” — they assert the function’s intended effects, including the return value and any other caller-visible side effects, which must hold at every return point when the function returns to the caller.

A postcondition IS-AN assertion in every way described in GotW #97, with the special addition that whereas a general assertion is always checked where it is written, a postcondition is written on the function and checked at every return (which could be multiple places). Otherwise, it’s “just an assertion”: As with an assertion, if a postcondition is false then it means there is a bug, likely right there inside the function on which the postcondition is written (or in the postcondition itself), because if prior contracts were well tested then likely this function created the first unexpected state. [2]

Explain your answer using the following example, which uses a variation of a proposed post-C++20 syntax for postconditions. [1]

// Example 1(a): A postcondition along the lines proposed in [1]

string combine_and_decorate( const string& x, const string& y )
    [[post( _return_.size() > x.size() + y.size() )]]
{
    if (x.empty()) {
        return "[missing] " + y + optional_suffix();
    } else {
        return x + ' ' + y + something_computed_from(x);
    }
}

The above would be roughly equivalent to writing the test before every return statement instead:

// Example 1(b): What a compiler might generate for Example 1(a)

string combine_and_decorate( const string& x, const string& y )
{
    if (x.empty()) {
        auto&& _return_ = "[missing] " + y + optional_suffix();
        assert( _return_.size() > x.size() + y.size() );
        return std::forward<decltype(_return_)>(_return_);
    } else {
        auto&& _return_ = x + ' ' + y + something_computed_from(x);
        assert( _return_.size() > x.size() + y.size() );
        return std::forward<decltype(_return_)>(_return_);
    }
}

2. Rewrite the example in Question 1 to show how to approximate the same effect using assertions in today’s C++. Are there any drawbacks to your solution compared to having language support for postconditions?

We could always write Example 1(b) by hand, but language support for postconditions is better in two key ways:

(A) The programmer should only write the condition once.

(B) The programmer should not need to write forwarding boilerplate by hand to make looking at the return value efficient.

How can we approximate those advantages?

Option 1 (basic): Named return object + an exit guard

The simplest way to achieve (A) would be to use the C-style goto exit; pattern:

// Example 2(a)(i): C-style “goto exit;” postcondition pattern

string combine_and_decorate( const string& x, const string& y )
{
    auto _return_ = string();
    if (x.empty()) {
        _return_ = "[missing] " + y + optional_suffix();
        goto post;
    } else {
        _return_ = x + ' ' + y + something_computed_from(x);
        goto post;
    }

post:
    assert( _return_.size() > x.size() + y.size() );
    return _return_;
}

If you were thinking, “in C++ this wants a scope guard,” you’re right! [3] Guards still need access to the return value, so the structure is basically similar:

// Example 2(a)(ii): scope_guard pattern, along the lines of [3]

string combine_and_decorate( const string& x, const string& y )
{
    auto _return_ = string();
    auto post = std::experimental::scope_success([&]{
        assert( _return_.size() > x.size() + y.size() );
    });

    if (x.empty()) {
        _return_ = "[missing] " + y + optional_suffix();
        return _return_;
    } else {
        _return_ = x + ' ' + y + something_computed_from(x);
        return _return_;
    }
}

Advantages:

  • Achieved (A). The programmer writes the condition only once.

Drawbacks:

  • Didn’t achieve (B). There’s no forwarding boilerplate, but only because we’re not even trying to forward…
  • Overhead (maybe). … and to look at the return values we require a named return value and a move assignment into that object, which is overhead if the function wasn’t already doing that.
  • Brittle. The programmer has to remember to convert every return site to _return_ = ...; goto post; or _return_ = ...; return _return_;… If they forget, the code silently compiles but doesn’t check the postcondition.

Option 2 (better): “return post” postcondition pattern

Here’s a second way to do it that achieves both goals, using a local function (which we have to write as a lambda in C++):

// Example 2(b): “return post” postcondition pattern

string combine_and_decorate( const string& x, const string& y )
{
    auto post = [&](auto&& _return_) -> auto&& {
        assert( _return_.size() > x.size() + y.size() );
        return std::forward<decltype(_return_)>(_return_);
    };

    if (x.empty()) {
        return post( x + ' ' + y + something_computed_from(x) );
    } else {
        return post( "[missing] " + y + optional_suffix() );
    }
}

Advantages:

  • Achieved (A). The programmer writes the condition only once.
  • Efficient. We can look at return values efficiently, without requiring a named return value and a move assignment.

Drawbacks:

  • Didn’t achieve (B). We still have to write the forwarding boilerplate, but at least it’s only in one place.
  • Brittle. The programmer has to remember to convert every return site to return post. If they forget, the code silently compiles but doesn’t check the postcondition.

Option 3 (mo’betta): Wrapping up option 2… with a macro

We can improve Option 2 by wrapping the boilerplate up in a macro (sorry). Note that instead of “MY_” you’d use your company’s preferred unique macro prefix: [4]

// Eliminate forward-boilerplate with a macro (written only once)
#define MY_POST(postconditions)                            \
    auto post = [&](auto&& _return_) -> auto&& {           \
        assert( postconditions );                          \
        return std::forward<decltype(_return_)>(_return_); \
    };

And then the programmer can just write:

// Example 2(c): “return post” with boilerplate inside a macro

string combine_and_decorate( const string& x, const string& y )
{   MY_POST( _return_.size() > x.size() + y.size() );

    if (x.empty()) {
        return post( x + ' ' + y + something_computed_from(x) );
    } else {
        return post( "[missing] " + y + optional_suffix() );
    }
}

Advantages:

  • Achieved (A) and (B). The programmer writes the condition only once, and doesn’t write the forwarding boilerplate.
  • Efficient. We can look at the return value without requiring a local variable for the return value, and without an extra move operation to put the value there.
  • Future-friendly. You may have noticed that I changed my usual brace style to write { MY_POST on a single line; that’s to make it easily replaceable with search-and-replace. If you systematically declare the condition as { MY_POST at the start of the function, and systematically write return post() to use it, the code is likely more future-proof — if we get language support for postconditions with a syntax like [1], migrating your code to that could be as simple as search-and-replace:

{ MY_POST( * )[[post _return_: * )]] {

return post( * )return *

Drawbacks:

  • (improved) Brittle. It’s still a manual pattern, but now we have the option of making it impossible for the programmer to forget return post by extending the macro to include a check that post was used before each return (see [5]). That’s feasible to put into the Option 3 macro, whereas it was not realistic to ask the programmer to write out by hand in Options 1 and 2.

GUIDELINE: If you don’t already use a way to write postconditions as code, consider trying something like MY_POST until language support is available. It’s legal C++ today, it’s not terrible, and it’s future-friendly to adopting future C++ language contracts.

Finally, all of these options share a common drawback:

  • Less composable/toolable. The next library or team will have THEIR_POST convention that’s different, which makes it hard to write tools to support both styles. Language support has an important incidental benefit of providing a common syntax that portable code and tools can rely upon.

3. Should a postcondition be expected to be true if the function throws an exception back to the caller?

No.

First, let’s generalize the question: Anytime you see “if the function throws an exception,” mentally rewrite it to “if the function reports that it couldn’t do what it advertised, namely complete its side effects.” That’s independent of whether it reports said failure using an exception, std::error_code, HRESULT, errno, or any other way.

Then the question answers itself: No, by definition. A postcondition documents the side effects, and if those weren’t achieved then there’s nothing to check. And for postconditions involving the return value we can add: No, those are meaningless by construction, because it doesn’t exist.

“But wait!” someone might interrupt. “Aren’t there still things that need to be true on function exit even if the function failed?” Yes, but those aren’t postconditions. Let’s take a look.

Justify your answer with example(s).

Consider this code:

// Example 3: (Not) a reasonable postcondition?

void append_and_decorate( string& x, string&& y )
    [[post( x.size() <= x.capacity() && /* other non-corruption */ )]]
{
    x += y + optional_suffix();
}

This can seem like a sensible “postcondition” even when an exception is thrown, but it is testing whether x is still a valid object of its type… and sure, that had better be true. But that’s an invariant, which should be written once on the type [2], not a postcondition to be laboriously repeated arbitrarily many times on every function that ever might touch an object of that type.

When reasoning about function failures, we use the well-known Abrahams error safety guarantees, and now it becomes important to understand them in terms of invariants:

  • The nofail guarantee is “the function cannot fail” (e.g., such functions should be noexcept), and so doesn’t apply here since we’re discussing what happens if the function does fail.
  • The basic guarantee is “no corruption,” every object we might have tried to modify is still a valid object of its type… but that’s identical to saying “the object still meets the invariants of its type.”
  • The strong guarantee is “all or nothing,” so in the case we’re talking about where an error is being reported, a strong guarantee function is again saying that all invariants hold. (It also says observable state did not change, but I’ll ignore that for now; for how we might want to check that, see [6].)

So we’re talking primarily about class invariants… and those should hold on both successful return and error exit, and they should be written on the type rather than on every function that uses the type.

GUIDELINE: If you’re trying to write a “postcondition” that should still be true even if an exception or other error is reported, you’re probably either trying to write an invariant instead [2], or trying to check the strong did-nothing guarantee [6].

4. Should postconditions be able to refer to both the initial (on entry) and final (on exit) value of a parameter, if those could be different?

Yes.

If so, give an example.

Consider this code, which uses a strawman _in_() syntax for referring to subexpressions of the postcondition that should be computed on entry so they can refer to the “in” value of the parameter (note: this was not proposed in [1]):

// Example 4(a): Consulting “in” state in a postcondition

void instrumented_push( vector<widget>& c, const widget& value )
    [[post( _in_(c.size())+1 == c.size() )]]
{

    c.push_back(value);

    // perform some extra work, such as logging which
    // values are added to which containers, then return
}

Postconditions like this one express relative side effects, where the “out” state is a delta from the “in” state of the parameter. To write postconditions like this one, we have to be able to refer to both states of the parameter, even for parameters that must be modifiable.

Note that this doesn’t require taking a copy of the parameter… that would be expensive for c! Rather, an implementation would just evaluate any _in_ subexpression on entry and store only that result as a temporary, then evaluate the rest of the expression on exist. For example, in this case the implementation could generate something like this:

// Example 4(b): What an implementation might generate for 4(a)

void instrumented_push( vector<widget>& c, const widget& value )
{
    auto __in_c_size = c.size();

    c.push_back(value);

    // perform some extra work, such as logging which
    // values are added to which containers, then return

    assert( __in_c_size+1 == c.size() );
}

Notes

[1] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[2] Upcoming GotWs will cover preconditions and invariants, including how invariants relate to postconditions.

[3] P. Sommerlad and A. L. Sandoval. “P0052: Generic Scope Guard and RAII Wrapper for the Standard Library” (WG21 paper, February 2019). Based on pioneering work by Andrei Alexandrescu and Petru Marginean starting with “Change the Way You Write Exception-Safe Code – Forever” (Dr. Dobb’s Journal, December 2000), and widely implemented in D and other languages, the Folly library, and more.

[4] In a real system we’d want a few more variations, such as:

// A separate _V version for functions that don’t return
// a value, because 'void' isn’t regular
#define MY_POST_V(postconditions)                          \
    auto post = [&]{ assert( postconditions ); };

// Parallel _DECL forms to work on forward declarations,
// for people who want to repeat the postcondition there
#define MY_POST_DECL(postconditions)   // intentionally empty 
#define MY_POST_V_DECL(postconditions) // intentionally empty

Note: We could try to combine MY_POST_V and MY_POST by always creating both a single-parameter lambda and a no-parameter lambda, and then “overloading” them using something like compose from Boost’s wonderful High-Order Function library by Paul Fultz II. Then in a void-returning function return post() still works fine even with empty parens. I didn’t do that because the proposed future in-language contracts proposed in [1] uses a slightly different syntax depending on whether there’s a return value, so if our syntax doesn’t somehow have such a distinction then it will be harder to migrate this macro to a syntax like [1] with a simple search-and-replace.

[5] We could add extra machinery help the programmer remember to write return post, so that just executing a return without post will assert… set a flag that gets sets on every post() evaluation, and then assert that flag in the destructor of an RAII object for every normal return. The code is pretty simple with a scope guard [3]:

// Check that the programmer wrote “return post” each time
#define MY_POST_CHECKED                                     \
    auto post_checked = false;                              \
    auto post_guard = std::experimental::scope_success([&]{ \
        assert( post_checked );                             \
    });

Then in MY_POST and MY_POST_V, pull in this machinery and then also set post_checked:

#define MY_POST(postconditions)                             \
    MY_POST_CHECKED                                         \
    auto post = [&](auto&& _return_) -> auto&& {            \
        assert( postconditions );                           \
        post_checked = true;                                \
        return std::forward<decltype(_return_)>(_return_);  \
    };

#define MY_POST_V(postconditions)                           \
    MY_POST_CHECKED                                         \
    auto post = [&]{                                        \
        assert( postconditions );                           \
        post_checked = true;                                \
    };

If you don’t have a scope guard helper, you can roll your own, where “successful exit” is detectable by seeing that the std::uncaught_exceptions() exception count hasn’t changed:

// Hand-rolled alternative if you don’t have a scope guard
#define MY_POST_CHECKED                                     \
    auto post_checked = false;                              \
    struct post_checked_ {                                  \
        const bool *pflag;                                  \
        const int  ecount = std::uncaught_exceptions();     \
        post_checked_(const bool* p) : pflag{p} {}          \
        ~post_checked_() {                                  \
            assert( *pflag ||                               \
                    ecount != std::uncaught_exceptions() ); \
        }                                                   \
    } post_checked_guard{&post_checked}; 

[6] For strong-guarantee functions, we could try to check that all observable state is the same as on function entry. In some cases, we can partly do that… for example, writing the test that a failed vector::push_back didn’t invalidate any pointers into the container may sound hard, but it’s actually the easy part of that function’s “error exit” condition! Using a strawman syntax like [1], extended to include an “error” exit condition:

// (Using a hypothetical “error exit” condition)
// This is enough to check that no pointers into *this are invalid

template <typename T, typename Allocator>
constexpr void vector<T>::push_back( const T& )
    [[error( _in_.data() == data() && _in_.size() == size() )]] ;

But other “error exit” checks for this same function would be hard, expensive, or impossible to express. For example, it would be expensive to write the check that all elements in the vector have their original values, which would require first taking a deep copy of the container.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gábor Horváth, Andrzej Krzemieński, James Probert, Bjarne Stroustrup, Andrew Sutton.