Cpp2 and cppfront: Year-end mini-update

As we close out 2022, I thought I’d write a short update on what’s been happening in Cpp2 and cppfront. If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube.

Most of this post is about improvements I’ve been making and merging over the year-end holidays, and an increasing number of contributions from others via pull requests to the cppfront repo and in companion projects. Thanks again to so many of you who expressed your interest and support for this personal experiment, including the over 3,000 comments on Reddit and YouTube and the over 200 issues and PRs on the cppfront repo!

10 design notes

On the cppfront wiki, I’ve written more design notes about specific parts of the Cpp2 language design that answer common questions. They include:

  • Broad strategic topics, such as addressing ABI and versioning, “unsafe” code, and aiming to eliminate the preprocessor with reflection.
  • Specific language feature design topics, such as unified function call syntax (UFCS), const, and namespaces.
  • Syntactic choices, such as postfix operators and capture syntax.
  • Implementation topics, such as parsing strategies and and grammar details.

117 issues (3 open), 74 pull requests (9 open), 6 related projects, and new collaborators

I started cppfront with “just a blank text editor and the C++ standard library.” Cppfront continues to have no dependencies on other libraries, but since I open-sourced the project in September I’ve found that people have started contributing working code — thank you! Authors of merged pull requests include:

  • The prolific Filip Sajdak contributed a number of improvements, probably the most important being generalizing my UFCS implementation, implementing more of is and as as described in P2392, and providing Apple-Clang regression test results. Thanks, Filip!
  • Gabriel Gerlero contributed refinements in the Cpp2 language support library, cpp2util.h.
  • Jarosław Głowacki contributed test improvements and ensuring all the code compiles cleanly at high warning levels on all major compilers.
  • Konstantin Akimov contributed command-line usability improvements and more test improvements.
  • Fernando Pelliccioni contributed improvements to the Cpp2 language support library.
  • Jessy De Lannoit contributed improvements to the documentation.

Thanks also to these six related projects, which you can find listed on the wiki:

Thanks again to Matt Godbolt for hosting cppfront on Godbolt Compiler Explorer and giving feedback.

Thanks also to over 100 other people who reported bugs and made suggestions via the Issues. See below for some more details about these features and more.

Compiler/language improvements

Here are some highlights of things added to the cppfront compiler since I gave the first Cpp2 and cppfront talk in September. Most of these were implemented by me, but some were implemented by the PR authors I mentioned above.

Roughly in commit order (you can find the whole commit history here), and like everything else in cppfront some of these continue to be experimental:

  • Lots of bug fixes and diagnostic improvements.
  • Everything compiles cleanly under MSVC -W4 and GCC/Clang -Wall -Wextra.
  • Enabled implicit move-from-last-use for all local variables. As I already did for copy parameters.
  • After repeated user requests, I turned -n and -s (null/subscript dynamic checking) on by default. Yes, you can always still opt out to disable them and get zero cost, Cpp2 will always stay a “zero-overhead don’t-pay-for-what-you-don’t-use” true-C++ environment. All I did was change the default to enable them.
  • Support explicit forward of members/subobjects of composite types. For a parameter declared forward x: SomeType, the default continues to be that the last use of x is automatically forwarded for you; for example, if the last use is call_something( x ); then cppfront automatically emits that call as call_something( std::forward<decltype(x)>(x) ); and you never have to write out that incantation. But now you also have the option to separately forward parts of a composite variable, such as that for a forward x: pair<string, string>> parameter you can write things like do_this( forward x.first ) and do_that( 1, 2, 3, forward x.second ).
  • Support is template-name and is ValueOrPredicate: is now supports asking whether this is an instantiation of a template (e.g., x is std::vector), and it supports comparing values (e.g., x is 14) and using predicates (e.g., x is (less_than(20)) invoking a lambda) including for values inside a std::variant, std::any, and std::optional (e.g., x is 42 where x is a variant<int,string> or an any).
  • Regression test results for all major compilers: MSVC, GCC, Clang, and Apple-Clang. All are now checked in and can be conveniently compared before each commit.
  • Finished support for >> and >>= expressions. In today’s syntax, C++ currently max-munches the >> and >>= tokens and then situationally breaks off individual > tokens, so that we can write things like vector<vector<int>> without putting a space between the two closing angle brackets. In Cpp2 I took the opposite choice, which was to not parse >> or >>= as a token (so max munch is not an issue), and just merge closing angles where a >> or >>= can grammatically go. I’ve now finished the latter, and this should be done.
  • Generalized support for UFCS. In September, I had only implemented UFCS for a single call of the form x.f(y), where x could not be a qualified name or have template arguments. Thanks to Filip Sajdak for generalizing this to qualified names, templated names, and chaining multiple UFCS calls! That was a lot of work, and as far as I can tell UFCS should now be generally complete.
  • Support declaring multi-level pointers/const.
  • Zero-cost implementation of UFCS. The implementation of UFCS is now force-inlined on all compilers. In the tests I’ve looked at, even when calling a nonmember function f(x,y), using Cpp2’s x.f(y) unified function call syntax (which tries a member function first if there is one, else falls back to a nonmember function), the generated object code at all optimization levels is now identical, or occasionally better, compared to calling the nonmember function directly. Thanks to Pierre Renaux for pointing this out!
  • Support today’s C++ (Cpp1) multi-token fundamental types (e.g., signed long long int). I added these mainly for compatibility because 100% seamless interoperability with today’s C++ is a core goal of Cpp2, but note that in Cpp2 these work but without any of the grammar and parsing quirks they have in today’s syntax. That’s because I decided to represent such multi-word names them as a single Cpp2 token, which happens to internally contain whitespace. Seems to work pretty elegantly so far.
  • Support fixed-width integer type aliases (i32, u64, etc.), including optional _fast and _small (e.g., i32_fast).

I think that this completes the basic implementation of Cpp2’s initial subset that I showed in my talk in September, including that support for multi-level pointers and the multi-word C/C++ fundamental type names should complete support for being able to invoke any existing C and C++ code seamlessly.

Which brings us to…

What’s next

Next, as I said in the talk, I’ll be adding support for user-defined types (classes)… I’ll post an update about that when there’s more that’s ready to see.

Again, thanks to everyone who expressed interest and support for this personal experiment, and may you all have a happy and safe 2023.

Trip report: Autumn ISO C++ standards meeting (Kona)

A few minutes ago, the ISO C++ committee completed its second-to-last meeting of C++23 in Kona, HI, USA. Our host, the Standard C++ Foundation, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We currently have 26 active subgroups, nine of which met in six parallel tracks throughout the week; some groups ran all week, and others ran for a few days or a part of a day, depending on their workloads. We had over 160 attendees, approximately two-thirds in-person and one-third remote via Zoom.

This was our first in-person meeting since Prague in February 2020 just a few weeks before the lockdowns began. It was also our first-ever hybrid meeting with remote Zoom participation for all subgroups that met.

You can find a brief summary of ISO procedures here.

From Prague, through the pandemic, to Kona

During the pandemic, the committee’s subgroups began regularly meeting virtually, and over the past nearly three years there have been hundreds of virtual subgroup meetings and thrice-a-year virtual plenary sessions to continue approving features for C++23.

This week, we resumed in-person meetings with remote Zoom support. In the months before Kona, a group of volunteers did a lot of planning and testing: We did a trial run of a hybrid meeting with the subgroup SG14 at CppCon in September, using some of the equipment we planned to use in Kona. That initial September test was a pretty rough experience for many of the remote attendees, but it led to valuable learnings , and though we entered Kona with some trepidation, the hybrid meetings went amazingly smoothly with very few hiccups, and we got a lot of good work done in the second-to-last meeting to finalize C++23 including with remote presentations and comments.

This was only possible because of a huge amount of work by many volunteers, and I want to especially thank Jens Maurer and Dietmar Kühl for leading that group. But it was a true team effort, and so many people helped with the planning, with bringing equipment, and with running the meetings. Thank you very much to all those volunteers and helpers! We received many such appreciative comments of thanks on the committee mailing lists, and from national bodies on Saturday, from experts participating remotely who wanted to thank the volunteers for how smoothly they were able to participate.

Now that we have resumed in-person meetings, the current intent is that:

This week’s meeting

Per our published C++23 schedule, this was our second-to-last meeting to finish technical work on C++23. No features were added or removed, we just handled fit-and-finish issues and primarily focused on addressing the 137 national body comments we received in the summer’s international comment ballot (Committee Draft, or CD).

Today, the committee approved final resolutions for 92 (67%) of the 137 national comments. That leaves 45 comments, some of which have already been partly worked on, still to be completed between now and early February at our last meeting for completing C++23.

An example of a comment we just approved is adopting the proposal from Nicolai Josuttis et al. to extend the lifetime all temporaries (not just the last one) for the for-range-initializer of the range-for loop (see also the more detailed earlier paper). This closes a lifetime safety hole in C++. Here’s one of the many examples that will now work correctly:

std::vector<std::string> createStrings();

...

for (std::string s : createStrings()) ... // OK

for (char c : createStrings().at(0)) ...
    // use-after-free in C++20
    // OK, safe in C++23

In addition to C++23 work, we also had time to make progress on a number of post-C++23 proposals, including continued work on contracts, executors (std::execution), pattern matching, and more. We also decided to ship the third Library Fundamentals TS, which includes support for a number of additional experimental library features such as propagate_const, scope_exit and related scope guards, observer_ptr, resource_adapter, a helper to make getting a random numbers easier, and more. These can then be considered for C++26.

The contracts subgroup adopted a roadmap and timeline to try to get contracts into C++26. The group also had initial discussion of Gabriel Dos Reis’ proposal to control side effects in contracts, with the plan to follow up with a telecon between now and the next in-person meeting in February.

The concurrency and parallelism subgroup agreed to move forward with std::execution and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group… and recall that C++26 is not just something distant that’s three years away, but we will start approving features for C++26 starting this June, and when specific features are early and stable in the working draft the vendors often don’t wait for the final standard to start shipping implementations.

The language evolution group considered national body comments and C++26 proposals, and approved nine papers for C++26 including to progress Jean-Heyd Meneide’s proposal for #embed for C++26.

The language evolution group also held a well-attended evening session (so that experts from all subgroups could participate) to start discussion of the long-term future of C++, with over 100 experts attending (75% on-site, 25% on-line). Nearly all of the discussion was focused on improving safety (mostly) and simplicity (secondarily), including discussion about going beyond our business-as-usual evolution to help C++ programmers with these issues. We expect this discussion to continue and lead to further concrete papers for C++ evolution.

The library evolution group addressed all its national body comments and papers, forwarded several papers for C++26 including std::execution, and for the first time in a while does not have a backlog to catch up with which was happy news for LEWG.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next meeting will be in Issaquah, WA, USA in February. At that meeting we will finish C++23 by resolving the remaining national body comments on the C++23 draft, and producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and be published later in 2023.

Wrapping up

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than three months from now we’ll be meeting again in Issaquah, WA, USA for the final meeting of C++23 to finish and ship the C++23 international standard. I look forward to seeing many of you there. Thank you again to the over 160 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies! And thank you also to everyone reading this for your interest and support for C++ and its standardization.

Weekend update: Operator and parsing design notes

Thanks again for all the bug reports and feedback for Cpp2 and cppfront! As I mentioned last weekend, I’ve started a wiki with “Design notes” about specific aspects of the design to answer why I’ve made them they way they currently are… basic rationale, alternatives considered, in a nutshell, as quick answers to common questions I encounter repeatedly.

This weekend I wrote up three more short design notes, the first of which is the writeup on “why postfix unary operators?” that I promised in my CppCon 2022 talk.

Cpp2 design notes: UFCS, “const”, “unsafe”, and (yes) ABI

Thanks to everyone who has offered bug reports and constructive suggestions for Cpp2 and cppfront.

To answer common questions I encounter repeatedly, I’ve started a wiki with “Design notes” about specific aspects of the design to answer why I’ve made them they way they currently are… basic rationale, alternatives considered, in a nutshell. There are four design notes so far… pasting from the wiki:

  • Design note: UFCS Why does UFCS use fallback semantics (prefer a member function)? Doesn’t that mean that adding a member function later could silently change behavior of existing call sites?
  • Design note: const objects by default Should objects be const? Mostly yes.
  • Design note: unsafe code Yes, I intend that we should be able to write very-low-level facilities in Cpp2. No, that doesn’t mean a monolithic “unsafe” block… I think we can do better.
  • Design note: ABI Cpp2 is ABI-neutral, but its immunity from backward compatibility constraints presents an opportunity for link-level improvements, not just source-level improvements.

The wiki also contains links to related projects. There are two of those so far:

Thanks again for the feedback and interest.

Something I implemented today: “is void”

[Edited to add pre-publication link to next draft of P2392, revision 2, and correct iterator comparison]

Brief background

As I presented at CppCon 2021 starting at 11:15, I’m proposing is (a general type or value query) and as (a general cast, for only the safe casts) for C++ evolution. The talk, and the ISO C++ evolution paper P2392 it’s based on, explained why I hope that is and as can provide a general mechanism to power pattern matching with inspect, while conversely also liberating the power of pattern matching beyond just inspect for use generally in the language (e.g., in requires clauses, in general code). Here’s the key slide from last year, that I cited again in this year’s talk:

Note this isn’t about “making it look pretty.” is and as do lead to simpler and prettier code, but whereas human programmers love clear and consistent spellings, generic code demands that consistency. Here’s the key 1-min clip from last year, summarizing the argument in a nutshell:

Today: Divergent emptiness

In that vein, today I was catching up with some cppfront PRs, and Drew Gross pointed out that as I’ve begun implementing is and as in Cpp2 syntax, one thing didn’t work as Drew expected:

is std::nullopt_t doesn’t appear to match an empty optional

Drew Gross in cppfront PR #5

That got me thinking.

First, that this wasn’t supported in the current P2392 was intentional, because nullopt_t isn’t a “real type”… it’s a signal for “empty / no value” which until now wasn’t covered in P2392. Recall that is and as are related, including that if is T succeeds it means that a dynamic as T cast to the same type will succeed. But that’s not true for nullopt_t, which is optional‘s way of signaling an empty state; you can’t cast to “no type.”

Still, this got me thinking that testing for “empty” could be useful. And if we do provide an is test for an empty optional, it makes sense for there not to be an as cast for that, which simplifies how we think about it. And it should be spelled generically in a way that works equally for other kind of empty things, so we wouldn’t want to spell it is nullopt_t because that “empty state” name is specific to optional only. It is one of many existing divergent ad-hoc spellings we’ve added for “empty state” (just like we had lots of divergent spellings of type queries and type casts):

  • nullopt_t is the empty state for std::optional
  • nullptr_t is the empty state for raw/smart pointers
  • More generally, the default-constructed state T() is the empty state for all Pointer-like things including iterators, and this is already the way the Lifetime profile handles it: a Pointer is any type that can be dereferenced, and a default-constructed Pointer is considered its null/empty value (including that an STL iterator is treated identically to a default-constructed (null) raw or smart pointer) and you can already see this in cppfront’s cpp2util.h null test (currently at line 298). [Edited to add: Note that it turns out the Standard doesn’t make this usefully testable for STL iterators, because it says that a default-constructed STL iterator can only be reliably compared to another default-constructed one. So while the default-constructed state is indeed an “empty” state for the iterator, as far as I know there is no way to portably test whether a given STL iterator object is actually in that state. Equality testing against a default-constructed iterator may work or it may not.]
  • monostate (and arguably valueless_by_exception) is the empty state for std::variant
  • !has_value is the empty state for std::any
  • !is_ready (which has a longer spelling in today’s standard library) is the empty state for std::*future

And so we have an opportunity to unify these too, which goes beyond what I showed last year and in my previous revision of paper P2392.

But wait, on top of all that, the language itself has already had a way to spell “no type” since the 1970s: void. And even though void is not a regular type (it doesn’t work as a type in some places in the C++ type system) it works in enough of the places we need to implement is void as the generic spelling of “is empty.”

A possible convergence: is void

So today I implemented is void as a generic “empty state” test in Cpp2 syntax in cppfront. I also checked in the following Cpp2-syntax test case, which now works as self-documented — and I couldn’t resist the nod to William Tyndale:

main: () -> int = {
    p: std::unique_ptr<int> = ();
    i: std::vector<int>::iterator = ();  // see "edited to add" note above
    v: std::variant<std::monostate, int, std::string> = ();
    a: std::any = ();
    o: std::optional<std::string> = ();

    std::cout << "\nAll these cases satisfy \"VOYDE AND EMPTIE\"\n";

    test_generic(p);
    test_generic(i);
    test_generic(v);
    test_generic(a);
    test_generic(o);
}

test_generic: ( x: _ ) = {
    std::cout
        << "\n" << typeid(x).name() << "\n    ..."
        << inspect x -> std::string {
            is void = " VOYDE AND EMPTIE";
            is _    = " no match";
           }
        << "\n";
}

Note that this generic function would be impossible to write without some kind of is void unification to eliminate all of today’s non-generic divergent “empty state” queries.

Here’s the result on my machine in Ubuntu using GCC and libstdc++… I’m glad to show GCC here after having my machine’s WSL 2 subsystem quit on me on-stage so that I couldn’t show the GCC and Clang live demos in the CppCon 2022 talk (sigh!):

Implementing it ensured the implementation worked, including that it exposed where an if constexpr is needed for std::variant‘s is void test (see cpp2util.h, currently lines 610-614; note that empty is an alias for void). Once I got it working in cppfront (prototypes matter! they help us debug our proposals) I added it to the next draft of my ISO C++ proposal paper P2392 for is/as/inspect in today’s Cpp1 syntax, including the suggested implementation.

Thanks, Drew!

My CppCon 2022 talk is online: “Can C++ be 10x simpler & safer … ?”

It was great to see many of you at CppCon, in person and online! It was a really fun conference this year, and the exhibitor hall felt crowded again which was a good feeling as we all start traveling more again.

The talk I gave on Friday is now on YouTube. In it I describe my experimental work on a potential alternate syntax for C++ (aka ‘syntax 2’ or Cpp2 for short) and my cppfront compiler that I’ve begun writing to implement it.

I hope you enjoy the talk. You can find cppfront at the GitHub repo here:

My CppCon 2021 talk video is online

Whew — I’m now back from CppCon, after remembering how to travel.

My talk video is now online. If you haven’t already seen this via JetBrains’ CppCon 2021 video page or the Reddit post, here’s a link:

Please direct technical comments to the Reddit thread and I’ll watch for them there and respond to as many comments as I can. Thanks!

Thanks again to everyone who attended in person for supporting our requirements for meeting together safely. Interestingly, this was the largest CppCon ever (and the largest C++-specific conference ever as far as I know) in terms of total attendance, though most were attending online. It was good to see and e-see you all! With any luck, by CppCon 2022 our lives will be much closer to normal everywhere in the world… here’s hoping. Thanks again, and stay safe.

Trip report: Summer 2021 ISO C++ standards meeting (virtual)

On Monday, the ISO C++ committee held its third full-committee (plenary) meeting of the pandemic and adopted a few more features and improvements for draft C++23.

We had representatives from 17 voting nations at this meeting: Austria, Bulgaria, Canada, Czech Republic, Finland, France, Germany, Israel, Italy, Netherlands, Poland, Russia, Slovakia, Spain, Switzerland, United Kingdom, and United States. Slovakia is our newest national body to officially join international C++ work. Welcome!

We continue to have the same priorities and the same schedule we originally adopted for C++23, but online via Zoom during the pandemic.

This week: A few more C++23 features adopted

This week we formally adopted a third round of small features for C++23, as well as a number of bug fixes. Below, I’ll list some of the more user-noticeable changes and credit all those paper authors, but note that this is far from an exhaustive list of important contributors… even for these papers, nothing gets done without help from a lot of people and unsung heroes, so thank you first to all of the people not named here who helped the authors move their proposals forward! And thank you to everyone who worked on the adopted issue resolutions and smaller papers I didn’t include in this list.

P1938  by Barry Revzin, Richard Smith, Andrew Sutton, and Daveed Vandevoorde adds the if consteval feature to C++23. If you know about C++17 if constexpr and C++20 std::is_constant_evaluated, then you might think we already have this feature under the spelling if constexpr (std::is_constant_evaluated())… and that’s one of the reasons to add this feature, because that code actually doesn’t do what one might think. See the paper for details, and why we really want if consteval in the language.

P1401 by Andrzej Krzemieński enables testing integers as booleans in static_cast and if constexpr without having to cast the result to bool first (or test against zero). This is a small-but-nice example of removing redundant ceremony to help make C++ code that much cleaner and more readable.

P1132 by Jean-Heyd Meneide, Todor Buyukliev, and Isabella Muerte add out_ptr and inout_ptr abstractions to help with potential pointer ownership transfer when passing a smart pointer to a function that is declared with a T** “out” parameter. In a nutshell, if you’ve ever wanted to call a C API by writing something like some_c_function( &my_unique_ptr ); then these types will likely help you. The idea is that a call site can use one of these types to wrap a smart pointer argument, and then when the helper type is destroyed it automatically updates the pointer it wraps (using a reset call or semantically equivalent behavior).

P1659 by Christopher DiBella generalizes the C++20 starts_with and ends_with on string and string_view by adding the general forms ranges::starts_with and ranges::ends_with to C++23. These can work on arbitrary ranges, and also answer questions such as “are the starting elements of r1 less than the elements of r2?” and “are the final elements of r1 greater than the elements of r2?”.

P2166 by Yuriy Chernyshov helps reduce a commonly-taught pitfall with std::string. You know how since forever (C++98) you can construct a string from a string literal, like std::string("xyzzy")? But that you’d better watch out (and you’d better not cry or pout) not to pass a null pointer, like std::string(nullptr), because that’s undefined behavior where implementations aren’t required to check the pointer for null and can do just whatever they liked, including crash? That’s still the case if you pass a pointer variable whose value is null (sorry!), but with this paper, as of C++23 at least now we have overloads that reject attempts to construct or assign a std::string from nullptr specifically, as a compile-time “d’oh! don’t do that.”

We also adopted a number of other issue resolutions and small papers that made additional improvements, including a number that will be backported retroactively to C++20. Quite a few were of the “oh, you didn’t know that rare case didn’t work? now it does” variety.

Other progress

We also approved work on a second Concurrency TS. Recall that a “TS” or “Technical Specification” is like doing work in a feature branch, which can later be merged into the C++ standards (trunk).

Two related pieces of work were approved to go into the Concurrency TS: P1121 and P1122 by Paul McKenney, Maged M. Michael, Michael Wong, Geoffrey Romer, Andrew Hunter, Arthur O’Dwyer, Daisy Hollman, JF Bastien, Hans Boehm, David Goldblatt, Frank Birbacher, Erik Rigtorp, Tomasz Kamiński, and Jens Maurer add support for hazard pointers and read-copy-update (RCU) which are useful in highly concurrent applications.

What’s next

We’re going to keep meeting virtually in subgroups, and then have at least one more virtual plenary session to adopt features into the C++23 working draft in October.

The next tentatively planned ISO C++ face-to-face meeting is February 2022 in Portland, OR, USA. (Per our C++23 schedule, this is the “feature freeze” deadline for design-approving new features targeting the C++23 standard, whether the meeting is physical or virtual.) Meeting in person next February continues to look promising – barring unexpected surprises, it’s possible that by that time most ISO C++ participating nations will have been able to resume local sports/theatre/concert events with normal audiences, and removed travel restrictions among each other, so that people from most nations will be able to participate at an in-person meeting. But we still have to wait and see… we likely won’t know for sure until well into the autumn, and so we’re still calling this one “tentative” for now. You can find a list of our meeting plans on the Upcoming Meetings page.

Thank you again to the hundreds of people who are working tirelessly on C++, even in our current altered world. Your flexibility and willingness to adjust are much appreciated by all of us in the committee and by all the C++ communities! Thank you, and see you on Zoom.

GotW #102 Solution: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

1. Briefly, what is the difference among:

(a) undefined behavior

Undefined behavior is what happens when your program tries to do something whose meaning is not defined at all in the C++ standard language or library (illegal code and/or data). A compiler is allowed to generate an executable that does anything at all, from data corruption (objects not meeting the requirements of their types) to injecting new code to reformat your hard drive if the program is run on a Tuesday, even if there’s nothing in your source code that could possibly reformat anything. Note that undefined behavior is a global property — it always applies not only to the undefined operation, but to the whole program. [1]

(b) unspecified behavior

Unspecified behavior is what happens when your program does something for which the C++ standard doesn’t document the results. You’ll get some valid result, but you won’t know what the result is until your code looks at it. A compiler is not allowed to give you a corrupted object or to inject new code to reformat your hard drive, not even on Tuesdays.

(c) implementation-defined behavior

Implementation-defined behavior is like unspecified behavior, where the implementation additionally is required to document what the actual result will be on this particular implementation. You can’t rely on a particular answer in portable code because another implementation could choose to do something different, but you can rely on what it will be on this compiler and platform.

2. For each of the following, write a short function … where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

Easy peasy! Let’s dereference a null pointer:

// Example 2(a): If assert is violated, always undefined behavior

void deref_and_set( int* p ) {
    assert( p );
    *p = 42;
}

The function asserts that p is not null, and then on the next line unconditionally dereferences p and scribbles over the location it points to. If p is null and the assertion checking is off so that we can get to the next line, the compiler is allowed to make running the whole program format our hard drive.

(b) possibly results in undefined behavior

A general way to describe this class of program is that the call site has two bugs: first, it violates a precondition (so the callee’s results are always at least unspecified), and then it additionally then uses the unspecified result without checking it and/or in a dangerous way.

To make up an example, let’s bisect a numeric range:

// Example 2(b): If assert is violated, might lead to undefined behavior

int midpoint( int low, int high ) {
    assert( low <= high );
    return low + (high-low)/2;
        // less overflow-prone than “(low+high)/2”
        // more accurate than “low/2 + high/2”
}

The author of midpoint could have made the function more robust to take the values in either order, and thus eliminated the assertion, but assume they had a reason not to, as alluded to in the comments.

Violating the assertion does not result in undefined behavior directly. The function just doesn’t specify (ahem!) its results if call sites call it in a way that violates the precondition the assertion is testing. If the precondition is violated, then the function can add a negative number to low. But just calculating and returning some other int is not (yet) undefined behavior.

For many call sites, a bad call to midpoint won’t lead to later undefined behavior.

However, it’s possible that some call site might go on to use the unspecified result in a way that does end up being real undefined behavior, such as using it as an array index that performs an out-of-bounds access:

auto m = midpoint( low_index(arr1), high_index(arr2) );   // unspecified
   // here we expect m >= low_index(arr1) ...
stats[m-low_index(arr1)]++;                 // --> potentially undefined

This call site code has a typo, and accidentally mixes the low and high indexes of unrelated containers, which can violate the precondition and result in an index that is less than the “low” value. Then in the next line it tries to use it as an offset index into an instrumentation statistics array, which is undefined behavior for a negative number.

GUIDELINE: Remember that an unspecified result is not in itself undefined behavior, but a call site can run with it and end up with real undefined behavior later. This happen particularly when the calculated value is a pointer, or an integer used as an array index (which, remember, is basically the same thing; a pointer value is just an index into all available memory viewed as an array). If a program relies on unspecified behavior to avoid performing undefined behavior, then it has a path to undefined behavior, and so unspecified behavior is a Crouching Tiger, if you will… still dangerous, and can be turned into to the full dragon.

GUIDELINE: Don’t specify your function’s behavior (output postconditions) for invalid inputs (precondition violations), except for defense in depth (see Example 2(c)). By definition, if a function’s preconditions are violated, then the results are not specified. If you specify the outputs for precondition violations, then (a) callers will depend on the outputs, and (b) those “preconditions” aren’t really preconditions at all.

While we’re at it, here’s a second example: Let’s compare pointers in a way the C++ standard says is unspecified. This program attempts to use pointer comparisons to see whether a pointer points into the contiguous data stored in a vector, but this technique doesn’t work because today’s C++ standard only specifies the results of raw pointer comparison when the pointers point at (into, or one-past-the-end of) the same allocation, and so when ptr is not pointing into v’s buffer it’s unspecified whether either pointer comparison in this test evaluates to false:

// Example 2(b)(ii): If assert is violated, might lead to undefined behavior

// std::vector<int> v = ...;
assert(&v[0] <= ptr && ptr < (&v[0])+v.size());           // unspecified
*ptr = 42;                                  // --> potentially undefined

(c) is never undefined or unspecified behavior

An assertion violation is never undefined behavior if the function specifies what happens in every case even when the assertion is violated. Here’s an example mentioned in my paper P2064, distilled from real-world code:

// Example 2(c): If assert is violated, never undefined behavior
//               (function documents its result when x!=0)

some_result_value DoSomething( int x ) {
    assert( x != 0 );
    if    ( x == 0 ) { return error_value; }
    return sensible_result(x);
}

The function asserts that the parameter is not zero, to express that the call site shouldn’t do that, in a way the call site can check and test… but then it also immediately turns around and checks for the errant value and takes a well-defined fallback path anyway even if it does happen. Why? This is an example of “defense in depth,” and can be a useful technique for writing robust software. This means that even though the assertion may be violated, we are always still in a well-defined state and so this violation does not lead to undefined behavior.

GUIDELINE: Remember that violating an assertion does not necessarily lead to undefined behavior.

GUIDELINE: Function authors, always document your function’s requirements on inputs (preconditions). The caller needs to know what inputs are and aren’t valid. The requirements that are reasonably checkable should be written as code so that the caller can perform the checks when testing their code.

GUIDELINE: Always satisfy the requirements of a function you call. Otherwise, you are feeding “garbage in,” and the best you can hope for is “garbage out.” Make sure your code’s tests includes verifying all the reasonably checkable preconditions of functions that it calls.

Writing the above pattern has two problems: First, it repeats the condition, which invites copy/paste errors. Second, it makes life harder for static analysis tools, which often trust assertions to be true in order to reduce false positive results, but then will think the fallback path is unreachable and so won’t properly analyze that path. So it’s better to use a helper to express the “either assert this or check it and do a fallback operation” in one shot, which always avoids repeating the condition, and could in principle help static analysis tools that are aware of this macro (yes, it would be nicer to do it without resorting to a macro, but it’s annoyingly difficult to write the early return without a macro, because a return statement inside a lambda doesn’t mean the same thing):

// Using a helper that asserts the condition or performs the fallback

#define ASSERT_OR_FALLBACK(B, ACTION) { \
    bool b = B;                         \
    assert(b);                          \
    if(!b) ACTION;                      \
}

some_result_value DoSomething( int x ) {
    ASSERT_OR_FALLBACK( x != 0, return error_value; );
    return sensible_result(x);
}

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

In Example 2(a), violating the assertion leads to undefined behavior, 1(a).

In Example 2(b), violating the assertion leads to unspecified behavior, 1(b). At buggy call sites, this could subsequently lead to undefined behavior.

In Example 2(c), violating the assertion leads to implementation-defined behavior, 1(c), which never in itself leads to  undefined behavior.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

There are many. Here is just one example, that happens to be nice because it is perfectly accurate.

Let’s say we have all the code examples in question 2, written using C assert today (or even with those assertions missing!), and then at some future time we get a version of standard C++ that can express them as preconditions. Then only in Example 2(a), where we can see that the function body (and possibly transitively its further callees with the help of inlining) exercises undefined behavior, a tool can infer the precondition annotation and add it mechanically, and get the benefit of diagnosing existing bugs at call sites:

// What a precondition-aware tool could generate for Example 2(a)

auto f( int* p ) 
    [[pre( p )]]  // can add this automatically: because a violation
                  // leads to undefined behavior, this precondition
                  // is guaranteed to never cause a false positive
{
    assert( p );
    *p = 42;
}

For example, after some future C++2x ships with contracts, a vendor could write an automated tool that goes through every open source C++ project on GitHub and mechanically generates a pull request to insert preconditions for functions like Example 2(a) – but not (b) or (c) – whether or not the assertion already exists, just by noticing the undefined behavior. And it can inject those contract preconditions with complete confidence that none of them will ever cause a false positive, that they will purely expose existing bugs at call sites when that call site is built with contract checking enabled. I would expect such tool to identify a good number of (at least latent if not actual) bugs, and be a boon for C++ users, and it’s possible only for functions in the category of 2(a).

“Automated adoption” of at least part of a new C++ feature, combined with “automatically identifies existing bugs” in today’s code, is a pretty good value proposition.

Acknowledgments

Thank you to the following for their comments on this material: Joshua Berne, Gabriel Dos Reis, Gábor Horváth, Andrzej Krzemieński, Ville Voutilainen.

Notes

[1] In the standard, there are two flavors of undefined behavior. The basic “undefined behavior” is allowed to enter your program only once you actually try to execute the undefined part. But some code is so extremely ill-formed (with magical names like “IF-NDR”) that its very existence in the program makes the entire program invalid, whether you try to execute it or not.

GotW #102: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

JG Question

1. Briefly, what is the difference among:

(a) undefined behavior

(b) unspecified behavior

(c) implementation-defined behavior

Guru Questions

2. For each of the following, write a short function of the form:

/*...function name and signature...*/
{
    assert( /*...some condition about the parameters...*/ );
    /*...do something with parameters...*/;
}

where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

(b) possibly results in undefined behavior

(c) is never undefined or unspecified behavior

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.