cppfront: Spring update

Since the year-end mini-update, progress has continued on cppfront. (If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube.)

This update covers Acknowledgments, and highlights of what’s new in the compiler and language since last time, including:

  • simple, mathematically safe, and efficient chained comparisons
  • named break and continue
  • “simple and safe” starts with . . . main
  • user-defined type, including unifying all special member functions as operator=
  • type/namespace/function/object aliases
  • header reflect.h with the start of the reflection API and the first 10 working compile-time metafunctions from P0707
  • unifying functions and blocks, including removing : and = from the for loop syntax

Acknowledgments: 267 issues, 128 pull requests, and new collaborators

I want to say a big “thank you” again to everyone who has participated in the cppfront repo. Since the last update, I’ve merged PRs from Jo Bates, Gabriel Gerlero, jarzec, Greg Marr, Pierre Renaux, Filip Sajdak and Nick Treleaven. Thanks also to many great issues opened by (as alphabetically as I can): Abhinav00, Robert Adam, Adam, Aaron Albers, Alex, Graham Asher, Peter Barnett, Sean Baxter, Jan Bielak, Simon Buchan, Michael Clausen, ct-clmsn, Joshua Dahl, Denis, Matthew Deweese, dmicsa, dobkeratops, Deven Dranaga, Konstantin F, Igor Ferreira, Stefano Fiorentino, fknauf, Robert Fry, Artie Fuffkin, Gabriel Gerlero, Matt Godbolt, William Gooch, ILoveGoulash, Víctor M. González, Terence J. Grant, GrigorenkoPV, HALL9kv0, Morten Hattesen, Neil Henderson, Michael Hermier, h-vetinari, Stefan Isak, Tim Keitt, Vanya Khodor, Hugo Lindström, Ferenc Nandor Janky, jarzec, jgarvin, Dominik Kaszewski, kelbon, Marek Knápek, Emilia Kond, Vladimir Kraus, Ahmed Al Lakani, Junyoung Lee, Greg Marr, megajocke, Thomas Neumann, Niel, Jim Northrup, Daniel Oberhoff, Jussi Pakkanen, PaTiToMaSteR, Johel Ernesto Guerrero Peña, Bastien Penavayre, Daniel Pfeifer, Piotr, Davide Pomi, Andrei Rabusov, rconde01, realgdman, Alex Reinking, Pierre Renaux, Alexey Rochev, RPeschke, Sadeq, Filip Sajdak, satu, Wolf Seifert, Tor Shepherd, Luke Shore, Zenja Solaja, Francis Grizzly Smit, Sören Sprößig, Benjamin Summerton, Hypatia of Sva, SwitchBlade, Ramy Tarchichy, tkielan, Marzo Sette Torres Junior, Nick Treleaven, Jan Tusil, userxfce, Ezekiel Warren, Kayla Washburn, Tyler Weaver, Will Wray, and thanks also to many others who participated on PR reviews and comment threads.

These contributors represent people from high school and undergrad students to full professors, from commercial developers to conference speakers, and from every continent except Antarctica. Thank you!

Next, here are some highlights of things added to the cppfront compiler in the four months since the previous update linked at top.

Simple, mathematically safe, and efficient chained comparisons (commit)

P0515 “Consistent comparison” (aka “spaceship”) was the first feature derived from this Cpp2 work to be adopted into the ISO Standard for C++, in C++20. That means cppfront didn’t have to do much to implement operator<=> and its generative semantics, because C++20 compilers already do so, which is great. Thank you again to everyone who helped land this Cpp2 feature in the ISO C++ Standard.

However, one part of P0515 isn’t yet merged into ISO C++: chained comparisons from P0515 section 3.3, such as min <= index < max. See also Barry Revzin‘s great followup ISO C++ proposal paper P0893 “Chaining comparisons.” The cppfront compiler now implements this as described in those ISO proposal papers, and:

  • Supports all mathematically meaningful and safe chains like min <= index < max, with efficient single evaluation of index. (In today’s C++, this kind of comparison silently compiles but is a bug. See P0893 for examples from real-world use.)
  • Rejects nonsense chains like a >= b < c and d != e != f at compile time. (In today’s C++, and in other languages like Python, they silently compile but are necessarily a bug because they are conceptually meaningless.)

I think this is a great example to demonstrate that “simple,” “safe,” and “fast” are often not in tension, and how it’s often possible to get all three at the same time without compromises.

Named break and continue (commit)

This feature further expands the Cpp2 “name :” way of introducing all names, to also support introducing loop names. Examples like the following now work… see test file pure2-break-continue.cpp2 for more examples.

outer: while i<M next i++ {      // loop named "outer"
    // ...
    inner: while j<N next j++ {  // loop named "inner"
        // ...
        if something() {
            continue inner;      // continue the inner loop
        }
        // ...
        if something_else() {
            break outer;         // break the outer loop
        }
        // ...
    }
    // ...
}

“Simple and safe” starts with . . . main

main can now be defined to return nothing, and/or as main: (args) to have a single argument of type std::vector<std::string_view>.

For example, here is a complete compilable and runnable program (in -pure-cpp2 mode, no #include is needed to use the C++ standard library)…

main: (args) =
    std::cout << "This program's name is (args[0])$";

Yes, this really is 100% C++ under the covers as you can see on Godbolt Compiler Explorer… “just nicer”:

  • The entire C++ standard library is available directly with zero thunking, zero wrapping, and zero need to #include or import because in pure Cpp2 the entire ISO C++ standard library is just always automatically there. (Yes, if you don’t like cout, you can use the hot-off-the-press C++23 std::print too the moment that your C++ implementation supports it.)
  • Convenient defaults, such as no need to write -> int, and no need to write braces around a single-statement function body.
  • Convenient semantics and services, such as $ string interpolation. Again, all fully compatible with today’s C++ (e.g., string interpolation uses std::to_string where available).
  • Type and memory safety by default even in this example: Not only is args defaulting to the existing best practices of C++ standard safety with ISO C++ vector and string_view, but the args[0] call is automatically bounds-checked by default too.

type: User-defined types

User-defined types are written using the same name : kind = value syntax as everything in Cpp2:

mytype: type =
{
    // data members are private by default
    x: std::string;

    // functions are public by default
    protected f: (this) = { do_something_with(x); }

    // ...
}

Here are some highlights…

First, types are order-independent. Cpp2 still has no forward declarations, and you can just write types that refer to each other. For example, see the test case pure2-types-order-independence-and-nesting.cpp2.

The this parameter is explicit, and has special sauce:

  • this is a synonym for the current object (not a pointer).
  • this defaults to the current type.
  • this‘s parameter passing style declares what kind of function you’re writing. For example, (in this) (or just (this) since “in” is the default as usual) clearly means a “const” member function because “in” parameters always imply constness; (inout this) means a non-const member function; (move this) expresses and emits a Cpp1 &&-qualified member function; and so on.

For example, here is how to write const member function named print that takes a const string value and prints this object’s data value and the string message (yes, everything in Cpp2 is const by default except for local-scope variables):

mytype: type =
{
    data: i32;   // some data member (private by default)

    print: (this, msg: std::string) = {
        std::cout << data << msg;
                 // "data" is shorthand for "this.data"
    }

    // ...
}

All Cpp1 special member functions (including construction, assignment, destruction) and conversions are unified as operator=, default to memberwise semantics and safe “explicit” by default, and there’s a special that parameter that makes writing copy/move in particular simpler and safer. On the cppfront wiki, see the Design Note “operator=, this & that” for details. Briefly summarizing here:

  • The only special function every type must have is the destructor. If you don’t write it by hand, a public nonvirtual destructor is generated by default.
  • If no operator= functions are written by hand, a public default constructor is generated by default.
  • All other operator= functions are explicitly written, either by hand or by opting into applying a metafunction (see below).

Note: Because generated functions are always opt-in, you can never get a generated function that’s wrong for your type, and so Cpp2 doesn’t need to support “=delete” for the purpose of suppressing unwanted generated functions.

  • The most general form of operator= is operator=: (out this, that) which works as a unified general {copy, move} x { constructor, assignment } operator, and generates all of four of those in the lowered Cpp1 code if you didn’t write a more specific one yourself (see Design Note linked above for details).
  • All copy/move/comparison operator= functions are memberwise by default in Cpp2 ( [corrected:] including memberwise construction and assignment when you write them yourself, in which case they aren’t memberwise by default in today’s Cpp1).
  • All conversion operator= functions are safely “explicit” by default. To opt into an implicit conversion, write the implicit qualifier on the this parameter.
  • All functions can have a that parameter which is just like this (knows it’s the current type, can be passed in all the usual ways, etc.) but refers to some other object of this type rather than the current object. It has some special sauce for simplicity and safety, including that the language ensures that the members of a that object are safely moved from only once.

Virtual functions and base classes are all about “this”:

  • Virtual functions are written by specifying exactly one of virtual, override, or final on the this parameter.
  • Base classes are written as members named this. For example, just as a class could write a data member as data: string = "xyzzy";, which in Cpp2 is pronounced “data is a string with default value ‘xyzzy'”, a base class is written as this: Shape = (default, values);, which is naturally pronounced as “this IS-A Shape with these default values.” There is no separate base class list or separate member initializer list.
  • Because base and member subobjects are all declared in the same place (the type body) and initialized in the same place (an operator= function body), they can be written in any order, including interleaved, and are still guaranteed to be safely initialized in declared order. This means that in Cpp2 you can declare a data member object before a base class object, so that it naturally outlives the base class object, and so you don’t need workarounds like Boost’s base_from_member because all of the motivating examples for that can be written directly in Cpp2. See my comments on cppfront issue #334 for details.

Alias support: Type, namespace, function, and object aliases (commit)

Cpp2 already defines every new entity using the syntax “name : kind = value“.

So how should it declare aliases, which declare not a new entity but a synonym for an existing entity? I considered several alternatives, and decided to try out the identical declaration syntax except changing = (which connotes value setting) to == (which connotes sameness):

// Namespace alias
lit: namespace == ::std::literals;

// Type alias
pmr_vec: <T> type
    == std::vector<T, std::pmr::polymorphic_allocator<T>>;

// Function alias
func :== some_original::inconvenient::function_name;

// Object alias
vec :== my_vector;  // note: const&, aliases are never mutable

Note again the const default. For now, Cpp2 supports only read-only aliases, not read-write aliases.

Header reflect.h: Initial support for reflection API, and implementing the first 10 working metafunctions from P0707interface, polymorphic_base, ordered, weakly_ordered, partially_ordered, basic_value, value, weakly_ordered_value, partially_ordered_value, struct

Disclaimer: I have not yet implemented a reflection operator that Cpp2 code can invoke, or written a Cpp2 interpreter to run inside the compiler. But I am doing everything else needed for type metafunctions: cppfront has started a usable reflection metatype API, and has started getting working metafunctions that are compile-time code that uses that metatype API… the only thing missing is that those functions aren’t run through an interpreter (yet).

For example, cppfront now supports code like the following… importantly, “value” and “interface” are not built-in types hardwired into the language as they are in Java and C# and other languages, but rather each is a function that uses the reflection API to apply requirements and defaults to the type (C++ class) being written:

// Point2D is declaratively a value type: it is guaranteed to have
// default/copy/move construction and <=> std::strong_ordering
// comparison (each generated with memberwise semantics
// if the user didn't write their own, because "@value" explicitly
// opts in to ask for these functions), a public destructor, and
// no protected or virtual functions... the word "value" carries
// all that meaning as a convenient and readable opt-in, but
// without hardwiring "value" specially into the language
//
Point2D: @value type = {
    x: i32 = 0;  // data members (private by default)
    y: i32 = 0;  // with default values
    // ...
}

// Shape is declaratively an abstract base class having only public
// and pure virtual functions (with "public" and "virtual" applied
// by default if the user didn't write an access specifier on a
// function, because "@interface" explicitly opts in to ask for
// these defaults), and a public pure virtual destructor (generated
// by default if not user-written)... the word "interface" carries
// all that meaning as a convenient and readable opt-in, but
// without hardwiring "interface" specially into the language
//
Shape: @interface type = {
    draw: (this);
    move: (inout this, offset: Point2D);
}

At compile time, cppfront parses the type’s body and then invokes the compile-time metafunction (here value or interface), which enforces requirements and applies defaults and generates functions, such as… well, I can just paste the actual code for interface from reflect.h, it’s pretty readable:

Note: For now I wrote the code in today’s Cpp1 syntax, which works fine as Cpp2 is just a fully compatible alternate syntax for the same true C++… later this year I aim to start self-hosting and writing more of cppfront itself in Cpp2 syntax, including functions like these.

//-----------------------------------------------------------------------
//  Some common metafunction helpers (metafunctions are just
//  functions, so they can be factored as usual)
//
auto add_virtual_destructor(meta::type_declaration& t)
    -> void
{
    t.require( t.add_member( "operator=: (virtual move this) = { }"),
               "could not add virtual destructor");
}

//-----------------------------------------------------------------------
// 
//      "... an abstract base class defines an interface ..."
// 
//          -- Stroustrup (The Design and Evolution of C++, 12.3.1)
//
//-----------------------------------------------------------------------
// 
//  interface
//
//  an abstract base class having only pure virtual functions
//
auto interface(meta::type_declaration& t)
    -> void
{
    auto has_dtor = false;

    for (auto m : t.get_members())
    {
        m.require( !m.is_object(),
                   "interfaces may not contain data objects");
        if (m.is_function()) {
            auto mf = m.as_function();
            mf.require( !mf.is_copy_or_move(),
                        "interfaces may not copy or move; consider a virtual clone() instead");
            mf.require( !mf.has_initializer(),
                        "interface functions must not have a function body; remove the '=' initializer");
            mf.require( mf.make_public(),
                        "interface functions must be public");
            mf.make_virtual();
            has_dtor |= mf.is_destructor();
        }
    }

    if (!has_dtor) {
        add_virtual_destructor(t);
    }
}

Note a few things that are demonstrated here:

  • .require (a convenience to combine a boolean test with the call to .error if the test fails) shows how to implement enforcing custom requirements. For example, an interface should not contain data members. If any requirement fails, the error output is presented as part of the regular compiler output — metafunctions extend the compiler, in a disciplined way.
  • .make_virtual shows how to implement applying a default. For example, interface functions are virtual by default even if the user didn’t write (virtual this) explicitly.
  • .add_member shows how to generate new members from legal source code strings. In this example, if the user didn’t write a destructor, we write a virtual destructor for them by passing the ordinary code to the .add_member function, which reinvokes the lexer to tokenize the code, the parser to generate a declaration_node parse tree from the code, and then if that succeeds adds the new declaration to this type.
  • The whole metafunction is invoked by the compiler right after initial parsing is complete (right after we parse the statement-node that is the initializer) and before the type is considered defined. Once the metafunction returns, if it had no errors then the type definition is complete and henceforth immutable as usual. This is how the metafunction gets to participate in deciding the meaning of the code the user wrote, but does not create any ODR confusion — there is only one immutable definition of the type, a type cannot be changed after it is defined, and the metafunction just gets to participate in defining the type just before the definition is cast in stone, that’s all.
  • The metafunction is ordinary compile-time code. It just gets invoked by the compiler at compile time in disciplined and bounded ways, and with access to bounded things.

Today in cppfront, metafunctions like value and interface are legitimately doing everything envisioned for them in P0707 except for being run through an interpreter — the metafunctions are using the meta:: API and exercising it so I can learn how that API should expand and become richer, cppfront is spinning up a new lexer and parser when a metafunction asks to do code generation to add a member, and then cppfront is stitching the generated result into the parse tree as if it had been written by the user explicitly… this implementation is doing everything I envisioned for it in P0707 except for being run through an interpreter.

As of this writing, here are the currently implemented metafunctions in reflect.h are as described in P0707 section 3, sometimes with a minor name change… and including links to the function source code…

interface: An abstract class having only pure virtual functions.

  • Requires (else diagnoses a compile-time error) that the user did not write a data member, a copy or move operation, or a function with a body.
  • Defaults functions to be virtual, if the user didn’t write that explicitly.
  • Generates a pure virtual destructor, if the user didn’t write that explicitly.

polymorphic_base (in P0707, originally named base_class): A pure polymorphic base type that has no instance data, is not copyable, and whose destructor is either public and virtual or protected and nonvirtual.

  • Requires (else diagnoses a compile-time error) that the user did not write a data member, a copy or move operation, and that the destructor is either public+virtual or protected+nonvirtual.
  • Defaults members to be public, if the user didn’t write that explicitly.
  • Generates a public pure virtual destructor, if the user didn’t write that explicitly.

ordered: A totally ordered type with operator<=> that implements std::strong_ordering.

  • Requires (else diagnoses a compile-time error) that the user did not write an operator<=> that returns something other than strong_ordering.
  • Generates that operator<=> if the user didn’t write one explicitly by hand.

Similarly, weakly_ordered and partially_ordered do the same for std::weak_ordering and std::partial_ordering respectively. I chose to call the strongly-ordered one “ordered,” not “strong_ordered,” because I think the one that should be encouraged as the default should get the nice name.

basic_value: A type that is copyable and has value semantics. It must have all-public default construction, copy/move construction/assignment, and destruction, all of which are generated by default if not user-written; and it must not have any protected or virtual functions (including the destructor).

  • Requires (else diagnoses a compile-time error) that the user did not write some but not all of the copy/move/ construction/assignment and destruction functions, a non-public destructor, or any protected or virtual function.
  • Generates a default constructor and memberwise copy/move construction and assignment functions, if the user didn’t write them explicitly.

value: A basic_value that is totally ordered.

Note: Many of you would call this a “regular” type… but I recognize that there’s a difference of opinion about whether “regular” includes ordering. That’s one reason I’ve avoided the word “regular” here, and this way we can all separately talk about a basic_value (which may not include ordering) or a value (which does include strong total ordering; see next paragraph for weaker orderings) and we can know we’re all talking about the same thing.

Similarly, weakly_ordered_value and partially_ordered_value do the same for weakly_ordered and partially_ordered respectively. I again chose to call the strongly-ordered one “value,” not “strongly_ordered_value,” because I think the one that should be encouraged as the default should get the nice name.

struct (in P0707, originally named plain_struct because struct is a reserved word in Cpp1… but struct isn’t a reserved word in Cpp2): A basic_value where all members are public, there are no virtual functions, and there are no user-written (non-default operator=) constructors, assignment operators, or destructors.

  • Requires (else diagnoses a compile-time error) that the user did not write a virtual function or a user-written operator=.
  • Defaults members to be public, if the user didn’t write that explicitly.

Local statement/block parameters (commit)

I had long intended to support the following unification of functions and blocks, where cppfront already provided all of these except only the third case:

f:(x: int = init) = { ... }     // x is a parameter to the function
f:(x: int = init) = statement;  // same, { } is implicit

 :(x: int = init) = { ... }     // x is a parameter to the lambda
 :(x: int = init) = statement;  // same, { } is implicit

  (x: int = init)   { ... }     // x is a parameter to the block
  (x: int = init)   statement;  // same, { } is implicit

                    { ... }     // x is a parameter to the block
                    statement;  // same, { } is implicit

(Recall that in Cpp2 : always and only means “declaring a new thing,” and therefore also always has an = immediately or eventually to set the value of that new thing.)

The idea is to treat functions and blocks/statements uniformly, as syntactic and semantic subsets of each other:

  • A named function has all the parts: A name, a : (and therefore =) because we’re declaring a new entity and setting its value, a parameter list, and a block (possibly an implicit block in the convenience syntax for single-statement bodies).
  • An unnamed function drops only the name: It’s still a declared new entity so it still has : (and =), still has a parameter list, still has a block.
  • (not implemented until now) A parameterized block drops only the name and : (and therefore =). A parameterized block is not a separate entity (there’s no : or =), it’s part of its enclosing entity, and therefore it doesn’t need to capture.
  • Finally, if you drop also the parameter list, you have an ordinary block.

In this model, the third (just now implemented) option above allows a block parameter list, which does the same work as “let” variables in other languages, but without a “let” keyword. This would subsume all the Cpp1 loop/branch scope variables (and more generally than in Cpp1 today, because you could declare multiple parameters easily which you can’t currently do with the Cpp1 loop/branch scope variables).

So this now works, pasting from test case pure2-statement-scope-parameters.cpp2:

main: (args) = 
{
    local_int := 42;

    //  'in' statement scope variable
    // declares read-only access to local_int via i
    (i := local_int) for args do (arg) {
        std::cout << i << "\n";       // prints 42
    }

    //  'inout' statement scope variable
    // declares read-write access to local_int via i
    (inout i := local_int) {
        i++;
    }
    std::cout << local_int << "\n";   // prints 43
}

Note that block parameters enable us to use the same declarative data-flow for local statements and blocks as for functions: Above, we declare a block (a statement, in this case a single loop, is implicitly treated as a block) that is read-only with respect to the local variable, and declare another to be read-write with respect to that variable. Being able to declare data flow is important for writing correct and safe code.

Corollary: Removed : and = from for

Eagle-eyed readers of the above example will notice a change: As a result of unifying functions and blocks, I realized that the for loop syntax should use the third syntax, not the first or second, because the loop body is a parameterized block, not a local function. So changed the for syntax from this

// previous syntax
for items do: (item) = {
    x := local + item;
    // ...
}

to this, which is the same except that it removes : and =

// current syntax
for items do (item) {
    x := local + item;
    // ...
}

Note that what follows for ... do is exactly a local block, just the parameter item doesn’t write an initializer because it is implicitly initialized by the for loop with each successive value in the range.

By the way, this is the first breaking change from code that I’ve shown publicly, so cppfront also includes a diagnostic for the old syntax to steer you to the new syntax. Compatibility!

Other features

Also implemented since last time:

  • As always, lots of bug fixes and diagnostic improvements.
  • Use _ as wildcard everywhere, and give a helpful diagnostic if the programmer tries to use “auto.”
  • Namespaces. Every namespace must have a name, and the anonymous namespace is supported by naming it _ (the “don’t care” wildcard). For now these are a separate language feature, but I’m still interested in exploring making them just another metafunction.
  • Explicit template parameter lists. A type parameter, spelled “: type”, is the default. For examples, see test case pure2-template-parameter-lists.cpp2
  • Add requires-clause support.
  • Make : _ (deduced type) the default for function parameters. In response to a lot of sustained user demand in issues and comments — thanks! For example, add: (x, y) -> _ = x+y; is a valid Cpp2 generic function that means the same as (and compiles to) [[nodiscard]] auto add(auto const& x, auto const& y) -> auto { return x+y; } in Cpp1 syntax.
  • Add alien_memory<T> as a better spelling for T volatile. The main problem with volatile isn’t the semantics — those are deliberately underspecified, and appropriate for talking about “memory that’s outside the C++ program that the compiler can’t assume it knows anything about” which is an important low-level concept. The problems with volatile are that (a) it’s wired throughout the language as a type qualifier which is undesirable and unnecessary, and (b) the current name is confusing and has baggage and so it should be named something that connotes what it’s actually for (and I like “alien” rather than “foreign” because I think “alien” has a better and stronger connotation).
  • Reject more implicit narrowing, notably floating point narrowing.
  • Reject shadowing of type scope names. For example, in a type that has a member named data, a member function can’t write a local variable named data.
  • Add support for forward return and generic out parameters.
  • Add support for raw string literals with interpolation.
  • Add compiler switches for compatibility with popular no-exceptions/no-RTTI modes (-fno-exceptions and -fno-rtti, as usual), specifying the output file (-o, with the option of -o stdout), and source line/column format for error output (MSVC style or GCC style)
  • Add single-word aliases (e.g., ulonglong) to replace today’s multi-keyword platform-width C types, with diagnostics support to aid migration. This is in addition to known-width Cpp2 types (e.g., i32) that are already there and should often be preferred.
  • Allow unnamed objects (not just unnamed functions, aka lambdas) at expression scope.
  • Reclaim many Cpp1 keywords for ordinary use. For example, a type or variable can be named “and” or “struct” in Cpp2, and it’s fully compatible (it’s prefixed with “cpp2_” when lowered to Cpp1, so Cpp1 code still has a way to refer to it, but Cpp2 gets to use the nice names). This isn’t just sugar… without this, I couldn’t write the “struct” metafunction and give it the expected nice name.
  • Support final on a type.
  • Add support for .h2 header files.

What’s next

Well, that’s all so far.

For cppfront, over the summer and fall I plan to:

  • implement more metafunctions from my paper P0707, probably starting with enum and union (a safe union) — not only because they’re next in the paper, but also because I use those features in cppfront today and so I’ll need them working in Cpp2 when it comes time to…
  • … start self-hosting cppfront, i.e., start migrating parts of cppfront itself to be written in Cpp2 syntax;
  • continue working my list of pending Cpp2 features and implementing them in cppfront; and
  • start finding a few private alpha testers to work with, to start writing a bit of code in Cpp2 to alpha-test cppfront and also to alpha-test my (so far unpublished) draft documentation.

For conferences:

  • One week from today, I’ll be at C++Now to give a talk about this progress and why full-fidelity compatibility with ISO C++ is essential (and what it means). C++Now is a limited-attendance conference, and it’s nearly sold out but the organizers say there are a few seats left… you can still register for C++Now until Friday.
  • In early October I hope to present a major update at CppCon 2023, where registration just opened (yes, you can register now! run, don’t walk!). I hope to see many more of you there at the biggest C++ event, and that only happens once a year — like every year, I’ll be there all week long to not miss a minute.

Interview on CppCast

A few days ago I recorded CppCast episode 357. Thanks to Timur Doumler and Phil Nash for inviting me on their show – and for continuing CppCast, which was so wonderfully founded by Rob Irving and Jason Turner!

This time, we chatted about news in the C++ world, and then about my Cpp2 and cppfront experimental work.

The podcast doesn’t seem to have chapters, but here are a few of my own notes about sections of interest:

  • 00:00 Intro
  • 04:30 News: LLVM 16.0.0, “C++ Initialisation” book, new user groups
  • 15:45 Start of interview
  • 16:08 Why I don’t view Cpp2 as a “successor language”
  • 16:25 A transpiler is a compiler (see also: cfront, PyPy, TypeScript, …)
  • 17:20 Origins of the Cpp2 project, 2015/16
  • 19:00 100% compatibility as a primary goal and design constraint
  • 22:00 Avoid divergence, continue in same path C++ is already going
  • 22:50 What compatibility means: 100% link compat always on, 100% source compat always available but pay only if you need it
  • 24:14 Making the syntax unique in a simple way, start with “name :”
  • 28:10 Avoid divergence and still make a major simplification, by letting programmers directly declare their intent
  • 30:30 Bringing the pieces to ISO and the community for feedback
  • 31:55 What about “epochs”/“editions”? tl;dr: It’s exactly the right question, but I think the right answer is “epoch” (singular)
  • 35:42 C++ is popular and will endure no matter what we do; question is can we make it nicer
  • 37:05 My personal experiment, and others are now helping
  • 38:20 What “safeties” I’m targeting, and what degree of safety, and why formally provable guarantees are nice but neither necessary nor sufficient (I expect this view to be controversial)
  • 44:00 The issue is making things 50x (98%) safer, vs. 100% safer, because what does requiring that last 2% actually cost the design in incompatibility / difficulty of use
  • 47:05 The zero-overhead principle is non-negotiable, and so is always being able to “open the hood” to take control, otherwise it’s not C++ anymore
  • 48:20 Examples: dynamic bounds/null checking is opt-out, now the default but still pay for it only if you use it
  • 50:20 Will cppfront support all major compilers and platforms? It already does, any reasonably conforming C++20 compiler (any gcc/clang/msvc since about 2020), and that will continue
  • 52:15 Keeping the generated source code very close to the original is a priority
  • 53:25 “TypeScript for C++” plan vs. “Dart for C++” plan
  • 55:20 TypeScript did what Bjarne’s cfront did: Transpiler that let you always keep your generated JavaScript/C code, so you could drop using the new language anytime if you want with #NoRegrets, risk-free
  • 57:20 Shout out to Anders Hejlsberg, IIRC the only human to produce a million-user programming language more than once, and his approach to TypeScript vs. C#
  • 59:20 Why generating C++ code isn’t in tension with the goal of compatibility (it’s actually synergistic), and the targeted subset is C++20 (with a workaround only when modules are not yet available on a given compiler)
  • 1:00:40 Why C++20 is super important (if constexpr, requires-expressions)
  • 1:01:40 Why any C++ evolution/successor language attempt that for now only tries to be compatible with C++17 faces a big hill/disadvantage
  • 1:02:40 What’s next for Cpp2 and cppfront
  • 1:05:35 Where can people learn more: cppfront repo, CppCon 2022 talk, C++Now talk coming up in a month, then CppCon 2023 in October
  • 1:07:28 C++ world is alive and well and thriving, now embracing challenges like safety to keep C++ working well for all of us

In at least one place I said “cppfront” where I meant “cfront”… I think the intent should be clear from context. 😉

Thanks again to everyone who has helped me personally with cppfront through issues and PRs, and to all the good folks who helped the entire C++ world by working hard and creatively through the pandemic and shipping another solid release of C++ in C++23.

C++23 “Pandemic Edition” is complete (Trip report: Winter ISO C++ standards meeting, Issaquah, WA, USA)

On Saturday, the ISO C++ committee completed technical work on C++23 in Issaquah, WA, USA! We resolved the remaining international comments on the C++23 draft, and are now producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and final editorial work, to be published later in 2023.

Our hosts, the Standard C++ Foundation, WorldQuant, and Edison Design Group, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We had about 160 attendees, more than half in-person and the others remote via Zoom. We had 19 nations formally represented, 9 in-person and 10 via Zoom. Also, at each meeting we regularly have new attendees who have never attended before, and this time there were 25 new first-time attendees in-person or on Zoom; to all of them, once again welcome!

The C++ committee currently has 26 active subgroups, 13 of which met in parallel tracks throughout the week. Some groups ran all week, and others ran for a few days or a part of a day and/or evening, depending on their workloads. You can find a brief summary of ISO procedures here.

From Prague, through the pandemic, to an on-time C++23 “Pandemic Edition”

The previous standard, C++20, was completed in Prague in February 2020, a month before the pandemic lockdowns began. At that same meeting, we adopted and published our C++23 schedule… without realizing that the world was about to turn upside down in just a few weeks. Incredibly, thanks to the effort and resilience of scores of subgroup chairs and hundreds of committee members, we still did it: Despite a global pandemic, C++23 has shipped exactly on time and at high quality.

The first pandemic-cancelled in-person meeting would have been the first meeting of the three-year C++23 cycle. This meant that nearly all of the C++23 release cycle, and the entire “development” phase of the cycle, was done virtually via Zoom with many hundreds of telecons from 2020 through 2022. Last week’s meeting was only our second in-person meeting since February 2020, and our second-ever hybrid meeting with remote Zoom participation. Both had a  high-quality hybrid Zoom experience for remote attendees around the world, and I want to repeat my thanks from November to the many volunteers who worked hard and carried hardware to Kona and Issaquah to make this possible. I want to again especially thank Jens Maurer and Dietmar Kühl for leading that group, and everyone who helped plan, equip, and run the meetings. Thank you very much to all those volunteers and helpers!

The current plan is that we’ve now returned to our normal cadence of having full-week meetings three times a year, as we did before the pandemic, but now those will be not only in-person but also have remote participation via Zoom. Most subgroups will additionally still continue to meet regularly via Zoom.

This week’s meeting

Per our published C++23 schedule, this was our final meeting to finish technical work on C++23. No features were added or removed, we just handled fit-and-finish issues and primarily focused on finishing addressing the 137 national body comments we received in the summer’s international comment ballot (Committee Draft, or CD). You can find a list of C++23 features here, many of them already implemented in major compilers and libraries. C++23’s main theme was “completing C++20,” and some of the highlights include module “std”, “if consteval,” explicit “this” parameters, still more constexpr, still more CTAD, “[[assume]]”, simplifying implicit move, multidimensional and static “operator[]”, a bunch of Unicode improvements, and Nicolai Josuttis’ personal favorite: fixing the lifetime of temporaries in range-for loops (some would add, “finally!”… thanks again for the persistence, Nico).

In addition to C++23 work, we also had time to make progress on a number of post-C++23 proposals, including continued work on contracts, SIMD execution, and more. We also decided to send the second Concurrency TS for international comment ballot, which includes hazard pointers, read-copy-update (RCU) data structures… and as of this week we also added Anthony Williams’ P0290 “synchronized_value” type.

The contracts subgroup made further progress on refining contract semantics targeting C++26.

The concurrency and parallelism subgroup is still on track to move forward with “std::execution” and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group.

Again, when you see “C++26” above, that doesn’t mean “three long years away”… we just closed the C++23 branch, and the C++26 branch is opening immediately and we will start approving features for C++26 at our next meeting in June, less than four months from now. Implementers interested in specific features often don’t wait for the final standard to start shipping implementations; note that C++23, which was just finished, already has many features shipping today in major implementations.

The newly-created SG23 Safety and Security subgroup met on Thursday for a well-attended session on hitting the ground running for making a targeted improvement in safety and security in C++, including that it approved the first two safety papers to progress to review next meeting by the full language evolution group.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next two meetings will be in Varna, Bulgaria in June and in Kona, HI, USA in November. At those two meetings we will start work on adding features into the new C++26 working draft.

Wrapping up

Thank you again to the approximately 160 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies!

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than four months from now we’ll be meeting again in Bulgaria to start adding features to C++26. I look forward to seeing many of you there. Thank you again to everyone reading this for your interest and support for C++ and its standardization.

Cpp2 and cppfront: Year-end mini-update

As we close out 2022, I thought I’d write a short update on what’s been happening in Cpp2 and cppfront. If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube.

Most of this post is about improvements I’ve been making and merging over the year-end holidays, and an increasing number of contributions from others via pull requests to the cppfront repo and in companion projects. Thanks again to so many of you who expressed your interest and support for this personal experiment, including the over 3,000 comments on Reddit and YouTube and the over 200 issues and PRs on the cppfront repo!

10 design notes

On the cppfront wiki, I’ve written more design notes about specific parts of the Cpp2 language design that answer common questions. They include:

  • Broad strategic topics, such as addressing ABI and versioning, “unsafe” code, and aiming to eliminate the preprocessor with reflection.
  • Specific language feature design topics, such as unified function call syntax (UFCS), const, and namespaces.
  • Syntactic choices, such as postfix operators and capture syntax.
  • Implementation topics, such as parsing strategies and and grammar details.

117 issues (3 open), 74 pull requests (9 open), 6 related projects, and new collaborators

I started cppfront with “just a blank text editor and the C++ standard library.” Cppfront continues to have no dependencies on other libraries, but since I open-sourced the project in September I’ve found that people have started contributing working code — thank you! Authors of merged pull requests include:

  • The prolific Filip Sajdak contributed a number of improvements, probably the most important being generalizing my UFCS implementation, implementing more of is and as as described in P2392, and providing Apple-Clang regression test results. Thanks, Filip!
  • Gabriel Gerlero contributed refinements in the Cpp2 language support library, cpp2util.h.
  • Jarosław Głowacki contributed test improvements and ensuring all the code compiles cleanly at high warning levels on all major compilers.
  • Konstantin Akimov contributed command-line usability improvements and more test improvements.
  • Fernando Pelliccioni contributed improvements to the Cpp2 language support library.
  • Jessy De Lannoit contributed improvements to the documentation.

Thanks also to these six related projects, which you can find listed on the wiki:

Thanks again to Matt Godbolt for hosting cppfront on Godbolt Compiler Explorer and giving feedback.

Thanks also to over 100 other people who reported bugs and made suggestions via the Issues. See below for some more details about these features and more.

Compiler/language improvements

Here are some highlights of things added to the cppfront compiler since I gave the first Cpp2 and cppfront talk in September. Most of these were implemented by me, but some were implemented by the PR authors I mentioned above.

Roughly in commit order (you can find the whole commit history here), and like everything else in cppfront some of these continue to be experimental:

  • Lots of bug fixes and diagnostic improvements.
  • Everything compiles cleanly under MSVC -W4 and GCC/Clang -Wall -Wextra.
  • Enabled implicit move-from-last-use for all local variables. As I already did for copy parameters.
  • After repeated user requests, I turned -n and -s (null/subscript dynamic checking) on by default. Yes, you can always still opt out to disable them and get zero cost, Cpp2 will always stay a “zero-overhead don’t-pay-for-what-you-don’t-use” true-C++ environment. All I did was change the default to enable them.
  • Support explicit forward of members/subobjects of composite types. For a parameter declared forward x: SomeType, the default continues to be that the last use of x is automatically forwarded for you; for example, if the last use is call_something( x ); then cppfront automatically emits that call as call_something( std::forward<decltype(x)>(x) ); and you never have to write out that incantation. But now you also have the option to separately forward parts of a composite variable, such as that for a forward x: pair<string, string>> parameter you can write things like do_this( forward x.first ) and do_that( 1, 2, 3, forward x.second ).
  • Support is template-name and is ValueOrPredicate: is now supports asking whether this is an instantiation of a template (e.g., x is std::vector), and it supports comparing values (e.g., x is 14) and using predicates (e.g., x is (less_than(20)) invoking a lambda) including for values inside a std::variant, std::any, and std::optional (e.g., x is 42 where x is a variant<int,string> or an any).
  • Regression test results for all major compilers: MSVC, GCC, Clang, and Apple-Clang. All are now checked in and can be conveniently compared before each commit.
  • Finished support for >> and >>= expressions. In today’s syntax, C++ currently max-munches the >> and >>= tokens and then situationally breaks off individual > tokens, so that we can write things like vector<vector<int>> without putting a space between the two closing angle brackets. In Cpp2 I took the opposite choice, which was to not parse >> or >>= as a token (so max munch is not an issue), and just merge closing angles where a >> or >>= can grammatically go. I’ve now finished the latter, and this should be done.
  • Generalized support for UFCS. In September, I had only implemented UFCS for a single call of the form x.f(y), where x could not be a qualified name or have template arguments. Thanks to Filip Sajdak for generalizing this to qualified names, templated names, and chaining multiple UFCS calls! That was a lot of work, and as far as I can tell UFCS should now be generally complete.
  • Support declaring multi-level pointers/const.
  • Zero-cost implementation of UFCS. The implementation of UFCS is now force-inlined on all compilers. In the tests I’ve looked at, even when calling a nonmember function f(x,y), using Cpp2’s x.f(y) unified function call syntax (which tries a member function first if there is one, else falls back to a nonmember function), the generated object code at all optimization levels is now identical, or occasionally better, compared to calling the nonmember function directly. Thanks to Pierre Renaux for pointing this out!
  • Support today’s C++ (Cpp1) multi-token fundamental types (e.g., signed long long int). I added these mainly for compatibility because 100% seamless interoperability with today’s C++ is a core goal of Cpp2, but note that in Cpp2 these work but without any of the grammar and parsing quirks they have in today’s syntax. That’s because I decided to represent such multi-word names them as a single Cpp2 token, which happens to internally contain whitespace. Seems to work pretty elegantly so far.
  • Support fixed-width integer type aliases (i32, u64, etc.), including optional _fast and _small (e.g., i32_fast).

I think that this completes the basic implementation of Cpp2’s initial subset that I showed in my talk in September, including that support for multi-level pointers and the multi-word C/C++ fundamental type names should complete support for being able to invoke any existing C and C++ code seamlessly.

Which brings us to…

What’s next

Next, as I said in the talk, I’ll be adding support for user-defined types (classes)… I’ll post an update about that when there’s more that’s ready to see.

Again, thanks to everyone who expressed interest and support for this personal experiment, and may you all have a happy and safe 2023.

Trip report: Autumn ISO C++ standards meeting (Kona)

A few minutes ago, the ISO C++ committee completed its second-to-last meeting of C++23 in Kona, HI, USA. Our host, the Standard C++ Foundation, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We currently have 26 active subgroups, nine of which met in six parallel tracks throughout the week; some groups ran all week, and others ran for a few days or a part of a day, depending on their workloads. We had over 160 attendees, approximately two-thirds in-person and one-third remote via Zoom.

This was our first in-person meeting since Prague in February 2020 just a few weeks before the lockdowns began. It was also our first-ever hybrid meeting with remote Zoom participation for all subgroups that met.

You can find a brief summary of ISO procedures here.

From Prague, through the pandemic, to Kona

During the pandemic, the committee’s subgroups began regularly meeting virtually, and over the past nearly three years there have been hundreds of virtual subgroup meetings and thrice-a-year virtual plenary sessions to continue approving features for C++23.

This week, we resumed in-person meetings with remote Zoom support. In the months before Kona, a group of volunteers did a lot of planning and testing: We did a trial run of a hybrid meeting with the subgroup SG14 at CppCon in September, using some of the equipment we planned to use in Kona. That initial September test was a pretty rough experience for many of the remote attendees, but it led to valuable learnings , and though we entered Kona with some trepidation, the hybrid meetings went amazingly smoothly with very few hiccups, and we got a lot of good work done in the second-to-last meeting to finalize C++23 including with remote presentations and comments.

This was only possible because of a huge amount of work by many volunteers, and I want to especially thank Jens Maurer and Dietmar Kühl for leading that group. But it was a true team effort, and so many people helped with the planning, with bringing equipment, and with running the meetings. Thank you very much to all those volunteers and helpers! We received many such appreciative comments of thanks on the committee mailing lists, and from national bodies on Saturday, from experts participating remotely who wanted to thank the volunteers for how smoothly they were able to participate.

Now that we have resumed in-person meetings, the current intent is that:

This week’s meeting

Per our published C++23 schedule, this was our second-to-last meeting to finish technical work on C++23. No features were added or removed, we just handled fit-and-finish issues and primarily focused on addressing the 137 national body comments we received in the summer’s international comment ballot (Committee Draft, or CD).

Today, the committee approved final resolutions for 92 (67%) of the 137 national comments. That leaves 45 comments, some of which have already been partly worked on, still to be completed between now and early February at our last meeting for completing C++23.

An example of a comment we just approved is adopting the proposal from Nicolai Josuttis et al. to extend the lifetime all temporaries (not just the last one) for the for-range-initializer of the range-for loop (see also the more detailed earlier paper). This closes a lifetime safety hole in C++. Here’s one of the many examples that will now work correctly:

std::vector<std::string> createStrings();

...

for (std::string s : createStrings()) ... // OK

for (char c : createStrings().at(0)) ...
    // use-after-free in C++20
    // OK, safe in C++23

In addition to C++23 work, we also had time to make progress on a number of post-C++23 proposals, including continued work on contracts, executors (std::execution), pattern matching, and more. We also decided to ship the third Library Fundamentals TS, which includes support for a number of additional experimental library features such as propagate_const, scope_exit and related scope guards, observer_ptr, resource_adapter, a helper to make getting a random numbers easier, and more. These can then be considered for C++26.

The contracts subgroup adopted a roadmap and timeline to try to get contracts into C++26. The group also had initial discussion of Gabriel Dos Reis’ proposal to control side effects in contracts, with the plan to follow up with a telecon between now and the next in-person meeting in February.

The concurrency and parallelism subgroup agreed to move forward with std::execution and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group… and recall that C++26 is not just something distant that’s three years away, but we will start approving features for C++26 starting this June, and when specific features are early and stable in the working draft the vendors often don’t wait for the final standard to start shipping implementations.

The language evolution group considered national body comments and C++26 proposals, and approved nine papers for C++26 including to progress Jean-Heyd Meneide’s proposal for #embed for C++26.

The language evolution group also held a well-attended evening session (so that experts from all subgroups could participate) to start discussion of the long-term future of C++, with over 100 experts attending (75% on-site, 25% on-line). Nearly all of the discussion was focused on improving safety (mostly) and simplicity (secondarily), including discussion about going beyond our business-as-usual evolution to help C++ programmers with these issues. We expect this discussion to continue and lead to further concrete papers for C++ evolution.

The library evolution group addressed all its national body comments and papers, forwarded several papers for C++26 including std::execution, and for the first time in a while does not have a backlog to catch up with which was happy news for LEWG.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next meeting will be in Issaquah, WA, USA in February. At that meeting we will finish C++23 by resolving the remaining national body comments on the C++23 draft, and producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and be published later in 2023.

Wrapping up

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than three months from now we’ll be meeting again in Issaquah, WA, USA for the final meeting of C++23 to finish and ship the C++23 international standard. I look forward to seeing many of you there. Thank you again to the over 160 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies! And thank you also to everyone reading this for your interest and support for C++ and its standardization.

My CppCon 2021 talk video is online

Whew — I’m now back from CppCon, after remembering how to travel.

My talk video is now online. If you haven’t already seen this via JetBrains’ CppCon 2021 video page or the Reddit post, here’s a link:

Please direct technical comments to the Reddit thread and I’ll watch for them there and respond to as many comments as I can. Thanks!

Thanks again to everyone who attended in person for supporting our requirements for meeting together safely. Interestingly, this was the largest CppCon ever (and the largest C++-specific conference ever as far as I know) in terms of total attendance, though most were attending online. It was good to see and e-see you all! With any luck, by CppCon 2022 our lives will be much closer to normal everywhere in the world… here’s hoping. Thanks again, and stay safe.

Trip report: Summer 2021 ISO C++ standards meeting (virtual)

On Monday, the ISO C++ committee held its third full-committee (plenary) meeting of the pandemic and adopted a few more features and improvements for draft C++23.

We had representatives from 17 voting nations at this meeting: Austria, Bulgaria, Canada, Czech Republic, Finland, France, Germany, Israel, Italy, Netherlands, Poland, Russia, Slovakia, Spain, Switzerland, United Kingdom, and United States. Slovakia is our newest national body to officially join international C++ work. Welcome!

We continue to have the same priorities and the same schedule we originally adopted for C++23, but online via Zoom during the pandemic.

This week: A few more C++23 features adopted

This week we formally adopted a third round of small features for C++23, as well as a number of bug fixes. Below, I’ll list some of the more user-noticeable changes and credit all those paper authors, but note that this is far from an exhaustive list of important contributors… even for these papers, nothing gets done without help from a lot of people and unsung heroes, so thank you first to all of the people not named here who helped the authors move their proposals forward! And thank you to everyone who worked on the adopted issue resolutions and smaller papers I didn’t include in this list.

P1938  by Barry Revzin, Richard Smith, Andrew Sutton, and Daveed Vandevoorde adds the if consteval feature to C++23. If you know about C++17 if constexpr and C++20 std::is_constant_evaluated, then you might think we already have this feature under the spelling if constexpr (std::is_constant_evaluated())… and that’s one of the reasons to add this feature, because that code actually doesn’t do what one might think. See the paper for details, and why we really want if consteval in the language.

P1401 by Andrzej Krzemieński enables testing integers as booleans in static_cast and if constexpr without having to cast the result to bool first (or test against zero). This is a small-but-nice example of removing redundant ceremony to help make C++ code that much cleaner and more readable.

P1132 by Jean-Heyd Meneide, Todor Buyukliev, and Isabella Muerte add out_ptr and inout_ptr abstractions to help with potential pointer ownership transfer when passing a smart pointer to a function that is declared with a T** “out” parameter. In a nutshell, if you’ve ever wanted to call a C API by writing something like some_c_function( &my_unique_ptr ); then these types will likely help you. The idea is that a call site can use one of these types to wrap a smart pointer argument, and then when the helper type is destroyed it automatically updates the pointer it wraps (using a reset call or semantically equivalent behavior).

P1659 by Christopher DiBella generalizes the C++20 starts_with and ends_with on string and string_view by adding the general forms ranges::starts_with and ranges::ends_with to C++23. These can work on arbitrary ranges, and also answer questions such as “are the starting elements of r1 less than the elements of r2?” and “are the final elements of r1 greater than the elements of r2?”.

P2166 by Yuriy Chernyshov helps reduce a commonly-taught pitfall with std::string. You know how since forever (C++98) you can construct a string from a string literal, like std::string("xyzzy")? But that you’d better watch out (and you’d better not cry or pout) not to pass a null pointer, like std::string(nullptr), because that’s undefined behavior where implementations aren’t required to check the pointer for null and can do just whatever they liked, including crash? That’s still the case if you pass a pointer variable whose value is null (sorry!), but with this paper, as of C++23 at least now we have overloads that reject attempts to construct or assign a std::string from nullptr specifically, as a compile-time “d’oh! don’t do that.”

We also adopted a number of other issue resolutions and small papers that made additional improvements, including a number that will be backported retroactively to C++20. Quite a few were of the “oh, you didn’t know that rare case didn’t work? now it does” variety.

Other progress

We also approved work on a second Concurrency TS. Recall that a “TS” or “Technical Specification” is like doing work in a feature branch, which can later be merged into the C++ standards (trunk).

Two related pieces of work were approved to go into the Concurrency TS: P1121 and P1122 by Paul McKenney, Maged M. Michael, Michael Wong, Geoffrey Romer, Andrew Hunter, Arthur O’Dwyer, Daisy Hollman, JF Bastien, Hans Boehm, David Goldblatt, Frank Birbacher, Erik Rigtorp, Tomasz Kamiński, and Jens Maurer add support for hazard pointers and read-copy-update (RCU) which are useful in highly concurrent applications.

What’s next

We’re going to keep meeting virtually in subgroups, and then have at least one more virtual plenary session to adopt features into the C++23 working draft in October.

The next tentatively planned ISO C++ face-to-face meeting is February 2022 in Portland, OR, USA. (Per our C++23 schedule, this is the “feature freeze” deadline for design-approving new features targeting the C++23 standard, whether the meeting is physical or virtual.) Meeting in person next February continues to look promising – barring unexpected surprises, it’s possible that by that time most ISO C++ participating nations will have been able to resume local sports/theatre/concert events with normal audiences, and removed travel restrictions among each other, so that people from most nations will be able to participate at an in-person meeting. But we still have to wait and see… we likely won’t know for sure until well into the autumn, and so we’re still calling this one “tentative” for now. You can find a list of our meeting plans on the Upcoming Meetings page.

Thank you again to the hundreds of people who are working tirelessly on C++, even in our current altered world. Your flexibility and willingness to adjust are much appreciated by all of us in the committee and by all the C++ communities! Thank you, and see you on Zoom.

GotW #102 Solution: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

1. Briefly, what is the difference among:

(a) undefined behavior

Undefined behavior is what happens when your program tries to do something whose meaning is not defined at all in the C++ standard language or library (illegal code and/or data). A compiler is allowed to generate an executable that does anything at all, from data corruption (objects not meeting the requirements of their types) to injecting new code to reformat your hard drive if the program is run on a Tuesday, even if there’s nothing in your source code that could possibly reformat anything. Note that undefined behavior is a global property — it always applies not only to the undefined operation, but to the whole program. [1]

(b) unspecified behavior

Unspecified behavior is what happens when your program does something for which the C++ standard doesn’t document the results. You’ll get some valid result, but you won’t know what the result is until your code looks at it. A compiler is not allowed to give you a corrupted object or to inject new code to reformat your hard drive, not even on Tuesdays.

(c) implementation-defined behavior

Implementation-defined behavior is like unspecified behavior, where the implementation additionally is required to document what the actual result will be on this particular implementation. You can’t rely on a particular answer in portable code because another implementation could choose to do something different, but you can rely on what it will be on this compiler and platform.

2. For each of the following, write a short function … where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

Easy peasy! Let’s dereference a null pointer:

// Example 2(a): If assert is violated, always undefined behavior

void deref_and_set( int* p ) {
    assert( p );
    *p = 42;
}

The function asserts that p is not null, and then on the next line unconditionally dereferences p and scribbles over the location it points to. If p is null and the assertion checking is off so that we can get to the next line, the compiler is allowed to make running the whole program format our hard drive.

(b) possibly results in undefined behavior

A general way to describe this class of program is that the call site has two bugs: first, it violates a precondition (so the callee’s results are always at least unspecified), and then it additionally then uses the unspecified result without checking it and/or in a dangerous way.

To make up an example, let’s bisect a numeric range:

// Example 2(b): If assert is violated, might lead to undefined behavior

int midpoint( int low, int high ) {
    assert( low <= high );
    return low + (high-low)/2;
        // less overflow-prone than “(low+high)/2”
        // more accurate than “low/2 + high/2”
}

The author of midpoint could have made the function more robust to take the values in either order, and thus eliminated the assertion, but assume they had a reason not to, as alluded to in the comments.

Violating the assertion does not result in undefined behavior directly. The function just doesn’t specify (ahem!) its results if call sites call it in a way that violates the precondition the assertion is testing. If the precondition is violated, then the function can add a negative number to low. But just calculating and returning some other int is not (yet) undefined behavior.

For many call sites, a bad call to midpoint won’t lead to later undefined behavior.

However, it’s possible that some call site might go on to use the unspecified result in a way that does end up being real undefined behavior, such as using it as an array index that performs an out-of-bounds access:

auto m = midpoint( low_index(arr1), high_index(arr2) );   // unspecified
   // here we expect m >= low_index(arr1) ...
stats[m-low_index(arr1)]++;                 // --> potentially undefined

This call site code has a typo, and accidentally mixes the low and high indexes of unrelated containers, which can violate the precondition and result in an index that is less than the “low” value. Then in the next line it tries to use it as an offset index into an instrumentation statistics array, which is undefined behavior for a negative number.

GUIDELINE: Remember that an unspecified result is not in itself undefined behavior, but a call site can run with it and end up with real undefined behavior later. This happen particularly when the calculated value is a pointer, or an integer used as an array index (which, remember, is basically the same thing; a pointer value is just an index into all available memory viewed as an array). If a program relies on unspecified behavior to avoid performing undefined behavior, then it has a path to undefined behavior, and so unspecified behavior is a Crouching Tiger, if you will… still dangerous, and can be turned into to the full dragon.

GUIDELINE: Don’t specify your function’s behavior (output postconditions) for invalid inputs (precondition violations), except for defense in depth (see Example 2(c)). By definition, if a function’s preconditions are violated, then the results are not specified. If you specify the outputs for precondition violations, then (a) callers will depend on the outputs, and (b) those “preconditions” aren’t really preconditions at all.

While we’re at it, here’s a second example: Let’s compare pointers in a way the C++ standard says is unspecified. This program attempts to use pointer comparisons to see whether a pointer points into the contiguous data stored in a vector, but this technique doesn’t work because today’s C++ standard only specifies the results of raw pointer comparison when the pointers point at (into, or one-past-the-end of) the same allocation, and so when ptr is not pointing into v’s buffer it’s unspecified whether either pointer comparison in this test evaluates to false:

// Example 2(b)(ii): If assert is violated, might lead to undefined behavior

// std::vector<int> v = ...;
assert(&v[0] <= ptr && ptr < (&v[0])+v.size());           // unspecified
*ptr = 42;                                  // --> potentially undefined

(c) is never undefined or unspecified behavior

An assertion violation is never undefined behavior if the function specifies what happens in every case even when the assertion is violated. Here’s an example mentioned in my paper P2064, distilled from real-world code:

// Example 2(c): If assert is violated, never undefined behavior
//               (function documents its result when x!=0)

some_result_value DoSomething( int x ) {
    assert( x != 0 );
    if    ( x == 0 ) { return error_value; }
    return sensible_result(x);
}

The function asserts that the parameter is not zero, to express that the call site shouldn’t do that, in a way the call site can check and test… but then it also immediately turns around and checks for the errant value and takes a well-defined fallback path anyway even if it does happen. Why? This is an example of “defense in depth,” and can be a useful technique for writing robust software. This means that even though the assertion may be violated, we are always still in a well-defined state and so this violation does not lead to undefined behavior.

GUIDELINE: Remember that violating an assertion does not necessarily lead to undefined behavior.

GUIDELINE: Function authors, always document your function’s requirements on inputs (preconditions). The caller needs to know what inputs are and aren’t valid. The requirements that are reasonably checkable should be written as code so that the caller can perform the checks when testing their code.

GUIDELINE: Always satisfy the requirements of a function you call. Otherwise, you are feeding “garbage in,” and the best you can hope for is “garbage out.” Make sure your code’s tests includes verifying all the reasonably checkable preconditions of functions that it calls.

Writing the above pattern has two problems: First, it repeats the condition, which invites copy/paste errors. Second, it makes life harder for static analysis tools, which often trust assertions to be true in order to reduce false positive results, but then will think the fallback path is unreachable and so won’t properly analyze that path. So it’s better to use a helper to express the “either assert this or check it and do a fallback operation” in one shot, which always avoids repeating the condition, and could in principle help static analysis tools that are aware of this macro (yes, it would be nicer to do it without resorting to a macro, but it’s annoyingly difficult to write the early return without a macro, because a return statement inside a lambda doesn’t mean the same thing):

// Using a helper that asserts the condition or performs the fallback

#define ASSERT_OR_FALLBACK(B, ACTION) { \
    bool b = B;                         \
    assert(b);                          \
    if(!b) ACTION;                      \
}

some_result_value DoSomething( int x ) {
    ASSERT_OR_FALLBACK( x != 0, return error_value; );
    return sensible_result(x);
}

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

In Example 2(a), violating the assertion leads to undefined behavior, 1(a).

In Example 2(b), violating the assertion leads to unspecified behavior, 1(b). At buggy call sites, this could subsequently lead to undefined behavior.

In Example 2(c), violating the assertion leads to implementation-defined behavior, 1(c), which never in itself leads to  undefined behavior.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

There are many. Here is just one example, that happens to be nice because it is perfectly accurate.

Let’s say we have all the code examples in question 2, written using C assert today (or even with those assertions missing!), and then at some future time we get a version of standard C++ that can express them as preconditions. Then only in Example 2(a), where we can see that the function body (and possibly transitively its further callees with the help of inlining) exercises undefined behavior, a tool can infer the precondition annotation and add it mechanically, and get the benefit of diagnosing existing bugs at call sites:

// What a precondition-aware tool could generate for Example 2(a)

auto f( int* p ) 
    [[pre( p )]]  // can add this automatically: because a violation
                  // leads to undefined behavior, this precondition
                  // is guaranteed to never cause a false positive
{
    assert( p );
    *p = 42;
}

For example, after some future C++2x ships with contracts, a vendor could write an automated tool that goes through every open source C++ project on GitHub and mechanically generates a pull request to insert preconditions for functions like Example 2(a) – but not (b) or (c) – whether or not the assertion already exists, just by noticing the undefined behavior. And it can inject those contract preconditions with complete confidence that none of them will ever cause a false positive, that they will purely expose existing bugs at call sites when that call site is built with contract checking enabled. I would expect such tool to identify a good number of (at least latent if not actual) bugs, and be a boon for C++ users, and it’s possible only for functions in the category of 2(a).

“Automated adoption” of at least part of a new C++ feature, combined with “automatically identifies existing bugs” in today’s code, is a pretty good value proposition.

Acknowledgments

Thank you to the following for their comments on this material: Joshua Berne, Gabriel Dos Reis, Gábor Horváth, Andrzej Krzemieński, Ville Voutilainen.

Notes

[1] In the standard, there are two flavors of undefined behavior. The basic “undefined behavior” is allowed to enter your program only once you actually try to execute the undefined part. But some code is so extremely ill-formed (with magical names like “IF-NDR”) that its very existence in the program makes the entire program invalid, whether you try to execute it or not.

GotW #102: Assertions and “UB” (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. Now that we have considered assertions, postconditions, and preconditions in GotWs #97-101, let’s pause and reflect: To what extent does a failed contract imply “UB”… either the Hidden Dragon of Undefined Behavior, or the Crouching Tiger of Unspecified Behavior?

JG Question

1. Briefly, what is the difference among:

(a) undefined behavior

(b) unspecified behavior

(c) implementation-defined behavior

Guru Questions

2. For each of the following, write a short function of the form:

/*...function name and signature...*/
{
    assert( /*...some condition about the parameters...*/ );
    /*...do something with parameters...*/;
}

where if the assertion is not checked and is false then the effect:

(a) is always undefined behavior

(b) possibly results in undefined behavior

(c) is never undefined or unspecified behavior

3. Explain how your answers to Questions 1 and 2 do, or do not, correspond with each other.

4. BONUS: Describe a valuable service that a tool could perform for assertions that satisfy the requirement in 2(a), that is not possible for other assertions.

GotW #101 Solution: Preconditions, Part 2 (Difficulty: 7/10)

This special Guru of the Week series focuses on contracts. We covered some basics of preconditions in GotW #100. This time, let’s see how we can use preconditions in some practical examples…

1. Consider these functions, expanded from an article by Andrzej Krzemieński: [1] … How many ways could a caller of each function get the arguments wrong, but that would silently compile without error? Name as many different ways as you can.

There are several ways to break this down. I’ll use three major categories of possible mistakes, the first two of which overlap:

  • wrong order: passing an argument in the wrong position
  • wrong value: passing an argument with a valid but wrong value (e.g., index out of range)
  • invalid value: passing an argument that is already invalid (e.g., an invalid iterator)

Let’s see how these play out with our three examples, starting with (a).

(a) is_in_values (int val, int min, int max)

// Example 1: Adapted from [1]

auto is_in_values (int val, int min, int max)
  -> bool;  // true iff val is in the values [min, max]

Oh my, three identically typed integer parameters… what could be confusing about that?!

Wrong order (5 ways): First, there are five ways to pass these in the wrong order, because there are 3! = 6 permutations, all of which compile but only the first of which is correct:

is_in_values( v,  lo, hi );    // correct

is_in_values( v,  hi, lo );    // all these are wrong, but compile :(
is_in_values( lo, v,  hi ); 
is_in_values( lo, hi, v  ); 
is_in_values( hi, v,  lo ); 
is_in_values( hi, lo, v  );

Some of these argument orders may seem strange, but some are orders other libraries’ similar APIs might use which makes confusion easier, we all make mistakes… and the type system isn’t helping us at all.

Wrong value (1 way): Second, there is an implicit precondition that min <= max, so passing arguments where min > max would be wrong, but would silently compile. Some of these are exercised by the “wrong order” permutations above, but even call sites that remember the right argument order can make mistakes about the actual values.

Invalid value (0 ways): Finally, all possible values of an int are valid — some may be suspiciously big or small, but int doesn’t have the concept of “not a number” (NaN) as we have with floats, or the concept of “invalidated” like we have with iterators.

(b) is_in_container (int val, int idx_min, int idx_max)

It sure doesn’t help that the next function has the identical signature as is_in_values, but with very different meaning:

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool; // true iff container[i]==val for some i in [idx_min, idx_max]

Wrong order (5 ways): As in (a), we again have five ways to pass these in the wrong order, all of which compile but only the first is correct:

is_in_container( v,  lo, hi );    // correct

is_in_container( v,  hi, lo );    // all these are wrong, but compile :(
is_in_container( lo, v,  hi ); 
is_in_container( lo, hi, v  ); 
is_in_container( hi, v,  lo ); 
is_in_container( hi, lo, v  );

Wrong value (3 ways): Again as in (a), we have the implicit precondition that idx_min <= idx_max, so passing idx_min > idx_max would be wrong, but would silently compile. But this time there are two additional ways to go wrong, because idx_min and idx_max must both be valid subscripts into container, so if either is outside the range [0, container.size()) it is a valid integer but an out of bounds value for this use.

Invalid value (0 ways): Again as in (a), all possible values of an int are valid — though some may be wrong values if they’re out of bounds as we noted above, they’re still valid integers.

(c) is_in_range (T val, Iter first, Iter last)

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool; // true iff *i==val for some i in [first,last)

Wrong order (1 way): This time there’s only one way to pass the parameters in the wrong order (ignoring pathological cases where the same argument might convert both T and Iter):

is_in_container( v, istart, iend );    // correct

is_in_container( v, iend, istart );    // wrong, but compiles :(

Wrong value (2 ways): We could pass a first and last that are not a valid range in two ways:

  • they point into the same container, but first doesn’t precede last
  • they point into different containers

Invalid value (2 ways): And finally, either of first or last could actually be an invalidated iterator (e.g., dangling). For example, the container they point into may be destroyed so that both are invalid; or one of the two iterators might have been calculated before a more recent operation like vector::push_back that could have invalidated it.

But if the sight of these function signatures has had you pulling your hair and shouting “use the type system, Luke!” at your screen, you’re not alone… now let’s make things better.

2. Show how can you improve the function declarations in Question 1 by …

(a) just grouping parameters, using a struct with public variables

Interestingly, we actually get a lot of benefit simply by grouping ‘parameters that go together,’ using an creating an aggregate or “grouping” helper struct.[3] For example:

// Example 2(a)(i): Improving Example 1 with aggregate types

struct min_max { int min, max; };

auto is_in_values (int val, min_max minmax) -> bool;
auto is_in_container (int val, min_max rng) -> bool;

template <typename Iter> struct two_iters { Iter first, last; };

template <typename T, typename Iter>
auto is_in_range (T val, two_iters<Iter> rng) -> bool;

Or even just venerable anonymous std::pair is better than no grouping:

// Example 2(a)(ii): Improving Example 1 with aggregate types

auto is_in_values (int val, std::pair<int,int> minmax) -> bool;
auto is_in_container (int val, std::pair<int,int> rng) -> bool;

template <typename T, typename Iter>
auto is_in_range (T val, std::pair<Iter,Iter> rng) -> bool;

With either of the above, there’s only one way for callers to get the argument order wrong. And it requires only two extra characters at call sites, because we can use { } to group the arguments without creating actual named objects of the helper struct:

is_in_values( v, {lo, hi} );	// correct
is_in_values( v, {hi, lo} );	// wrong, but compiles

is_in_container( v, {lo, hi} );	// correct
is_in_container( v, {hi, lo} );	// wrong, but compiles

is_in_range( v, {i1, i2} );		// correct
is_in_range( v, {i2, i1} ); 	// wrong, but compiles

So just grouping parameters using a struct eliminates some errors. But really using the type system is even better…

(b) just using an encapsulated class, using a class with private variables (an abstraction with its own invariant)

Clearly all three functions are crying out for a “range”-like abstraction for its pair of parameters, in the first two cases a range of values and in the third a range of iterators. How do we know? Because:

Here’s one way we can apply class types we can find in the standard library or Boost today:

// Example 2(b): Improving Example 1 with encapsulated class types

auto is_in_values (int val, boost::integer_range<int> rng) -> bool;

auto is_in_container (int val, boost::integer_range<int> rng) -> bool;

template <typename T, std::ranges::input_range Range>
auto is_in_range (T val, Range&& rng) -> bool;

This gives us all the mistake-reduction goodness we got in (a), plus more.

First, as in (a), absent pathological conversions, it’s very difficult to get arguments in the wrong order simply because of being forced to group the parameters:

auto minmax = boost::irange(10, 100);
is_in_values( 42, minmax );

auto minmax2 = boost::irange(0, ssize(myvec)-1);
is_in_container( 42, minmax2 );

auto myvec = std::vector<int>();
is_in_range( 42, myvec );

But, unlike our helper structs in (a), we now get additional safety because the types can express constructor preconditions that move some of those mistakes (such as (hi,lo) misordering) to constructors of class abstractions that can then preserve them as invariants [4] – so the mistake can still be made but in fewer places, to where we construct or modify the abstracted object (e.g., range), rather than every time we use un-abstracted separately values (e.g., a couple of iterator objects we have lying around and whose relationship we have to maintain by hand over time). This is why we sometimes say “types are predicates,” because a type encapsulates a predicate, namely its invariant.

GUIDELINE: When multiple functions state the same precondition, it’s a telltale sign there’s a missing class that should turn it into an invariant. A repeated precondition is nearly always a “naked invariant” that should be encapsulated up inside a type. This is more obvious when the precondition involves multiple parameters (or ordinary variables for that matter); a poster child is the STL’s pervasive use of iterator pairs, which have long been crying out to be encapsulated using a range abstraction, and fortunately we now have that in C++20. Consider using a class instead.

GUIDELINE: Remember that a key reason why encapsulated classes are powerful is that they wrap up preconditions and turn them into invariants. Hiding data members is good dependency management because it limits the code that can depend on the details of the data and is responsible for maintaining the correct relationship among the data members.

(c) just using post-C++20 contract preconditions (not yet valid C++, but something like the syntax in [2])

Preconditions test values, so they can let us eliminate the “wrong values” kinds of mistakes. Consider this code:

// Example 2(c): Improving Example 1 with boolean preconditions

auto is_in_values (int val, int min, int max)
  -> bool // true iff val is in the values [min, max]
     [[pre (min <= max)]]
;

auto is_in_container (int val, int idx_min, int idx_max)
  -> bool // true iff container[i]==val for i in [idx_min, idx_max]
     [[pre (0       <= idx_min
         && idx_min <= idx_max
         && idx_max <  container.size())]]          // see note [5]
;

template <typename T, typename Iter>
auto is_in_range (T val, Iter first, Iter last)
  -> bool // true iff *i==val for some i in [first,last)
     [[pre (/*... is_reachable? is_not_dangling? hmm ...*/)]]
;

For the first two functions, we can write clear preconditions that can check the “wrong value” bugs.

In these particular examples, the best place to write the preconditions is right on the constructors of the class types we saw in (b), and if we write them there then we don’t have to repeat them as explicit contracts on every function.

But is (b) always better than (c), in other examples? This brings us to our last question, which is all about “can” versus “should”…

3. Consider these three examples, where each shows expressing a boolean condition either as a function precondition or as an encapsulated invariant inside a new type… In each of these cases, which way is better? Explain your answer.

In Question 2, writing a type was often the best choice, but it isn’t always.

The benefits to writing a type include:

  • Encapsulation. We limit the code that is responsible for maintaining the boolean condition.
  • Language support. We get the help of the type system to statically enforce requirements.

But there are costs and limitations too:

  • What’s the abstraction? There may not be a suitable one. We can’t write a good type unless we can discover a useful abstraction that the type’s interface should support. A good type represents a useful reusable domain abstraction that programmers can understand and that makes their code clearer by elevating the vocabulary of the code. There won’t always be a practical and reusable abstraction; when there isn’t, we won’t be able to write a useful and reusable type. — Even when there is, we have to design that all ahead of time, which requires a lot more advance knowledge and engineering than just writing ad-hoc boolean conditions on individual functions.
  • What’s the cost? It may not be feasible to maintain the invariant. We have to do any extra work it takes to maintain the invariant, and it has to be practical to do. When it isn’t, we can’t maintain the invariant without help from outside code, and so we won’t be able to really encapsulate it properly.
  • Does it make sense as an independent abstraction? Will the user be carrying around objects of this type, or are we just jamming a precondition common to a few functions (or only one) into a type and calling it useful? Occam’s Razor: Don’t multiply entities beyond necessity.
  • What’s the type the caller is using? This is where a real usable abstraction shines, because many callers will be using it independently of calling our function. But if the caller isn’t using this type, then there typically has to be an implicit or explicit conversion (because inheritance from all argument types our callers might already have usually isn’t an option), and that conversion would need to be usable and sufficiently cheap.

GUIDELINE: Remember that types and contracts are “better together.” Use both. They are complementary, neither is a substitute for the other. All we are trying to accomplish with contracts is to augment the language’s static type checking with runtime checking where that is more appropriate because we can’t design a practical abstraction. And this is why we want contracts on functions (preconditions, postconditions) even though we already have types, and why we also want contracts on types (invariants).

Let’s consider the three examples.

(a) A vector that is sorted

template <typename T>
void f( vector<T> const& v ) [[pre( is_sorted(v) )]] ;

template <typename T>
void f( sorted<vector<T>> const& v );

If this looks familiar, it’s because is_sorted is one of the classic examples we saw in GotW #98 of conditions that are often impractical to check and enforce as an assertion, in this case a precondition.

Can we do better by making it a type, perhaps a sorted wrapper around a container like vector that maintains the guarantee that it’s always sorted? Well, we have to answer some questions about a sorted<T>:

  • What’s the abstraction it provides? It can’t easily fulfill the requirements of a sequence container like vector itself; for example, push_back doesn’t make much sense because letting the caller insert an arbitrary value at a specific location would easily cause the container to be unsorted. Instead, it would naturally want a more general insert function instead, and the interface would be more like set. This part could be workable.
  • What’s the cost? This where it starts to breaks down: Keeping a vector sorted all the time means that every insertion would cost O(N) work all the time. Which leads into…
  • Does it make sense as an independent abstraction? … that it’s very common for code to maintain an “almost-sorted” vector, such as by inserting new elements at the end which is fast (and, hmm, affects our abstraction design, because then it would make sense to have push_back after all, wouldn’t it? hmm) but leaves a suffix of unsorted elements in the container, and then periodically sorting the whole container so that the sorting cost is amortized. But an almost-sorted vector isn’t good enough, and so doesn’t fit the bill. We don’t have empirical evidence of such types in general use.
  • What’s the type the caller is using? And now we’re busted all the way, because we want this interface to be usable by anyone who has a vector<T>, which would require a conversion to sorted<vector<T>>. If we do a deep copy, that’s prohibitively expensive. Even if the conversion is lightweight by avoiding a deep copy, such as by just wrapping an existing vector object, it wouldn’t be very useful unless it did O(N) work every time unconditionally to verify the invariant. And even then the abstraction design is affected and compromised: If the user can still see and modify the original vector, then that’s still part of the accessible interface to the data, so the user can make the container be not fully sorted and we’re unable to really encapsulate and maintain our intended invariant.

So is_sorted is much better as a function precondition.

// (b) A vector that is not empty

template <typename T>
void f( vector<T> const& v ) [[pre( !v.empty() )]] ;

template <typename T>
void f( not_empty<vector<T>> const& v );

This one is more feasible as a type, but still not ideal:

  • What’s the abstraction it provides? It’s a vector, and we can make the interface identical to vector with just extra preconditions on pop and erase functions to not remove the last element in the container.
  • What’s the cost? Emptiness is cheap enough to check and maintain.
  • Does it make sense as an independent abstraction? This is where it starts to get questionable… the answer is at best “maybe.” It’s not clear to me than a “nonempty vector” is a generally useful abstraction.
  • What’s the type the caller is using? This is where I think we break down again. Again, we want this interface to be usable by anyone who has a vector<T>, and that means a conversion to not_empty<vector<T>>. If we do a deep copy, that’s prohibitively expensive. This time if we just wrap an existing vector object to avoid the deep copy, the check is cheap. But then we still have the problem that the abstraction design is affected and compromised so that it can’t maintain its invariant, because if the user can still see and modify the original vector, they can remove the last element on us.

So not_empty seems better as a function precondition.

(c) A pointer that is not null

void f( int* p ) [[pre( p != nullptr )]] ;

void f( not_null<int*> p );

This time we can do better:

  • What’s the abstraction it provides? This one’s easy to state: It’s a not-null pointer. That’s a far simpler interface than a container, because we just need operator* and operator->, construction, destruction, and copying. Even so it’s not totally without subtlety, because not_null should not have move operations that modify the source object. This means that a not_null<unique_ptr<T>> is legal but there’s not much you can do with it besides dereference it and destroy it: It can’t be copyable because unique_ptr isn’t copyable, and it must not be movable because moving a unique_ptr leaves the source null.
  • What’s the cost? Nullness is cheap enough to check and maintain.
  • Does it make sense as an independent abstraction? Definitely. A “non-null pointer” has been widely rediscovered and reinvented as a generally useful abstraction.
  • What’s the type the caller is using? A not_null<int*> is a useful object in its own right in the calling code, independently of calling this particular function. And if our function is invoked by someone who has only an ordinary int*, doing a full copy of the pointer is cheap, and applying the nullness check as a precondition on that converting constructor is exactly equivalent to writing the precondition by hand, but is automated.

So not_null seems better as a type, primarily because it is independently useful. This is why it has been reinvented a number of times, including as gsl::not_null. [6]

GUIDELINE: Wherever practical, design interfaces so that incorrect call sites are illegal (won’t compile, using the type system) or loud (won’t pass unit tests, using preconditions). This is a key part of achieving the goal to “make interfaces easy to use correctly, and hard to use incorrectly.” Preconditions directly help with that by letting us catch entire groups of errors at test time, and are a complement to the type system which makes incorrect uses “not fit” through the compiler and also carries extra preconditions around for us in the form of invariants.

GUIDELINE: Remember that the type system is a hammer, and not every precondition is a nail. The type system is a powerful tool, but not every precondition is naturally (part of) an invariant of a useful type that provides a good reusable abstraction that’s generally useful independently of this function.

Notes

[1] A. Krzemieński. “Contracts, preconditions and invariants” (Andrzej’s C++ blog, December 2020).

[2] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[3] For 2(a) and 2(b), on platform ABIs that do not pass small structs/classes in registers, turning individual parameters into a struct/class could cause them to be passed in stack memory instead of in registers.

[4] Upcoming GotWs will cover invariants and violation handling.

[5] If C++ gets chained comparisons as proposed in P0515 and P0893 we could write this much more clearly, and with fewer opportunities for mistakes, as:

[[pre( 0 <= idx_min <= idx_max < container.size() )]]

[6] B. Stroustrup and H. Sutter (eds.) “I.12 Declare a pointer that must not be null as not_null” (C++ Core Guidelines.) If the not_null<T> type we are using is implicitly convertible from T, which is the intent of I.12 to provide a drop-in replacement for pointer parameters, then the usability is the same as with the precondition. Otherwise, the caller has to provide a not_null argument at the call site, either by doing an explicit conversion or by just using a not_null local variable in their own body.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gabriel Dos Reis, J. Daniel Garcia, Gábor Horváth, Andrzej Krzemieński, Bjarne Stroustrup, Andrew Sutton, Ville Voutilainen