C++ safety, in context

Scope. To talk about C++’s current safety problems and solutions well, I need to include the context of the broad landscape of security and safety threats facing all software. I chair the ISO C++ standards committee and I work for Microsoft, but these are my personal opinions and I hope they will invite more dialog across programming language and security communities.

Acknowledgments. Many thanks to people from the C, C++, C#, Python, Rust, MITRE, and other language and security communities whose feedback on drafts of this material has been invaluable, including: Jean-François Bastien, Joe Bialek, Andrew Lilley Brinker, Jonathan Caves, Gabriel Dos Reis, Daniel Frampton, Tanveer Gani, Daniel Griffing, Russell Hadley, Mark Hall, Tom Honermann, Michael Howard, Marian Luparu, Ulzii Luvsanbat, Rico Mariani, Chris McKinsey, Bogdan Mihalcea, Roger Orr, Robert Seacord, Bjarne Stroustrup, Mads Torgersen, Guido van Rossum, Roy Williams, Michael Wong.

Terminology (see ISO/IEC 23643:2020).Software security” (or “cybersecurity” or similar) means making software able to protect its assets from a malicious attacker. “Software safety” (or “life safety” or similar) means making software free from unacceptable risk of causing unintended harm to humans, property, or the environment. “Programming language safety” means a language’s (including its standard libraries’) static and dynamic guarantees, including but not limited to type and memory safety, which helps us make our software both more secure and more safe. When I say “safety” unqualified here, I mean programming language safety, which benefits both software security and software safety.

We must make our software infrastructure more secure against the rise in cyberattacks (such as on power grids, hospitals, and banks), and safer against accidental failures with the increased use of software in life-critical systems (such as autonomous vehicles and autonomous weapons).

The past two years in particular have seen extra attention on programming language safety as a way to help build more-secure and -safe software; on the real benefits of memory-safe languages (MSLs); and that C and C++ language safety needs to improve — I agree.

But there have been misconceptions, too, including focusing too narrowly on programming language safety as our industry’s primary security and safety problem — it isn’t. Many of the most damaging recent security breaches happened to code written in MSLs (e.g., Log4j) or had nothing to do with programming languages (e.g., Kubernetes Secrets stored on public GitHub repos).

In that context, I’ll focus on C++ and try to:

  • highlight what needs attention (what C++’s problem “is”), and how we can get there by building on solutions already underway;
  • address some common misconceptions (what C++’s problem “isn’t”), including practical considerations of MSLs; and
  • leave a call to action for programmers using all  languages.

tl;dr: I don’t want C++ to limit what I can express efficiently. I just want C++ to let me enforce our already-well-known safety rules and best practices by default, and make me opt out explicitly if that’s what I want. Then I can still use fully modern C++… just nicer.

Let’s dig in.

The immediate problem “is” that it’s Too Easy By Default™ to write security and safety vulnerabilities in C++ that would have been caught by stricter enforcement of known rules for type, bounds, initialization, and lifetime language safety

In C++, we need to start with improving these four categories. These are the main four sources of improvement provided by all the MSLs that NIST/NSA/CISA/etc. recommend using instead of C++ (example), so by definition addressing these four would address the immediate NIST/NSA/CISA/etc. issues with C++. (More on this under “The problem ‘isn’t’… (1)” below.)

And in all recent years including 2023 (see figures 1’s four highlighted rows, and figure 2), these four constitute the bulk of those oft-quoted 70% of CVEs (Common [Security] Vulnerabilities and Exposures) related to language memory unsafety. (However, that “70% of language memory unsafety CVEs” is misleading; for example, in figure 1, most of MITRE’s 2023 “most dangerous weaknesses” did not involve language safety and so are outside that denominator. More on this under “The problem ‘isn’t’… (3)” below.)

The C++ guidance literature already broadly agrees on safety rules in those categories. It’s true that there is some conflicting guidance literature, particularly in environments that ban exceptions or run-time type support and so use some alternative rules. But there is consensus on core safety rules, such as banning unsafe casts, uninitialized variables, and out-of-bounds accesses (see Appendix).

C++ should provide a way to enforce them by default, and require explicit opt-out where needed. We can and do write “good” code and secure applications in C++. But it’s easy even for experienced C++ developers to accidentally write “bad” code and security vulnerabilities that C++ silently accepts, and that would be rejected as safety violations in other languages. We need the standard language to help more by enforcing the known best practices, rather than relying on additional nonstandard tools to recommend them.

These are not the only four aspects of language safety we should address. They are just the immediate ones, a set of clear low-hanging fruit where there is both a clear need and clear way to improve (see Appendix).

Note: And safety categories are of course interrelated. For example, full type safety (that an accessed object is a valid object of its type) requires eliminating out-of-bounds accesses to unallocated objects. But, conversely, full bounds safety (that accessed memory is inside allocated bounds) similarly requires eliminating type-unsafe downcasts to larger derived-type objects that would appear to extend beyond the actual allocation.

Software safety is also important. Cyberattacks are urgent, so it’s natural that recent discussions have focused more on security and CVEs first. But as we specify and evolve default language safety rules, we must also include our stakeholders who care deeply about functional safety issues that are not reflected in the major CVE buckets but are just as harmful to life and property when left in code. Programming language safety helps both software security and software safety, and we should start somewhere, so let’s start (but not end) with the known pain points of security CVEs.

In those four buckets, a 10-50x improvement (90-98% reduction) is sufficient

If there were 90-98% fewer C++ type/bounds/initialization/lifetime vulnerabilities we wouldn’t be having this discussion. All languages have CVEs, C++ just has more (and C still more). [Updated: Removed count of 2024 Rust vs C/C++ CVEs because MITRE.org search doesn’t have a great way of accurately counting the latter.] So zero isn’t the goal; something like a 90% reduction is necessary, and a 98% reduction is sufficient, to achieve security parity with the levels of language safety provided by MSLs… and has the strong benefit that I believe it can be achieved with perfect backward link compatibility (i.e., without changing C++’s object model, and its lifetime model which does not depend on universal tracing garbage collection and is not limited to tree-based data structures) which is essential to our being able to adopt the improvements in existing C++ projects as easily as we can adopt other new editions of C++. — After that, we can pursue additional improvements to other buckets, such as thread safety and overflow safety.

Aiming for 100%, or zero CVEs in those four buckets, would be a mistake:

  • 100% is not necessary because none of the MSLs we’re being told to use instead are there either. More on this in “The problem ‘isn’t’… (2)” below.
  • 100% is not sufficient because many cyberattacks exploit security weaknesses other than memory safety.

And getting that last 2% would be too costly, because it would require giving up on link compatibility and seamless interoperability (or “interop”) with today’s C++ code. For example, Rust’s object model and borrow checker deliver great guarantees, but require fundamental incompatibility with C++ and so make interop hard beyond the usual C interop level. One reason is that Rust’s safe language pointers are limited to expressing tree-shaped data structures that have no cycles; that unique ownership is essential to having great language-enforced aliasing guarantees, but it also requires programmers to use ‘something else’ for anything more complex than a tree (e.g., using Rc, or using integer indexes as ersatz pointers); it’s not just about linked lists but those are a simple well-known illustrative example.

If we can get a 98% improvement and still have fully compatible interop with existing C++, that would be a holy grail worth serious investment.

A 98% reduction across those four categories is achievable in new/updated C++ code, and partially in existing code

Since at least 2014, Bjarne Stroustrup has advocated addressing safety in C++ via a “subset of a superset”: That is, first “superset” to add essential items not available in C++14, then “subset” to exclude the unsafe constructs that now all have replacements.

As of C++20, I believe we have achieved the “superset,” notably by standardizing span, string_view, concepts, and bounds-aware ranges. We may still want a handful more features, such as a null-terminated zstring_view, but the major additions already exist.

Now we should “subset”: Enable C++ programmers to enforce best practices around type and memory safety, by default, in new code and code they can update to conform to the subset. Enabling safety rules by default would not limit the language’s power but would require explicit opt-outs for non-standard practices, thereby reducing inadvertent risks. And it could be evolved over time, which is important because C++ is a living language and adversaries will keep changing their attacks.

ISO C++ evolution is already pursuing Safety Profiles for C++. The suggestions in the Appendix are refinements to that, to demonstrate specific enforcements and to try to maximize their adoptability and useful impact. For example, everyone agrees that many safety bugs will require code changes to fix. However, how many safety bugs could be fixed without manual source code changes, so that just recompiling existing code with safety profiles enabled delivers some safety benefits? For example, we could by default inject a call-site bounds check 0 <= b < a.size() on every subscript expression a[b] when a.size() exists and a is a contiguous container, without requiring any source code changes and without upgrading to a new internally bounds-checked container library; that checking would Just Work out of the box with every contiguous C++ standard container, span, string_view, and third-party custom container with no library updates needed (including therefore also no concern about ABI breakage).

Rules like those summarized in the Appendix would have prevented (at compile time, test time or run time) most of the past CVEs I’ve reviewed in the type, bounds, and initialization categories, and would have prevented many of the lifetime CVEs. I estimate a roughly 98% reduction in those categories is achievable in a well-defined and standardized way for C++ to enable safety rules by default, while still retaining perfect backward link compatibility. See the Appendix for a more detailed description.

We can and should emphasize adoptability and benefit also for C++ code that cannot easily be changed. Any code change to conform to safety rules carries a cost; worse, not all code can be easily updated to conform to safety rules (e.g., it’s old and not understood, it belongs to a third party that won’t allow updates, it belongs to a shared project that won’t take upstream changes and can’t easily be forked). That’s why above (and in the Appendix) I stress that C++ should seriously try to deliver as many of the safety improvements as practical without requiring manual source code changes, notably by automatically making existing code do the right thing when that is clear (e.g., the bounds checks mentioned above, or emitting static_cast pointer downcasts as effectively dynamic_cast without requiring the code to be changed), and by offering automated fixits that the programmer can choose to apply (e.g., to change the source for static_cast pointer downcasts to actually say dynamic_cast). Even though in many cases a programmer will need to thoughtfully update code to replace inherently unsafe constructs that can’t be automatically fixed, I believe for some percentage of cases we can deliver safety improvements by just recompiling existing code in the safety-rules-by-default mode, and we should try because it’s essential to maximizing safety profiles’ adoptability and impact.

What the problem “isn’t”: Some common misconceptions

(1) The problem “isn’t” defining what we mean by “C++’s most urgent language safety problem.” We know the four kinds of safety that most urgently need to be improved: type, bounds, initialization, and lifetime safety.

We know these four are the low-hanging fruit (see “The problem ‘is’…” above). It’s true that these are just four of perhaps two dozen kinds of “safety” categories, including ones like safe integer arithmetic. But:

  • Most of the others are either much smaller sources of problems, or are primarily important because they contribute to those four main categories. For example, the integer overflows we care most about are indexes and sizes, which fall under bounds safety.
  • Most MSLs don’t address making these safe by default either, typically due to the checking cost. But all languages (including C++) usually have libraries and tools to address them. For example, Microsoft ships a SafeInt library for C++ to handle integer overflows, which is opt-in. C# has a checked arithmetic language feature to handle integer overflows, which is opt-in. Python’s built-in integers are overflow-safe by default because they automatically expand; however, the popular NumPy fixed-size integer types do not check for overflow by default and require using checked functions, which is opt-in.

Thread safety is obviously important too, and I’m not ignoring it. I’m just pointing out that it is not one of the top target buckets: Most of the MSLs that NIST/NSA/CISA/etc. recommend over C++ (except uniquely Rust, and to a lesser extent Python) address thread safety impact on user data corruption about as well as C++. The main improvement MSLs give is that a program data race will not corrupt the language’s own virtual machine (whereas in C++ a data race is currently all-bets-are-off undefined behavior). Some languages do give some additional protection, such as that Python guarantees two racing threads cannot see a torn write of an integer and reduces other possible interleavings because of the global interpreter lock (GIL).

(2) The problem “isn’t” that C++ code is not formally provably safe.

Yes, C++ code makes it too easy to write silently-unsafe code by default (see “The problem ‘is’…” above).

But I’ve seen some people claim we need to require languages to be formally provably safe, and that would be a bridge too far. Much to the chagrin of CS theorists, mainstream commercial programming languages aren’t formally provably safe. Consider some examples:

  • None of the widely-used languages we view as MSLs (except uniquely Rust) claim to be thread-safe and race-free by construction, as covered in the previous section. Yet we still call C#, Go, Java, Python, and similar languages “safe.” Therefore, formally guaranteeing thread safety properties can’t be a requirement to be considered a sufficiently safe language.
  • That’s because a language’s choice of safety guarantees is a tradeoff: For example, in Rust, safe code uses tree-based dynamic data structures only. This feature lets Rust deliver stronger thread safety guarantees than other safe languages, because it can more easily reason about and control aliasing. However, this same feature also requires Rust programs to use unsafe code more often to represent common data structures that do not require unsafe code to represent in other MSLs such as C# or Java, and so 30% to 50% of Rust crates use unsafe code, compared for example to 25% of Java libraries.
  • C#, Java, and other MSLs still have use-before-initialized and use-after-destroyed type safety problems too: They guarantee not accessing memory outside its allocated lifetime, but object lifetime is a subset of memory lifetime (objects are constructed after, and destroyed/disposed before, the raw memory is allocated and deallocated; before construction and after dispose, the memory is allocated but contains “raw bits” that likely don’t represent a valid object of its type). If you doubt, please run (don’t walk) and ask ChatGPT about Java and C# problems with: access-unconstructed-object bugs (e.g., in those languages, any virtual call in a constructor is “deep” and executes in a derived object before the derived object’s state is initialized); use-after-dispose bugs; “resurrection” bugs; and why those languages tell people never to use their finalizers. Yet these are great languages and we rightly consider them safe languages. Therefore, formally guaranteeing no-use-before-initialized and no-use-after-dispose can’t be a requirement to be considered a sufficiently safe language.
  • Rust, Go, and other languages support sanitizers too, including ThreadSanitizer and undefined behavior sanitizers, and related tools like fuzzers. Sanitizers are known to be still needed as a complement to language safety, and not only for when programmers use ‘unsafe’ code; furthermore, they go beyond finding memory safety issues. The uses of Rust at scale that I know of also enforce use of sanitizers. So using sanitizers can’t be an indicator that a language is unsafe — we should use the supported sanitizers for code written in any language.

Note: “Use your sanitizers” does not mean to use all of them all the time. Some sanitizers conflict with each other, so you can only use those one at a time. Some sanitizers are expensive, so they should only be run periodically. Some sanitizers should not be run in production, including because their presence can create new security vulnerabilities.

(3) The problem “isn’t” that moving the world’s C and C++ code to memory-safe languages (MSLs) would eliminate 70% of security vulnerabilities.

MSLs are wonderful! They just aren’t a silver bullet.

An oft-quoted number is that “70%” of programming language-caused CVEs (reported security vulnerabilities) in C and C++ code are due to language safety problems. That number is true and repeatable, but has been badly misinterpreted in the press: No security expert I know believes that if we could wave a magic wand and instantly transform all the world’s code to MSLs, that we’d have 70% fewer CVEs, data breaches, and ransomware attacks. (For example, see this February 2024 example analysis paper.)

Consider some reasons.

  • That 70% is of the subset of security CVEs that can be addressed by programming language safety. See figure 1 again: Most of 2023’s top 10 “most dangerous software weaknesses” were not related to memory safety. Many of 2023’s largest data breaches and other cyberattacks and cybercrime had nothing to do with programming languages at all. In 2023, attackers reduced their use of malware because software is getting hardened and endpoint protection is effective (CRN), and attackers go after the slowest animal in the herd. Most of the issues listed in NISTIR-8397 affect all languages equally, as they go beyond memory safety (e.g., Log4j) or even programming languages (e.g., automated testing, hardcoded secrets, enabling OS protections, string/SQL injections, software bills of materials). For more detail see the Microsoft response to NISTIR-8397, for which I was the editor. (More on this in the Call to Action.)
  • MSLs get CVEs too, though definitely fewer (again, e.g., Log4j). For example, see MITRE list of Rust CVEs, including six so far in 2024. And all programs use unsafe code; for example, see the Conclusions section of Firouzi et al.’s study of uses of C#’s unsafe on StackOverflow and prevalence of vulnerabilities, and that all programs eventually call trusted native libraries or operating system code.
  • Saying the quiet part out loud: CVEs are known to be an imprecise metric. We use it because it’s the metric we have, at least for security vulnerabilities, but we should use it with care. This may surprise you, as it did me, because we hear a lot about CVEs. But whenever I’ve suggested improvements for C++ and measuring “success” via a reduction in CVEs (including in this essay), security experts insist to me that CVEs aren’t a great metric to use… including the same experts who had previously quoted the 70% CVE number to me. — Reasons why CVEs aren’t a great metric include that CVEs are self-reported and often self-selected, and not all are equally exploitable; but there can be pressure to report a bug as a vulnerability even if there’s no reasonable exploit because of the benefits of getting one’s name on a CVE. In August 2023, the Python Software Foundation became a CVE Numbering Authority (CNA) for Python and pip distributions, and now has more control over Python and pip CVEs. The C++ community has not done so.
  • CVEs target only software security vulnerabilities (cyberattacks and intrusions), and we also need to consider software safety (life-critical systems and unintended harm to humans).

(4) The problem “isn’t” that C++ programmers aren’t trying hard enough / using the existing tools well enough. The challenge is making it easier to enable them.

Today, the mitigations and tools we do have for C++ code are an uneven mix, and all are off-by-default:

  • Kind. They are a mix of static tools, dynamic tools, compiler switches, libraries, and language features.
  • Acquisition. They are acquired in a mix of ways: in-the-box in the C++ compiler, optional downloads, third-party products, and some you need to google around to discover.
  • Accuracy. Existing rulesets mix rules with low and high false positives. The latter are effectively unadoptable by programmers, and their presence makes it difficult to “just adopt this whole set of rules.”
  • Determinism. Some rules, such as ones that rely on interprocedural analysis of full call trees, are inherently nondeterministic (because an implementation gives up when fully evaluating a case exceeds the space and time available; a.k.a. “best effort” analysis). This means that two implementations of the identical rule can give different answers for identical code (and therefore nondeterministic rules are also not portable, see below).
  • Efficiency. Existing rulesets mix rules with low and high (and sometimes impossible) cost to diagnose. The rules that are not efficient enough to implement in the compiler will always be relegated to optional standalone tools.
  • Portability. Not all rules are supported by all vendors. “Conforms to ISO/IEC 14882 (Standard C++)” is the only thing every C++ tool vendor supports portably.

To address all these points, I think we need the C++ standard to specify a mode of well-agreed and low-or-zero-false-positive deterministic rules that are sufficiently low-cost to implement in-the-box at build time.

Call(s) to action

As an industry generally, we must make a major improvement in programming language memory safety — and we will.

In C++ specifically, we should first target the four key safety categories that are our perennial empirical attack points (type, bounds, initialization, and lifetime safety), and drive vulnerabilities in these four areas down to the noise for new/updated C++ code — and we can.

But we must also recognize that programming language safety is not a silver bullet to achieve cybersecurity and software safety. It’s one battle (not even the biggest) in a long war: Whenever we harden one part of our systems and make that more expensive to attack, attackers always switch to the next slowest animal in the herd. Many of 2023’s worst data breaches did not involve malware, but were caused by inadequately stored credentials (e.g., Kubernetes Secrets on public GitHub repos), misconfigured servers (e.g., DarkBeam, Kid Security), lack of testing, supply chain vulnerabilities, social engineering, and other problems that are independent of programming languages. Apple’s white paper about 2023’s rise in cybercrime emphasizes improving the handling, not of program code, but of the data: “it’s imperative that organizations consider limiting the amount of personal data they store in readable format while making a greater effort to protect the sensitive consumer data that they do store [including by using] end-to-end [E2E] encryption.”

No matter what programming language we use, security hygiene is essential:

  • Do use your language’s static analyzers and sanitizers. Never pretend using static analyzers and sanitizers is unnecessary “because I’m using a safe language.” If you’re using C++, Go, or Rust, then use those languages’ supported analyzers and sanitizers. If you’re a manager, don’t allow your product to be shipped without using these tools. (Again: This doesn’t mean running all sanitizers all the time; some sanitizers conflict and so can’t be used at the same time, some are expensive and so should be used periodically, and some should be run only in testing and never in production including because their presence can create new security vulnerabilities.)
  • Do keep all your tools updated. Regular patching is not just for iOS and Windows, but also for your compilers, libraries, and IDEs.
  • Do secure your software supply chain. Do use package management for library dependencies. Do track a software bill of materials for your projects.
  • Don’t store secrets in code. (Or, for goodness’ sake, on GitHub!)
  • Do configure your servers correctly, especially public Internet-facing ones. (Turn authentication on! Change the default password!)
  • Do keep non-public data encrypted, both when at rest (on disk) and when in motion (ideally E2E… and oppose proposed legislation that tries to neuter E2E encryption with ‘backdoors only good guys will use’ because there’s no such thing).
  • Do keep investing long-term in keeping your threat modeling current, so that you can stay adaptive as your adversaries keep trying different attack methods.

We need to improve software security and software safety across the industry, especially by improving programming language safety in C and C++, and in C++ a 98% improvement in the four most common problem areas is achievable in the medium term. But if we focus on programming language safety alone, we may find ourselves fighting yesterday’s war and missing larger past and future security dangers that affect software written in any language.

Sadly, there are too many bad actors. For the foreseeable future, our software and data will continue to be under attack, written in any language and stored anywhere. But we can defend our programs and systems, and we will.

Be well, and may we all keep working to have a safer and more secure 2024.

Appendix: Illustrating why a 98% reduction is feasible

This Appendix exists to support why I think a 98% reduction in type/bounds/initialization/lifetime CVEs in C++ code is believable. This is not a formal proposal, but an overview of concrete ways to achieve such an improvement it in new and updatable code, and ways to even get some fraction of that improvement in existing code we cannot update but can recompile. These notes are aligned with the proposals currently being pursued in the ISO C++ safety subgroup, and if they pan out as I expect in ongoing discussions and experiments, then I intend to write further details about them in a future paper.

There are runtime and code size overheads to some of the suggestions in all four buckets, notably checking bounds and casts. But there is no reason to think those overheads need to be inherently worse in C++ than other languages, and we can make them on by default and still provide a way to opt out to regain full performance where needed.

Note: For example, bounds checking can cause a major impact on some hot loops, when using a compiler whose optimizer does not hoist bounds checks; not only can the loops incur redundant checking, but they also may not get other optimizations such as not being vectorized. This is why making bounds-checking on by default is good, but all performance-oriented languages also need to provide a way to say “trust me” and explicitly opt out of bounds checking tactically where needed.

This appendix refers to the “profiles” in the C++ Core Guidelines safety profiles, a set of about two dozen enforceable rules for type and memory safety of which I am a coauthor. I refer to them only as examples, to show “what” already-known rules exist that we can enforce, to support that my claimed improvement is possible. They are broadly consistent with rules in other sources, such as: The C++ Programming Language’s advice on type safety; C++ Coding Standards’ section on type safety; the Joint Strike Fighter Coding Standards; High Integrity C++; the C++ Core Guidelines section on safety profiles (a small enforceable set of safety rules); and the recently-released MISRA C++:2023.

The best way for “how” to let the programmer control enabling those rules (e.g., via source code annotations, compiler switches, and/or something else) is an orthogonal UX issue that is now being actively discussed in the C++ standards committee and community.

Type safety

Enforce the Pro.Type safety profile by default. That includes either banning or checking all unsafe casts and conversions (e.g., static_cast pointer downcasts, reinterpret_cast), including implicit unsafe type punning via C union and vararg.

However, these rules haven’t yet been systematically enforced in the industry. For example, in recent years I’ve painfully observed a significant set of type safety-caused security vulnerabilities whose root cause was that code used static_cast instead of dynamic_cast for pointer downcasts, and “C++” gets blamed even when the actual problem was failure to follow the well-publicized guidance to use the language’s existing safe recommended feature. It’s time for a standardized C++ mode that enforces these rules by default.

Note: On some platforms and for some applications, dynamic_cast has problematic space and time overheads that hinder its use. Many implementations bundle dynamic_cast indivisibly with all C++ run-time typing (RTTI) features (e.g., typeid), and so require storing full potentially-heavyweight RTTI data even though dynamic_cast needs only a small subset. Some implementations also use needlessly inefficient algorithms for dynamic_cast itself. So the standard must encourage (and, if possible, enforce for conformance, such as by setting algorithmic complexity requirements) that dynamic_cast implementations be more efficient and decoupled from other RTTI overheads, so that programmers do not have a legitimate performance reason not to use the safe feature. That decoupling could require an ABI break; if that is unacceptable, the standard must provide an alternative lightweight facility such as a fast_dynamic_cast that is separate from (other) RTTI and performs the dynamic cast with minimum space and time cost.

Bounds safety

Enforce the Pro.Bounds safety profile by default, and guarantee bounds checking. We should additionally guarantee that:

  • Pointer arithmetic is banned (use std::span instead); this enforces that a pointer refers to a single object. Array-to-pointer decay, if allowed, will point to only the first object in the array.
  • Only bounds-checked iterator arithmetic is allowed (also, prefer ranges instead).
  • All subscript operations are bounds-checked at the call site, by having the compiler inject an automatic subscript bounds check on every expression of the form a[b], where a is a contiguous sequence with a size/ssize function and b is an integral index. When a violation happens, the action taken can be customized using a global bounds violation handler; some programs will want to terminate (the default), others will want to log-and-continue, throw an exception, integrate with a project-specific critical fault infrastructure.

Importantly, the latter explicitly avoids implementing bounds-checking intrusively for each individual container/range/view type. Implementing bounds-checking non-intrusively and automatically at the call site makes full bounds checking available for every existing standard and user-written container/range/view type out of the box: Every subscript into a vector, span, deque, or similar existing type in third-party and company-internal libraries would be usable in checked mode without any need for a library upgrade.

It’s important to add automatic call-site checking now before libraries continue adding more subscript bounds checking in each library, so that we avoid duplicating checks at the call site and in the callee. As a counterexample, C# took many years to get rid of duplicate caller-and-callee checking, but succeeded and .NET Core addresses this better now; we can avoid most of that duplicate-check-elimination optimization work by offering automatic call-site checking sooner.

Language constructs like the range-for loop are already safe by construction and need no checks.

In cases where bounds checking incurs a performance impact, code can still explicitly opt out of the bounds check in just those paths to retain full performance and still have full bounds checking in the rest of the application.

Initialization safety

Enforce initialization-before-use by default. That’s pretty easy to statically guarantee, except for some cases of the unused parts of lazily constructed array/vector storage. Two simple alternatives we could enforce are (either is sufficient):

  • Initialize-at-declaration as required by Pro.Type and ES.20; and possibly zero-initialize data by default as currently proposed in P2723. These two are good but with some drawbacks; both have some performance costs for cases that require ‘dummy’ writes that are never used but hard for optimizers to eliminate, and the latter has some correctness costs because it ‘fixing’ some uninitialized cases where zero is a valid value but masks others for which zero is not a valid initializer and so the behavior is still wrong, but because a zero has been jammed in it’s harder for sanitizers to detect.
  • Guaranteed initialization-before-use, similar to what Ada and C# successfully do. This is still simple to use, but can be more efficient because it avoids the need for artificial ‘dummy’ writes, and can be more flexible because it allows alternative constructors to be used for the same object on different paths. For details, see: example diagnostic; definite-first-use rules.

Lifetime safety

Enforce the Pro.Lifetime safety profile by default, ban manual allocation by default, and guarantee null checking. The Lifetime profile is a static analysis that diagnoses many common sources of dangling and use-after-free, including for iterators and views (not just raw pointers and references), in a way that is efficient enough to run during compilation. It can be used as a basis to iterate on and further improve. And we should additionally guarantee that:

  • All manual memory management is banned by default (new, delete, malloc, and free). Corollary: ‘Owning’ raw pointers are banned by default, since they require delete or free. Use RAII instead, such as by calling make_unique or make_shared.
  • All dereferences are null-checked. The compiler injects an automatic check on every expression of the form *p or p-> where p can be compared to nullptr to null-check all dereferences at the call site (similar to bounds checks above). When a violation happens, the action taken can be customized using a global null violation handler; some programs will want to terminate (the default), others will want to log-and-continue, throw an exception, integrate with a project-specific critical fault infrastructure.

Note: The compiler could choose to not emit this check (and not perform optimizations that benefit from the check) when targeting platforms that already trap null dereferences, such as platforms that mark low memory pages as unaddressable. Some C++ features, such as delete, have always done call-site null checking.

Reducing undefined behavior and semantic bugs

Tactically, reduce some undefined behavior (UB) and other semantic bugs (pitfalls), for cases where we can automatically diagnose or even fix well-known antipatterns. Not all UB is bad; any performance-oriented language needs some. But we know there is low-hanging fruit where the programmer’s intent is clear and any UB or pitfall is a definite bug, so we can do one of two things:

(A – Good) Make the pitfall a diagnosed error, with zero false positives — every violation is a real bug. Two examples mentioned above are to automatically check a[b] to be in bounds and *p and p-> to be non-null.

(B – Ideal) Make the code actually do what the programmer intended, with zero false positives — i.e., fix it by just recompiling. An example, discussed at the most recent ISO C++ November 2023 meeting, is to default to an implicit return *this; when the programmer writes an assignment operator for their type C that returns a C& (note: the same type), but forgets to write a return statement. Today, that is undefined behavior. Yet it’s clear that the programmer meant return *this; — nothing else can be valid. If we make return *this; be the default, all the existing code that accidentally omits the return is not just “no longer UB,” but is guaranteed to do the right and intended thing.

An example of both (A) and (B) is to support chained comparisons, that makes the mathematically valid chains work correctly and rejects the mathematically invalid ones at compile time. Real-world code does write such chains by accident (see: [a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [k]).

  • For (A): We can reject all mathematically invalid chains like a != b > c at compile time. This automatically diagnoses bugs in existing code that tries to do such nonsense chains, with perfect accuracy.
  • For (B): We can fix all existing code that writes would-be-correct chains like 0 <= index < max. Today those silently compile but are completely wrong, and we can make them mean the right thing. This automatically fixes those bugs, just by recompiling the existing code.

These examples are not exhaustive. We should review the list of UB in the standard for a more thorough list of cases we can automatically fix (ideally) or diagnose.

Summarizing: Better defaults for C++

C++ could enable turning safety rules on by default that would make code:

  • fully type-safe,
  • fully bounds-safe,
  • fully initialization-safe,

and for lifetime safety, which is the hardest of the four, and where I would expect the remaining vulnerabilities in these categories would mostly lie:

  • fully null-safe,
  • fully free of owning raw pointers,
  • with lifetime-safety static analysis that diagnoses most common pointer/iterator/view lifetime errors;

and, finally:

  • with less undefined behavior including by automatically fixing existing bugs just by recompiling code with safety enabled by default.

All of this is efficiently implementable and has been implemented. Most of the Lifetime rules have been implemented in Visual Studio and CLion, and I’m prototyping a proof-of-concept mode of C++ that includes all of the other above language safeties on-by-default in my cppfront compiler, as well as other safety improvements including an implementation of the current proposal for ISO C++ contracts. I haven’t yet used the prototype at scale. However, I can report that the first major change request I received from early users was to change the bounds checking and null checking from opt-in (off by default) to opt-out (on by default).

Note: Please don’t be distracted by that cppfront uses an experimental alternate syntax for C++. That’s because I’m additionally trying to see if we can reach a second orthogonal goal: to make the C++ language itself simpler, and eliminate the need to teach ~90% of the C++ guidance literature related to language complexity and quirks. This essay’s language safety improvements are orthogonal to that, however, and can be applied equally to today’s C++ syntax.

Solutions need to distinguish between (A) “solution for new-or-updatable code” and (B) “solution for existing code.”

(A) A “solution for new-or-updatable code” means that to help existing code we have to change/rewrite our code. This includes not only “(re)write in C#/Rust/Go/Python/…,” but also “annotate your code with SAL” or “change your code to use std::span.”

One of the costs of (A) is that anytime we write/change code to fix bugs, we also introduce new bugs; change is never free. We need to recognize that changing our code to use std::span often means non-trivially rewriting parts of it which can also create other bugs. Even annotating our code means writing annotations that can have bugs (this is a common experience in the annotation languages I’ve seen used at scale, such as SAL). All these are significant adoption barriers.

Actually switching to another language means losing a mature ecosystem. C++ is the well-trod path: It’s taught, people know it, the tools exist, interop works, and current regulations have an industry around C++ (such as for functional safety). It takes another decade at least for another language to become the well-trod path, whereas a better C++, and its benefits to the industry broadly, can be here much sooner.

(B) A “solution for existing code” emphasizes the adoptability benefits of not having to make manual code changes. It includes anything that makes existing code more secure with “just a recompile” (i.e., no binary/ABI/link issues; e.g., ASAN, compiler switches to enable stack checks, static analysis that produces only true positives, or a reliable automated code modernizer).

We will still need (B) no matter how successful new languages or new C++ types/annotations are. And (B) has the strong benefit that it is easier to adopt. Getting to a 98% reduction in CVEs will require both (A) and (B), but if we can deliver even a 30% reduction using just (B) that will be a major benefit for adoption and effective impact in large existing code bases that are hard to change.

Consider how the ideas earlier in this appendix map onto (A) and (B):

In C++, by default enforce …(A) Solution for new/updated code (can require code changes — no link/binary changes)(B) Solution for existing code (requires recompile only — no manual code changes, no link/binary changes)
Type safetyBan all inherently unsafe casts and conversionsMake unsafe casts and conversions with a safe alternative do the safe thing
Bounds safetyBan pointer arithmetic Ban unchecked iterator arithmeticCheck in-bounds for all allowed iterator arithmetic Check in-bounds for all subscript operations
Initialization safetyRequire all variables to be initialized (either at declaration, or before first use)
Lifetime safetyStatically diagnose many common pointer/iterator lifetime error casesCheck not-null for all pointer dereferences
Less undefined behaviorStatically diagnose known UB/bug cases, to error on actual bugs in existing code with just a recompile and zero false positives:
Ban mathematically invalid comparison chains
(add additional cases from UB Annex review)
Automatically fix known UB/bug cases, to make current bugs in existing code be actually correct with just a recompile and zero false positives:
Define mathematically valid comparison chains
Default return *this; for C assignment operators that return C&
(add additional cases from UB Annex review)

By prioritizing adoptability, we can get at least some of the safety benefits just by recompiling existing code, and make the total improvement easier to deploy even when code updates are required. I think that makes it a valuable strategy to pursue.

Finally, please see again the main post’s conclusion: Call(s) to action.

Effective Concurrency: Live online course in April

I generally give one or two courses a year on C++ and related technologies. This year, on April 22-25, I’ll be giving a live online public course for four half-days, on the topic of high-performance low-latency coding in C++ — and the early registration discount is available for a few more days until this Thursday:

Effective Concurrency with Herb Sutter

High performance and low latency code, via concurrency and parallelism

22-25th April 2024, from 14:00 – 18:00 CEST daily

Participants in this intensive course will acquire the knowledge and skills required to write high-performance and low-latency code on today’s modern systems using modern C++. Presented by Alfasoft.

See the course link for details and a syllabus of topics that will be covered.

The times are intended to be friendly to the home time zones of attendees anywhere in EMEA and also to early risers in the Americas. If you live in a part of the world where these times can’t work for you, and you’d like another offering of the course that is friendlier to your home time zone, please email Alfasoft to let them know!

Because “high-performance low-latency” is kind of C++’s bailiwick, and because it’s my course, you’ll be unsurprised to learn that the topics and code focus on C++ and include coverage of modern C++17/20/23 features. But we are polyglots, after all… so don’t be overly shocked that I may sometimes show a few code examples in other popular languages, if only for comparison and to show how the other half lives.

Trip report: Autumn ISO C++ standards meeting (Kona, HI, USA)

Today, the ISO C++ committee completed its second meeting of C++26, held in Kona, HI, USA.

Our hosts, Standard C++ Foundation and WorldQuant, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We had over 170 attendees, about two-thirds in-person and the others remote via Zoom, formally representing 21 nations. Also, at each meeting we regularly have new attendees who have never attended before, and this time there were over a dozen new first-time attendees, mostly in-person; to all of them, once again welcome!

The committee currently has 23 active subgroups, most of which met in parallel tracks throughout the week. Some groups ran all week, and others ran for a few days or a part of a day and/or evening, depending on their workloads. You can find a brief summary of ISO procedures here.

This week’s meeting: Meeting #2 of C++26

At the previous meeting in June, the committee adopted the first 40 proposed changes for C++26, including many that had been ready for a couple of meetings and were just waiting for the C++26 train to open to be adopted. For those highlights, see the previous trip report.

This time, the committee adopted the next set of features for C++26. It also made significant progress on other features that are now expected to be complete in time for C++26 — including contracts and reflection.

Here are some of the highlights…

Adopted for C++26: Core language changes/features

The core language adopted four papers, including P2662R3 “Pack indexing” by Corentin Jabot and Pablo Halpern officially adds support for using [idx] subscripting into variadic parameter packs. Here is an example from the paper that will now be legal:

template <typename... T>
constexpr auto first_plus_last(T... values) -> T...[0] {
    return T...[0](values...[0] + values...[sizeof...(values)-1]);
}

int main() {
    static_assert( first_plus_last(1, 2, 10) == 11 );
}

For those interested in writing standards proposals, I would suggest looking at this and its two predecessors P1858 and P2632 as well written papers: The earlier papers delve into the motivating use cases, and this paper has a detailed treatment of other design alternatives considered and why this is the one chosen. Seeing only the end result of T...[0] would be easy to call “obvious” in hindsight, but it’s far from the only option and this paper’s analysis shows a thorough consideration of alternatives, including their effects on existing and future code and future language evolution.

Adopted for C++26: Standard library changes/features

The standard library adopted 19 papers, including the following…

The biggest, and probably this meeting’s award for “proposal being worked on the longest,” is P1673R13, “A free function linear algebra interface based on the BLAS” by Mark Hoemmen, Daisy Hollman, Christian Trott, Daniel Sunderland, Nevin Liber, Alicia Klinvex, Li-Ta Lo, Damien Lebrun-Grandie, Graham Lopez, Peter Caday, Sarah Knepper, Piotr Luszczek, and Timothy Costa, with the help of Bob Steagall, Guy Davidson, Andrew Lumsdaine, and Davis Herring. If you want to do efficient linear algebra, you don’t want to write your own code by hand; that would be slow. Instead, you want a library that is tuned for your target hardware architecture and ready for par_unseq vectorized algorithms, for blazing speed. This is that library. For detailed rationale, see in particular sections 5 “Why include dense linear algebra in the C++ Standard Library?” and 6 “Why base a C++ linear algebra library on the BLAS?”

P2905R2 “Runtime format strings”  and P2918R2 “Runtime format strings II” by Victor Zverovich builds on the C++20 format library, which already supported compile-time format strings. Now with this pair of papers, we will have direct support for format strings not known at compile time and be able to opt out of compile-time format string checks.

P2546R5 “Debugging support” by René Ferdinand Rivera Morell, building on prior work by Isabella Muerte in P1279, adds std::breakpoint(), std::breakpoint_if_debugging(), and std::is_debugger_present(). This standardizes prior art already available in environments from Amazon Web Services to Unreal Engine and more, under a common standard API that gives the programmer full runtime control over breakpoints, including (quoting from the paper):

  • “allowing printing out extra output to help diagnose problems,
  • executing extra test code,
  • displaying an extra user interface to help in debugging, …
  • … breaking when an infrequent non-critical condition is detected,
  • allowing programmatic control with complex runtime sensitive conditions,
  • breaking on user input to inspect context in interactive programs without needing to switch to the debugger application,
  • and more.”

I can immediately think of times I would have used this in the past month, and probably you can too.

Those are some of the “bigger” papers as highlights… there were 16 papers other adopted too, including more extensions and fixes for the C++26 language and standard library.

On track for targeting C++26: Contracts

The contracts subgroup, SG21, decided several long-open questions that needed to be answered to land contracts in C++26. Perhaps not the most important one, but the one that’s the most visible, is the contracts syntax: This week, SG21 approved pursuing P2961R2 “A natural syntax for contracts” by Jens Maurer and Timur Doumler as the syntax for C++26 contracts. The major visible change is that instead of writing contracts like this:

// previous draft syntax
int f(int i)
    [[pre: i >= 0]]
    [[post r: r > 0]]
{
    [[assert: i >= 0]]
    return i+1;
}

we’ll write them like this, changing “assert” to “contract_assert”… pretty much everyone would prefer “assert,” if only it were backward-compatible, but in this new syntax it would hit an incompatibility with the C assert macro:

// newly adopted syntax
int f(int i)
    pre (i >= 0)
    post (r: r > 0)
{
    contract_assert (i >= 0);
    return i+1;
}

I already had a contracts implementation in my cppfront compiler, which used the previous [[ ]] syntax (because, when I have nothing clearly better, I try to follow syntax in existing/proposed C++). So, once P2961 was approved in the subgroup on Tuesday morning, I decided to take Tuesday afternoon to implement the change to this syntax, except that I kept the nice word “assert” because I can do that without a breaking change in my experimental alternate syntax. The work ended up taking not quite an hour, including to update the repo’s own code where I’m using contracts myself in the compiler and its unit tests. You can check out the diff in these | commits. My initial personal reaction, as an early contracts user, is that I like the result.

There are a handful of design questions still to decide, notably the semantics of implicit lambda capture, consteval, and multiple declarations. Six contracts telecons have been scheduled between now and the next meeting in March in Tokyo. The group is aiming to have a feature-complete proposal for Tokyo to forward to other groups for review.

Today when this progress was reported to the full committee, there was applause. As there should be, because this week’s progress increases the confidence that the feature is on track for C++26!

Note that “for C++26” doesn’t mean “that’s still three years away, maybe my kids can use it someday.” It means the feature has to be finished in just the next 18 months or so, and once it’s finished that unleashes implementations to be able to confidently go implement it. It’s quite possible we may see implementations available sooner, as we do with other popular in-demand draft standard features.

Speaking of major features that made great progress this meeting to be confidently on track for C++26…

On track for targeting C++26: Reflection

The reflection subgroup, SG7, saw two experience reports from people actively using the prototype implementation of P2996 by Lock3 Software: P3010R0 “Using reflection to replace a metalanguage for generating JS bindings” by Dan Katz, and P2911R1 “Python bindings with value-based reflection” by Adam Lach and Jagrut Dave. As you can see from the titles, these were serious attempts to try out reflection for major use cases. Both experience reports supported P2996R1, so…

The group then voted unanimously to adopt P2996R1 “Reflection for C++26” by Wyatt Childers, Peter Dimov, Barry Revzin, Andrew Sutton, Faisal Vali, and Daveed Vandevoorde and forward it on to the main Evolution and Library Evolution subgroups targeting C++26. This is a “core” of static reflection that is useful enough to solve many important problems, while letting us also plan to continue building on it further post-C++26.

This is particularly exciting for me personally, because we desperately need reflection in C++, and based on this week’s progress now is the first time I’ve felt confident enough to mention a target ship vehicle for this super important feature.

Perhaps the most common example of reflection is “enum to string”, so here’s that example:

template <typename E>
    requires std::is_enum_v<E>
constexpr std::string enum_to_string(E value) {
    template for (constexpr auto e : std::meta::members_of(^E)) {
        if (value == [:e:]) {
            return std::string(std::meta::name_of(e));
        }
    }
    return "<unnamed>";
}

Note that the above uses some of the new reflection syntax, but this is just the implementation… the new syntax stays encapsulated there. The code that uses enum_to_string gets to not know anything about reflection, and just use the function:

enum Color { red, green, blue };
static_assert(enum_to_string(Color::red) == "red");
static_assert(enum_to_string(Color(42)) == "<unnamed>");

See the paper for much more detail, including more about enum-to-string in section 2.6.

Adding to the excitement, Edison Design Group noted that they expect to have an experimental implementation available on Godbolt Compiler Explorer by Christmas.

P2996 builds on the core of the original Reflection TS, and mainly changes the “top” and “bottom” layers that we knew we would likely change from the TS:

  • At the “top” or programming model layer, P2996 avoids having to do temp<late,meta<pro,gram>>::ming to use the API and lets us write something more like ordinary C++ code instead.
  • And at the “bottom” implementation layer, it uses a value-based implementation which is more efficient to implement.

This doesn’t mean the Reflection TS wasn’t useful; it was instrumental. Progress to this point would have been slower if we hadn’t been able to do the TS first, and we deeply appreciate all the work that went into that, as well as the new progress to move forward with P2996 as the reflection feature targeting C++26.

After the unanimous approval vote to forward this paper for C++26, there was a round of applause in the subgroup.

Then today, when this progress toward targeting C++26 was reported to the whole committee in the closing plenary session, the whole room was filled with sustained applause.

Other progress

Many other subgroups continued to make progress during the week. Here are a few highlights…

SG1 (Concurrency) will be working on out-of-thin-air issues for relaxed atomics at a face-to-face meeting or telecon between meetings. They are still on track to move forward with std::execution and SIMD parallelism for C++26, and SIMD was reviewed in the Library Evolution (LEWG) main subgroup; these features, in the words of the subgroup chair, will make C++26 a huge release for the concurrency and parallelism group.

SG4 (Networking) continued working on updating the networking proposal for std::execution senders and receivers. There is a lot of work still to be done and it is not clear on whether networking will be on track for C++26.

SG9 (Ranges) set a list of features and priorities for ranges for C++26. There are papers that need authors, including ones that would be good “first papers” for new authors, so please reach out to the Ranges chair, Daisy Hollman, if you are interested in contributing toward a Ranges paper.

SG15 (Tooling) considered papers on improving modules to enable better tooling, and work toward the first C++ Ecosystem standard.

SG23 (safety) subgroup made further progress towards safety profiles for C++ as proposed by Bjarne Stroustrup, and adopted it as the near-term direction for safety in C++. The updated paper will be available in the next mailing in mid-December.

Library Evolution (LEWG) started setting a framework for policies for new C++ libraries. The group also made progress on a number of proposals targeting C++26, including std::hive, SIMD (vector unit parallelism), ranges extensions, and std::execution, and possibly some support for physical units, all of which made good progress.

Language Evolution (EWG) worked on improving/forwarding/rejecting many proposals, including a set of discussions about improving undefined behavior in conjunction with the C committee, including eight papers about undefined behavior in the preprocessor. The group also decided to pursue doing a full audit of “ill-formed, no diagnostic required” undefined behavior that compilers currently are not required to detect and diagnose. The plan for our next meeting in Tokyo is to spend a lot of time on reflection, and prepare for focusing on contracts.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next meeting will be in Tokyo, Japan in March hosted by Woven by Toyota.

Wrapping up

Thank you again to the over 170 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies!

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in just four months from now we’ll be meeting again in person + Zoom to continue adding features to C++26. Thank you again to everyone reading this for your interest and support for C++ and its standardization.

My new CppCon talk is on YouTube: “Cooperative C++ Evolution – Toward a TypeScript for C++”

My Thursday CppCon talk is now online.

Note: There’s already a Reddit thread for it, so if you want to comment on the video I suggest you use that thread instead of creating a new one.

At CppCon 2022, I argued for why we should try to make C++ 10x simpler and 50x safer, and this update is an evolution of the update talk I gave at C++ Now in May, with additional news and demos.

The “Dart plan” and the “TypeScript plan”

The back half of this talk clearly distinguishes between what I call the “Dart plan” and the “TypeScript plan” for aiming at a 10x improvement for an incumbent popular language. Both plans have value, but they have different priorities and therefore choose different constraints… most of all, they either embrace up-front the design constraint of perfect C++ interop and ecosystem compatibility, or they forgo it (forever; as I argue in the talk, it can never be achieved retroactively, except by starting over, because it’s a fundamental up-front constraint). No one else has tried the TypeScript plan for C++ yet, and I see value in trying it, and so that’s the plan I’m following for cppfront.

When people ask me “how is cppfront different from all the other projects trying to improve/replace C++?” my answer is “cppfront is on the TypeScript plan.” All the other past and present projects have been on the Dart plan, which again is a fine plan too, it just has different priorities and tradeoffs particularly around

  • full seamless interop compatibility with ISO Standard C++ code and libraries without any wrapping/thunking/marshaling,
  • full ecosystem compatibility with all of today’s C++ compilers, IDEs, build systems, and tooling, and
  • full standards evolution support with ISO C++, including not creating incompatible features (e.g., a different concepts feature than C++20’s, a different modules system than C++20’s) and bringing all major new pieces to today’s ISO C++ evolution as also incremental proposals for today’s C++.

See just the final 10 minutes of the talk to see what I mean — I demo’d syntax 2 “just working” with four different IDEs’ debuggers and visualizers, but I could also have demo’d that profilers just work, build systems just work, and so on.

I call my experimental syntax 2 (aka Cpp2) “still 100% pure C++” not only because cppfront translates it to 100% today’s C++ syntax (aka Cpp1), but because:

  • every syntax-2 type is an ordinary C++ type that ordinary existing C++ code can use, recognized by tools that know C++ types including IDE visualizers;
  • every syntax-2 function is an ordinary C++ function that ordinary existing C++ code can use, recognized by tools that know C++ functions including debuggers to step into them;
  • every syntax-2 object is an ordinary C++ object that ordinary existing C++ code can use;
  • every syntax-2 feature can be (and has been) brought as a normal ISO C++ standards proposal to evolve today’s syntax, because Cpp2 embraces and follows today’s C++ standard and guidance and evolution instead of competing with them;

and that’s because I want a way to keep writing 100% pure C++, just nicer.

“Nicer” means: 10x simpler by having more generality and consistency, better defaults, and less ceremony; and 50x safer by having 98% fewer vulnerabilities in the four areas of initialization safety (guaranteed in Cpp2), type safety (guaranteed in Cpp2), bounds safety (on by default in Cpp2), and lifetime safety (still to be implemented in cppfront is the C++ Core Guidelines Lifetime static analysis which I designed for Cpp2).

Cpp2 and cppfront don’t replace your C++ compilers. Cpp2 and cppfront work with all your existing C++ compilers (and build systems, profilers, debuggers, visualizers, custom in-house tools, test harnesses, and everything else in the established C++ ecosystem, from the big commercial public C++ products to your team’s internal bespoke C++ tools). If you’re already using GCC, Clang, and/or MSVC, keep using them, they all work fine. If you’re already using CMake or build2, or lldb or the Qt Creator debugger, or your favorite profiler or test framework, keep using them, it’s all still C++ that all C++ tools can understand. There’s no new ecosystem.

There are only two plans for 10x major improvement. (1-min clip) This is the fundamental difference with all the other attempts at a major improvement of today’s C++ I know of, which are all on the Dart plan — and those are great projects by really smart people and I hope we all learn from each other. But for my work I want to pursue the TypeScript plan, which I think is the only major evolution plan that can legitimately call itself “still 100% C++.” That’s important to me, because like I said at the very beginning of my talk last year (1-min clip), I want to encourage us to pursue major evolution that brings C++ itself forward and to double down on C++, not switch to something else — to aim for major C++ evolution directed to things that will make us better C++ programmers, not programmers of something else.

I’m spending time on this experiment first of all for myself, because C++ is the language that best lets me express the programs I need to write, so I want to keep writing real C++ types and real C++ functions and real C++ everything else… just nicer.

Thanks again to the over 120 people who have contributed issues and PRs to cppfront, and the many more who have provided thoughtful comments and feedback! I appreciate your help.

cppfront: Autumn update

Since the 2022-12-31 year-end mini-update and the 2023-04-30 spring update, progress has continued on cppfront. (If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube for an overview, and the CppNow 2023 talk on YouTube for an interim update.)

I’ll be giving a major update next week at CppCon. I hope to see many of you there! In the meantime, here are some notes about what’s been happening since the spring update post, including:

  • Acknowledgments and thanks
  • Started self-hosting
  • No data left behind: Mandatory explicit discard
  • requires clauses
  • Generalized aliases+constexpr with ==
  • Safe enum and flag_enum metafunctions
  • Safe union metafunction
  • What’s next

Acknowledgments: Thank you!

Thank you to all these folks who have participated in the cppfront repo by opening issues and PRs, and to many more who participated on PR reviews and comment threads! These contributors represent people from high school and undergrad students to full professors, from commercial developers to conference speakers, and from every continent except Antarctica.

Started self-hosting

I haven’t spent a lot of time yet converting cppfront’s own code from today’s syntax 1 to my alternate syntax 2 (which I’m calling “Cpp1” and “Cpp2” for short), but I started with all of cppfront’s reflection API and metafunctions which are now mostly written in Cpp2. Here’s what that reflect.h2 file compilation looks like when compiled on the command line on my laptop:

But note you can still build cppfront as all-today’s-C++ using any fairly recent C++20 compiler because I distribute the sources also as C++ (just as Bjarne distributed the cfront sources also as C).

No data left behind: Mandatory explicit discard

Initialization and data flow are fundamental to safe code, so from the outset I ensured that syntax 2 guaranteed initialization-before use, I made all converting constructors explicit by default… and I made [[nodiscard]] the default for return values (1-min talk clip).

The more I thought about [[nodiscard]], the more determined I was that data must never be silently lost, and data-lossy operations should be explicit. So I’ve decided to try an aggressive experiment:

  • make “nodiscard” the law of the land, implicitly required all the time, with no opt-out…
  • including when calling existing C++ libraries (including std::) that were never designed for their return values to be treated as [[nodiscard]]!

Now, I wasn’t totally crazy: See the Design note: Explicit discard for details on how I first surveyed other languages’ designers about experience in their languages — notably C#, F#, and Python. In particular, F# does the same thing with .NET APIs — F# requires explicit |> ignore to discard unused return values, including for .NET APIs that were never designed for that and were largely written in other languages. Don Syme told me it has not been a significant pain point, and that was encouraging, so I’m following suit.

My experience so far is that it’s pretty painless, and I write about one explicit discard for every 200 lines of code, even when using the C++ standard library (which cppfront does pervasively, because the C++ standard library is the only library cppfront uses). And, so far, every time cppfront told me I had to write an explicit discard, I learned something useful (e.g., before this I never realized that emplace_back started to return something since C++17! push_back still doesn’t) and I found I liked that my code explicitly self-documented it was not looking at output values… my code looked better.

The way to do an explicit discard is to assign the result to the “don’t care” wildcard. It’s unobtrusive, but explicit and clear:

 _ = vec.emplace_back(1,2,3);

Now all Cpp2-authored C++ functions are emitted as [[nodiscard]], except only for assignment and streaming operators because those are designed for chaining and every chain always ends with a discarded return.

And the whole language hangs together well: Explicit discard works very naturally with inout and out parameters too, not just return values. If you have a local variable x and pass it to an inout parameter, what if that’s the last use of the variable?

{
    x := my_vector.begin();
    std::advance(x, 2);
        // ERROR, if param is Cpp2 'inout' or Cpp1 non-const '&'
}

In this example, that call to std::advance(x, 2); is a definite last use of x, and so Cpp2 will automatically pass x as an rvalue and make it a move candidate… and presto! the call won’t compile because you can’t pass an rvalue to a Cpp2 inout parameter (the same as a Cpp1 non-const-& parameter, so this correctly detects the output side effects also when calling existing C++ functions that take references to non-const). That’s a feature, not a bug, because if that’s the last use of x that means the function is not looking at x again, so it’s ignoring the “out” value of the std::advance(x, 2) function call, which is exactly like ignoring a return value. And the guidance is the same: If you really meant to do that, just explicitly discard x‘s final value:

{
    x := my_vector.begin();
    std::advance(x, 2);
    _ = x; // all right, you said you meant it, carry on then...
}

Adding _ = x; afterward naturally makes that the last use of x instead. Problem solved, and it self-documents that the code really meant to ignore a function’s output value.

I really, really like how my C++ code’s data flow is explicit, and fully protected and safe, in syntax 2. And I’m very pleased to see how it just works naturally throughout the language — from universal guaranteed initialization, to explicit constructors by default, to banning implicitly discarding any values, to uniform treatment of returned values whether returned by return value or the “out” part of inout and out parameters, and all of it working also with existing C++ libraries so they’re safer and nicer to use from syntax 2. Data is always initialized, data is never silently lost, data flow is always visible. Data is precious, and it’s always safe. This feels right and proper to me.

requires clauses

I also added support for requires clauses, so now you can write those on all templates. The cppfront implementation was already generating some requires clauses already (see this 1-min video clip). Now programmers can write their own too.

This required a bit of fighting with a GCC 10 bug about requires-clauses on declarations, that was fixed in GCC 11+ but was never backported. But because this was the only problem I’ve encountered with GCC 10 that I couldn’t paper over, and because I could give a clear diagnostic that a few features in Cpp2 that rely on requires clauses aren’t supported on GCC 10, so far I’ve been able to retain GCC 10 as a supported compiler and emit diagnostics if you try to use those few features it doesn’t support. GCC 11 and higher are all fine and support all Cpp2 semantics.

Generalized aliases+constexpr with ==

In the April blog post, I mentioned I needed a way to write type and namespace aliases, and that because all Cpp2 declarations are of the form thing : type = value, I decided to try using the same syntax but with == to denote “always equal to.”

// namespace alias
lit: namespace == ::std::literals;

// type alias
pmr_vec: <T> type
    == std::vector<T, std::pmr::polymorphic_allocator<T>>;

I think this clearly denotes that lit is always the same as ::std::literals, and pmr_vec<int> is always the same as std::vector<int, std::pmr::polymorphic_allocator<int>>.

Since then, I’ve thought about how this should be best extended to functions and objects, and I realized the requirements seem to overlap with something else I needed to support: constexpr functions and objects. Which, after all, are functions/objects that return/have “always the same values” known at compile time…

// function with "always the same value" (constexpr function)
increment: (value: int) -> int == value+1;
    // Cpp2 lets you omit { return } around 1-line bodies

// object with "always the same value" (constexpr object)
forty_two: i64 == 42;

I particularly needed these in order to write the enum metafunctions…

Safe enum and flag_enum metafunctions

In the spring update blog post, I described the first 10 working compile-time metafunctions I implemented in cppfront, from the set of metafunctions I described in my ISO C++ paper P0707. Since then, I’ve also implemented enum and union.

The most important thing about metafunctions is that they are compile-time library code that uses the reflection and code generation API, that lets the author of an ordinary C++ class type easily opt into a named set of defaults, requirements, and generated contents. This approach is essential to making the language simpler, because it lets us avoid hardwiring special “extra” types into the language and compiler.

In Cpp2, there’s no enum feature hardwired into the language. Instead you write an ordinary class type and just apply the enum metafunction:

// skat_game is declaratively a safe enumeration type: it has
// default/copy/move construction/assignment and <=> with 
// std::strong_ordering, a minimal-size signed underlying type
// by default if the user didn't specify a type, no implicit
// conversion to/from the underlying type, in fact no public
// construction except copy construction so that it can never
// have a value different from its listed enumerators, inline
// constexpr enumerators with values that automatically start
// at 1 and increment by 1 if the user didn't write their own
// value, and conveniences like to_string()... the word "enum"
// carries all that meaning as a convenient and readable
// opt-in, without hardwiring "enum" specially into the language
//
skat_game: @enum<i16> type = {
    diamonds := 9;
    hearts;  // 10
    spades;  // 11
    clubs;   // 12
    grand    := 20;
    null     := 23;
}

Consider hearts: It’s a member object declaration, but it doesn’t have a type (or a default value) which is normally illegal, but it’s okay because the @enum<i16> metafunction fills them in: It iterates over all the data members and gives each one the underlying type (here explicitly specified as i16, otherwise it would be computed as the smallest signed type that’s big enough), and an initializer (by default one higher than the previous enumerator).

Why have this metafunction on an ordinary C++ class, when C++ already has both C’s enum and C++11’s enum class? Because:

  • it keeps the language smaller and simpler, because it doesn’t hardwire special-purpose divergent splinter types into the language and compiler
    • (cue Beatles, and: “all you need is class (wa-wa, wa-wa-wa), all you need is class (wa-wa, wa-wa-wa)”);
  • it’s a better enum than C enum, because C enum is unscoped and not as strongly typed (it implicitly converts to the underlying type); and
  • it’s a better enum class than C++11 enum class, because it’s more flexible…

… consider: Because an enumeration type is now “just a type,” it just naturally can also have member functions and other things that are not possible for Cpp1 enums and enum classes (see this StackOverflow question):

janus: @enum type = {
    past;
    future;

    flip: (inout this) == {
        if this == past { this = future; }
        else { this = past; }
    }
}

There’s also a flag_enum variation with power-of-two semantics and an unsigned underlying type:

// file_attributes is declaratively a safe flag enum type:
// same as enum, but with a minimal-size unsigned underlying
// type by default, and values that automatically start at 1
// and rise by powers of two if the user didn't write their 
// own value, and bitwise operations plus .has(flags), 
// .set(flags), and .clear(flags)... the word "flag_enum"
// carries all that meaning as a convenient and readable
// opt-in without hardwiring "[Flags]" specially into the
// language
//
file_attributes: @flag_enum<u8> type = {
    cached;     // 1
    current;    // 2
    obsolete;   // 4
    cached_and_current := cached | current;
}

Safe union metafunction

And you can declaratively opt into writing a safe discriminated union/variant type:

// name_or_number is declaratively a safe union/variant type: 
// it has a discriminant that enforces only one alternative 
// can be active at a time, members always have a name, and
// each member has .is_member() and .member() accessors...
// the word "union" carries all that meaning as a convenient 
// and readable opt-in without hardwiring "union" specially 
// into the language
//
name_or_number: @union type = {
    name: std::string;
    num : i32;
}

Why have this metafunction on an ordinary C++ class, when C++ already has both C’s union and C++11’s std::variant? Because:

  • it keeps the language smaller and simpler, because it doesn’t hardwire special-purpose divergent splinter types into the language and compiler
    • (cue the Beatles earworm again: “class is all you need, class is all you need…”);
  • it’s a better union than C union, because C union is unsafe; and
  • it’s a better variant than C++11 std::variant, because std::variant is hard to use because its alternatives are anonymous (as is the type itself; there’s no way to distinguish in the type system between a variant<int,string> that stores either an employee id or employee name, and a variant<int,string> that stores either a lucky number or a pet unicorn’s dominant color).

Each @union type has its own type-safe name, has clear and unambiguous named members, and safely encapsulates a discriminator to rule them all. Sure, it uses unsafe casts in the implementation, but they are fully encapsulated, where they can be tested once and be safe in all uses. That makes @union:

  • as easy to use as a C union,
  • as safe to use as a std::variant… and
  • as a bonus, because it’s an ordinary type, it can naturally have other things normal types can have, such as template parameter lists and member functions:
// a templated custom safe union
name_or_other: @union <T:type> type
= {
    name  : std::string;
    other : T;

    // a custom member function
    to_string: (this) -> std::string = {
        if is_name()       { return name(); }
        else if is_other() { return other() as std::string; }
        else               { return "invalid value"; }
    }
}

main: () = {
    x: name_or_other<int> = ();
    x.set_other(42);
    std::cout << x.other() * 3.14 << "\n";
    std::cout << x.to_string(); // prints "42", but is legal whichever alternative is active
}

What’s next

For the rest of the year, I plan to:

  • continue self-hosting cppfront, i.e., migrate more of cppfront’s own code to be written in Cpp2 syntax, particularly now that I have enum and union (cppfront uses enum class and std::variant pervasively);
  • continue working my list of pending Cpp2 features and implementing them in cppfront; and
  • work with a few private alpha testers to start writing a bit of code in Cpp2, to alpha-test cppfront and also to alpha-test my (so far unpublished) draft documentation.

But first, one week from today, I’ll be at CppCon to give a talk about this progress and why full-fidelity compatibility with ISO C++ is essential (and what that means): “Cooperative C++ Evolution: Toward a TypeScript for C++.” I look forward to seeing many of you there!

My C++ Now 2023 talk is online: “A TypeScript for C++”

Thanks again to C++ Now for inviting me to speak this year in glorious Aspen, Colorado, USA! It was nice to see many old friends again there and make a few new ones too.

The talk I gave there was just posted on YouTube, you can find it here:

At CppCon 2022, I argued for why we should try to make C++ 10x simpler and safer, and I presented my own incomplete experimental compiler, cppfront. Since then, cppfront has continued progressing: My spring update post covered the addition of types, a reflection API, and metafunctions, and this talk was given a week after that post and shows off those features with discussion and live demos.

This talk also clearly distinguishes between what I call the “Dart plan” and the “TypeScript plan” for aiming at a 10x improvement for an incumbent popular language. Both plans have value, but they have different priorities and therefore choose different constraints… most of all, they either embrace up-front the design constraint of perfect C++ interop compatibility, or they forgo it (forever; as I argue in the talk, it can never be achieved retroactively, except by starting over, because it’s a fundamental up-front constraint). No one else has tried the TypeScript plan for C++ yet, and I see value in trying it, and so that’s the plan I’m following for cppfront.

When people ask me “how is cppfront different from all the other projects trying to improve/replace C++?” my answer is “cppfront is on the TypeScript plan.” All the other past and present projects have been on the Dart plan, which again is a fine plan too, it just has different priorities and tradeoffs particularly around compatibility.

The video description has a topical guide linking to major points in the talk. Here below is a more detailed version of that topical guide… I hope you enjoy the talk!

1:00 Intro and roadmap for the talk

2:28 1. cppfront recap

2:35 – green-field experiments are great; but cppfront is about refreshing C++ itself

3:28 – “when I say compatibility .. I mean I can call any C++ code that exists today … with no shims, no thunks, no overheads, no indirections, no wrapping”

4:05 – can’t take a breaking change to existing code without breaking the world

5:22 – to me, the most impactful release of C++ was C++11, it most changed the way we wrote our code

6:20 – what if we could do C++11 again, but a coordinated set of features to internally evolve C++

6:52 – cppfront is an experiment in progress, still incomplete

7:41 – thanks to 100+ cppfront contributors!

8:00 – summary slide of features demonstrated at CppCon 2022

– safety for C++; goal of 50x fewer CVEs due to type/bounds/lifetime/init safety

– simplicity for C++; goal of 10x less to know

10:00 – 2. cppfront: what’s new

10:05 – (a) 3 smaller new features showing simplicity+safety+efficiency

10:15 – <=> from this work has already been standardized

11:05 – simplicity, safety and efficiency rarely in tension, with the right abstractions

12:55 – chained comparisons: simple, safe (mathematically), efficient (single eval)

15:08 – named loops/break/continue: simple, safe (structured), efficient

16:51 – main’s arguments: simple (std:: always available), safe (bounds/null check by default), efficient (pay only if you ask for main’s parameters)

18:30 – (b) user-defined types

19:20 – explicit `this`

20:20 – defaults: rarely write access-specifiers

21:30 – (recall from CppCon 2022: composable initialization safety with `out` parameters)

23:50 – unified `operator=`: {construct,assign}x{copy,move} is a single function (by default)

25:48 – visual for unified `operator=`

27:28 – walk through example code generation for unified `operator=`

31:35 – virtual/override/final are qualifiers on `this`

35:05 – DEMO: inheritance (GCC this time)

40:43 – easter egg

41:55 – can interleave bases and members, more control over layout and lifetime

43:10 – (c) reflection and type metafunctions

43:10 – recap overview from CppCon 2017

54:23 – DEMO: applying type metafunctions

56:10 – 3. compatibility for C++

56:35 – John Carmack on compatibility in the real world

59:40 – recall: summary of “Bridge to NewThingia” talk

1:02:05 – avoiding an adoption step function requires high fidelity compatibility

1:04:25 – C++ from C, TypeScript from JavaScript, Swift from Objective-C, Roslyn from prior compiler

1:05:45 – emphasizing and dissecting TypeScript’s compatibility story

1:07:55 – Dart: similar goal, but not designed to be compatible, and you’ll never be able to back into compatibility without starting over

1:08:55 – examples of why incompatibility costs a decade:

1:08:57 – – VC++ 6.0 to 10.0 … 12 years

1:10:28 – – Python 2 to 3 … 12 years (Python is C++’s #1 sister language)

1:18:30 – – C99 to C11 … 12 years

1:18:50 – – C++11 basic_string (approved in 2008) to 2019 support on all major platforms … 11 years

1:19:25 – the “lost decade” pattern: lack of seamless compatibility will cost you a decade in adoption

1:20:45 – three “plans”: the “10% plan”, the “Dart plan”, and the “TypeScript plan”

1:21:00 – “10% plan”: incremental evolution-as-usual

1:21:40 – so how do we get a 10x improvement?

1:21:50 – “Dart plan”: designing something new, not worry about compatible interop, competitive

1:23:20 – “TypeScript plan”: designing for something compatible, cooperative

1:25:40 – what it takes to evolve C++ compatibly, which no other effort has tried before

1:28:50 – filling in the blank: ______ for C++

Trip report: Summer ISO C++ standards meeting (Varna, Bulgaria)

Minutes ago, the ISO C++ committee finished its meeting in-person in Varna, Bulgaria and online via Zoom, where we formally began adopting features into C++26.

Our hosts, VMware and Chaos, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We had nearly 180 attendees, about two-thirds in-person and the others remote via Zoom, formally representing 20 nations. Also, at each meeting we regularly have new attendees who have never attended before, and this time there were 17 new first-time attendees, mostly in-person; to all of them, once again welcome!

The committee currently has 23 active subgroups, most of which met in parallel tracks throughout the week. Some groups ran all week, and others ran for a few days or a part of a day and/or evening, depending on their wortpkloads. You can find a brief summary of ISO procedures here.

This week’s meeting: Starting C++26

ISO C++ is on a three-year development cycle, which includes a “feature freeze” about one year before we ship and publish that edition of the standard. For example, the feature freeze for C++23 was in early 2022.

But this doesn’t mean we only have two years’ worth of development time in the cycle and the third year is bug fixes and red tape. Instead, the subgroups are a three-stage pipeline and continue concurrently working on new feature development all the time, and the feature freezes are just the checkpoints where we pause loading new features into this particular train. So for the past year, as the subgroups finished work on fit-and-finish for the C++23 features, they also increasingly worked on C++26 features.

That showed this week, as we adopted the first 40 proposed change papers for C++26, many of which had been ready for a couple of meetings and were just waiting for the C++26 train to open for loading to be adopted. Of those 40 change papers, two were “apply the resolutions for all Ready issues” papers that applied a bunch of generally-minor changes. The other 38 were individual changes, everything from bug fixes to new features like hazard pointers and RCU.

Here are some of the highlights…

Adopted for C++26: Core language changes/features

The core language adopted 11 papers, including the following. Taking them in paper number order, which is roughly the order in which work started on the paper…

P2169 “A nice placeholder with no name” by Corentin Jabot and Michael Park officially adds support for the _ wildcard in C++26. Thanks to the authors for all their research and evidence for how it could be done in a backward-compatible way! Here are some examples that will now be legal as compilers start to support draft-C++26 syntax:

std::lock_guard _(mutex);

auto [x, y, _] = f();

inspect(foo) { _ => bar; };

Some compiler needs to implement -Wunderbar.

The palindromic P2552 “On the ignorability of standard attributes” by Timur Doumler sets forth the Three Laws of Robotics… er, I mean, the Three Rules of Ignorability for standard attributes. The Three Rules are a language design guideline for all current and future standard attributes going forward… see the paper for the full rules, but my informal summary is:

  • [Already in C++23] Rule 1. Standard attributes must be parseable (i.e., can’t just contain random nonsense).
  • [Already in C++23] Rule 2. Removing a standard attribute can’t change the program’s meaning: It can reduce the program’s possible legal behaviors, but it can’t invent new behaviors.
  • [New] Rule 3. Feature test macros shouldn’t pretend to support an attribute unless the implementation actually implements the attribute’s optional semantics (i.e., doesn’t just parse it but then ignore it).

P2558 “Add @, $, and ` to the basic character set” by Steve Downey is not a paper whose name was redacted for cussing; it’s a language extension paper that follows in C’s footsteps, and allows these characters to be used in valid C++ programs, and possibly in future C++ language evolution.

P2621 “UB? In my lexer?” by Corentin Jabot removes the possibility that just tokenizing C++ code can be a source of undefined behavior in a C++ compiler itself. (Did you know it could be UB? Now it can’t.) Note however that this does not remove all possible UB during compilation; future papers may address more of those compile-time UB sources.

P2738 “constexpr cast from void*: towards constexpr type-erasure” by Corentin Jabot and David Ledger takes another step toward powerful compile-time libraries, including enabling std::format to potentially support constexpr compile-time string formatting. Speaking of which…

P2741 “User-generated static_assert messages” by Corentin Jabot lets compile-time static_assert accept stringlike messages that are not string literals. For example, the popular {fmt} library (but not yet std::format, but see above!) supports constexpr string formatting, and so this code would work in C++26:

static_assert(sizeof(S) == 1, fmt::format("Unexpected sizeof: expected 1, got {}", sizeof(S)));

Together with P2738, an implementation of std::format that uses both of the above features would now be able to used in a static_assert.

Adopted for C++26: Standard library changes/features

The standard library adopted 28 papers, including the following. Starting again with the lowest paper number…

This first one gets the award for “being worked on the longest” (just look at the paper number, and the R revision number): P0792R14, “function_ref: A type-erased callable reference” by Vittorio Romeo, Zhihao Yuan, and Jarrad Waterloo adds function_ref<R(Args...)> as a vocabulary type with reference semantics for passing callable entities to the standard library.

P1383 “More constexpr for <cmath> and <complex>” by Oliver J. Rosten adds constexpr to over 100 more standard library functions. The march toward making increasing swathes of the standard library usable at compile time continues… Jason Turner is out there somewhere saying “Moar Constexpr!” and “constexpr all the things!”

Then, still in paper number order, we get to the “Freestanding group”:

P2510 “Formatting pointers” by Mark de Wever allows nice formatting of pointer values without incanting reinterpret_cast to an integer type first. For example, this will now work: format("{:P}", ptr);

P2530 “Hazard pointers for C++26” by Maged M. Michael, Michael Wong, Paul McKenney, Andrew Hunter, Daisy S. Hollman, JF Bastien, Hans Boehm, David Goldblatt, Frank Birbacher, and Mathias Stearn adds a subset of the Concurrency TS2 hazard pointer feature to add hazard pointer-based deferred cleanup to C++26.

P2545 “Read-Copy-Update (RCU)” by Paul McKenney, Michael Wong, Maged M. Michael, Andrew Hunter, Daisy Hollman, JF Bastien, Hans Boehm, David Goldblatt, Frank Birbacher, Erik Rigtorp, Tomasz Kamiński, Olivier Giroux, David Vernet, and Timur Doumler as another complementary way to do deferred cleanup in C++26.

P2548 “copyable_function” by Michael Hava adds a copyable replacement for std::function, modeled on move_only_function.

P2562 “constexpr stable sorting” by Oliver J. Rosten enables compile-time use of the standard library’s stable sorts (stable_sort, stable_partition, inplace_merge, and the ranges:: versions). Jason Turner is probably saying “Moar!”…

P2641 “Checking if a union alternative is active” by Barry Revzin and Daveed Vandevoorde introduces the consteval bool is_within_lifetime(const T* p) noexcept function, which works in certain compile-time contexts to find out whether p is a pointer to an object that is within its lifetime — such as checking the active member of a union, but during development the feature was made even more generally useful than just that one use case. (This is technically a core language feature, but it’s in one of the “magic std:: features that look like library functions but are actually implemented by the compiler” section of the standard, in this case the metaprogramming clause.)

P2757 “Type-checking format args” by Barry Revzin enables even more compile-time checking for std::format format strings.

Those are just 12 of the adopted papers as highlights… there were 16 more papers adopted that also apply more extensions and fixes for the C++26 standard library.

Other progress

We also adopted the C++26 schedule for our next three-year cycle. It’s the same as the schedule for C++23 but just with three years added everywhere, just as the C++23 schedule was in turn the same as the schedule for C++20 plus three years.

The language evolution subgroup (EWG) saw 30 presentations for papers during the week, mostly proposals targeting C++26, including fine-tuning for some of the above that made it into C++26 at this meeting.

The standard library evolution subgroup (LEWG) focused on advancing “big” papers in the queue that really benefit from face-to-face meetings. Notably, there is now design consensus on P1928 SIMD, P0876 Fibers, and P0843 inplace_vector, and those papers have been forwarded to the library wording specification subgroup (LWG) and may come up for adoption into C++26 at our next meeting in November. Additional progress was made on P0447 hive, P0260 Concurrent Queues, P1030 path_view, and P2781 constexpr_v.

The library wording specification subgroup (LWG) is now caught up with their backlog, and spent a lot of time iterating on the std::execution and sub_mdspan proposals (the latter was adopted this week).

The contracts subgroup made further progress on refining contract semantics targeting C++26, including to get consensus on removing build modes and having a contract violation handling API.

The concurrency and parallelism subgroup are still on track to move forward with std::execution and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next two meetings will be in Kona, HI, USA in November hosted by WorldQuant and the Standard C++ Foundation, and Tokyo, Japan in March hosted by Woven by Toyota.

Wrapping up

Thank you again to the nearly 180 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies!

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than five months from now we’ll be meeting again in person + Zoom to continue adding features to C++26. Thank you again to everyone reading this for your interest and support for C++ and its standardization.

cppfront: Spring update

Since the year-end mini-update, progress has continued on cppfront. (If you don’t know what this personal project is, please see the CppCon 2022 talk on YouTube.)

This update covers Acknowledgments, and highlights of what’s new in the compiler and language since last time, including:

  • simple, mathematically safe, and efficient chained comparisons
  • named break and continue
  • “simple and safe” starts with . . . main
  • user-defined type, including unifying all special member functions as operator=
  • type/namespace/function/object aliases
  • header reflect.h with the start of the reflection API and the first 10 working compile-time metafunctions from P0707
  • unifying functions and blocks, including removing : and = from the for loop syntax

Acknowledgments: 267 issues, 128 pull requests, and new collaborators

I want to say a big “thank you” again to everyone who has participated in the cppfront repo. Since the last update, I’ve merged PRs from Jo Bates, Gabriel Gerlero, jarzec, Greg Marr, Pierre Renaux, Filip Sajdak and Nick Treleaven. Thanks also to many great issues opened by (as alphabetically as I can): Abhinav00, Robert Adam, Adam, Aaron Albers, Alex, Graham Asher, Peter Barnett, Sean Baxter, Jan Bielak, Simon Buchan, Michael Clausen, ct-clmsn, Joshua Dahl, Denis, Matthew Deweese, dmicsa, dobkeratops, Deven Dranaga, Konstantin F, Igor Ferreira, Stefano Fiorentino, fknauf, Robert Fry, Artie Fuffkin, Gabriel Gerlero, Matt Godbolt, William Gooch, ILoveGoulash, Víctor M. González, Terence J. Grant, GrigorenkoPV, HALL9kv0, Morten Hattesen, Neil Henderson, Michael Hermier, h-vetinari, Stefan Isak, Tim Keitt, Vanya Khodor, Hugo Lindström, Ferenc Nandor Janky, jarzec, jgarvin, Dominik Kaszewski, kelbon, Marek Knápek, Emilia Kond, Vladimir Kraus, Ahmed Al Lakani, Junyoung Lee, Greg Marr, megajocke, Thomas Neumann, Niel, Jim Northrup, Daniel Oberhoff, Jussi Pakkanen, PaTiToMaSteR, Johel Ernesto Guerrero Peña, Bastien Penavayre, Daniel Pfeifer, Piotr, Davide Pomi, Andrei Rabusov, rconde01, realgdman, Alex Reinking, Pierre Renaux, Alexey Rochev, RPeschke, Sadeq, Filip Sajdak, satu, Wolf Seifert, Tor Shepherd, Luke Shore, Zenja Solaja, Francis Grizzly Smit, Sören Sprößig, Benjamin Summerton, Hypatia of Sva, SwitchBlade, Ramy Tarchichy, tkielan, Marzo Sette Torres Junior, Nick Treleaven, Jan Tusil, userxfce, Ezekiel Warren, Kayla Washburn, Tyler Weaver, Will Wray, and thanks also to many others who participated on PR reviews and comment threads.

These contributors represent people from high school and undergrad students to full professors, from commercial developers to conference speakers, and from every continent except Antarctica. Thank you!

Next, here are some highlights of things added to the cppfront compiler in the four months since the previous update linked at top.

Simple, mathematically safe, and efficient chained comparisons (commit)

P0515 “Consistent comparison” (aka “spaceship”) was the first feature derived from this Cpp2 work to be adopted into the ISO Standard for C++, in C++20. That means cppfront didn’t have to do much to implement operator<=> and its generative semantics, because C++20 compilers already do so, which is great. Thank you again to everyone who helped land this Cpp2 feature in the ISO C++ Standard.

However, one part of P0515 isn’t yet merged into ISO C++: chained comparisons from P0515 section 3.3, such as min <= index < max. See also Barry Revzin‘s great followup ISO C++ proposal paper P0893 “Chaining comparisons.” The cppfront compiler now implements this as described in those ISO proposal papers, and:

  • Supports all mathematically meaningful and safe chains like min <= index < max, with efficient single evaluation of index. (In today’s C++, this kind of comparison silently compiles but is a bug. See P0893 for examples from real-world use.)
  • Rejects nonsense chains like a >= b < c and d != e != f at compile time. (In today’s C++, and in other languages like Python, they silently compile but are necessarily a bug because they are conceptually meaningless.)

I think this is a great example to demonstrate that “simple,” “safe,” and “fast” are often not in tension, and how it’s often possible to get all three at the same time without compromises.

Named break and continue (commit)

This feature further expands the Cpp2 “name :” way of introducing all names, to also support introducing loop names. Examples like the following now work… see test file pure2-break-continue.cpp2 for more examples.

outer: while i<M next i++ {      // loop named "outer"
    // ...
    inner: while j<N next j++ {  // loop named "inner"
        // ...
        if something() {
            continue inner;      // continue the inner loop
        }
        // ...
        if something_else() {
            break outer;         // break the outer loop
        }
        // ...
    }
    // ...
}

“Simple and safe” starts with . . . main

main can now be defined to return nothing, and/or as main: (args) to have a single argument of type std::vector<std::string_view>.

For example, here is a complete compilable and runnable program (in -pure-cpp2 mode, no #include is needed to use the C++ standard library)…

main: (args) =
    std::cout << "This program's name is (args[0])$";

Yes, this really is 100% C++ under the covers as you can see on Godbolt Compiler Explorer… “just nicer”:

  • The entire C++ standard library is available directly with zero thunking, zero wrapping, and zero need to #include or import because in pure Cpp2 the entire ISO C++ standard library is just always automatically there. (Yes, if you don’t like cout, you can use the hot-off-the-press C++23 std::print too the moment that your C++ implementation supports it.)
  • Convenient defaults, such as no need to write -> int, and no need to write braces around a single-statement function body.
  • Convenient semantics and services, such as $ string interpolation. Again, all fully compatible with today’s C++ (e.g., string interpolation uses std::to_string where available).
  • Type and memory safety by default even in this example: Not only is args defaulting to the existing best practices of C++ standard safety with ISO C++ vector and string_view, but the args[0] call is automatically bounds-checked by default too.

type: User-defined types

User-defined types are written using the same name : kind = value syntax as everything in Cpp2:

mytype: type =
{
    // data members are private by default
    x: std::string;

    // functions are public by default
    protected f: (this) = { do_something_with(x); }

    // ...
}

Here are some highlights…

First, types are order-independent. Cpp2 still has no forward declarations, and you can just write types that refer to each other. For example, see the test case pure2-types-order-independence-and-nesting.cpp2.

The this parameter is explicit, and has special sauce:

  • this is a synonym for the current object (not a pointer).
  • this defaults to the current type.
  • this‘s parameter passing style declares what kind of function you’re writing. For example, (in this) (or just (this) since “in” is the default as usual) clearly means a “const” member function because “in” parameters always imply constness; (inout this) means a non-const member function; (move this) expresses and emits a Cpp1 &&-qualified member function; and so on.

For example, here is how to write const member function named print that takes a const string value and prints this object’s data value and the string message (yes, everything in Cpp2 is const by default except for local-scope variables):

mytype: type =
{
    data: i32;   // some data member (private by default)

    print: (this, msg: std::string) = {
        std::cout << data << msg;
                 // "data" is shorthand for "this.data"
    }

    // ...
}

All Cpp1 special member functions (including construction, assignment, destruction) and conversions are unified as operator=, default to memberwise semantics and safe “explicit” by default, and there’s a special that parameter that makes writing copy/move in particular simpler and safer. On the cppfront wiki, see the Design Note “operator=, this & that” for details. Briefly summarizing here:

  • The only special function every type must have is the destructor. If you don’t write it by hand, a public nonvirtual destructor is generated by default.
  • If no operator= functions are written by hand, a public default constructor is generated by default.
  • All other operator= functions are explicitly written, either by hand or by opting into applying a metafunction (see below).

Note: Because generated functions are always opt-in, you can never get a generated function that’s wrong for your type, and so Cpp2 doesn’t need to support “=delete” for the purpose of suppressing unwanted generated functions.

  • The most general form of operator= is operator=: (out this, that) which works as a unified general {copy, move} x { constructor, assignment } operator, and generates all of four of those in the lowered Cpp1 code if you didn’t write a more specific one yourself (see Design Note linked above for details).
  • All copy/move/comparison operator= functions are memberwise by default in Cpp2 ( [corrected:] including memberwise construction and assignment when you write them yourself, in which case they aren’t memberwise by default in today’s Cpp1).
  • All conversion operator= functions are safely “explicit” by default. To opt into an implicit conversion, write the implicit qualifier on the this parameter.
  • All functions can have a that parameter which is just like this (knows it’s the current type, can be passed in all the usual ways, etc.) but refers to some other object of this type rather than the current object. It has some special sauce for simplicity and safety, including that the language ensures that the members of a that object are safely moved from only once.

Virtual functions and base classes are all about “this”:

  • Virtual functions are written by specifying exactly one of virtual, override, or final on the this parameter.
  • Base classes are written as members named this. For example, just as a class could write a data member as data: string = "xyzzy";, which in Cpp2 is pronounced “data is a string with default value ‘xyzzy'”, a base class is written as this: Shape = (default, values);, which is naturally pronounced as “this IS-A Shape with these default values.” There is no separate base class list or separate member initializer list.
  • Because base and member subobjects are all declared in the same place (the type body) and initialized in the same place (an operator= function body), they can be written in any order, including interleaved, and are still guaranteed to be safely initialized in declared order. This means that in Cpp2 you can declare a data member object before a base class object, so that it naturally outlives the base class object, and so you don’t need workarounds like Boost’s base_from_member because all of the motivating examples for that can be written directly in Cpp2. See my comments on cppfront issue #334 for details.

Alias support: Type, namespace, function, and object aliases (commit)

Cpp2 already defines every new entity using the syntax “name : kind = value“.

So how should it declare aliases, which declare not a new entity but a synonym for an existing entity? I considered several alternatives, and decided to try out the identical declaration syntax except changing = (which connotes value setting) to == (which connotes sameness):

// Namespace alias
lit: namespace == ::std::literals;

// Type alias
pmr_vec: <T> type
    == std::vector<T, std::pmr::polymorphic_allocator<T>>;

// Function alias
func :== some_original::inconvenient::function_name;

// Object alias
vec :== my_vector;  // note: const&, aliases are never mutable

Note again the const default. For now, Cpp2 supports only read-only aliases, not read-write aliases.

Header reflect.h: Initial support for reflection API, and implementing the first 10 working metafunctions from P0707interface, polymorphic_base, ordered, weakly_ordered, partially_ordered, basic_value, value, weakly_ordered_value, partially_ordered_value, struct

Disclaimer: I have not yet implemented a reflection operator that Cpp2 code can invoke, or written a Cpp2 interpreter to run inside the compiler. But I am doing everything else needed for type metafunctions: cppfront has started a usable reflection metatype API, and has started getting working metafunctions that are compile-time code that uses that metatype API… the only thing missing is that those functions aren’t run through an interpreter (yet).

For example, cppfront now supports code like the following… importantly, “value” and “interface” are not built-in types hardwired into the language as they are in Java and C# and other languages, but rather each is a function that uses the reflection API to apply requirements and defaults to the type (C++ class) being written:

// Point2D is declaratively a value type: it is guaranteed to have
// default/copy/move construction and <=> std::strong_ordering
// comparison (each generated with memberwise semantics
// if the user didn't write their own, because "@value" explicitly
// opts in to ask for these functions), a public destructor, and
// no protected or virtual functions... the word "value" carries
// all that meaning as a convenient and readable opt-in, but
// without hardwiring "value" specially into the language
//
Point2D: @value type = {
    x: i32 = 0;  // data members (private by default)
    y: i32 = 0;  // with default values
    // ...
}

// Shape is declaratively an abstract base class having only public
// and pure virtual functions (with "public" and "virtual" applied
// by default if the user didn't write an access specifier on a
// function, because "@interface" explicitly opts in to ask for
// these defaults), and a public pure virtual destructor (generated
// by default if not user-written)... the word "interface" carries
// all that meaning as a convenient and readable opt-in, but
// without hardwiring "interface" specially into the language
//
Shape: @interface type = {
    draw: (this);
    move: (inout this, offset: Point2D);
}

At compile time, cppfront parses the type’s body and then invokes the compile-time metafunction (here value or interface), which enforces requirements and applies defaults and generates functions, such as… well, I can just paste the actual code for interface from reflect.h, it’s pretty readable:

Note: For now I wrote the code in today’s Cpp1 syntax, which works fine as Cpp2 is just a fully compatible alternate syntax for the same true C++… later this year I aim to start self-hosting and writing more of cppfront itself in Cpp2 syntax, including functions like these.

//-----------------------------------------------------------------------
//  Some common metafunction helpers (metafunctions are just
//  functions, so they can be factored as usual)
//
auto add_virtual_destructor(meta::type_declaration& t)
    -> void
{
    t.require( t.add_member( "operator=: (virtual move this) = { }"),
               "could not add virtual destructor");
}

//-----------------------------------------------------------------------
// 
//      "... an abstract base class defines an interface ..."
// 
//          -- Stroustrup (The Design and Evolution of C++, 12.3.1)
//
//-----------------------------------------------------------------------
// 
//  interface
//
//  an abstract base class having only pure virtual functions
//
auto interface(meta::type_declaration& t)
    -> void
{
    auto has_dtor = false;

    for (auto m : t.get_members())
    {
        m.require( !m.is_object(),
                   "interfaces may not contain data objects");
        if (m.is_function()) {
            auto mf = m.as_function();
            mf.require( !mf.is_copy_or_move(),
                        "interfaces may not copy or move; consider a virtual clone() instead");
            mf.require( !mf.has_initializer(),
                        "interface functions must not have a function body; remove the '=' initializer");
            mf.require( mf.make_public(),
                        "interface functions must be public");
            mf.make_virtual();
            has_dtor |= mf.is_destructor();
        }
    }

    if (!has_dtor) {
        add_virtual_destructor(t);
    }
}

Note a few things that are demonstrated here:

  • .require (a convenience to combine a boolean test with the call to .error if the test fails) shows how to implement enforcing custom requirements. For example, an interface should not contain data members. If any requirement fails, the error output is presented as part of the regular compiler output — metafunctions extend the compiler, in a disciplined way.
  • .make_virtual shows how to implement applying a default. For example, interface functions are virtual by default even if the user didn’t write (virtual this) explicitly.
  • .add_member shows how to generate new members from legal source code strings. In this example, if the user didn’t write a destructor, we write a virtual destructor for them by passing the ordinary code to the .add_member function, which reinvokes the lexer to tokenize the code, the parser to generate a declaration_node parse tree from the code, and then if that succeeds adds the new declaration to this type.
  • The whole metafunction is invoked by the compiler right after initial parsing is complete (right after we parse the statement-node that is the initializer) and before the type is considered defined. Once the metafunction returns, if it had no errors then the type definition is complete and henceforth immutable as usual. This is how the metafunction gets to participate in deciding the meaning of the code the user wrote, but does not create any ODR confusion — there is only one immutable definition of the type, a type cannot be changed after it is defined, and the metafunction just gets to participate in defining the type just before the definition is cast in stone, that’s all.
  • The metafunction is ordinary compile-time code. It just gets invoked by the compiler at compile time in disciplined and bounded ways, and with access to bounded things.

Today in cppfront, metafunctions like value and interface are legitimately doing everything envisioned for them in P0707 except for being run through an interpreter — the metafunctions are using the meta:: API and exercising it so I can learn how that API should expand and become richer, cppfront is spinning up a new lexer and parser when a metafunction asks to do code generation to add a member, and then cppfront is stitching the generated result into the parse tree as if it had been written by the user explicitly… this implementation is doing everything I envisioned for it in P0707 except for being run through an interpreter.

As of this writing, here are the currently implemented metafunctions in reflect.h are as described in P0707 section 3, sometimes with a minor name change… and including links to the function source code…

interface: An abstract class having only pure virtual functions.

  • Requires (else diagnoses a compile-time error) that the user did not write a data member, a copy or move operation, or a function with a body.
  • Defaults functions to be virtual, if the user didn’t write that explicitly.
  • Generates a pure virtual destructor, if the user didn’t write that explicitly.

polymorphic_base (in P0707, originally named base_class): A pure polymorphic base type that has no instance data, is not copyable, and whose destructor is either public and virtual or protected and nonvirtual.

  • Requires (else diagnoses a compile-time error) that the user did not write a data member, a copy or move operation, and that the destructor is either public+virtual or protected+nonvirtual.
  • Defaults members to be public, if the user didn’t write that explicitly.
  • Generates a public pure virtual destructor, if the user didn’t write that explicitly.

ordered: A totally ordered type with operator<=> that implements std::strong_ordering.

  • Requires (else diagnoses a compile-time error) that the user did not write an operator<=> that returns something other than strong_ordering.
  • Generates that operator<=> if the user didn’t write one explicitly by hand.

Similarly, weakly_ordered and partially_ordered do the same for std::weak_ordering and std::partial_ordering respectively. I chose to call the strongly-ordered one “ordered,” not “strong_ordered,” because I think the one that should be encouraged as the default should get the nice name.

basic_value: A type that is copyable and has value semantics. It must have all-public default construction, copy/move construction/assignment, and destruction, all of which are generated by default if not user-written; and it must not have any protected or virtual functions (including the destructor).

  • Requires (else diagnoses a compile-time error) that the user did not write some but not all of the copy/move/ construction/assignment and destruction functions, a non-public destructor, or any protected or virtual function.
  • Generates a default constructor and memberwise copy/move construction and assignment functions, if the user didn’t write them explicitly.

value: A basic_value that is totally ordered.

Note: Many of you would call this a “regular” type… but I recognize that there’s a difference of opinion about whether “regular” includes ordering. That’s one reason I’ve avoided the word “regular” here, and this way we can all separately talk about a basic_value (which may not include ordering) or a value (which does include strong total ordering; see next paragraph for weaker orderings) and we can know we’re all talking about the same thing.

Similarly, weakly_ordered_value and partially_ordered_value do the same for weakly_ordered and partially_ordered respectively. I again chose to call the strongly-ordered one “value,” not “strongly_ordered_value,” because I think the one that should be encouraged as the default should get the nice name.

struct (in P0707, originally named plain_struct because struct is a reserved word in Cpp1… but struct isn’t a reserved word in Cpp2): A basic_value where all members are public, there are no virtual functions, and there are no user-written (non-default operator=) constructors, assignment operators, or destructors.

  • Requires (else diagnoses a compile-time error) that the user did not write a virtual function or a user-written operator=.
  • Defaults members to be public, if the user didn’t write that explicitly.

Local statement/block parameters (commit)

I had long intended to support the following unification of functions and blocks, where cppfront already provided all of these except only the third case:

f:(x: int = init) = { ... }     // x is a parameter to the function
f:(x: int = init) = statement;  // same, { } is implicit

 :(x: int = init) = { ... }     // x is a parameter to the lambda
 :(x: int = init) = statement;  // same, { } is implicit

  (x: int = init)   { ... }     // x is a parameter to the block
  (x: int = init)   statement;  // same, { } is implicit

                    { ... }     // x is a parameter to the block
                    statement;  // same, { } is implicit

(Recall that in Cpp2 : always and only means “declaring a new thing,” and therefore also always has an = immediately or eventually to set the value of that new thing.)

The idea is to treat functions and blocks/statements uniformly, as syntactic and semantic subsets of each other:

  • A named function has all the parts: A name, a : (and therefore =) because we’re declaring a new entity and setting its value, a parameter list, and a block (possibly an implicit block in the convenience syntax for single-statement bodies).
  • An unnamed function drops only the name: It’s still a declared new entity so it still has : (and =), still has a parameter list, still has a block.
  • (not implemented until now) A parameterized block drops only the name and : (and therefore =). A parameterized block is not a separate entity (there’s no : or =), it’s part of its enclosing entity, and therefore it doesn’t need to capture.
  • Finally, if you drop also the parameter list, you have an ordinary block.

In this model, the third (just now implemented) option above allows a block parameter list, which does the same work as “let” variables in other languages, but without a “let” keyword. This would subsume all the Cpp1 loop/branch scope variables (and more generally than in Cpp1 today, because you could declare multiple parameters easily which you can’t currently do with the Cpp1 loop/branch scope variables).

So this now works, pasting from test case pure2-statement-scope-parameters.cpp2:

main: (args) = 
{
    local_int := 42;

    //  'in' statement scope variable
    // declares read-only access to local_int via i
    (i := local_int) for args do (arg) {
        std::cout << i << "\n";       // prints 42
    }

    //  'inout' statement scope variable
    // declares read-write access to local_int via i
    (inout i := local_int) {
        i++;
    }
    std::cout << local_int << "\n";   // prints 43
}

Note that block parameters enable us to use the same declarative data-flow for local statements and blocks as for functions: Above, we declare a block (a statement, in this case a single loop, is implicitly treated as a block) that is read-only with respect to the local variable, and declare another to be read-write with respect to that variable. Being able to declare data flow is important for writing correct and safe code.

Corollary: Removed : and = from for

Eagle-eyed readers of the above example will notice a change: As a result of unifying functions and blocks, I realized that the for loop syntax should use the third syntax, not the first or second, because the loop body is a parameterized block, not a local function. So changed the for syntax from this

// previous syntax
for items do: (item) = {
    x := local + item;
    // ...
}

to this, which is the same except that it removes : and =

// current syntax
for items do (item) {
    x := local + item;
    // ...
}

Note that what follows for ... do is exactly a local block, just the parameter item doesn’t write an initializer because it is implicitly initialized by the for loop with each successive value in the range.

By the way, this is the first breaking change from code that I’ve shown publicly, so cppfront also includes a diagnostic for the old syntax to steer you to the new syntax. Compatibility!

Other features

Also implemented since last time:

  • As always, lots of bug fixes and diagnostic improvements.
  • Use _ as wildcard everywhere, and give a helpful diagnostic if the programmer tries to use “auto.”
  • Namespaces. Every namespace must have a name, and the anonymous namespace is supported by naming it _ (the “don’t care” wildcard). For now these are a separate language feature, but I’m still interested in exploring making them just another metafunction.
  • Explicit template parameter lists. A type parameter, spelled “: type”, is the default. For examples, see test case pure2-template-parameter-lists.cpp2
  • Add requires-clause support.
  • Make : _ (deduced type) the default for function parameters. In response to a lot of sustained user demand in issues and comments — thanks! For example, add: (x, y) -> _ = x+y; is a valid Cpp2 generic function that means the same as (and compiles to) [[nodiscard]] auto add(auto const& x, auto const& y) -> auto { return x+y; } in Cpp1 syntax.
  • Add alien_memory<T> as a better spelling for T volatile. The main problem with volatile isn’t the semantics — those are deliberately underspecified, and appropriate for talking about “memory that’s outside the C++ program that the compiler can’t assume it knows anything about” which is an important low-level concept. The problems with volatile are that (a) it’s wired throughout the language as a type qualifier which is undesirable and unnecessary, and (b) the current name is confusing and has baggage and so it should be named something that connotes what it’s actually for (and I like “alien” rather than “foreign” because I think “alien” has a better and stronger connotation).
  • Reject more implicit narrowing, notably floating point narrowing.
  • Reject shadowing of type scope names. For example, in a type that has a member named data, a member function can’t write a local variable named data.
  • Add support for forward return and generic out parameters.
  • Add support for raw string literals with interpolation.
  • Add compiler switches for compatibility with popular no-exceptions/no-RTTI modes (-fno-exceptions and -fno-rtti, as usual), specifying the output file (-o, with the option of -o stdout), and source line/column format for error output (MSVC style or GCC style)
  • Add single-word aliases (e.g., ulonglong) to replace today’s multi-keyword platform-width C types, with diagnostics support to aid migration. This is in addition to known-width Cpp2 types (e.g., i32) that are already there and should often be preferred.
  • Allow unnamed objects (not just unnamed functions, aka lambdas) at expression scope.
  • Reclaim many Cpp1 keywords for ordinary use. For example, a type or variable can be named “and” or “struct” in Cpp2, and it’s fully compatible (it’s prefixed with “cpp2_” when lowered to Cpp1, so Cpp1 code still has a way to refer to it, but Cpp2 gets to use the nice names). This isn’t just sugar… without this, I couldn’t write the “struct” metafunction and give it the expected nice name.
  • Support final on a type.
  • Add support for .h2 header files.

What’s next

Well, that’s all so far.

For cppfront, over the summer and fall I plan to:

  • implement more metafunctions from my paper P0707, probably starting with enum and union (a safe union) — not only because they’re next in the paper, but also because I use those features in cppfront today and so I’ll need them working in Cpp2 when it comes time to…
  • … start self-hosting cppfront, i.e., start migrating parts of cppfront itself to be written in Cpp2 syntax;
  • continue working my list of pending Cpp2 features and implementing them in cppfront; and
  • start finding a few private alpha testers to work with, to start writing a bit of code in Cpp2 to alpha-test cppfront and also to alpha-test my (so far unpublished) draft documentation.

For conferences:

  • One week from today, I’ll be at C++Now to give a talk about this progress and why full-fidelity compatibility with ISO C++ is essential (and what it means). C++Now is a limited-attendance conference, and it’s nearly sold out but the organizers say there are a few seats left… you can still register for C++Now until Friday.
  • In early October I hope to present a major update at CppCon 2023, where registration just opened (yes, you can register now! run, don’t walk!). I hope to see many more of you there at the biggest C++ event, and that only happens once a year — like every year, I’ll be there all week long to not miss a minute.

Interview on CppCast

A few days ago I recorded CppCast episode 357. Thanks to Timur Doumler and Phil Nash for inviting me on their show – and for continuing CppCast, which was so wonderfully founded by Rob Irving and Jason Turner!

This time, we chatted about news in the C++ world, and then about my Cpp2 and cppfront experimental work.

The podcast doesn’t seem to have chapters, but here are a few of my own notes about sections of interest:

  • 00:00 Intro
  • 04:30 News: LLVM 16.0.0, “C++ Initialisation” book, new user groups
  • 15:45 Start of interview
  • 16:08 Why I don’t view Cpp2 as a “successor language”
  • 16:25 A transpiler is a compiler (see also: cfront, PyPy, TypeScript, …)
  • 17:20 Origins of the Cpp2 project, 2015/16
  • 19:00 100% compatibility as a primary goal and design constraint
  • 22:00 Avoid divergence, continue in same path C++ is already going
  • 22:50 What compatibility means: 100% link compat always on, 100% source compat always available but pay only if you need it
  • 24:14 Making the syntax unique in a simple way, start with “name :”
  • 28:10 Avoid divergence and still make a major simplification, by letting programmers directly declare their intent
  • 30:30 Bringing the pieces to ISO and the community for feedback
  • 31:55 What about “epochs”/“editions”? tl;dr: It’s exactly the right question, but I think the right answer is “epoch” (singular)
  • 35:42 C++ is popular and will endure no matter what we do; question is can we make it nicer
  • 37:05 My personal experiment, and others are now helping
  • 38:20 What “safeties” I’m targeting, and what degree of safety, and why formally provable guarantees are nice but neither necessary nor sufficient (I expect this view to be controversial)
  • 44:00 The issue is making things 50x (98%) safer, vs. 100% safer, because what does requiring that last 2% actually cost the design in incompatibility / difficulty of use
  • 47:05 The zero-overhead principle is non-negotiable, and so is always being able to “open the hood” to take control, otherwise it’s not C++ anymore
  • 48:20 Examples: dynamic bounds/null checking is opt-out, now the default but still pay for it only if you use it
  • 50:20 Will cppfront support all major compilers and platforms? It already does, any reasonably conforming C++20 compiler (any gcc/clang/msvc since about 2020), and that will continue
  • 52:15 Keeping the generated source code very close to the original is a priority
  • 53:25 “TypeScript for C++” plan vs. “Dart for C++” plan
  • 55:20 TypeScript did what Bjarne’s cfront did: Transpiler that let you always keep your generated JavaScript/C code, so you could drop using the new language anytime if you want with #NoRegrets, risk-free
  • 57:20 Shout out to Anders Hejlsberg, IIRC the only human to produce a million-user programming language more than once, and his approach to TypeScript vs. C#
  • 59:20 Why generating C++ code isn’t in tension with the goal of compatibility (it’s actually synergistic), and the targeted subset is C++20 (with a workaround only when modules are not yet available on a given compiler)
  • 1:00:40 Why C++20 is super important (if constexpr, requires-expressions)
  • 1:01:40 Why any C++ evolution/successor language attempt that for now only tries to be compatible with C++17 faces a big hill/disadvantage
  • 1:02:40 What’s next for Cpp2 and cppfront
  • 1:05:35 Where can people learn more: cppfront repo, CppCon 2022 talk, C++Now talk coming up in a month, then CppCon 2023 in October
  • 1:07:28 C++ world is alive and well and thriving, now embracing challenges like safety to keep C++ working well for all of us

In at least one place I said “cppfront” where I meant “cfront”… I think the intent should be clear from context. 😉

Thanks again to everyone who has helped me personally with cppfront through issues and PRs, and to all the good folks who helped the entire C++ world by working hard and creatively through the pandemic and shipping another solid release of C++ in C++23.

C++23 “Pandemic Edition” is complete (Trip report: Winter ISO C++ standards meeting, Issaquah, WA, USA)

On Saturday, the ISO C++ committee completed technical work on C++23 in Issaquah, WA, USA! We resolved the remaining international comments on the C++23 draft, and are now producing the final document to be sent out for its international approval ballot (Draft International Standard, or DIS) and final editorial work, to be published later in 2023.

Our hosts, the Standard C++ Foundation, WorldQuant, and Edison Design Group, arranged for high-quality facilities for our six-day meeting from Monday through Saturday. We had about 160 attendees, more than half in-person and the others remote via Zoom. We had 19 nations formally represented, 9 in-person and 10 via Zoom. Also, at each meeting we regularly have new attendees who have never attended before, and this time there were 25 new first-time attendees in-person or on Zoom; to all of them, once again welcome!

The C++ committee currently has 26 active subgroups, 13 of which met in parallel tracks throughout the week. Some groups ran all week, and others ran for a few days or a part of a day and/or evening, depending on their workloads. You can find a brief summary of ISO procedures here.

From Prague, through the pandemic, to an on-time C++23 “Pandemic Edition”

The previous standard, C++20, was completed in Prague in February 2020, a month before the pandemic lockdowns began. At that same meeting, we adopted and published our C++23 schedule… without realizing that the world was about to turn upside down in just a few weeks. Incredibly, thanks to the effort and resilience of scores of subgroup chairs and hundreds of committee members, we still did it: Despite a global pandemic, C++23 has shipped exactly on time and at high quality.

The first pandemic-cancelled in-person meeting would have been the first meeting of the three-year C++23 cycle. This meant that nearly all of the C++23 release cycle, and the entire “development” phase of the cycle, was done virtually via Zoom with many hundreds of telecons from 2020 through 2022. Last week’s meeting was only our second in-person meeting since February 2020, and our second-ever hybrid meeting with remote Zoom participation. Both had a  high-quality hybrid Zoom experience for remote attendees around the world, and I want to repeat my thanks from November to the many volunteers who worked hard and carried hardware to Kona and Issaquah to make this possible. I want to again especially thank Jens Maurer and Dietmar Kühl for leading that group, and everyone who helped plan, equip, and run the meetings. Thank you very much to all those volunteers and helpers!

The current plan is that we’ve now returned to our normal cadence of having full-week meetings three times a year, as we did before the pandemic, but now those will be not only in-person but also have remote participation via Zoom. Most subgroups will additionally still continue to meet regularly via Zoom.

This week’s meeting

Per our published C++23 schedule, this was our final meeting to finish technical work on C++23. No features were added or removed, we just handled fit-and-finish issues and primarily focused on finishing addressing the 137 national body comments we received in the summer’s international comment ballot (Committee Draft, or CD). You can find a list of C++23 features here, many of them already implemented in major compilers and libraries. C++23’s main theme was “completing C++20,” and some of the highlights include module “std”, “if consteval,” explicit “this” parameters, still more constexpr, still more CTAD, “[[assume]]”, simplifying implicit move, multidimensional and static “operator[]”, a bunch of Unicode improvements, and Nicolai Josuttis’ personal favorite: fixing the lifetime of temporaries in range-for loops (some would add, “finally!”… thanks again for the persistence, Nico).

In addition to C++23 work, we also had time to make progress on a number of post-C++23 proposals, including continued work on contracts, SIMD execution, and more. We also decided to send the second Concurrency TS for international comment ballot, which includes hazard pointers, read-copy-update (RCU) data structures… and as of this week we also added Anthony Williams’ P0290 “synchronized_value” type.

The contracts subgroup made further progress on refining contract semantics targeting C++26.

The concurrency and parallelism subgroup is still on track to move forward with “std::execution” and SIMD parallelism for C++26, which in the words of the subgroup chair will make C++26 a huge release for the concurrency and parallelism group.

Again, when you see “C++26” above, that doesn’t mean “three long years away”… we just closed the C++23 branch, and the C++26 branch is opening immediately and we will start approving features for C++26 at our next meeting in June, less than four months from now. Implementers interested in specific features often don’t wait for the final standard to start shipping implementations; note that C++23, which was just finished, already has many features shipping today in major implementations.

The newly-created SG23 Safety and Security subgroup met on Thursday for a well-attended session on hitting the ground running for making a targeted improvement in safety and security in C++, including that it approved the first two safety papers to progress to review next meeting by the full language evolution group.

Thank you to all the experts who worked all week in all the subgroups to achieve so much this week!

What’s next

Our next two meetings will be in Varna, Bulgaria in June and in Kona, HI, USA in November. At those two meetings we will start work on adding features into the new C++26 working draft.

Wrapping up

Thank you again to the approximately 160 experts who attended on-site and on-line at this week’s meeting, and the many more who participate in standardization through their national bodies!

But we’re not slowing down… we’ll continue to have subgroup Zoom meetings, and then in less than four months from now we’ll be meeting again in Bulgaria to start adding features to C++26. I look forward to seeing many of you there. Thank you again to everyone reading this for your interest and support for C++ and its standardization.