This spring: High-Performance and Low-Latency C++ (Stockholm) and ACCU (Bristol)

I don’t get to Europe very often apart from ISO C++ standards meetings, but this spring I’ve been able to accept invitations for two English-language European events in the last week of April. If you’re interested in attending, please check out the links, and I look forward to meeting and re-meeting many of you there.

Tue-Thu Apr 25-27: High-Performance and Low-Latency C++ (Stockholm)

On April 25-27, I’ll be in Stockholm (Kista) giving a three-day seminar on “High-Performance and Low-Latency C++.” This contains updated and new material that reflects the latest C++ standards and compilers, with a focus to using modern C++11/14/17 effectively on modern hardware and memory architectures.

Note that the class size is limited to about 100, so that I’ll be able to interact with most attendees directly. Registration just opened recently and hasn’t been widely publicized yet, but today I was told that it’s already over 1/3 full. So if you or your colleagues might be interested in attending, please check out the link above; for group registrations, please contact Alfasoft directly.

Here is the summary, below; for a more detailed topic breakdown see the link above.


Performance and efficiency are C++’s bread and butter, and they matter more than ever on modern hardware: In processors, single-threaded performance improvements are slowing down (unless your code is parallel); in Internet of Things, we are often asked to do more work with less hardware; and in cloud computing, processor/hardware time is often the major component of cost and so making code twice as efficient often means saving close to half the processing cost. Today, getting the highest performance and the lowest latency on modern hardware often means being aware of the hardware in ways that most other programming languages can’t – from hardware caches where simply arranging our data in the right order can give 50x speedups with otherwise identical code, to hardware parallelism where using parallel algorithms turns on high-performance parallel and vector processor hardware that otherwise sits idle.

Additionally, low latency increasingly matters at all scales: In user interfaces, low latency means responsive apps and productive users without the dreaded “wait…” donut; in financial trading, low latency regularly saves large amounts of cash; in embedded real-time systems, low latency is crucial to meeting deadlines and can even save lives. Today, this makes concurrency more important than ever, because it delivers two things: It hides latencies we have to deal with and cannot remove, from disk I/O latency to speed-of-light network latency; and it makes our code responsive by not introducing needless latencies of our own even when we’re not hiding someone else’s latency.


This intensive three day course will provide developers with the knowledge and skills required to write high-performance and low-latency code on today’s modern systems using modern C++11/14/17. During the training you’ll learn how to get the highest performance and the lowest latency on modern hardware in ways that are unique to C++, including how to arrange data to use hardware caches effectively, and how to use standard and your own custom-written parallel algorithms to harness high-performance parallel and vector processor hardware to compute results faster. You’ll also learn how to manage latency for responsive apps and for real-time systems, and techniques to hide the underlying latencies we have to deal with and cannot remove such as disk and network latency, and to make your own code responsive by not introducing needless latencies in your own code.

Sat Apr 29: ACCU closing keynote (Bristol)

Next, I’ll be heading to Bristol to catch the end of the ACCU 2017 conference, and give the closing talk on “Something(s) New in C++.” No, the title is not intentionally a tease; it’s just that I have several topics available, and I won’t be sure until about a month before the event which will be the best one to speak about. Here is the current abstract:

By the time the ACCU 2017 conference begins, C++17 is expected to be technically complete and in its final approval ballot. What comes next? Will C++ continue growing forever? Can C++ code be simplified? This is a brand-new talk of material I’ve never given before, in which I’ll present one (or more) of three proposals I’m personally working on to further improve C++ post-C++17. All follow a common theme – adding a strategic language and/or library feature to C++ that leads to significant, and sometimes dramatic, simplification of real-world C++ code. I’ll pick which one (or more) of those topics to present sometime in March.

What I can say is that, whichever topic it ends up being, it’ll be something you haven’t seen before that’s forward-looking and aimed directly toward making C++ code simpler and easier… and of course without compromising C++’s model of efficient machine-near abstraction.

I look forward to seeing many of you in Europe this spring.

Recommended reading: Why mobile web apps are slow (Drew Crawford)

I don’t often link to other articles, but this one is worth reading.

Why mobile web apps are slow

by Drew Crawford

… So if you are trying to figure out exactly what brand of crazy all your native developer friends are on for continuing to write the evil native applications on the cusp of the open web revolution, or whatever, then bookmark this page, make yourself a cup of coffee, clear an afternoon, find a comfy chair, and then we’ll both be ready.

He offers data (imagine!) to justly debunk many common memes and “easy answers” that routinely litter HN/Reddit/Slashdot comment threads. The piece is also often subtly (and intentionally) hilarious – do watch for the subtle humor, not just the obvious wit.

Don’t be distracted by the author’s viewpoint and emphasis on “iOS and Javascript” development – the article covers lots of important ground, including:

  • developing for ARM vs. x86;
  • developing for desktop vs. mobile;
  • managed vs. native code performance;
  • JIT issues vs. inherent language design tensions;
  • why garbage collection is not at all the panacea it’s often billed to be and often needs to be emphatically avoided (did you realize Apple already jettisoned GC?); and
  • as many of you know already, why if you’re serious about performance you’ll be seriously serious about memory usage and access patterns as a first-order issue.

I agree with most of it, and not just because he quotes from my When Will Better JITs Save Managed Code blog post from last year.



A few takeaways from the conclusion (spoiler alert):

Garbage collection is exponentially bad in a memory-constrained environment. It is way, way worse than it is in desktop-class or server-class environments.

Every competent mobile developer, whether they use a GCed environment or not, spends a great deal of time thinking about the memory performance of the target device

JavaScript, as it currently exists, is fundamentally opposed to even allowing developers to think about the memory performance of the target device

If they did change their minds and allowed developers to think about memory, experience suggests this is a technically hard problem.

asm.js show some promise, but even if they win you will be using C/C++ or similar “backwards” language as a frontend, rather than something dynamic like JavaScript

Words of wisdom: Bjarne Stroustrup

Bjarne Stroustrup wrote the following a few minutes ago on the concepts mailing list:

Let me take this opportunity to remind people that

  • "being able to do something is not sufficient reason for doing it" and
  • "being able to do every trick is not a feature but a bug"

For the latter, remember Dijkstra’s famous "Goto considered harmful" paper. The point was not that the "new features" (loop constructs) could do every goto trick better/simpler, but that some of those tricks should be avoided to simplify good programming.

Concepts and concepts lite are meant to make good generic programming simpler. They are not meant to be a drop-in substitute for every metaprogramming and macroprogramming trick. If you are an expert, and if in your expert opinion you and your users really need those tricks, you can still use them, but we need to make many (most) uses of templates easier to get right, so that they can become more mainstream. That where concepts and concept lite fits in.

Some of you may find this hard to believe, but "back then" there was quite serious opposition to function declarations because "they restricted the way functions could be used and the way separate compilation could be used" and also serious opposition to virtual functions "because pointers to functions are so much more flexible." I see concepts lite (and concepts) in the same light as goto/for, unchecked-function-arguments/function-declarations, pointers-to-functions/abstract-classes.

Reminds me of the related antipattern: “Something must be done. This is something. Therefore we must do it!”

Java vulnerabilities

With the help of friends Robert Seacord and David Svoboda of CERT in particular, I posted a note and link to their CERT post today because people have been misunderstanding the recent Java vulnerabilities, thinking they’re somehow really C or C++ vulnerabilities because Java is implemented in C and C++.

From the post:

Are the Java vulnerabilities actually C and C++ vulnerabilities?

by Herb Sutter


You’ve probably seen the headlines:

[US-CERT] Java in Web Browser: Disable Now!

We’ve been telling people to disable Java for years. … We have confirmed that VU#625617 can be used to reliably execute code on Windows, OS X, and Linux platforms. And the exploit code for the vulnerability is publicly available and already incorporated into exploit kits. This should be enough motivation for you to turn Java off.

Firefox and Apple have blocked Java while U.S. Homeland Security recommends everyone disable it, because of vulnerabilities
Homeland Security still advises disabling Java, even after update

Some people have asked whether last week’s and similar recent Java vulnerabilities are actually C or C++ vulnerabilities – because, like virtually all modern systems software, Java is implemented in C and C++.

The answer is no, these particular exploits are pure Java. Some other exploits have indeed used vulnerabilities in Java’s native C code implementation, but the major vulnerabilities in the news lately are in Java itself, and they enable portable exploits on any operating system with a single program. …

Some other C++ experts who have better sense than I do won’t add the following bit publicly, but I can’t help myself: Insert “write once, pwn everywhere” joke here…


On yesterday’s thread, I just wrote in a comment:

@Jon: Yes, C++ is complex and the complexity is largely because of C compatibility. I agree with Bjarne that there’s a small language struggling to get out — I’ve participated in private experiments to specify such a language, and you can do it in well under half the complexity of C++ without losing the expressivity and flexibility of C++. However, it does require changes to syntax (mainly) and semantics (secondarily) that amount to making it a new language, which makes it a massive breaking change for programs (where existing code doesn’t compile and would need to be rewritten manually or with a tool) and programmers (where existing developers need retraining). That’s a very big tradeoff. So just as C++ ‘won’ in the 90s because of its own strengths plus its C compatibility, C++11 is being adopted because of its own strengths plus its C++98 compatibility. At the end of the day compatibility continues to be a very strong force and advantage that isn’t lightly tossed aside to “improve” things. However, it’s still worthwhile and some interesting things may come of it. Stay tuned, but expect (possible) news in years rather than months.

That reminded me of a snippet from the September 2012 issues of CACM. When I first read it, I thought it was so worthwhile that I made a magnified copy and pasted it on the window outside my office at work – it’s still there. The lightly annotated conclusion is shown here.

imageThe article itself was about a different technology, but the Lessons Learned conclusion remind us of the paramount importance of two things:

  • Backward compatibility with simple migration path (“compat”). That’s what keeps you “in the ecosystem.” If you don’t have such compatibility, don’t expect wide adoption unless there are hugely compelling reasons to put up with the breaking change. It’s roughly the same kind of ecosystem change for programmers to go from Language X to Language Y as for users to go from Platform X to Platform Y (e.g., Windows to Mac, iPhone to Android) – in each case you have a lot of relearning, and you lose understood tools and services some of which have replacements and some of which don’t.
  • Focusing on complete end-to-end solutions (“SFE” or scenario-focused engineering). This is why it’s important not to ship just a bunch of individual features, because that can leave holes where Steps 1, 2, and 4 of an end-to-end experience are wonderful but Step 3 is awkward, unreliable, or incomplete, which renders much of the good work in the other steps unuseful since they can’t be used as much or possibly at all.


“256 cores by 2013”?

I just saw a tweet that’s worth commenting on:


Almost right, and we have already reached that.

I said something similar to the above, but with two important differences:

  1. I said hardware “threads,” not only hardware “cores” – it was about the amount of hardware parallelism available on a mainstream system.
  2. What I gave was a min/max range estimate of roughly 16 to 256 (the latter being threads) under different sets of assumptions.

So: Was I was right about 2013 estimates?

Yes, pretty much, and in fact we already reached or exceeded that in 2011 and 2012:

  • Lower estimate line: In 2011 and 2012 parts, Intel Core i7 Sandy Bridge and Ivy Bridge are delivering almost the expected lower baseline, and offering 8-way and 12-way parallelism = 4-6 cores x 2 hardware threads per core.
  • Upper estimate line: In 2012, as mentioned in the article (which called it Larrabee, now known as MIC or Xeon Phi) is delivering 200-way to 256-way parallelism = 50-64 cores x 4 hardware threads per core. Also, in 2011 and 2012, GPUs have since emerged into more mainstream use for computation (GPGPU), and likewise offer massive compute parallelism, such as 1,536-way parallelism on a machine having a single NVidia Tesla card.

Yes, mainstream machines do in fact have examples of both ends of the “16 to 256 way parallelism” range. And beyond the upper end of the range, in fact, for those with higher-end graphics cards.

For more on these various kinds of compute cores and threads, see also my article Welcome to the Jungle.


Longer answer follows:

Here’s the main part from article, “Design for Manycore Systems” (August 11, 2009). Remember this was written over three years ago – in the Time Before iPad, when Android was under a year old:

How Much Scalability Does Your Application Need?

So how much parallel scalability should you aim to support in the application you‘re working on today, assuming that it’s compute-bound already or you can add killer features that are compute-bound and also amenable to parallel execution? The answer is that you want to match your application’s scalability to the amount of hardware parallelism in the target hardware that will be available during your application’s expected production or shelf lifetime. As shown in Figure 4, that equates to the number of hardware threads you expect to have on your end users’ machines.

Figure 4: How much concurrency does your program need in order to exploit given hardware?

Let’s say that YourCurrentApplication 1.0 will ship next year (mid-2010), and you expect that it’ll be another 18 months until you ship the 2.0 release (early 2012) and probably another 18 months after that before most users will have upgraded (mid-2013). Then you’d be interested in judging what will be the likely mainstream hardware target up to mid-2013.

If we stick with "just more of the same" as in Figure 2’s extrapolation, we’d expect aggressive early hardware adopters to be running 16-core machines (possibly double that if they’re aggressive enough to run dual-CPU workstations with two sockets), and we’d likely expect most general mainstream users to have 4-, 8- or maybe a smattering of 16-core machines (accounting for the time for new chips to be adopted in the marketplace). [[Note: I often get lazy and say “core” to mean all hardware parallelism. In context above and below, it’s clear we’re talking about “cores and threads.”]]

But what if the gating factor, parallel-ready software, goes away? Then CPU vendors would be free to take advantage of options like the one-time 16-fold hardware parallelism jump illustrated in Figure 3, and we get an envelope like that shown in Figure 5.

Figure 5: Extrapolation of “more of the same big cores” and “possible one-time switch to 4x smaller cores plus 4x threads per core” (not counting some transistors being used for other things like on-chip GPUs).

First, let’s look at the lower baseline, ‘most general mainstream users to have [4-16 way parallelism] machines in 2013’? So where are were in 2012 today for mainstream CPU hardware parallelism? Well, Intel Core i7 (e.g., Sandy Bridge, Ivy Bridge) are typically in the 4 to 6 core range – which, with hyperthreading == hardware threads, means 8 to 12 hardware threads.

Second, what about the higher potential line for 2013? As noted above:

  • Intel’s Xeon Phi (then Larrabee) is now delivering 50-64 cores x 4 threads = 200 to 256-way parallelism. That’s no surprise, because this article’s upper line was based on exactly the Larrabee data point (see quote below).
  • GPUs already blow the 256 upper bound away – any machine with a two-year-old Tesla has 1,536-way parallelism for programs (including mainstream programs like DVD encoders) that can harness the GPU.

So not only did we already reach the 2013 upper line early, in 2012, but we already exceeded it for applications that can harness the GPU for computation.

As I said in the article:

I don’t believe either the bottom line or the top line is the exact truth, but as long as sufficient parallel-capable software comes along, the truth will probably be somewhere in between, especially if we have processors that offer a mix of large- and small-core chips, or that use some chip real estate to bring GPUs or other devices on-die. That’s more hardware parallelism, and sooner, than most mainstream developers I’ve encountered expect.

Interestingly, though, we already noted two current examples: Sun’s Niagara, and Intel’s Larrabee, already provide double-digit parallelism in mainstream hardware via smaller cores with four or eight hardware threads each. "Manycore" chips, or perhaps more correctly "manythread" chips, are just waiting to enter the mainstream. Intel could have built a nice 100-core part in 2006. The gating factor is the software that can exploit the hardware parallelism; that is, the gating factor is you and me.

Podcast: Interview on Hanselminutes

imageA few weeks ago at the Build conference, Scott Hanselman and I sat down to talk about C++ and modern UI/UX. The podcast is now live here:

The Hanselminutes Podcast, Show #346
“Why C++” with Herb Sutter

Topics Scott raises include:

  • 2:00 Scott mentions he has used C++ in the past. C++ has changed. We still call it C++, but it’s a very different language now.
  • 5:30 (Why) do we care about performance any more?
  • 10:00 What’s this GPGPU thing? Think of your GPU as your modern 80387.
  • 13:45 C++ is having a resurgence. Where is C++ big?
  • 18:00 Why not just use one language? or, What is C++ good at? Efficient abstraction and portability.
  • 21:45 Programmers have a responsibility to support the business. Avoid the pitfall of speeds & feeds.
  • 24:00 My experience with my iPad, my iPhone, and my Slate 7 with Win8.
  • 28:45 We’re in two election seasons – (a) political and (b) technology (Nexus, iPad Mini, Surface, …). Everyone is wallpapering the media with ads (some of them attack ads), and vying for customer votes/$$, and seeing who’s going to be the winner.
  • 35:00 Natural user interfaces – we get so easily used to touch that we paw all screen, and Scott’s son gets so used to saying “Xbox pause” that anything that doesn’t respond is “broken.”

Friday’s Q&A session now online

imageMy live Q&A after Friday’s The Future of C++ talk is now online on Channel 9. The topics revolved around…

… recent progress and near-future directions for C++, both at Microsoft and across the industry, and talks about some announcements related to C++11 support in VC++ 2012 and the formation of the Standard C++ Foundation.

Herb takes questions from a live virtual audience and demos the new site on an 82 inch Perceptive Pixel display attached to a Windows 8 machine.

Thanks to everyone who tuned in.

Talk now online: The Future of C++ (VC++, ISO C++)

imageYesterday, many thousands of you were in the room or live online for my talk on The Future of C++. The talk is now available online.

This has been a phenomenal year for C++, since C++11’s publication just 12 months ago. And yesterday was a great day for C++.

Yesterday I had the privilege of announcing much of what Microsoft and the industry have been working on over the past year.

(minor) C++ at Microsoft

On September 12, we shipped VC++ 2012 with the complete C++11 standard library, and adding support for C++11 range-for, enum class, override and final. Less than two months later, yesterday we announced and shipped the November 2012 CTP, a compiler add-in to VC++ 2012 adding C++11 variadic templates, uniform initialization and initializer_lists, delegating constructors, function template default arguments, explicit conversion operators, and raw string literals. Details here, and download here.

Note that this is just the first batch of additional C++11 features. Expect further announcements and deliveries in the first half of 2013.

(major) C++ across the industry

Interest and investment in C++ continues to accelerate across the software world.

  • ISO C++ standardization is accelerating. Major companies are dedicating more people and resources to C++ standardization than they have in years. Over the next 24 months, we plan to ship three Technical Specifications and a new C++ International Standard.
  • C++ now has a home on the web at Launched yesterday, it both aggregates the best C++ content and hosts new content itself, including Bjarne Stroustrup’s new Tour of C++ and Scott Meyers’ new Universal References article.
  • We now have a Standard C++ Foundation. Announced yesterday, it is already funded by the largest companies in the industry down to startups, financial institutions to universities, book publishers to other consortia, with more members joining weekly. For the first time in C++’s history since AT&T relinquished control of the language, we have an entity – a trade organization – that exists exclusively to promote Standard C++ on all compilers and platforms, and companies are funding it because the world runs on C++, and investing in Standard C++ is good business.

This is an exciting time to be part of our industry, on any OS and using any language. It’s especially an exciting time to be involved with C++ on all compilers and platforms.

Thank you all, whatever platform and language you use, for being part of it.