atomic Weapons: The C++ Memory Model and Modern Hardware

[ETA: Updated OneDrive slides link]

Most of the talks I gave at C++ and Beyond 2012 last summer are already online at Channel 9. Here are two more.

This is a two-part talk that covers the C++ memory model, how locks and atomics and fences interact and map to hardware, and more. Even though we’re talking about C++, much of this is also applicable to Java and .NET which have similar memory models, but not all the features of C++ (such as relaxed atomics).

Note: This is about the basic structure and tools, not how to write lock-free algorithms using atomics. That next-level topic may be on deck for this year’s C++ and Beyond in December, we’ll see…

atomic<> Weapons: The C++ Memory Model and Modern Hardware

This session in one word: Deep.

It’s a session that includes topics I’ve publicly said for years is Stuff You Shouldn’t Need To Know and I Just Won’t Teach, but it’s becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they’ll reach for the big red levers with the flashing warning lights. Since we can’t keep people from pulling the big red levers, we’d better document the A to Z of what the levers actually do, so that people don’t SCRAM unless they really, really, really meant to.

Topics Covered:

  • The facts: The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct. We’ll include clear answers to several FAQs: “how do the compiler and hardware cooperate to remember how to respect these rules?”, “what is a race condition?”, and the ageless one-hand-clapping question “how is a race condition like a debugger?”
  • The tools: The deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers. I’ll try to convince you why standalone memory barriers are bad, and why barriers should always be associated with a specific load or store.
  • The unspeakables: I’ll grudgingly and reluctantly talk about the Thing I Said I’d Never Teach That Programmers Should Never Need To Now: relaxed atomics. Don’t use them! If you can avoid it. But here’s what you need to know, even though it would be nice if you didn’t need to know it.
  • The rapidly-changing hardware reality: How locks and atomics map to hardware instructions on ARM and x86/x64, and throw in POWER and Itanium for good measure – and I’ll cover how and why the answers are actually different last year and this year, and how they will likely be different again a few years from now. We’ll cover how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers.

Videos: Panel, and C++ Concurrency

I’m about two weeks late posting this, but two more C++ and Beyond 2012 videos are now available online.

The first is my concurrency talk:

C++ and Beyond 2012: C++ Concurrency (Herb Sutter)

I’ve spoken and written on these topics before. Here’s what’s different about this talk:

  • Brand new: This material goes beyond what I’ve written and taught about before in my Effective Concurrency articles and courses.
  • Cutting-edge current: It covers the best-practices state of the art techniques and shipping tools, and what parts of that are standardized in C++11 already (the answer to that one may surprise you!) and what’s en route to near-term standardization and why, with coverage of the latest discussions.
  • Blocking vs. non-blocking: What’s the difference between blocking and non-blocking styles, why on earth would you care, which kinds does C++11 support, and how are we looking at rounding it out in C++1y?

The answers all matter to you – even the ones not yet in the C++ standard – because they are real, available in shipping products, and affect how you design your software today.

The second is one of the panels:

imageC++ and Beyond 2012: Panel – Convincing your Colleagues

From C++ and Beyond 2012, Andrei, Herb and Scott present Convincing Your Colleagues – an interactive panel.

Abstract:

You can’t do a better job if you don’t change what you’re doing, but change is hard.  It’s especially hard when what needs to change is your colleagues’ approach to software development. Moving your team forward often requires persuading your peers to change their behavior, sometimes to do something they’re not doing, other times to stop doing something they’ve become accustomed to.  Whether the issue is to embrace or avoid C++ language features, to adopt new development tools or abandon old ones, to increase use of or scale back on overuse of design patterns, to adhere to coding standards, or any of the plethora of other matters that affect software creation, moving things forward typically requires getting your colleagues to buy into the change you’re proposing.  But how can you do that?

In this panel session, Andrei, Herb, and Scott share how they go about convincing their colleagues to change and take questions from the audience.

Truth be told, the panel ranged widely and probably most of the time was on other topics!

I hope you find them useful.

Java vulnerabilities

With the help of friends Robert Seacord and David Svoboda of CERT in particular, I posted a note and link to their CERT post today because people have been misunderstanding the recent Java vulnerabilities, thinking they’re somehow really C or C++ vulnerabilities because Java is implemented in C and C++.

From the post:

Are the Java vulnerabilities actually C and C++ vulnerabilities?

by Herb Sutter

 

You’ve probably seen the headlines:

[US-CERT] Java in Web Browser: Disable Now!

We’ve been telling people to disable Java for years. … We have confirmed that VU#625617 can be used to reliably execute code on Windows, OS X, and Linux platforms. And the exploit code for the vulnerability is publicly available and already incorporated into exploit kits. This should be enough motivation for you to turn Java off.

Firefox and Apple have blocked Java while U.S. Homeland Security recommends everyone disable it, because of vulnerabilities
Homeland Security still advises disabling Java, even after update

Some people have asked whether last week’s and similar recent Java vulnerabilities are actually C or C++ vulnerabilities – because, like virtually all modern systems software, Java is implemented in C and C++.

The answer is no, these particular exploits are pure Java. Some other exploits have indeed used vulnerabilities in Java’s native C code implementation, but the major vulnerabilities in the news lately are in Java itself, and they enable portable exploits on any operating system with a single program. …

Some other C++ experts who have better sense than I do won’t add the following bit publicly, but I can’t help myself: Insert “write once, pwn everywhere” joke here…

Video: You Don’t Know const and mutable

imageAt C++ and Beyond in August, I gave a 30 min talk on the changed meaning of const and mutable. The talk video is now online:

You Don’t Know [keyword] and [keyword]

const means const.

Bonus: mutable is useful and continues to mean ‘already as good as const.’

This is another way C++ has become simpler: const now means what people always thought it meant, and the four-line code example doesn’t need deep analysis at all – it just works.

But we analyze the four-line example anyway as a motivating case to see why and how it works, so we can fully appreciate this new simplification in C++11…

An implementation of generic lambdas is now available

For those interested in C++ standardization and not already following along at isocpp.org, here’s an item of likely interest:

An implementation of generic lambdas (request for feedback)—Faisal Vali

This week, Faisal Vali shared an initial “alpha” implementation of generic lambdas in Clang. Faisal is the lead author of the proposal (N3418), with Herb Sutter and Dave Abrahams.

To read and participate in the active discussion, see the message thread on std-proposals.

Here is a copy of Faisal’s announcement…

Read more at isocpp.org…

Compatibility

On yesterday’s thread, I just wrote in a comment:

@Jon: Yes, C++ is complex and the complexity is largely because of C compatibility. I agree with Bjarne that there’s a small language struggling to get out — I’ve participated in private experiments to specify such a language, and you can do it in well under half the complexity of C++ without losing the expressivity and flexibility of C++. However, it does require changes to syntax (mainly) and semantics (secondarily) that amount to making it a new language, which makes it a massive breaking change for programs (where existing code doesn’t compile and would need to be rewritten manually or with a tool) and programmers (where existing developers need retraining). That’s a very big tradeoff. So just as C++ ‘won’ in the 90s because of its own strengths plus its C compatibility, C++11 is being adopted because of its own strengths plus its C++98 compatibility. At the end of the day compatibility continues to be a very strong force and advantage that isn’t lightly tossed aside to “improve” things. However, it’s still worthwhile and some interesting things may come of it. Stay tuned, but expect (possible) news in years rather than months.

That reminded me of a snippet from the September 2012 issues of CACM. When I first read it, I thought it was so worthwhile that I made a magnified copy and pasted it on the window outside my office at work – it’s still there. The lightly annotated conclusion is shown here.

imageThe article itself was about a different technology, but the Lessons Learned conclusion remind us of the paramount importance of two things:

  • Backward compatibility with simple migration path (“compat”). That’s what keeps you “in the ecosystem.” If you don’t have such compatibility, don’t expect wide adoption unless there are hugely compelling reasons to put up with the breaking change. It’s roughly the same kind of ecosystem change for programmers to go from Language X to Language Y as for users to go from Platform X to Platform Y (e.g., Windows to Mac, iPhone to Android) – in each case you have a lot of relearning, and you lose understood tools and services some of which have replacements and some of which don’t.
  • Focusing on complete end-to-end solutions (“SFE” or scenario-focused engineering). This is why it’s important not to ship just a bunch of individual features, because that can leave holes where Steps 1, 2, and 4 of an end-to-end experience are wonderful but Step 3 is awkward, unreliable, or incomplete, which renders much of the good work in the other steps unuseful since they can’t be used as much or possibly at all.

Enjoy.

Perspective: “Why C++ Is Not ‘Back’”

John Sonmez wrote a nice article on the weekend – both the article and the comments are worth reading.

“Why C++ Is Not ‘Back’”

by John Sonmez

I love C++. […] There are plenty of excellent developers I know today that still use C++ and teach others how to use it and there is nothing at all wrong with that.

So what is the problem then?

[…] Everyone keeps asking me if they need to learn C++, but just like my answer was a few years ago, it is the same today—NO!

Ok, so “NO” in caps is a bit harsh.  A better answer is “why?”

[…]

Although I don’t agree with everything John says, he presents something quite valuable, and unfortunately rare: a thoughtful hype-free opinion. This is valuable especially when (not just “even when”) it differs from your own opinion, because combining different thoughtful views of the same thing gives something exceedingly important and precious: perspective.

By definition, depth perception is something you get from seeing and combining more than one point of view. This is why one of my favorite parts of any conference or group interview is discussions between experts on points where they disagree – because experts do regularly disagree, and when they each present their thoughtful reasons and qualifications and specific cases (not just hype), you get to see why a point is valid, when it is valid and not valid, how it applies to a specific situation or doesn’t, and so on.

I encourage you to read the article and the comments. This quote isn’t the only thing I don’t fully agree with, but it’s the one thing I’ll respond to a little from the article:

There are only about three sensible reasons to learn C++ today that I can think of.

There are other major reasons in addition to those, such as:

  • Servicing, which is harder when you depend on a runtime.
  • Testing, since you lose the ability to test your entire application (compare doing all-static or mostly-static linking with having your application often be compiled/jitted for the first time on an end user’s machine).

These aren’t bad things in themselves, just tradeoffs and reasons to prefer native code vs managed code depending on your application’s needs.

But even the three points John does mention are very broad reasons that apply often, especially the issue of control over memory layouts, to the point where I would say: In any language, if you are serious about performance you will be using arrays a lot (not “always,” just “a lot”). Some languages make that easier and give you much better control over layout in general and arrays in particular, while other languages/environments make it harder (possible! but harder) and you have to “opt out” or “work against” the language’s/runtime’s strong preference for pointer-chasing node-based data structures.

For more, please see also my April 2012 Lang.NEXT talk “(Not Your Father’s) C++”, especially the second half:

  • From 19:20, I try to contrast the value and tenets of C++ and of managed language/environments, including that each is making a legitimate and useful design choice.
  • From 36:00, I try to address the question of: “can managed languages do essentially everything that’s interesting, so native C/C++ code should be really just for things like device drivers?”
  • From 39:00, I talk about when each one is valuable and should be used, especially that if programmer time is your biggest cost, that’s what you should optimize for and it’s exactly what managed languages/environments are designed for.

But again, I encourage you to read John’s article – and the comments – yourself. There’s an unusually high signal-to-noise ratio here.

It’s always nice to encounter a thoughtful balanced piece of writing, whether or not we agree with every detail. Thanks, John!

“256 cores by 2013”?

I just saw a tweet that’s worth commenting on:

image

Almost right, and we have already reached that.

I said something similar to the above, but with two important differences:

  1. I said hardware “threads,” not only hardware “cores” – it was about the amount of hardware parallelism available on a mainstream system.
  2. What I gave was a min/max range estimate of roughly 16 to 256 (the latter being threads) under different sets of assumptions.

So: Was I was right about 2013 estimates?

Yes, pretty much, and in fact we already reached or exceeded that in 2011 and 2012:

  • Lower estimate line: In 2011 and 2012 parts, Intel Core i7 Sandy Bridge and Ivy Bridge are delivering almost the expected lower baseline, and offering 8-way and 12-way parallelism = 4-6 cores x 2 hardware threads per core.
  • Upper estimate line: In 2012, as mentioned in the article (which called it Larrabee, now known as MIC or Xeon Phi) is delivering 200-way to 256-way parallelism = 50-64 cores x 4 hardware threads per core. Also, in 2011 and 2012, GPUs have since emerged into more mainstream use for computation (GPGPU), and likewise offer massive compute parallelism, such as 1,536-way parallelism on a machine having a single NVidia Tesla card.

Yes, mainstream machines do in fact have examples of both ends of the “16 to 256 way parallelism” range. And beyond the upper end of the range, in fact, for those with higher-end graphics cards.

For more on these various kinds of compute cores and threads, see also my article Welcome to the Jungle.

 

Longer answer follows:

Here’s the main part from article, “Design for Manycore Systems” (August 11, 2009). Remember this was written over three years ago – in the Time Before iPad, when Android was under a year old:

How Much Scalability Does Your Application Need?

So how much parallel scalability should you aim to support in the application you‘re working on today, assuming that it’s compute-bound already or you can add killer features that are compute-bound and also amenable to parallel execution? The answer is that you want to match your application’s scalability to the amount of hardware parallelism in the target hardware that will be available during your application’s expected production or shelf lifetime. As shown in Figure 4, that equates to the number of hardware threads you expect to have on your end users’ machines.

Figure 4: How much concurrency does your program need in order to exploit given hardware?

Let’s say that YourCurrentApplication 1.0 will ship next year (mid-2010), and you expect that it’ll be another 18 months until you ship the 2.0 release (early 2012) and probably another 18 months after that before most users will have upgraded (mid-2013). Then you’d be interested in judging what will be the likely mainstream hardware target up to mid-2013.

If we stick with "just more of the same" as in Figure 2’s extrapolation, we’d expect aggressive early hardware adopters to be running 16-core machines (possibly double that if they’re aggressive enough to run dual-CPU workstations with two sockets), and we’d likely expect most general mainstream users to have 4-, 8- or maybe a smattering of 16-core machines (accounting for the time for new chips to be adopted in the marketplace). [[Note: I often get lazy and say “core” to mean all hardware parallelism. In context above and below, it’s clear we’re talking about “cores and threads.”]]

But what if the gating factor, parallel-ready software, goes away? Then CPU vendors would be free to take advantage of options like the one-time 16-fold hardware parallelism jump illustrated in Figure 3, and we get an envelope like that shown in Figure 5.

Figure 5: Extrapolation of “more of the same big cores” and “possible one-time switch to 4x smaller cores plus 4x threads per core” (not counting some transistors being used for other things like on-chip GPUs).

First, let’s look at the lower baseline, ‘most general mainstream users to have [4-16 way parallelism] machines in 2013’? So where are were in 2012 today for mainstream CPU hardware parallelism? Well, Intel Core i7 (e.g., Sandy Bridge, Ivy Bridge) are typically in the 4 to 6 core range – which, with hyperthreading == hardware threads, means 8 to 12 hardware threads.

Second, what about the higher potential line for 2013? As noted above:

  • Intel’s Xeon Phi (then Larrabee) is now delivering 50-64 cores x 4 threads = 200 to 256-way parallelism. That’s no surprise, because this article’s upper line was based on exactly the Larrabee data point (see quote below).
  • GPUs already blow the 256 upper bound away – any machine with a two-year-old Tesla has 1,536-way parallelism for programs (including mainstream programs like DVD encoders) that can harness the GPU.

So not only did we already reach the 2013 upper line early, in 2012, but we already exceeded it for applications that can harness the GPU for computation.

As I said in the article:

I don’t believe either the bottom line or the top line is the exact truth, but as long as sufficient parallel-capable software comes along, the truth will probably be somewhere in between, especially if we have processors that offer a mix of large- and small-core chips, or that use some chip real estate to bring GPUs or other devices on-die. That’s more hardware parallelism, and sooner, than most mainstream developers I’ve encountered expect.

Interestingly, though, we already noted two current examples: Sun’s Niagara, and Intel’s Larrabee, already provide double-digit parallelism in mainstream hardware via smaller cores with four or eight hardware threads each. "Manycore" chips, or perhaps more correctly "manythread" chips, are just waiting to enter the mainstream. Intel could have built a nice 100-core part in 2006. The gating factor is the software that can exploit the hardware parallelism; that is, the gating factor is you and me.

Podcast: Interview on Hanselminutes

imageA few weeks ago at the Build conference, Scott Hanselman and I sat down to talk about C++ and modern UI/UX. The podcast is now live here:

The Hanselminutes Podcast, Show #346
“Why C++” with Herb Sutter

Topics Scott raises include:

  • 2:00 Scott mentions he has used C++ in the past. C++ has changed. We still call it C++, but it’s a very different language now.
  • 5:30 (Why) do we care about performance any more?
  • 10:00 What’s this GPGPU thing? Think of your GPU as your modern 80387.
  • 13:45 C++ is having a resurgence. Where is C++ big?
  • 18:00 Why not just use one language? or, What is C++ good at? Efficient abstraction and portability.
  • 21:45 Programmers have a responsibility to support the business. Avoid the pitfall of speeds & feeds.
  • 24:00 My experience with my iPad, my iPhone, and my Slate 7 with Win8.
  • 28:45 We’re in two election seasons – (a) political and (b) technology (Nexus, iPad Mini, Surface, …). Everyone is wallpapering the media with ads (some of them attack ads), and vying for customer votes/$$, and seeing who’s going to be the winner.
  • 35:00 Natural user interfaces – we get so easily used to touch that we paw all screen, and Scott’s son gets so used to saying “Xbox pause” that anything that doesn’t respond is “broken.”

Reader Q&A: A good book to learn C++11?

Last night a reader asked one of the questions that helped motivate the creation of isocpp.org:

I am trying to learn the new C++. I am wondering if you are aware of resources or courses that can help me learn a little. I was not able to find any books for C++11. Any help would be greatly appreciated.

By the way, the isocpp website is great :)

Thanks! And you beat me to the punchline. :)

A good place to start is isocpp.org/get-started. It recommends four books, three of which have been updated for C++11 (and two of those three are available now). C++ Primer 5e is a good choice.

We also maintain a list of new articles and books under isocpp.org/blog/category/articles-books.

As for courses, there’s also a growing list of good courses there under isocpp.org/blog/category/training, including two C++11 overview courses by Scott Meyers and Dave Abrahams respectively.

As with all blog categories, you have three ways to find the most recent items under Articles & Books, Training, and other categories:

  • Home page: The most recent items in each category are always listed right on the site’s newspaper-inspired home page.
  • RSS: You can subscribe to RSS feeds for all posts, or for specific categories to get notifications of new articles, books, training, events, and more as they become available.
  • Twitter: Follow @isocpp for notifications of new stuff.

The site and blog are curated, aiming for higher quality and lower volume. Even so, we know you won’t be able to read, watch, or listen to all of the content. So we’re doing our best to make sure that when you do pick something recommended by the site, it’s likely to be of high quality and address what you’re looking to find.

Enjoy! This is version 1 of the site… I hope that it’s already useful, and you’ll see quite a bit more over the coming year.