Late-Breaking C&B Session: A Special Announcement

At the end of the Monday afternoon session, I will be making a special announcement related to Standard C++ on all platforms. Be there to hear the details, and to receive an extra perk that’s being reserved for C&B 2012 attendees only.

Note: We sometimes record sessions and make them freely available online via Channel 9, and we intend to do that again this year for some selected sessions. However, this session is for C&B attendees only and will not be recorded.

Registration is open until Wednesday and the event is pretty full but a few spaces are still available. I’m looking forward to seeing many of you there for a top-notch C++ conference full of fresh new current material – I’ve seen Andrei’s and Scott’s talk slides too, and I think this C&B is going to be the best one yet.

You’ll leave exhausted, but with a full brain and quite likely a big silly grin as you think about all the ways to use the material right away on your current project back home.

C&B Session: atomic<> Weapons – The C++11 Memory Model and Modern Hardware

Here’s another deep session for C&B 2012 on August 5-8 – if you haven’t registered yet, register soon. We got a bigger venue this time, but as I write this the event is currently almost 75% full with five weeks to go.

I know, I’ve already posted three sessions and a panel. But there’s just so much about C++11 to cover, so here’s a fourth brand-new session I’ll do at C&B 2012 that goes deeper on its topic than I’ve ever been willing to go before.

atomic<> Weapons: The C++11 Memory Model and Modern Hardware

This session in one word: Deep.

It’s a session that includes topics I’ve publicly said for years is Stuff You Shouldn’t Need To Know and I Just Won’t Teach, but it’s becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they’ll reach for the big red levers with the flashing warning lights. Since we can’t keep people from pulling the big red levers, we’d better document the A to Z of what the levers actually do, so that people don’t SCRAM unless they really, really, really meant to.

This session covers:

The facts: The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct. We’ll include clear answers to several FAQs: “how do the compiler and hardware cooperate to remember how to respect these rules?”, “what is a race condition?”, and the ageless one-hand-clapping question “how is a race condition like a debugger?”
The tools: The deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers. I’ll try to convince you why standalone memory barriers are bad, and why barriers should always be associated with a specific load or store.
The unspeakables: I’ll grudgingly and reluctantly talk about the Thing I Said I’d Never Teach That Programmers Should Never Need To Now: relaxed atomics. Don’t use them! If you can avoid it. But here’s what you need to know, even though it would be nice if you didn’t need to know it.
The rapidly-changing hardware reality: How locks and atomics map to hardware instructions on ARM and x86/x64, and throw in POWER and Itanium for good measure – and I’ll cover how and why the answers are actually different last year and this year, and how they will likely be different again a few years from now. We’ll cover how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers.
Coda: Volatile and “compiler-only” memory barriers. It’s important to understand exactly what atomic and volatile are and aren’t for. I’ll show both why they’re both utterly unrelated (they have exactly zero overlapping uses, really) and yet are fundamentally related when viewed from the perspective of talking about the memory model. Also, people keep seeing and asking about “compiler-only” memory barriers and when to use them – they do have a valid-though-rare use, but it’s not the use that most people are trying to use them for, so beware!

For me, this is going to be the deepest and most fun C&B yet. At previous C&Bs I’ve spoken about not only code, but also meta topics like design and C++’s role in the marketplace. This time it looks like all my talks will be back to Just Code. Fun times!

Here a snapshot of the list of C&B 2012 sessions so far:

Universal References in C++11 (Scott)
You Don’t Know [keyword] and [keyword] (Herb)
Convincing Your Colleagues (Panel)
Initial Thoughts on Effective C++11 (Scott)
Modern C++ = Clean, Safe, and Faster Than Ever (Panel)
Error Resilience in C++11 (Andrei)
C++ Concurrency – 2012 State of the Art (and Standard) (Herb)
C++ Parallelism – 2012 State of the Art (and Standard) (Herb)
Secrets of the C++11 Threading API (Scott)
atomic<> Weapons: The C++11 Memory Model and Modern Hardware (Herb)

It’ll be a blast. I hope to see many of you there. Register soon.

Reader Q&A: Why don’t modern smart pointers implicitly convert to *?

Today a reader asked a common question:

Why doesn’t unique_ptr (and the ilk) appear to have an operator overload somewhat as follows:

operator T*() { return get(); };

The reason I ask is because we have reams of old code wanting raw pointers (as function parms), and I would like to replace the outer layers of the code which deal with the allocation and deallocation with unique_ptrs without having to either ripple unique_ptrs through the entire system or explicitly call .get() every time the unique_ptr is a parm to a function which wants a raw pointer.

What my programmers are doing is creating a unique_ptr and immediately using get() to put it into a local raw pointer which is used from then on. Somehow that doesn’t feel right, but I don’t know what would be the best alternative.

In the olden days, smart pointers often did provide the convenience of implicit conversion to *. It was by using those smart pointers that we learned it caused more problems than it solves, and that requiring people to write .get() was actually not a big deal.

For an example of the problems of implicit conversions, consider:

unique_ptr p( new widget );
...
use( p + 42 ); // error (maybe he meant "*p + 42"?)
    // but if implicit conversion to * were allowed, would silently compile -- urk
...
delete p; // error
    // but if implicit conversion to * were allowed, would silently compile -- double urk

For more, see also Andrei’s Modern C++ Design section 7.7, “Implicit Conversion to Raw Pointer Types.”

However, this really isn’t as bad as most people fear for several reasons, including but not limited to:

The large majority of uses of the smart pointer, such as calling member functions on the object (e.g., p->foo()) just work naturally and effortlessly because we do have operator->.
You rarely if ever need to say unique_ptr on a local variable, because C++11’s auto is your friend – and “rarely” becomes “never” if you use make_unique which is described here and should become standard in the future.
Parameters (which you mention) themselves should almost never be smart pointers, but should be normal pointers and references. So if you’re managing an object’s lifetime by smart pointer, you do write .get() – but only once at the top of each call tree. More on this in the current GotW #105 – solution coming soon, watch this space.

Talk Video: Welcome to the Jungle (60 min version + Q&A)

While visiting Facebook earlier this month, I gave a shorter version of my “Welcome to the Jungle” talk, based on the eponymous WttJ article. They made a nice recording and it’s now available online here:

Facebook Engineering

Title: Herb Sutter: Welcome to the Jungle

In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend—mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done. — The free lunch is over. Now welcome to the hardware jungle.

The slides are available here. (There doesn’t seem to be a link to the slides on the page itself as I write this.)

For those interested in a longer version, in April I gave a 105-minute + Q&A version of this talk in Kansas City at Perceptive, also available online where I posted before.

A word about “cluster in a box”

I should have remembered that describing a PC as a “heterogeneous cluster in a box” is a big red button for people, in particular because “cluster” implies “parts can fail and program should continue.” So in the Q&A, one commenter made the point that I should have mentioned reliability is an issue.

As I answered there, I half agree – it’s true but it’s only half the story, and it doesn’t affect the programming model (see more below). One of the slides I omitted to shorten this version of the talk highlighted that there are actually two issues when you go from “Disjoint (tightly coupled)” to “Disjoint (loosely coupled)”: reliability and latency, and both are important. (I also mentioned this in the original WttJ article this is based on; just search for “reliability.”)

Even after the talk, I still got strong resistance along the lines that, ‘no, you obviously don’t get it, latency isn’t a significant issue at all, reliability is the central issue and it kills your argument because it makes the model fundamentally different.’ Paraphrasing subsequent email:

‘A fundamental difference between distributed computing and single-box multiprocessing is that in the former case you don’t know whether a failure was a communication failure (i.e. the task was completed but communication failed) or a genuine failure to carry the task. (Hence all complicated two-phase commit protocols etc.) In contrast, in a single-box scenario you can know the box you’re on is working.’

Let me respond further to this here, because clearly these guys know more about distributed systems than I do and I’m always happy to be educated, but I also think we have a disconnect on three things asserted above: It is not my understanding that reliability is more important than latency, or that apps have to distinguish comms failures from app exceptions, or that N-phase commit enters the picture.

First, I don’t agree with the assertion that reliability alone is what’s important, or that it’s more important than latency, for the following reason:

You can build reliable transports on top of unreliable ones. You do it through techniques like sequencing, redundancy, and retry. A classic example is TCP, which delivers reliable communications over notoriously- and deliberately-unreliable IP which can drop and reorder packets as network nodes and communications paths keep madly appearing and reappearing like a herd of crazed Cheshire cats. We can and do build secure reliable global banking systems on that.
Once you do that, you have turned a reliability issue into a performance (specifically latency) issue. Both reliability and latency are key issues when moving to loosely-coupled systems, but because you can turn the first into the second, it’s latency that is actually the more fundamental and important one – and the only one the developer needs to deal with.

For example, to use compute clouds like Azure and AWS, you usually start with two basic pieces:

the queue(s), which you use to push the work items out/around and results back/around; and
an elastic set of compute nodes, each of which pulls work items from the queue and processes them.

What happens when you encounter a reliability problem? A node can pull a work item but fail to complete it, for example if the node crashes or the system encounters a partial network outage or other communication problem.

Many modern systems already automatically recover and have another node re-pull the same work item to make sure each work item gets done even in the face of partial failures. From the app’s point of view, such failures just manifest as degraded performance (higher latency or time-to-solution) and therefore mainly affect the granularity of parallel work items – they have to be big enough to be worth sending elsewhere and so minimum size is directly proportional to latency so that the overheads do not dominate. They do not manifest as app-visible failures.

Yes, the elastic cloud implementation has to deal with things like network failures and retries. But no, this isn’t your problem; it’s not supposed to be your job to implement the elastic cloud, it’s supposed to be your job just to implement each node’s local logic and to create whatever queues you want and push your work item data into them.

Aside: Of course, as with any retry-based model, you have to make sure that a partly-executed work item doesn’t expose any partial side effects it shouldn’t, and normally you prevent that by doing the work in a transaction and rolling it back on failure, or in the extreme (not generally recommended but sometimes okay) resorting to compensating writes to back out partial work.

That covers everything except the comment about two-phase commit: Citing that struck me as odd because I haven’t heard much us of that kind of coupled approach in years. Perhaps I’m misinformed, but my impression of 2- or N-phase commit protocols was that they have some serious problems:

They are inherently nonscalable.
They increase rather than decrease interdependencies in the system – even with heroic efforts like majority voting and such schemes that try to allow for subsets of nodes being unavailable, which always seemed fragile to me.
Also, I seem to remember that NPC is a blocking protocol, which if so is inherently anti-concurrency. One of the big realizations in modern mainstream concurrency in the past few years is that Blocking Is Nearly Always Evil. (I’m looking at you, future.get(), and this is why the committee is now considering adding the nonblocking future.then() as well.)

So my impression is that these were primarily of historical interest – if they are still current in modern datacenters, I would appreciate learning more about it and seeing if I’m overly jaded about N-phase commit.

GotW #105: Smart Pointers, Part 3 (Difficulty: 7/10)

JG Question

1. What are the performance and correctness implications of the following function declaration? Explain.

void f( shared_ptr<widget> );

Guru Question

2. A colleague is writing a function f that takes an existing object of type widget as a required input-only parameter, and trying to decide among the following basic ways to take the parameter (omitting const):

void f( widget& );
void f( unique_ptr<widget> );
void f( unique_ptr<widget>& );
void f( shared_ptr<widget> );
void f( shared_ptr<widget>& );

Under what circumstances is each appropriate? Explain your answer, including where const should or should not be added anywhere in the parameter type.

(There are other ways to pass the parameter, but we will consider only the ones shown above.)

GotW #104: Solution

The solution to GotW #104 is now live.

Facebook Folly – OSS C++ Libraries

I’ve been beating the drum this year (see the last section of the talk) that the biggest problem facing C++ today is the lack of a large set of de jure and de facto standard libraries. My team at Microsoft just recently announced Casablanca, a cloud-oriented C++ library and that we intend to open source, and we’re making other even bigger efforts that I hope will bear fruit and I’ll be able to share soon. But it can’t be just one company – many companies have already shared great open libraries on the past, but still more of us have to step up.

In that vein, I was extremely happy to see another effort come to fruition today. A few hours ago I was at Facebook to speak at their C++ day, and I got to be in the room when Andrei Alexandrescu dropped his arm to officially launch Folly:

Folly: The Facebook Open Source Library

by Jordan DeLong on Saturday, June 2, 2012 at 2:59pm ·

Facebook is built on open source from top to bottom, and could not exist without it. As engineers here, we use, contribute to, and release a lot of open source software, including pieces of our core infrastructure such as HipHop and Thrift.

But in our C++ services code, one clear bottleneck to releasing more work has been that any open sourced project needed to break dependencies on unreleased internal library code. To help solve that problem, today we open sourced an initial release of Folly, a collection of reusable C++ library artifacts developed and used at Facebook. This announcement was made at our C++ conference at Facebook in Menlo Park, CA.

Our primary aim with this ‘foolishness’ is to create a solution that allows us to continue open sourcing parts of our stack without resorting to reinventing some of our internal wheels. And because Folly’s components typically perform significantly faster than counterparts available elsewhere, are easy to use, and complement existing libraries, we think C++ developers might find parts of this library interesting in their own right.

Andrei announced that Folly stood for the Facebook Open Llibrary. He claimed it was a Welsh name, which is even funnier when said by a charming Romanian man.

Microsoft, and I personally, would like to congratulate Facebook on this release! Getting more high-quality open C++ libraries is a Good Thing, and I think it is safe to say that Casablanca and Folly are just the beginning. There’s a lot more coming across our industry this year. Stay tuned.

We’re hiring (again & more)

The Visual C++ team is looking for a number of people to do great work on C++11, parallelizing/vectorizing, cloud, libraries, and more. All I can say is that there’s a lot of cool stuff in the pipeline that directly addresses real needs, including things people regularly comment about on this blog that I can’t answer specifically yet but will soon.

If you might like to be part of it, here’s how – 13 positions right now and more to come as we update this list:

Be What’s Next (We’re hiring!)

The C++ organization is growing and hiring across all feature areas (C++ 11, compiler front-end, compiler back-end, C++ AMP, PPL, libraries & runtime, IDE, Casablanca). We are looking for passionate program managers, developers and testers to bang out the next versions of the toolset!

What’s in it for you:

Be part of the C++ standards evolution – you’ll have the opportunity to work side-by-side with folks like Herb Sutter

Solve exciting challenges as we navigate the hardware evolution (newer chipsets, multi-core, GPU, heterogeneous cores etc.)

Be part of the technology that builds all of Microsoft’s platforms like Windows, Xbox, Windows Phone and Windows Embedded.

Please apply directly using the links below. We’ll keep this list updated for the next couple of months.

Current job list is available on that page.

Two Sessions: C++ Concurrency and Parallelism – 2012 State of the Art (and Standard)

It’s time for, not one, but two brand-new, up-to-date talks on the state of the art of concurrency and parallelism in C++. I’m going to put them together especially and only for C++ and Beyond 2012, and I’ll be giving them nowhere else this year:

C++ Concurrency – 2012 State of the Art (and Standard)
C++ Parallelism – 2012 State of the Art (and Standard)

And there’s a lot to tell. 2012 has already been a busy year for the pushing the boundaries of both “shipping-and-practical” and “proto-standard” concurrency and parallelism in C++:

In February, the spring ISO C++ standards meeting saw record attendance at 73 experts (normal is 50-55), and spent the full week primarily on new language and library proposals, with notable emphasis on the area of concurrency and parallelism. There was so much interest that I formed four Study Groups and appointed chairs: the largest on concurrency and parallelism (SG1, Hans Boehm), and three others on modules (SG2, Doug Gregor), filesystem (SG3, Beman Dawes), and networking (SG4, Kyle Kloepper).
Three weeks ago, we hosted another three-day face-to-face meeting for SG1 and SG4 – and at nearly 40 people the SG1 attendance rivaled that of a normal full ISO C++ meeting, with a who’s-who of the world’s concurrency and parallelism experts in attendance and further proposal presentations from companies like IBM, Intel, and Microsoft. There was so much interest that I had to form a new Study Group 5 for Transactional Memory (SG5), and appointed Michael Wong of IBM as chair.
Over the summer, we’ll all be working on updated proposals for the October ISO C++ meeting in Portland.

Things are heating up, and we’re narrowing down which areas to focus on.

I’ve spoken and written on these topics before. Here’s what’s different about these talks:

Brand new: This material goes beyond what I’ve written and taught about before in my Effective Concurrency articles and courses.
Cutting-edge current: It covers the best-practices state of the art techniques and shipping tools, and what parts of that are standardized in C++11 already (the answer to that one may surprise you!) and what’s en route to near-term standardization and why, with coverage of the latest discussions.
Mainstream hardware – many kinds of parallelism: What’s the relationship among multi-core CPUs, hardware threads, SIMD vector units (Intel SSE and AVX, ARM Neon), and GPGPU (general-purpose computation on GPUs, which I covered at C++ and Beyond 2011)? Which are most interesting, what technologies are available now, and what’s being considered for near-term standardization?
Blocking vs. non-blocking: What’s the difference between blocking and non-blocking styles, why on earth would you care, which kinds does C++11 support, and how are we looking at rounding it out in C++1y?
Task and data parallelism: What’s the difference between task parallelism and data parallelism, which kind of of hardware does each allow you to exploit, and why?
Work stealing: What’s the difference between thread pools and work stealing, what are the major flavors of work stealing, which of these (if any) does C++11 already support and is already shipping on some advanced commercial C++ compilers today (this answer will likely surprise you), and what needs to be done in the next round for a complete state-of-the-art parallelism story in C++1y?

The answers all matter to you – even the ones not yet in the C++ standard – because they are real, available in shipping products, and affect how you design your software today.

This will be a broad and deep dive. At C++ and Beyond 2011, the attendees (audience!) included some of the world’s leading experts on parallelism and compilers. At these sessions of C&B 2012, I expect anyone who wasn’t personally at the SG1 meeting this month, even world-class experts, will learn something new in these talks. I certainly did, and that’s why I’m motivated to turn the information into talks and share. This isn’t just cool stuff – it’s important and useful in production code today.

I hope to see many of you at C&B 2012. I’m excited about these topics, and about Scott’s and Andrei’s new material – you just can’t get this stuff anywhere else.

Asheville is going to be blast. I can’t wait.

Herb

P.S.: I haven’t seen this much attention and investment in C++ since last century – C++ conferences at record numbers, C++ compiler investments by the biggest companies in the industry (e.g., Clang), and much more that we’ve seen already…

… and a little bird tells me there’s a lot more major C++ news coming this year. Stay tuned, and fasten your seat belts. 2012 ain’t done yet, not by a long shot, and I’ll be able to say more about C++ as a whole (besides the specific topics mentioned above) for the first time at C&B in August. I hope to see you there.

FYI, C&B is already over 60% full, and early bird registration ends this Friday, June 1 – so register today.

Reminder: Win8 tablet apps and VC++

Day-before reminder: If you are interested in tablet apps using VC++, check out the livestream starting at 9am U.S. Pacific time tomorrow, or come back later to watch the talks on demand.

Sutter’s Mill

Want to know how to write cool tablet apps using Visual C++?

On May 18, Microsoft is hosting a one-day free technical event for developers who want to write Metro apps for Windows 8 using Visual C++. I’m giving the opening talk, and the rest of the day is full of useful technical information on everything from XAML and DirectX to networking and VC++ compiler flags.

Please note: This event is Windows-specific, and talks will use both portable ISO C++ as well as Visual C++-specific libraries and compiler extensions; for brevity these are being referred to as "C++" to highlight to a Microsoft-specific audience that that this day is about Visual C++, not about Visual C# or Visual Basic or JavaScript.

From the page:

Join the Microsoft Visual C++ and Windows teams in Redmond on May 18, 2012 for a free, all-day event focused on building Windows 8 Metro…

View original post 275 more words