Effective Concurrency: Design for Manycore Systems

This month’s Effective Concurrency column, Design for Manycore Systems, is now live on DDJ’s website.

From the article:

Why worry about “manycore” today?

Dual- and quad-core computers are obviously here to stay for mainstream desktops and notebooks. But do we really need to think about "many-core" systems if we’re building a typical mainstream application right now? I find that, to many developers, "many-core" systems still feel fairly remote, and not an immediate issue to think about as they’re working on their current product.

This column is about why it’s time right now for most of us to think about systems with lots of cores. In short: Software is the (only) gating factor; as that gate falls, hardware parallelism is coming more and sooner than many people yet believe. …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

Eliminate False Sharing (May 2009)

Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

The Power of “In Progress” (Jul 2009)

Design for Manycore Systems (Aug 2009)

Suggestions on improving C++ skills

Someone just asked me about getting more proficient in C++, and with their permission I thought I’d share the question and my answer in case it’s of broader interest to folks wanting to improve their C++ skills.

Here’s the question:

I need to take my C++ knowledge up a notch – or two. On a scale of 1-10 I’d consider my C++ knowledge a 5 and would like to get to 7-8 in the next 4, or so, months. As C++ is a large language my quandary is what subject areas, using what study materials, should I focus on to get myself to this next level.

Questions (assuming that I’m going to study approx 2 hrs/day for 4 months):

1. How would you suggest I structure such a course of study (subject areas, 5 of time for each) ?
1.1 If you were in my shoes, how would you proceed?
2. If I was going to study and hopefully know cold one of your books, which do you think I should select?

Here’s my response, which I didn’t limit to my own books. These are the first two “next books” books I would recommend to anyone who already knew the basics of the language and wanted to improve their proficiency with using C++ in production code:

If you’re not already familiar with Scott Meyers’ Effective C++, the current third edition is the plate to start (it’s substantially reworked since the first and second editions).

After that, I’d say the most important is C++ Coding Standards which I wrote with Andrei Alexandrescu. Not only does it cover the top 100 things we felt were important to say about using C++ in production code, and had its guidance peer-reviewed in advance by a who’s-who of the C++ community more than any other C++ book I’ve ever heard of, but it also includes references in each Item for where to find more in-depth treatment. Most Items are just one or two pages, so if you know that Item already you can use it as a refresher; and if you discover ones that you’re less familiar with or rusty on the details, it tells you where to go to find out more.

I’d say those are the two most important that I can suggest. Best wishes with your studies!

Trip Report: Exit Concepts, Final ISO C++ Draft in ~18 Months

A week ago, I attended the summer ISO C++ meeting in Frankfurt, Germany. The C++ committee made a lot of progress on addressing national body comments on the full committee draft published last year, and is well on the way to publishing a second and final CD this winter with a final draft international standard a year after that. To meet that schedule, the committee decided to defer a major feature, “concepts”, and not include it in this standard.

I’m surprised at some of the commentary I’ve been reading on the web about the deferral of concepts, including not just technical inaccuracies but also that their absence in C++0x is somehow a big deal for C++ as a language or is a major geopolitical firestorm. It may feel like a big deal because of the work that’s gone into concepts and the large number of pages of standardese that were needed to specify the feature, and there was a lot of discussion about important design details including heavy traffic on the committee reflectors this spring and early summer that probably made them look more controversial than they are.

But my opinion is that concepts’ presence or absence in C++0x just won’t make much difference to most users. Let me give some reasons why I feel that’s so, and maybe debunk some common misconceptions along the way, in the form of an unofficial FAQ. Note that I was never involved in the design of concepts and don’t speak for the designers, whose views might differ here and there; the following are observations from my point of view as a participant in the standards body (including in Frankfurt last week), and chair of the ISO C++ committee during the entire time concepts were being developed up to the meeting they were voted into the ISO C++0x working draft last fall.

Q: Bjarne Stroustrup is the creator of C++ and a primary designer of concepts. What’s his take?

A: Great question. See Bjarne’s coverage here, published earlier today in Dr. Dobb’s.

Q: Were concepts removed because of political reason X, Y, or Z?

A: No. Concepts were removed because of normal project management considerations. The basic constraint has been phrased in many ways, all of them hitting at the same underlying tension among time, scope, cost, and quality. Here are two examples: “You can pick what goes on the train, or you can pick when the train leaves the station, but you can’t pick both.” “You can have good, fast, and cheap – pick any two.”

Fundamentally, the committee members felt that concepts are desirable but many (though not all) believed that they were still in design mode and needed more “bake time” – a formal poll of committee members in Frankfurt put the expected time at between three and four years – to get up to the quality of a feature that belongs in an international standard. Therefore, the two major options, and the ones that people generally argued for, were to either:

Prioritize (bigger) scope. Wait for concepts, and accept a delay to the rest of the standard including features that are already finalized and ready to ship.
Prioritize (faster) schedule. Finish and ship the standard to get already-done features into users’ hands, and decouple concepts to continue work on them for a future standard.

The committee decided to prioritize schedule and ship C++0x sooner while still leaving the door wide open to also adopt concepts later.

Q: Isn’t removing concepts a lot of work? Concepts are used pervasively in the C++0x CD published last fall.

A: Yes, but it’s less work than you’d think. We won’t have a de-conceptized working draft for the post-meeting mailing, two weeks after the meeting, but should have one soon after that. (I would be remiss not to add: Thanks to our always-hardworking project editor, Pete Becker!)

Recall that, until last September, we had a complete draft of C++0x features specified entirely without the use of concepts. And many of the uses of concepts in the draft standard library were just replacements of preexisting non-concepts wording or implicit requirements (e.g., DefaultConstructible, MemberContainer), and will just revert to their previous wording or disappear back into implicitness.

The main C++0x feature that was always specified in terms of concepts was the new range-based for loop (e.g., for( x : collection ) ). I’ve seen some wonder what we’ll do about that, but it’s already done; the non-concepts text for that feature was already voted in at the same Frankfurt meeting.

Q: Wasn’t this C++0x’s one big feature?

A: No. Concepts would be great, but for most users, the presence or absence of concepts will make no difference to their experience with C++0x except for quality of error messages (see below).

C++0x is still a major revision with many new major features, including improvements to templates (e.g., variadic templates, template aliases). C++’s other major new features include lambda functions (which unfortunately for Java were rejected for Java 7), move semantics (aka rvalue references), a concurrency memory model, threading and atomics libraries, initializer lists, and more. Other handy new features include delegating constructors, inherited constructors, defaulted and deleted functions, explicit virtual overriding control, alignment support, static assertions, and more. Not to mention convenience features like range-based for loops and “auto” type inference that are small but will certainly get a lot of use (as they do in C# and Java).

A number of those features are available now or soon in real compilers, including Gnu gcc, Visual C++ 2010, and Intel C++.

Q: Aren’t concepts about adding major new expressive power to the language, and so enable major new kinds of programs or programming styles?

A: Not really. Concepts are almost entirely about getting better error messages.

Yes, concepts would add one truly new expressive capability to the language, namely the ability to overload on concepts. That’s inconvenient to simulate without language support. Concept-based overloading would have been used in a handful of places in the standard library. But that’s it.

By far the most visible benefit of concepts lies in clearer template error messages, including the ability to do separate checking – confirming that a template’s implementation type-checks correctly with all possible valid types without having to instantiate it, and confirming that a given type can be used to instantiate a template without knowing the template’s implementation.

Q: Weren’t concepts all about bringing templates into the 21st century? Where does this leave templates?

A: Templates are still king of the genericity hill, for all their faults in the area of opaque compiler messages.

It’s important to remember that, in 1994, C++ was the only major language whose type genericity capabilities were strong enough to create the Standard Template Library (STL). Today, 15 years later, that is still true; you can’t express the STL’s containers-algorithms design separation well, or at all, using generics facilities in Ada, Java, .NET, or any other significant commercial or research language that I know of; as we learned when doing STL.NET, you can do the containers well with other generics, but not the orthogonalization with algorithms that is the heart of STL design style. Concepts or not, that hasn’t changed.

So my personal view is that templates have been in the 21st century since about 1994. No other language has yet caught up to their expressive power. And C++0x is adding some further power to templates, in particular by adding variadic templates and template aliases, both of which will help to simplify template code.

Effective Concurrency: The Power of “In Progress”

This month’s Effective Concurrency column, The Power of “In Progress”, is now live on DDJ’s website.

From the article:

Don’t let a long-running operation take hostages. When some work that takes a long time to complete holds exclusive access to one or more popular shared resources, such as a thread or a mutex that controls access to some popular shared data, it can block other work that needs the same resource. Forcing that other work to wait its turn hurts both throughput and responsiveness.

…

Let’s look at the question from another angle, suggested by my colleague Terry Crowley: Instead of viewing partially-updated state with in-progress work as an ‘unfortunate’ special case to remember and recover from, what if we simply embrace it as the normal case? …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

Eliminate False Sharing (May 2009)

Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

The Power of “In Progress” (Jul 2009)

Answering email about error handling in concurrent code

Someone emailed me today asking:

I’m writing because I’m somewhat conscious of what I would consider a rather large hole in the parallel programming literature.

… What if one or more of your tasks throws an exception? Should the thread that runs the task swallow it? Should the caught exceptions get stashed somewhere so that the "parent" thread can deal with them once the tasks are complete? (This is somewhat tricky currently in a language such as C++(98) where one cannot store an exception caught with the "catch(…)" construct). Perhaps all tasks should have a no-throw guarantee? Perhaps some kind of asynchronous error handlers might be installed, somewhat like POSIX signals? The options are many, but choosing a strategy is hard for those of us with little parallel programming experience.

I thought I’d share my response here:

That’s an excellent question. Someone asked that very question in Stockholm last month at my Effective Concurrency course, and my answer started out somewhat dismissive: "Well, it’s about the same as you do in sequential code, and all the same guarantees apply; nothrow/nofail is only for a few key functions used for commit/rollback operations, and you’d usually target the basic guarantee unless adding the strong guarantee comes along naturally for near-free. So it’s pretty much the same as always. Although, well, of course futures may transport exceptions across threads, but that’s still the same because they manifest on .get(). And of course for parallel loops you may get multiple concurrent exceptions from multiple concurrent loop bodies that get aggregated into a single exception; and then there’s the question of whether you start new loop bodies that haven’t started yet (usually no) but do you interrupt loop bodies that are in progress (probably not), and… oh, hmm, yeah, I guess it would be good to write an article about that."

So the above is now adding to my notes of things to write about. :-) Maybe some of that stream-of-consciousness may be helpful until I can get to writing it up in more detail.

I pointed him to Doug Lea’s Concurrent Programming in Java pages 161-176, "Dealing with Failure", adding that I haven’t read it in detail but the subtopics look right. Also Joe Duffy’s Concurrent Programming on Windows, pages 721-733.

If you know of a good standalone treatise focused on error handling in concurrent code, please mention it in the comments.

Effective Concurrency: Break Up and Interleave Work to Keep Threads Responsive

This month’s Effective Concurrency column, “Break Up and Interleave Work to Keep Threads Responsive”, is now live on DDJ’s website.

Sorry for the long title; suggestions welcome. I always try to word the title to make it (a) short, (b) active, and (c) advice, but sometimes I’ll settle for two of those, or just one, until a better suggestion comes along.

From the article:

What happens when this thread must remain responsive to new incoming messages that have to be handled quickly, even when we’re in the middle of servicing an earlier lower-priority message that may take a long time to process?

If all the messages must be handled on this same thread, then we have a problem. Fortunately, we also have two good solutions, both of which follow the same basic strategy: Somehow break apart the large piece of work to allow the thread to perform other work in between, interleaved between the chunks of the large item. Let’s consider the two major ways to implement that interleaving, and their respective tradeoffs in the areas of fairness and performance.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

Eliminate False Sharing (May 2009)

Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

Truth In Spam

This afternoon I was just finishing up my next Effective Concurrency article (it’ll be up in a few days), when some spam email arrived. Just as my fingers’ auto-delete macro was about to fire, I noticed something odd about the name of the attachment and did a double-take:

Cool! There must be some kind of new truth-in-advertising laws for spammers.

Yes, I know that as programmers we could argue about naming all day long. We could point out that maybe “virusLoader.gif” or “exploit_exploit_muhaha.gif” would be a little better, and argue about the relative merits of camel case and underscores. But there’s no need; I think “runnable.gif” is short, clear, and definitely good enough. (Evidently someone else thought so too, and just shipped it.)

Dress Re-Hearsal?

An amusing hearse, seen on a neighborhood street:

Here’s a close-up of the license plate:

Made my morning.

VS2010 Beta 1 Now Available

For those of you who are interested in using or trying Microsoft development tools, I’m happy to report that Visual Studio 2010 Beta 1 is now available.

If you’re interested in:

concurrency and parallel computing, check out the new concurrency runtime (ConcRT) that implements efficient work stealing for scalable code, the Asynchronous Agents Library and the Parallel Patterns Library (PPL) for C++, and the Task Parallel Library (TPL) and Parallel LINQ (PLINQ) for .NET. And some nifty concurrency debugging and profiling tools, too.
C++0x features, check out the lambda functions (my fave), move-semantic rvalue references, the new auto, its good friend decltype, and the ever-helpful static_assert.
functional programming in VS, enjoy the F# programming language.

Remember this is just a beta and not intended for production use, but there’s a lot of cool stuff to play around with and it should run fine side-by-side with VS2008. Feedback via forums or filing a bug/suggestion is always appreciated. Enjoy!

Effective Concurrency: Eliminate False Sharing

This month’s Effective Concurrency column, “Eliminate False Sharing”, is now live on DDJ’s website.

People keep writing asking me about my previous mentions of false sharing, even debating whether it’s really a problem. So this month I decided to treat it in depth, including:

A compelling and realistic example where just changing a couple of lines to remove false sharing takes an algorithm from zero scaling to perfect scaling – even when many threads are merely doing reads. Hopefully after this nobody will argue that false sharing isn’t a problem. :-)
How your performance monitoring and analysis tools do and/or don’t help you uncover the problem, and how to use them effectively to identify the culprit. Short answer: CPU activity monitors aren’t very helpful, but cycles-per-instruction (CPI) and cache miss rate measurements attributed to specific lines of source code are your friend.
The two ways to correct the code: Reduce the frequency of writes to the too-popular cache line, or add padding to move other data off the line.
Reusable code in C++ and C#, and a note about Java, that you can use to use padding (and alignment if available) to put frequently-updated objects on their own cache lines.

From the article:

In two previous articles I pointed out the performance issue of false sharing (aka cache line ping-ponging), where threads use different objects but those objects happen to be close enough in memory that they fall on the same cache line, and the cache system treats them as a single lump that is effectively protected by a hardware write lock that only one core can hold at a time. … It’s easy to see why the problem arises when multiple cores are writing to different parts of the same cache line… In practice, however, it can be even more common to encounter a reader thread using what it thinks is read-only data still getting throttled by a writer thread updating a different but nearby memory location…

A number of readers have asked for more information and examples on where false sharing arises and how to deal with it. … This month, let’s consider a concrete example that shows an algorithm in extremis due to false sharing distress, how to use tools to analyze the problem, and the two coding techniques we can use to eliminate false sharing trouble. …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

“Use Thread Pools Correctly: Keep Tasks Short and Nonblocking” (Apr 2009)

“Eliminate False Sharing” (May 2009)