Effective Concurrency: Understanding Parallel Performance

Wow, DDJ just posted the previous one a few days ago, and already the next Effective Concurrency column is available: “Understanding Parallel Performance” just went live, and will also appear in the print magazine.

From the article:

Let’s say that we’ve slickly written our code to apply divide-and-conquer algorithms and concurrent data structures and parallel traversals and all our other cool tricks that make our code wonderfully scalable in theory. Question: How do we know how well we’ve actually succeeded? Do we really know, or did we just try a couple of tests on a quad-core that looked reasonable and call it good? What key factors must we measure to understand our code’s performance, and answer not only whether our code scales, but quantify how well under different circumstances and workloads? What costs of concurrency do we have to take into account?

This month, I’ll summarize some key issues we need to keep in mind to accurately analyze the real performance of our parallel code. I’ll list some basic considerations, and then some common costs. Next month, I have a treat in store: We’ll take some real code and apply these techniques to analyze its performance in detail as we successively apply a number of optimizations and measure how much each one actually buys us, under what conditions and in what directions, and why.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):

August 2007: The Pillars of Concurrency

September 2007: How Much Scalability Do You Have or Need?

October 2007: Use Critical Sections (Preferably Locks) to Eliminate Races

November 2007: Apply Critical Sections Consistently

December 2007: Avoid Calling Unknown Code While Inside a Critical Section

January 2007: Use Lock Hierarchies to Avoid Deadlock

February 2008: Break Amdahl’s Law!

March 2008: Going Superlinear

April 2008: Super Linearity and the Bigger Machine

May 2008: Interrupt Politely

June 2008: Maximize Locality, Minimize Contention

July 2008: Choose Concurrency-Friendly Data Structures

August 2008: The Many Faces of Deadlock

September 2008: Lock-Free Code: A False Sense of Security

October 2008: Writing Lock-Free Code: A Corrected Queue

November 2008: Writing a Generalized Concurrent Queue

December 2008: Understanding Parallel Performance

Effective Concurrency: Writing a Generalized Concurrent Queue

The next Effective Concurrency column, “Writing a Generalized Concurrent Queue”, just went live on DDJ’s site, and also appears in the print magazine.

From the article:

Last month [1], I showed code for a lock-free queue that supported the limited case of exactly two threads—one producer, and one consumer. That’s useful, but maybe not as exciting now that our first rush of lock-free coding glee has worn off. This month, let’s tackle the general problem of supporting multiple producers and multiple consumers with as much concurrency as possible. The code in this article uses four main design techniques: …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):

August 2007: The Pillars of Concurrency

September 2007: How Much Scalability Do You Have or Need?

October 2007: Use Critical Sections (Preferably Locks) to Eliminate Races

November 2007: Apply Critical Sections Consistently

December 2007: Avoid Calling Unknown Code While Inside a Critical Section

January 2007: Use Lock Hierarchies to Avoid Deadlock

February 2008: Break Amdahl’s Law!

March 2008: Going Superlinear

April 2008: Super Linearity and the Bigger Machine

May 2008: Interrupt Politely

June 2008: Maximize Locality, Minimize Contention

July 2008: Choose Concurrency-Friendly Data Structures

August 2008: The Many Faces of Deadlock

September 2008: Lock-Free Code: A False Sense of Security

October 2008: Writing Lock-Free Code: A Corrected Queue

November 2008: Writing a Generalized Concurrent Queue

September 2008 ISO C++ Standards Meeting: The Draft Has Landed, and a New Convener

The ISO C++ committee met in San Francisco, CA, on September 15-20. You can find the minutes here, including the votes to approve papers.

The most important thing the committee accomplished was this:

Complete C++0x draft published for international ballot

The biggest goal entering this meeting was to make C++0x feature-complete and stay on track to publish a complete public draft of C++0x for international review and comment — in ISO-speak, an official Committee Draft or CD. As I predicted in the summer, the committee achieved that at this meeting. Now the world will know the shape of C++0x in good detail. Here’s where to find it: The September C++0x working draft document is essentially the same as the September 2008 CD.

This is “it”, feature-complete C++0x, including the major feature of “concepts” which had its own extensive set of papers for language and library extensions — I’ll stop there, but there are still more concepts papers at the mailing page and some more still to come during the CD phase. (If you get the impression that concepts is a big feature, well, it is indeed easily the biggest addition we made in C++0x.)

What’s next? As I’ve mentioned before, we’re planning to have two rounds of international comment review. The first of two opportunities for national bodies to give their comments is now underway; the second round will probably be this time next year. The only changes expected to be made between that CD and the final International Standard are bug fixes and clarifications. It’s helpful to think of a CD as a feature-complete beta, and we’re on track to ship one more beta before the full release.

And a new convener

On a personal note, I’m very happy to see this accomplished at the last meeting during my convenership. I’ve now served as secretary and then convener (chair) of the ISO C++ committee for over 10 years, and my second three-year term as convener ended one week after the San Francisco meeting. A decade is enough; I decided not to volunteer for another term as chair.

As of a few weeks ago, P. J. Plauger is the new convener of ISO/IEC JTC1/SC22/WG21 (C++). Many of you will know P.J. (or Bill, as he’s known within the committee) from his long service to the C and C++ communities, including that he has been a past convener of the ISO C standards committee, past editor of the C/C++ Users Journal, the principal author of the Dinkumware implementation of the C++ standard library, and recipient of the 2004 Dr Dobb’s Journal Excellence in Programming Award, among various other qualifications and honors. He has been a regular participant at ISO C++ meetings for about as long as they’ve been held, and his long experience with both the technology and the ISO standards world will serve WG21 well.

I’m very happy to have been able to chair the committee during the development of C++0x. Now as we move from “develop mode” into “ship mode” it will be great to have his experienced hand guiding the committee through the final ISO process. Thanks for volunteering, Bill!

Stroustrup & Sutter on C++ 2008, Second Showing: October 30-31, 2008, in Boston, MA, USA

This spring at SD West in Santa Clara, Bjarne and I did a fresh-and-updated S&S event with lots of new material.

We don’t usually repeat the same material, but this time there’s been such demand that we agreed to do a repeat… four weeks from today, in Boston. More information and talk descriptions follow.

CONTENT ADVISORY

Again, usually our S&S events feature mostly new material, but this one is almost identical to the material we did in spring 2008 in Santa Clara. Bjarne is substituting one talk, and will present “C++ in Safety-Critical Systems” instead of his talk on C++’s design and evolution; and we’ll both be updating to the material to reflect the current state of the draft ISO C++0x standard. But otherwise it’ll be identical.

So:

If you missed our event this spring, here’s your second chance! It was our highest-rated S&S ever, and in the post-conference survey we asked the question “Would you recommend this course to a colleague?” and 100% said yes.
If you already attended this spring and came to all our sessions, you’ve seen nearly all this material already, but feel free to encourage a friend or colleague to attend who you think would benefit from the material.

The Talks

Wednesday, October 29, 2008

C++0x Overview (Bjarne Stroustrup)

Safe Locking: Best Practices to Eliminate Race Conditions (Herb Sutter)

How to Design Good Interfaces (Bjarne Stroustrup)

Lock-Free Programming in C++—or How to Juggle Razor Blades (Herb Sutter)

Grill the Experts: Ask Us Anything! (Bjarne Stroustrup & Herb Sutter)

Thursday, October 30, 2008

[“Best of Stroustrup & Sutter”] Update of talk voted “Most Informative” at S&S 2007: Concepts and Generic Programming in C++0x (Bjarne Stroustrup)

What Not to Code: Avoiding Bad Design Choices and Worse Implementations (Herb Sutter)

C++ in Safety-Critical Systems (Bjarne Stroustrup)

How to Migrate C++ Code to the Manycore “Free Lunch” (Herb Sutter)

Discussion on Questions Raised During the Seminar (Herb Sutter & Bjarne Stroustrup)

Registration

This two-day seminar is getting billing on two different conferences that are running at the same time in the Hynes Convention Center: SD Best Practices and the Embedded Systems Conference. S&S is technically part of both conferences, which means you can attend S&S via either one… either

and both ways will include options that include our two-day seminar.

I look forward to seeing many of you in Boston! Best wishes,

Herb

Data and Perspective

Even genuinely newsworthy topics can get distorted when commentators exaggerate or use data selectively. Here are two recent examples I noticed.

“This is the worst financial crisis since the Great Depression.” It’s true that it’s bad and even historic, and this sound bite correctly doesn’t actually claim it’s as bad as the Depression. I hope it doesn’t turn out to be in the same league as that; then, people were lining up at soup kitchens. For now, however, Apple is still on track to sell 10 million iPhone 3Gs this year, which says something.

“Yesterday [Monday, September 29, 2008] saw the worst single-day plunge in Dow Jones history.” “It’s a new Black Monday.” Well, these got my attention, because I remember Black Monday on October 19, 1987 very well. I was working in IT at a major bank, doing software application support for traders and related departments. When I went up to the trading floor that day, I immediately knew something was badly wrong because of the eerie sound as the elevator doors opened — a sound you never hear during trading hours, and certainly not from a room full of traders standing at their desks: silence.

Yesterday’s loss of 777 points was stunning as the largest single-day point loss in Dow history. But as a percentage loss that’s not even in the top 10 Bad Dow Days, all but two of which occurred before 1935. Those two since the Depression occurred on October 19 and 26, 1987, when the Dow lost 22.6% and then another 8% of its total value in single sessions, respectively #2 and #9 on All-Time Bad Dow Days list. For perspective, as of this writing the Dow is down 19.8% so far this entire year, and it surely hasn’t been a good year. Today’s crisis is already historic and could well get worse yet, of course, but some of us do remember some pretty bad ones in the past.

As good old Sam Clemens said (approximately), there are lies, darned lies, and statistics. Even when the statistics are true, always cross-check them for perspective. Even the best news and the worst news can be overstated, and viewing the same data from multiple angles helps ensure we understand it properly.

[Edited 9/30 to add: At 22.6%, Black Monday in 1987 was actually the worst Dow Day ever if you don’t count “reopening days” after unusual market closures when the markets catch up with events that happened while they were closed. So when was the all-time worst Dow Day ever? Perhaps surprisingly, the answer is not in 1929, though several of the all-time top 10 were in that year. Rather, it was December 12, 1914, when the Dow dropped 24.4% after the markets reopened after being closed entirely for over four months due to the outbreak of World War I. (The markets closed on July 30, two days after Austria-Hungary declared war and a day before Germany did.) That helps put in perspective just how bad October 1987 was, and of course that today’s crisis is also pretty bad even if it hasn’t beaten those prior records, yet.]

Ralph Johnson on Parallel Programming Patterns

A few days ago at UIUC, Ralph Johnson gave a very nice talk on “Parallel Programming Patterns.” It’s now online, and here’s the abstract:

Parallel programming is hard. One proposed solution is to provide a standard set of patterns. Learning the patterns would help people to become expert parallel programmers. The patterns would provide a vocabulary that would let programmers think about their programs at a higher level than the programming language. The patterns could steer programmers away from common errors and towards good design principles.

There have been a number of papers about parallel patterns, and one book Patterns for Parallel Programming. None of them have become popular. I think the problem is that parallel programming is diverse and requires more design expertise than traditional software design. Thus, parallel programming experts use more patterns than parallel programming expert. I’ll critique the existing patterns and explain what I think should be done to make a set of patterns that can be as effective for parallel programming as patterns have been for object-oriented design.

If Johnson’s name sounds familiar, it should: He’s one of the “Gang of Four” authors of the seminal book Design Patterns.

Recommended viewing.

Effective Concurrency Course: Sep 22-24, 2008

The first offering of the three-day Effective Concurrency course in May went very well. We’re doing it again later this month — this will be the last offering this year.

Here’s the brief information (more details below):

3-Day Seminar: Effective Concurrency

September 22-4, 2008
Bellevue, WA, USA
Developed and taught by Herb Sutter

This course covers the fundamental tools that software developers need to write effective concurrent software for both single-core and multi-core/many-core machines. To use concurrency effectively, we must identify and solve four key challenges:

Leverage the ability to perform and manage work asynchronously
Build applications that naturally run faster on new hardware having more and more cores
Manage shared objects in memory effectively to avoid races and deadlocks
Engineer specifically for high performance

This seminar will equip attendees to reason correctly about concurrency requirements and tradeoffs, to migrate existing code bases to be concurrency-enabled, and to achieve key success factors for a concurrent programming project. Most code examples in the course can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, Java, .NET, pthreads, and the forthcoming ISO C++0x standard.

Here’s a summary of what we’ll cover during the three days:

Fundamentals

Define basic concurrency goals and requirements
Understand applications’ scalability needs
Key concurrency patterns

Isolation: Keep Work Separate

Running tasks in isolation and communicate via async messages
Integrating multiple messaging systems, including GUIs and sockets
Building responsive applications using background workers
Threads vs. thread pools

Scalability: Re-enable the Free Lunch

When and how to use more cores
Exploiting parallelism in algorithms
Exploiting parallelism in data structures
Breaking the scalability barrier

Consistency: Don’t Corrupt Shared State

The many pitfalls of locks–deadlock, convoys, etc.
Locking best practices
Reducing the need for locking shared data
Safe lock-free coding patterns
Avoiding the pitfalls of general lock-free coding
Races and race-related effects

Migrating Existing Code Bases to Use Concurrency

Near-Future Tools and Features

High Performance Concurrency

Machine architecture and concurrency
Costs of fundamental operations, including locks, context switches, and system calls
Memory and cache effects
Data structures that support and undermine concurrency
Enabling linear and superlinear scaling

I hope to get to meet some of you in the Seattle area!

Anon on Data

The adage, quoted again this week by Bruce Schneier:

The plural of “anecdote” is not “data.”

But lest we enshrine raw data as holy in itself, another perspective:

And the plural of “datum” is not “proof.”

Embedded Multicore Development Webinar with Lee, Reinders, and Truchard

Last month, I was privileged to be part of a panel in a webinar on Embedded Multicore Development moderated by Richard Nass, Editor-in-Chief of Embedded Systems Design, Embedded.com, and the Embedded Systems Conferences. It’s online and available on demand.

I say “privileged” especially because of the stature of the other panelists. These distinguished gentlemen were:

Edward A. Lee, Professor and Chair of the EE&CS department at U.C. Berkeley, and known among other things for “The Problem with Threads”.
James Reinders, the chief concurrency evangelist at Intel and author of the Threading Building Blocks book.
James Truchard, the co-founder, President and CEO of National Instruments.

It was a fun discussion, and I hope informative.

Effective Concurrency: Lock-Free Code — A False Sense of Security

DDJ posted the next Effective Concurrency column a couple of weeks earlier than usual. You can find it here: “Lock-Free Code: A False Sense of Security”, just went live on DDJ’s site, and also appears in the print magazine.

This is a special column in a way, because I rarely critique someone else’s published code. However, mere weeks ago DDJ itself published an article with fundamentally broken code that tried to show how to write a simplified lock-free queue. I corresponded with the author, Petru Marginean, and his main reviewer, Andrei Alexandrescu, to discuss the problems, and they have patched the code somewhat and added a disclaimer to the article. But the issues need to be addressed, and so Petru kindly let me vivisect his code in public in this column (and the next, not yet available, which will show how to do it right).

From the article:

[Lock-free code is] hard even for experts. It’s easy to write lock-free code that appears to work, but it’s very difficult to write lock-free code that is correct and performs well. Even good magazines and refereed journals have published a substantial amount of lock-free code that was actually broken in subtle ways and needed correction.

To illustrate, let’s dissect some peer-reviewed lock-free code that was published here in DDJ just two months ago …

I hope you enjoy it. (Note: Yes, the title is a riff on Tom Cargill’s classic article “Exception Handling: A False Sense of Security”, one of Scott Meyers’ five “Most Important C++ Non-Book Publications…Ever”.)

Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):

August 2007	The Pillars of Concurrency
September 2007	How Much Scalability Do You Have or Need?
October 2007	Use Critical Sections (Preferably Locks) to Eliminate Races
November 2007	Apply Critical Sections Consistently
December 2007	Avoid Calling Unknown Code While Inside a Critical Section
January 2007	Use Lock Hierarchies to Avoid Deadlock
February 2008	Break Amdahl’s Law!
March 2008	Going Superlinear
April 2008	Super Linearity and the Bigger Machine
May 2008	Interrupt Politely
June 2008	Maximize Locality, Minimize Contention
July 2008	Choose Concurrency-Friendly Data Structures
August 2008	The Many Faces of Deadlock
September 2008	“Lock-Free Code: A False Sense of Security”

Sutter’s Mill

Effective Concurrency: Understanding Parallel Performance

Effective Concurrency: Writing a Generalized Concurrent Queue

September 2008 ISO C++ Standards Meeting: The Draft Has Landed, and a New Convener