Effective Concurrency: Going Superlinear

The latest Effective Concurrency column, "Going Superlinear", just went live on DDJ’s site, and will also appear in the print magazine. From the article:

We spend most of our scalability lives inside a triangular box, shown in Figure 1. It reminds me of the early days of flight: We try to lift ourselves away from the rough ground of zero scalability and fly as close as possible to the cloud ceiling of linear speedup. Normally, the Holy Grail of parallel scalability is to linearly use P processors or cores to complete some work almost P times faster, up to some hopefully high number of cores before our code becomes bound on memory or I/O or something else that means diminishing returns for adding more cores. As Figure 1 illustrates, the traditional shape of our "success" curve lies inside the triangle.

Sometimes, however, we can equip our performance plane with extra tools and safely break through the linear ceiling into the superlinear stratosphere. So the question is: "Under what circumstances can we use P cores to do work more than P times faster?" There are two main ways to enter that rarefied realm:

Do disproportionately less work.

Harness disproportionately more resources.

This month and next, we’ll consider situations and techniques that fall into one or both of these categories. …

I hope you enjoy it.

Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):

July 2007	The Pillars of Concurrency
August 2007	How Much Scalability Do You Have or Need?
September 2007	Use Critical Sections (Preferably Locks) to Eliminate Races
October 2007	Apply Critical Sections Consistently
November 2007	Avoid Calling Unknown Code While Inside a Critical Section
December 2007	Use Lock Hierarchies to Avoid Deadlock
January 2008	Break Amdahl’s Law!
February 2008	Going Superlinear

What Not To Code

At Stroustrup & Sutter on C++ this March, one of my sessions will be on "What Not To Code" (submission link). The premise is to try something new I haven’t done before: A session dedicated to making over code nominated by you, the public, in the few weeks before the talk.

In return for your submission, here’s what I’ll do if your entry is selected:

Full notes: Whether or not you attend the S&S event, I’ll send you the full talk handouts — not only for your submission, but for all selected submissions — as your thank-you for participating.
Participation: If you’re there in the audience, you can participate in the makeover on stage (if you want to, your choice). Oh, and there might be an additional live prize.

I’ll select as many of the submissions as I can, and analyze/critique/improve them during the session, including talking about tradeoffs and alternatives that can make the code clearer, faster, simpler, and/or safer. As the talk blurb concludes:

… you will refresh your sense of elegance and beauty, not to mention old-fashioned performance and robustness and maintainability, so often lacking in the broken code littering today’s bleak postmodern corporate landscape.

So if you’ve seen a friend’s or coworker’s (or your own) code that could use a makeover, please nominate it anytime here. Thanks!

Many Books

When I walk into a Chapters or a Borders, seeing the many shelves of books often recalls the ancient writer’s words about quality vs. quantity, circa 1000 BC:

"To the making of many books there is no end."

So true. Yet that observation predates the printing press… and netnews… and now RSS.

(Yes, I’ve been thinking of managing-down my Google Reader subscriptions again…)

Stroustrup & Sutter on C++: March 3-4, 2008, in Santa Clara, CA, USA

I’m pleased to announce that Bjarne and I are going to have another two-day event co-located with SD West in San Jose, California, this March. Most of the talks are new ones we’ve never given publicly before, along with updated classics that people liked the best in the past. This year, three of my four talks have a strong emphasis on concurrency: making your application manycore-scalable, safe locking, and bleeding-fast lock-free coding.

SD graciously let me publish an extra-discount code for readers of this blog. Here it is… if you register before Feb 8, use the following code to get the early bird price (up to $300 off) plus an extra $50 off any package:

Discount code: 8WSUT (expires Feb 8)

Here’s a link to the conference site, and their summary:

Stroustrup & Sutter #4

Back with brand new and freshly updated content!

The preeminent super session on the C++ language is back at SD West—and full of new and updated material! Join Bjarne Stroustrup, C++ creator and original implementer, and Herb Sutter, C++ and concurrency guru, for two jam-packed days of new and completely updated courses. The seminar is filled with instructive, revealing and highly pragmatic material, and is structured with both talks and panels—not to mention liberal break times, so that the instructors and attendees can mix, eat and chat together.

In addition to lots of information you can use today, Herb and Bjarne will also reveal important, forward-looking information about what’s coming in the next version of the C++ Standard, C++0x, and related efforts. As key designers of several of the new core features, their personal experiences are invaluable.

Finally, for convenience, below is a cut-and-paste of the session topics and abstracts.

I look forward to seeing many of you in San Jose! Best wishes,

Herb

DAY 1: Monday, March 3

C++0x Overview

Bjarne Stroustrup

We now know the expected contents of the next C++ Standard, which is targeted to be feature-complete in mid-2008 with final text in 2009. This presentation articulates the main principles of the design of C++0x, outlines the ISO C++ standards process, summarizes the new features and libraries, and gives key examples using new features. Major features, such as concepts, the memory model, and major libraries (such as threads and regular expression matching) are covered by other tutorials, so they will be only briefly mentioned here. The focus of this presentation is the various "minor" features, such as the unified initializer syntax (including variable length initializer lists), decltype and the new form of auto, template aliases, nullptr, generalized constant expressions, "strong" enumerations, the new for statement, static assertions, and rvalue references. But a language is far more than a mere list of features: My aim is to show how these features fit together and fit with C++98 features to better support programming techniques. As ever, the ultimate aim of this language design is to allow clearer expression of real-world ideas, leading to better-performing and easier-to-maintain code. Even the "minor features" can significantly affect your programming style.

What Not to Code: Avoiding Bad Design Choices and Worse Implementations

Herb Sutter

Our reality show premise is simple: Friends and coworkers nominate code that they consider poorly "fashioned" and ask the show to make over the victim. Our Code Police swoop in with a deal: We’ll provide up to 15 minutes of public assistance in the form of amusing analyses and insightful improvements, discussing tradeoffs and alternatives that can make the code clearer, faster, simpler, and/or safer… but only on the condition that they allow our instructor (that’s Herb) to ruthlessly critique their existing code, and in some cases throw it out altogether, in front of a live studio audience (that’s you). As members of that interactive audience, you will refresh your sense of elegance and beauty, not to mention old-fashioned performance and robustness and maintainability, so often lacking in the broken code littering today’s bleak postmodern corporate landscape.

How to Design Good Interfaces

Bjarne Stroustrup

So: We have classes, derived classes, virtual bases, templates, const, overloading, exceptions, and a host of other useful language features. How do we use them to produce well performing maintainable code? All too often we get seduced into using powerful language features to write clever (i.e., complicated) code rather than to simplify our interfaces and to make the organization of our code easier to understand. This presentation is a tour of the most useful C++ features from the point of view of how they can be used to express the structure of code and to define interfaces that serve basic needs such as flexibility, early error detection, acceptable compile time, performance, decent error reporting, and maintainability.

How to Migrate C++ Code to the Manycore "Free Lunch"

Herb Sutter

For decades, we enjoyed the "free lunch" of seeing existing applications naturally run faster on new hardware with a faster single CPU core. Computation power is still growing, but in a fundamentally different direction — more and more cores. We can regain the free lunch, but only by building applications in new ways that correctly apply concurrency and parallelism to express lots of latent concurrency that can scale well to a given number of cores but avoids penalizing the application when running on older hardware with one or few cores. This talk addresses the question of how to design new code, and how to migrate existing code, to be multicore/manycore enabled. We will cover best practices for finding and exploiting parallelism in algorithms and data structures, avoiding data structures that harm concurrency, using thread pools and background tasks effectively, and ways to cheat (if not entirely avoid) the specter of Amdahl’s Law. Most code examples will be illustrated using draft standard C++0x syntax, but can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, .NET, and pthreads.

Grill the Experts: Ask Us Anything!

Bjarne Stroustrup and Herb Sutter

This is your opportunity to get "thought leader" answers to your favorite C++ questions! We strongly encourage you to submit your questions in advance, preferably by email or in writing at the beginning of the seminar. Audience questions will also be taken from the floor. Both instructors will answer as many questions as time permits.

DAY 2: Tuesday, March 4

"Best of Stroustrup & Sutter": Concepts and Generic Programming in C++0x

(Update of talk voted “Most Informative” at S&S 2007)
Bjarne Stroustrup

An updated version of the talk voted "Most Informative" at S&S 2007: C++ templates are immensely flexible and the basis of most modern C++ high-performance programming techniques and of many el
egant library designs. They are the key language feature behind the standard library’s algorithms and containers: the STL. However, they can also be tricky to use, cause spectacularly bad error messages when misused, and sometimes require unreasonable amounts of code to express apparently simple ideas. C++0x will address these issues directly, and the key to resolving the problems with templates without loss of flexibility or loss of performance is "concepts." Concepts provide a type system for C++ types and for combinations of C++ types and values. Thus, we are able to provide what feels a lot like conventional type checking for template arguments (including simple and elegant overloading based on template arguments). This presentation explains the notion of concepts and shows how to use concepts to write clearer and more robust generic code using templates. People who can’t wait for C++0x before trying out concepts (and other new C++0x features related to generic programming) can try the proof-of-concept implementation, ConceptGCC.

Safe Locking: Best Practices to Eliminate Race Conditions

Herb Sutter

From many core to web services, writing highly concurrent software is increasingly becoming a mainstream requirement. But how can we best manage shared state, specifically objects in shared memory? We need to chart a safe course between the Scylla of data corruption due to race conditions on one side, and the Charybdis of excessive contention and even deadlock or livelock on the other side. This talk covers these important problems, as well as the simplicity vs. scalability tradeoff and the composability conundrum. It focuses on solutions, from basics like scoped locks through correct use of lock hierarchies, disciplines to associate data with locks, essential guidelines for writing lock-safe code, and other important best practices. Most code examples will be illustrated using draft standard C++0x syntax, but can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, .NET, and pthreads.

Q&A: C++ Design and Evolution

Bjarne Stroustrup

This is a unique opportunity for a "fireside interview" with the creator of C++, moderated by Herb Sutter. After a brief introduction and opening thoughts, Bjarne Stroustrup will take all questions and share thoughtful observations on topics ranging from essential trends affecting software development across languages today, to observations on the strengths and applicability of existing and new languages, to the role of C++ in the 21st century, and more. Attendees are strongly encouraged to submit questions in advance.

Lock-Free Programming in C++—or How to Juggle Razor Blades

Herb Sutter

Concurrent programs increasingly face pressure to avoid locks altogether. This talk focuses on techniques we can sometimes apply to avoid the need for locking and its difficulties. We will cover many effective best practices, from ways to avoid or better manage shared state through to effective use of atomic operations for lock-free coding, including patterns like lock-free mailboxes, low-contention lazy initialization, internally versioned objects, and more. Most code examples will be illustrated using draft standard C++0x syntax and the C++0x memory model, but can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, .NET, and pthreads.

Discussion on Questions Raised During the Seminar

Herb Sutter and Bjarne Stroustrup

This panel is set aside for follow-up comments and discussion on issues that are raised during the seminar. During the other talks and panels, or during between-session chats, questions often come up that the instructors want to research. Some of the resulting information will be of general interest, and this final panel provides the needed convenient opportunity to promulgate it to everyone.

Newton on Tact

"Tact is the knack of making a point without making an enemy."

Effective Concurrency: Break Amdahl’s Law!

The latest Effective Concurrency column, "Break Amdahl’s Law!", just went live on DDJ’s site, and will also appear in the print magazine. From the article:

Back in 1967, Gene Amdahl famously pointed out what seemed like a fundamental limit to how fast you can make your concurrent code: Some amount of a program’s processing is fully "O(N)" parallelizable (call this portion p), and only that portion can scale directly on machines having more and more processor cores. The rest of the program’s work is "O(1)" sequential (s). [1,2] Assuming perfect use of all available cores and no parallelization overhead, Amdahl’s Law says that the best possible speedup of that program workload on a machine with N cores is given by

Note that, as N increases to infinity, the best speedup we can ever get is (s+p)/s. Figure 1 illustrates why a program that is half scalably parallelizable and half not won’t scale beyond a factor of two even given infinite cores. Some people find this depressing. They conclude that Amdahl’s Law means there’s no point in trying to write multicore- and manycore-exploiting applications except for a few "embarrassingly parallel" patterns and problem domains with essentially no sequential work at all.

Fortunately, they’re wrong. If Amdahl’s Game is rigged, well then, to paraphrase a line from the movie WarGames: The only way to win is not to play. …

I hope you enjoy it.

Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):

July 2007	The Pillars of Concurrency
August 2007	How Much Scalability Do You Have or Need?
September 2007	Use Critical Sections (Preferably Locks) to Eliminate Races
October 2007	Apply Critical Sections Consistently
November 2007	Avoid Calling Unknown Code While Inside a Critical Section
December 2007	Use Lock Hierarchies to Avoid Deadlock

GotW #88: A Candidate For the “Most Important const”

A friend recently asked me whether Example 1 below is legal, and if so what it means. It led to a nice discussion I thought I’d post here. Since it was in close to GotW style already, I thought I’d do another honorary one after all these years… no, I have not made a New Year’s Resolution to resume writing regular GotWs. :-)

JG Questions

Q1: Is the following code legal C++?

// Example 1

string f() { return "abc"; }

void g() {
const string& s = f();
cout << s << endl; // can we still use the "temporary" object?
}

A1: Yes.

This is a C++ feature… the code is valid and does exactly what it appears to do.

Normally, a temporary object lasts only until the end of the full expression in which it appears. However, C++ deliberately specifies that binding a temporary object to a reference to const on the stack lengthens the lifetime of the temporary to the lifetime of the reference itself, and thus avoids what would otherwise be a common dangling-reference error. In the example above, the temporary returned by f() lives until the closing curly brace. (Note this only applies to stack-based references. It doesn’t work for references that are members of objects.)

Does this work in practice? Yes, it works on all compilers I tried (except Digital Mars 8.50, so I sent a bug report to Walter to rattle his cage :-) and he quickly fixed it for the Digital Mars 8.51.0 beta).

Q2: What if we take out the const… is Example 2 still legal C++?

// Example 2

string f() { return "abc"; }

void g() {
string& s = f(); // still legal?
cout << s << endl;
}

A2: No.

The "const" is important. The first line is an error and the code won’t compile portably with this reference to non-const, because f() returns a temporary object (i.e., rvalue) and only lvalues can be bound to references to non-const.

Note: Visual C++ does allow Example 2 but emits a "nonstandard extension used" warning by default. A conforming C++ compiler can always allow otherwise-illegal C++ code to compile and give it some meaning — hey, it could choose to allow inline COBOL if some kooky compiler writer was willing to implement that extension, maybe after a few too many Tequilas. For some kinds of extensions the C++ standard requires that the compiler at least emit some diagnostic to say that the code isn’t valid ISO C++, as this compiler does.

I once heard Andrei Alexandrescu give a talk on ScopeGuard (invented by Petru Marginean) where he used this C++ feature and called it "the most important const I ever wrote." And this brings us to the Guru Question, which highlights the additional subtlety that Andrei’s code deftly leveraged…

Guru Question

Q3: When the reference goes out of scope, which destructor gets called?

A3: The same destructor that would be called for the temporary object. It’s just being delayed.

Corollary: You can take a const Base& to a Derived temporary and it will be destroyed without virtual dispatch on the destructor call.

This is nifty. Consider the following code:

// Example 3

Derived factory(); // construct a Derived object

void g() {
const Base& b = factory(); // calls Derived::Derived here
// … use b …
} // calls Derived::~Derived directly here — not Base::~Base + virtual dispatch!

Does this work in practice on real compilers? Yes: Every compiler I have access to calls the correct Derived destructor, including even ancient Borland 5.5 and Visual C++ 6.0 (and Digital Mars, though DM calls the destructor at the wrong time, as noted above).

Andrei leverages this subtlety (of course) in his ScopeGuard implementation to avoid making the implementation classes’ destructors virtual at all, which is okay in that case because those classes otherwise have no need for one.

Updates:

08.01.02 to emphasize the feature applies to stack-based references, and mention Walter’s fix for DM.
08.02.05 to clarify Petru Marginean invented ScopeGuard.