Effective Concurrency Course: Sep 22-24, 2008

The first offering of the three-day Effective Concurrency course in May went very well. We’re doing it again later this month — this will be the last offering this year.

Here’s the brief information (more details below):

3-Day Seminar: Effective Concurrency

September 22-4, 2008
Bellevue, WA, USA
Developed and taught by Herb Sutter

This course covers the fundamental tools that software developers need to write effective concurrent software for both single-core and multi-core/many-core machines. To use concurrency effectively, we must identify and solve four key challenges:

  • Leverage the ability to perform and manage work asynchronously
  • Build applications that naturally run faster on new hardware having more and more cores
  • Manage shared objects in memory effectively to avoid races and deadlocks
  • Engineer specifically for high performance

This seminar will equip attendees to reason correctly about concurrency requirements and tradeoffs, to migrate existing code bases to be concurrency-enabled, and to achieve key success factors for a concurrent programming project. Most code examples in the course can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, Java, .NET, pthreads, and the forthcoming ISO C++0x standard.

Here’s a summary of what we’ll cover during the three days:

Fundamentals

  • Define basic concurrency goals and requirements
  • Understand applications’ scalability needs
  • Key concurrency patterns

Isolation: Keep Work Separate

  • Running tasks in isolation and communicate via async messages
  • Integrating multiple messaging systems, including GUIs and sockets
  • Building responsive applications using background workers
  • Threads vs. thread pools

Scalability: Re-enable the Free Lunch

  • When and how to use more cores 
  • Exploiting parallelism in algorithms 
  • Exploiting parallelism in data structures 
  • Breaking the scalability barrier

Consistency: Don’t Corrupt Shared State

  • The many pitfalls of locks–deadlock, convoys, etc.
  • Locking best practices
  • Reducing the need for locking shared data
  • Safe lock-free coding patterns
  • Avoiding the pitfalls of general lock-free coding
  • Races and race-related effects

Migrating Existing Code Bases to Use Concurrency

Near-Future Tools and Features

High Performance Concurrency

  • Machine architecture and concurrency
  • Costs of fundamental operations, including locks, context switches, and system calls
  • Memory and cache effects
  • Data structures that support and undermine concurrency
  • Enabling linear and superlinear scaling

I hope to get to meet some of you in the Seattle area!

Embedded Multicore Development Webinar with Lee, Reinders, and Truchard

Last month, I was privileged to be part of a panel in a webinar on Embedded Multicore Development moderated by Richard Nass, Editor-in-Chief of Embedded Systems Design, Embedded.com, and the Embedded Systems Conferences. It’s online and available on demand.

I say “privileged” especially because of the stature of the other panelists. These distinguished gentlemen were:

It was a fun discussion, and I hope informative.

Effective Concurrency: Lock-Free Code — A False Sense of Security

DDJ posted the next Effective Concurrency column a couple of weeks earlier than usual. You can find it here: “Lock-Free Code: A False Sense of Security”, just went live on DDJ’s site, and also appears in the print magazine.
 
This is a special column in a way, because I rarely critique someone else’s published code. However, mere weeks ago DDJ itself published an article with fundamentally broken code that tried to show how to write a simplified lock-free queue. I corresponded with the author, Petru Marginean, and his main reviewer, Andrei Alexandrescu, to discuss the problems, and they have patched the code somewhat and added a disclaimer to the article. But the issues need to be addressed, and so Petru kindly let me vivisect his code in public in this column (and the next, not yet available, which will show how to do it right).
 
From the article:

[Lock-free code is] hard even for experts. It’s easy to write lock-free code that appears to work, but it’s very difficult to write lock-free code that is correct and performs well. Even good magazines and refereed journals have published a substantial amount of lock-free code that was actually broken in subtle ways and needed correction.

To illustrate, let’s dissect some peer-reviewed lock-free code that was published here in DDJ just two months ago …

I hope you enjoy it. (Note: Yes, the title is a riff on Tom Cargill’s classic article “Exception Handling: A False Sense of Security”, one of Scott Meyers’ five “Most Important C++ Non-Book Publications…Ever”.)
 
Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):
August 2007 The Pillars of Concurrency
September 2007 How Much Scalability Do You Have or Need?
October 2007 Use Critical Sections (Preferably Locks) to Eliminate Races
November 2007 Apply Critical Sections Consistently
December 2007 Avoid Calling Unknown Code While Inside a Critical Section
January 2007 Use Lock Hierarchies to Avoid Deadlock
February 2008 Break Amdahl’s Law!
March 2008 Going Superlinear
April 2008 Super Linearity and the Bigger Machine
May 2008 Interrupt Politely
June 2008 Maximize Locality, Minimize Contention
July 2008 Choose Concurrency-Friendly Data Structures
August 2008 The Many Faces of Deadlock
September 2008 “Lock-Free Code: A False Sense of Security”

Server Concurrency != Client Concurrency

Today I received an email that asked:

I have recently come across your excellent articles on concurrency and the changes in software writing paradigm. They make a lot of sense, but I am having trouble translating them to my world of Telecom oriented web services, where practically everything is run through a DBMS. It seems to me we get everything “free”, simply by using an inherently concurrent multi-everything beast such as that :-) .

Could you please share your thoughts on the issue in one of your coming blog entries? It seems to me nowadays most complex systems would take advantage of a DBMS, certainly any application that is internet based, telecom oriented, or enterprise level. Be it in C++, Java, or PHP and its ilk, using a DBMS – often as a sort of message queue – is one of the best practices that ensures parallelism.

imageSure. At right is a slide I give in talks that summarizes the answer to this question, and I’ve addressed this and other similar issues in an ACM Queue article.

The problem with taking advantage of multicore/manycore hardware isn’t (as much) on the server, it’s on the client. When experienced people say things like, “but the concurrency problem is already solved, we’ve been building scalable software for years,” that’s server or niche-client application people talking. That kind of laid-back sound bite sure isn’t coming from mainstream client application developers.

Typical Server Workloads

On the server:

  • Workloads typically already have lots of inherent concurrency (000s or 000,000s of incoming requests for web/dbms/etc. operations), and it’s easy to launch independent requests concurrently.
  • Shared data is typically inside highly structured relational databases where we have decades of experience with automatic concurrency control. The DBMS itself knows how to optimistically run transactions in parallel and control conflicts by escalating row locks to page locks, index locks to table locks, and so on.
  • The programming model is typically transactions, and all the programmer has to know about is to write “begin transaction; /* … read/write whatever stuff you feel the need to, then … */ end transaction;” which is about as sweet as it gets.

We already know how to build somewhat scalable server apps. Sure, it’s still rocket science and takes expert knowledge to do well. But we generally know the rocket science, have experts who can implement it with repeatable success, and have regularly scheduled missions to the “scalable servers” space station. With some care, we can say that the “concurrency problem is already solved” here.

Typical Client Workloads

The world is very different for typical mainstream client applications (i.e., I’m not talking about Photoshop and a handful of others), where:

  • Workloads don’t have lots of inherent concurrency. The user clicks one button, and we have to figure out how to divide the work and recombine it in order to get the answer faster on many cores.
  • Shared data is typically in unstructured pointer-chasing graphs of objects in shared memory that require explicit concurrency control. Note that “unstructured” doesn’t mean there’s no structure — of course there’s some — but it’s a gloriously diverse pile of objects and containers, and more like an organically growing shantytown with unplanned twisty little alleys and passages, than the nice rectangular downtown city street plan of a nice rectangular database table.
  • The programming model to protect shared data is to use error-prone explicit locks. You have to remember which locks protect which data, and not to acquire them in inconsistent and deadlock-prone orders.

We’re still discovering and productizing the rocket science here. You could say that the tools like OpenMP that we do have now are still at the V-2 stage — they have limited applicability, are somewhat fussy, and don’t always land where you aim them.

But we’re working on it. Up-and-coming tools like Threading Building Blocks are like the Mercury and Venera missions, setting out to reach successively higher goals and repeatability… and we’re starting to see what are perhaps Apollo– and ISS-class missions in the form of PLINQ, the Task Parallel Library, and one for native C++ we’ll be announcing in October at PDC. In part, these tools are trying to see how much we can make client workloads look more like server workloads, notably in providing a transaction-oriented programming model. For example, transactional memory is an area of active research that would let us write “begin transaction; /* … read/write whatever memory variables you feel the need to, then … */ end transaction;”, and if successful it could eventually replace many or even most existing uses of locks.

We have rightly celebrated some successful ‘manned’ flights with client products like Photoshop (parallel rendering) and Excel (parallel recalc) that scale to a number of cores. We’re on the road to, but still working toward, establishing the infrastructure and technology base to enable regularly scheduled commercial flights/shipments of scalable client applications that “light up” on multicore/manycore machines.

Effective Concurrency: The Many Faces of Deadlock

The latest Effective Concurrency column, “The Many Faces of Deadlock”, just went live on DDJ’s site, and also appears in the print magazine. From the article:

… That’s the classic deadlock example from college. Of course, two isn’t a magic number. An improved definition of deadlock is: “When N threads enter a locking cycle where each tries to take a lock the next already holds.”

“But wait,” someone might say. “I once had a deadlock just like the code you just showed, but it didn’t involve locks at all—it involved messages.” …
I hope you enjoy it.
 
Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):
August 2007 The Pillars of Concurrency
September 2007 How Much Scalability Do You Have or Need?
October 2007 Use Critical Sections (Preferably Locks) to Eliminate Races
November 2007 Apply Critical Sections Consistently
December 2007 Avoid Calling Unknown Code While Inside a Critical Section
January 2007 Use Lock Hierarchies to Avoid Deadlock
February 2008 Break Amdahl’s Law!
March 2008 Going Superlinear
April 2008 Super Linearity and the Bigger Machine
May 2008 Interrupt Politely
June 2008 Maximize Locality, Minimize Contention
July 2008 Choose Concurrency-Friendly Data Structures
August 2008 The Many Faces of Deadlock

Constructor Exceptions in C++, C#, and Java

I just received the following question, whose answer is the same in C++, C#, and Java.

Question: In the following code, why isn’t the destructor/disposer ever called to clean up the Widget when the constructor emits an exception? You can entertain this question in your mainstream language of choice:

// C++ (an edited version of the original code
// that the reader emailed me)
class Gadget {
   Widget* w;

public:
   Gadget() {
     w = new Widget();
     throw new exception();
     // … or some API call might throw
   };

   ~Gadget() {
      if( w != nullptr ) delete w;
   }
};

// C# (equivalent)
class Gadget : IDisposable {
   private Widget w;

   public Gadget() {
     w = new Widget();
     throw new Exception();
     // … or some API call might throw
   };

   public void Dispose() {
     // … eliding other mechanics, eventually calls:
     if( w != null ) w.Dispose();
     // or just “w = null;” if Widget isn’t IDisposable
   }
};

// Java (equivalent)
class Gadget {
   private Widget w;

   public Gadget() {
     w = new Widget();
     throw new Exception();
     // … or some API call might throw
   };

   public void dispose() {
     if( w != null ) w.dispose();
     // or just “w = null;” if Widget isn’t disposable

   }
};

Interestingly, we can answer this even without knowing the calling code… but there is typical calling code that is similar in all three languages that reinforces the answer.

Construction and Lifetime

The fundamental principles behind the answer is the same in C++, C#, and Java:

  1. A constructor conceptually turns a suitably sized chunk of raw memory into an object that obeys its invariants. An object’s lifetime doesn’t begin until its constructor completes successfully. If a constructor ends by throwing an exception, that means it never finished creating the object and setting up its invariants — and at the point the exceptional constructor exits, the object not only doesn’t exist, but never existed.
  2. A destructor/disposer conceptually turns an object back into raw memory. Therefore, just like all other nonprivate methods, destructors/disposers assume as a precondition that “this” object is actually a valid object and that its invariants hold. Hence, destructors/disposers only run on successfully constructed objects.

I’ve covered some of these concepts and consequences before in GotW #66, “Constructor Failures,” which appeared in updated and expanded form as Items 17 and 18 of More Exceptional C++.

In particular, if Gadget’s constructor throws, it means that the Gadget object wasn’t created and never existed. So there’s nothing to destroy or dispose: The destructor/disposer not only isn’t needed to run, but it can’t run because it doesn’t have a valid object to run against.

Incidentally, it also means that the Gadget constructor isn’t exception-safe, because it and only it can clean up resources it might leak. In the C++ version, the usual simple way to write the code correctly is to change the w member’s type from Widget* to shared_ptr<Widget> or an equivalent smart pointer type that owns the object. But let’s leave that aside for now to explore the more general issue.

A Look At the Calling Code

Next, let’s see how these semantics are actually enforced, whether by language rules or by convention, on the calling side in each language. Here are the major idiomatic ways in each language to use an Gadget object in an exception-safe way:

// C++ caller

{
  Gadget myGadget;
  // do something with myGadget
}

// C# caller

using( Gadget myGadget = new Gadget() )
{
  // do something with myGadget
}

// Java caller

Gadget myGadget = new Gadget();

try {
  // do something with myGadget
}
finally {
  myGadget.dispose();
}

Consider the two cases where an exception can occur:

  • What if an exception is thrown while using myGadget — that is, during “do something with myGadget”? In all cases, we know that myGadget’s destructor/dispose method is guaranteed to be called.
  • But what if an exception is thrown while constructing myGadget? Now in all cases the answer is that the destructor/dispose method is guaranteed not to be called.

Put another way, we can say for all cases: The destructor/dispose is guaranteed to be run if and only if the constructor completed successfully.

Another Look At the Destructor/Dispose Code

Finally, let’s return to each key line of code one more time:

// C++
      if( w != nullptr ) delete w;
// C#
     if( w != null ) w.Dispose();
// Java
     if( w != null ) w.dispose();

The motivation for the nullness tests in the original example was to clean up partly-constructed objects. That motivation is suspect in principle — it means the constructors aren’t exception-safe because only they can clean up after themselves — and as we’ve seen it’s flawed in practice because the destructors/disposers won’t ever run on the code paths that the original motivation cared about. So we don’t need the nullness tests for that reason, although you might still have nullness tests in destructors/disposers to handle ‘optional’ parts of an object where a valid object might hold a null pointer or reference member during its lifetime.

In this particular example, we can observe that the nullness tests are actually unnecessary, because w will always be non-null if the object was constructed successfully. There is no (legitimate) way that you can call the destructor/disposer (Furthermore, C++ developers will know that the test is unnecessary for a second reason: Delete is a no-op if the pointer passed to it is null, so there’s never a need to check for that special case.)

Conclusion

When it comes to object lifetimes, all OO languages are more alike than different. Object and resource lifetime matters, whether or not you have a managed language, garbage collection (finalizers are not destructors/disposers!), templates, generics, or any other fancy bell or whistle layered on top of the basic humble class.

The same basic issues arise everywhere… the code you write to deal with them is just spelled using different language features and/or idioms in the different languages, that’s all. We can have lively discussions about which language offers the easiest/best/greenest/lowest-carbon-footprint way to express how we want to deal with a particular aspect of object construction, lifetime, and teardown, but those concerns are general and fundamental no matter which favorite language you’re using to write your code this week.

Research Firms Are Good At Research, Not Technology Predictions

This story has been picked up semi-widely since last night. I’m sure this Steven Prentice they quote is a fine (Gartner) Fellow, but really:

The computer mouse is set to die out in the next five years and will be usurped by touch screens and facial recognition, analysts believe.

Seriously, does anyone who uses computers daily really believe this kind of prediction just because someone at Gartner says so? Dude, sanity check: 1. What functions do you use your mouse for? 2. How many of those functions can be done by pointing at your screen or smiling at the camera: a) at all; and b) with equivalent high precision and low arm fatigue? Of course the mouse, including direct equivalents like the touchpad/trackpad, will be replaced someday. But to notice that people like to turn and shake their Wii controllers and iPhones and then make the leap to conclude that this will replace mice outright in the short term seems pretty thin even for Gartner.

When you read a report from Gartner, Forrester, IDC and their brethren research firms, remember that you’re either getting real-world data (aka research) or a single analyst’s personal predictions (aka crystal-ball gazing). Research firms are good at what they’re good at, namely research:

  • They’re “decent” at compiling current industry market data. Grade: A.
  • They’re “pretty okay” when they limit themselves to simple short-term extrapolation of that data, such as two-year projections of cost changes of high-speed networking in Canada or cell phone penetration in India. Grade: A-.

But when they try bigger technology movement predictions like “X will replace Y in Z years” they average somewhere around “spotty,” and on their off days they dip down into “I think you forgot to sanity check that sound bite” territory. It’s a pity that some venture capitalists take the research analysts’ word as gospel. Reliability of technology shift predictions: D+.

Kindling

Two weeks ago, I broke down and bought a Kindle. I like it:

  • It’s a good and well-designed reader, and the experience is much better than the other e-book reading I’ve done before on phones and PDAs. I like how when you bookmark a page, you can see it… the corner of the page gets a little dog-ear. [1]
  • It’s got a nice e-paper screen that uses ambient light, not backlight, which makes it readable anywhere just like a printed page — it’s even better, not worse, in direct sunlight.
  • It’s light and thin and sturdy. Sure beats carrying three or four books on a trip.
  • It has great battery life. I’ve only charged it once so far, when I first received it… since then I’ve had it for 11 days and read a full book and a half, and it still has 75% of its first charge left. (It helps that I turn the wireless off unless I’m actively using it.)
  • Fast, free wireless everywhere in the U.S., without computers or WiFi.

But today, it transformed my reading experience.

This morning, I was unsuspectingly reading my feeds in Google Reader as usual, blissfully unaware that the way I read books was about to change. Among other articles, I noticed that Slashdot ran a book review of Inside Steve’s Brain (that’s Jobs, not Wozniak or Ballmer). The review made me want to read the book. That’s when the new reality started, because I was interested in the book now, and had time to start reading it now:

  • Normally, I would have ordered it from Amazon and waited for it to arrive. But what usually happens then is that the book arrives a week later, and when it gets here I don’t have time to start it right away or I don’t quite feel like that kind of book just at the moment… and it goes on a shelf, with a 70% probability of being picked up at some point in the future.
  • Today, I searched for the book title on my Kindle, clicked “Buy”, and a few minutes later started reading Inside Steve’s Brain while eating lunch. [2]

That convenience isn’t merely instant gratification, it’s transformative. I suspect I’m going to be reading even more books now, even though I have a few little nits with the device, such as that the next and previous page buttons are a little too easy to press in some positions.

In other news, the Kindle also supports reading blogs and newspapers and some web surfing, but those are less compelling for me because I tend to do those things in the context of work, which means I’m already sitting at a computer with a bigger color screen and full keyboard. Maybe someday I’ll do it on e-paper. Until then, just living inside a virtual bookstore is plenty for me. Kindle + the Amazon Kindle store = iPod + iTunes for books. [3]

Here’s a useful summary article on Kindle features from a prospective user’s point of view.

Notes

1. The first two books I downloaded? The Design of Everyday Things, which was interestingly apropros to read on a new device like the Kindle with its nice dog-ear feedback and other well-designed features, and Foundation, which I hadn’t read in ages.

2. And it cost less than half what the dead-tree version would (though the latter was hardcover).

3. Caveat: I’m not actually an iPod owner, and I hate how Apple keeps insisting on installing iTunes on my computer just because I have Safari or QuickTime installed (mutter grumble evil product-tying monopolies mutter grumble :-) ). But apparently everyone else loves them, and they have indeed changed their industry.

Hungarian Notation Is Clearly (Good|Bad)

A commenter asked:

thread_local X tlsX; ??

Herb, I hope you aren’t backtracking on Hungarian Notation now that you work for Microsoft. Say it aint so…

It ain’t so. Besides, Microsoft’s Framework Developer’s Guide prominently intones: “Do not use Hungarian notation.”

Warts like “tls” and “i” are about lifetime and usage, not type. Here “tls” denotes that each thread gets its own copy of the value that is constructed and destroyed once per thread that uses it (lifetime) and doesn’t need to be synchronized using mutexes (usage). As another example of usage warts, I’ll also use “i” for a variable that’s used as an index or for iteration — and given that the “i” convention goes back to before BASIC, we shouldn’t try to pin the Hungarian tail on that donkey.

Having said that, though, many people have variously decried and defended different forms of Hungarian, and you may notice a pattern… they’re mostly:

Today, “Hungarian” nearly always means Systems Hungarian. The main trouble with Systems Hungarian comes from trying to embed information about a variable’s type into the variable’s name by prepending an encoded wart like the venerable sz, pach, ul, and their ilk. Although potentially helpful in a weakly-typed language like C, that’s known to be brittle and the prefixes tend to turn into lies as variable types morph during maintenance. The warting systems also don’t extend well to user-defined types and templates.

I’ve railed against the limitations and perils of Hungarian with Andrei Alexandrescu in our book C++ Coding Standards, where I made sure it was stated right up front as part of “Item 0: Don’t sweat the small stuff (or, Know what not to standardize)”:

Example 3: Hungarian notation. Notations that incorporate type information in variable names have mixed utility in type-unsafe languages (notably C), are possible but have no benefits (only drawbacks) in object-oriented languages, and are impossible in generic programming. Therefore, no C++ coding standard should require Hungarian notation, though a C++ coding standard might legitimately choose to ban it.

and with Jim Hyslop in our article “Hungarian wartHogs”:

“… the compiler already knows much more than you do about an object’s type. Changing the variable’s name to embed type information adds little value and is in fact brittle. And if there ever was reason to use some Hungarian notation in C-style languages, which is debatable, there certainly remains no value when using type-safe languages.”

There’s an amusing real-world note later in that article. I’ll pick up where the Guru is saying:

“I recall only one time when Hungarian notation was useful on a project. … One of the programmers on the project was named Paul,” the Guru explained. “Several months into the project, while still struggling to grow a ponytail and build his report-writing module, he pointed out that Hungarian notation had helped him find a sense of identity, for he now knew what he was…” She paused.

I blinked. It took me about ten seconds, and then I shut my eyes and grimaced painfully. “Pointer to array of unsigned long,” I groaned.

She smiled, enjoying my pain. “True story,” she said.

It is indeed a true story. I worked with him in 1993. The joke was bad then, too, but at least using Hungarian was more defensible because the project’s code was written in C. I found it handy at the time, but now that I work in more strongly typed languages I find the type-embedding version of Hungarian more actively harmful than actually useful.

You might call me a Hungarian emigrant, now living happily as an expat out in eastern C++ near the Isle of Managed Languages.