Effective Concurrency: Eliminate False Sharing

This month’s Effective Concurrency column, “Eliminate False Sharing”, is now live on DDJ’s website.

People keep writing asking me about my previous mentions of false sharing, even debating whether it’s really a problem. So this month I decided to treat it in depth, including:

A compelling and realistic example where just changing a couple of lines to remove false sharing takes an algorithm from zero scaling to perfect scaling – even when many threads are merely doing reads. Hopefully after this nobody will argue that false sharing isn’t a problem. :-)
How your performance monitoring and analysis tools do and/or don’t help you uncover the problem, and how to use them effectively to identify the culprit. Short answer: CPU activity monitors aren’t very helpful, but cycles-per-instruction (CPI) and cache miss rate measurements attributed to specific lines of source code are your friend.
The two ways to correct the code: Reduce the frequency of writes to the too-popular cache line, or add padding to move other data off the line.
Reusable code in C++ and C#, and a note about Java, that you can use to use padding (and alignment if available) to put frequently-updated objects on their own cache lines.

From the article:

In two previous articles I pointed out the performance issue of false sharing (aka cache line ping-ponging), where threads use different objects but those objects happen to be close enough in memory that they fall on the same cache line, and the cache system treats them as a single lump that is effectively protected by a hardware write lock that only one core can hold at a time. … It’s easy to see why the problem arises when multiple cores are writing to different parts of the same cache line… In practice, however, it can be even more common to encounter a reader thread using what it thinks is read-only data still getting throttled by a writer thread updating a different but nearby memory location…

A number of readers have asked for more information and examples on where false sharing arises and how to deal with it. … This month, let’s consider a concrete example that shows an algorithm in extremis due to false sharing distress, how to use tools to analyze the problem, and the two coding techniques we can use to eliminate false sharing trouble. …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

“Use Thread Pools Correctly: Keep Tasks Short and Nonblocking” (Apr 2009)

“Eliminate False Sharing” (May 2009)

You Know When Your UI Needs Help When…

Seen at a gas station:

You know your UI has usability issues when people tape multiple signs on your gas pump to help people get through the intricate and error-prone process of purchasing fuel.

Why does the upper note exist? The trouble is that there’s a Debit button but not a Credit button, and so since Credit is the default, users need to remember to override the default before swiping the card. People will sometimes naturally forget the special step because there was originally no reminder that this needed to be done, and because most other pumps don’t work that way. One solution would be to print the information prominently near the card reader, thus standardizing the note and making it more visible. A better and simpler solution would be to do what most pumps do: Avoid the opportunity for forgetting to specify the right thing by having both buttons, Debit and Credit, and simply prompting the user to press one or the other after they swipe their card.

But the trouble behind the lower note is even more blatant and, frankly, inexcusable: Is there any good reason not to conditionally print “ENTER PIN” or “ENTER ZIP,” instead of just always printing “ENTER DATA” which is not only unclear but also one of the classic geeky words to avoid in a consumer-oriented UI?

Then again, I’m amazed that in this day and age I still see output like “1 item(s) purchased.” Apparently it’s still more important to write the programmer-friendly printf( “%d item(s)”, count ) than the user-friendly printf( “%d item%s”, count, (count==1 ? “” : “s”) ).

Effective Concurrency: Use Thread Pools Correctly – Keep Tasks Short and Nonblocking

This month’s Effective Concurrency column, “Use Thread Pools Correctly: Keep Tasks Short and Nonblocking”, is now live on DDJ’s website.

From the article:

… But the thread pool is a leaky abstraction. That is, the pool hides a lot of details from us, but to use it effectively we do need to be aware of some things a pool does under the covers so that we can avoid inadvertently hitting performance and correctness pitfalls. Here’s the summary up front:

1. Tasks should be small, but not too small, otherwise performance overheads will dominate.

2. Tasks should avoid blocking (waiting idly for other events, including inbound messages or contested locks), otherwise the pool won’t consistently utilize the hardware well — and, in the extreme worst case, the pool could even deadlock.

Let’s see why. …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

“Use Thread Pools Correctly: Keep Tasks Short and Nonblocking” (Apr 2009)

A Wryly Repurposed Quotation

In my travels, I recently came across this empty store with an almost-empty box beside the front door. As seen in Monterey, CA:

Evidently some character had also noticed the empty store with its empty box, and decided to do a little walk-by wry economic commentary via repurposed quotation. Zooming on the once-empty box:

New Dates for Effective Concurrency Seminar in Europe: May 27-29, Stockholm, Sweden

Now that I’m over the icky flu that forced me to postpone the seminar two weeks ago, I’m happy to say that we have new dates: Effective Concurrency (Europe) will be held on May 27-29, 2009, in Stockholm, Sweden. I’ll cover the following topics:

Fundamentals: Define basic concurrency goals and requirements • Understand applications’ scalability needs • Key concurrency patterns
Isolation — Keep work separate: Running tasks in isolation and communicate via async messages • Integrating multiple messaging systems, including GUIs and sockets • Building responsive applications using background workers • Threads vs. thread pools
Scalability — Re-enable the Free Lunch: When and how to use more cores • Exploiting parallelism in algorithms • Exploiting parallelism in data structures • Breaking the scalability barrier
Consistency — Don’t Corrupt Shared State: The many pitfalls of locks–deadlock, convoys, etc. • Locking best practices • Reducing the need for locking shared data • Safe lock-free coding patterns • Avoiding the pitfalls of general lock-free coding • Races and race-related effects
High Performance Concurrency: Machine architecture and concurrency • Costs of fundamental operations, including locks, context switches, and system calls • Memory and cache effects • Data structures that support and undermine concurrency • Enabling linear and superlinear scaling
Migrating Existing Code Bases to Use Concurrency
Near-Future Tools and Features

I hope to see some of you there!

RIP: SD Conferences

The latest casualties in the technical education world are the Software Development conferences – SD West, SD Best Practices, and Architecture & Design World – which are being discontinued effective immediately, making the SD West that was just held earlier this month the last of its kind. The conferences were run by the same company that ran some important but now-defunct magazines including C/C++ Users Journal and of course Software Development Magazine, as well as Dr. Dobb’s which still exists but recently went web-only.

SD, especially SD West, has been a mainstay of the programming conferences world since 1985. I’m going to miss these shows. They were a wonderful gathering place for talks and tutorials by A-list speakers covering many technologies, from C++ to Java to AJAX and more. More personally, Bjarne Stroustrup and I immensely enjoyed giving our two-day Stroustrup & Sutter seminar there several times in recent years, and skipped this year’s event in anticipation of doing a fresh one again next spring. Alas, it’s not to be (at least not with SD).

SD, you will be missed.

Effective Concurrency: Use Threads Correctly = Isolation + Asynchronous Messages

This month’s Effective Concurrency column, “Use Threads Correctly = Isolation + Asynchronous Messages”, is now live on DDJ’s website.

From the article:

Explicit threads are undisciplined. They need some structure to keep them in line. In this column, we’re going to see what that structure is, as we motivate and illustrate best practices for using threads — techniques that will make our concurrent code easier to write correctly and to reason about with confidence.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns and the DDJ print magazine issue in which they first appeared:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Postponed: Effective Concurrency Europe

Right now I should be at 40,000 feet somewhere over Baffin Island on my way to Stockholm for Effective Concurrency Europe, but instead I’m in bed with a fever that I’ve had since Wednesday night and still unable to talk. The organizer and I have been staying in touch with flu updates every few hours to see if lots of “TLC and OTC” (tender loving care and over-the-counter drugs) could get me well enough to fly and push through teaching for three days; that usually works, but this time it’s a really nasty one, and we have to postpone.

I’m sorry to have to disappoint this week and for any inconvenience this will surely cause some of you, and I hope that you’ll understand. The organizer has already emailed everyone who has registered to notify you of the cancellation of this week’s dates (this post partly serves as a backup to that email), and we’ll keep you posted as we reschedule. The response has exceeded expectations by a good margin, and I’m grateful for your strong interest and planning to take time away from your schedules to come and talk concurrency with me. I’m still very much looking forward to seeing you there in the near future. (And minus potentially spreading wicked germs.)

Best wishes to you and yours,

Herb

Free Training For Laid-Off Developers

Like many areas in the United States, Seattle has recently been hit with layoffs and downsizing in our industry. So it’s quite timely that Steve McConnell’s company Construx, in the Seattle area, is offering free training for laid-off software workers:

After listening to doom and gloom economic reports for the past few months, we decided we would try to do something to brighten our little corner of the world. Here’s our official press release about it:

Construx Software has designated 25% of its public seminar seats free of charge to software workers who have been laid off. Construx seminars help software professionals improve their technical and managerial skills. Seminar attendees will be more effective when they reenter the workforce. Construx hopes this program will help laid-off software workers reenter the workforce more quickly.

Please read the full article for more details.

I’m very pleased to report that this program will include my next U.S. Effective Concurrency seminar which will be held in Bellevue, WA on July 27-30, 2009. I’ll blog about that seminar again as we get closer, but it’s on the Construx calendar now.

I hope that this will be helpful to some people, and I look forward to seeing some of you in Bellevue.

Aside: Please Be Professional

One problem with offering things for free is that people don’t always value them. I was surprised to learn that a surprising number of people who’ve asked for and received this free admission at Construx have been no-shows.

The following should go without saying, but here it is: As a courtesy to Construx and to other attendees, please don’t sign up unless you intend to come, and if something comes up that prevents you from attending please notify Construx. Otherwise, you may be taking the place of someone who would have liked to attend in your place (seating is limited), and Construx is out the cost of food (they provide breakfast and lunch) and printed color binders and such that go to waste.

Other Free Materials

Most of you know that virtually all the articles and columns I’ve ever written have always been available for free on my own website or via magazine websites (modulo some link rot, sigh). These include all the Effective Concurrency articles, and my C++-focused “Guru of the Week” and magazine articles.

The only things I’ve written that aren’t legally freely available are the final texts of my books, including translations, to which the publishing company owns the copyright; but the books are based on the same freely-available English articles, and Addison-Wesley/Pearson has never had any issue with those staying available, so the basic material is all there.

But there’s news on the “final book contents” front, too, at least in English: Once my current book(s) on Effective Concurrency is done, Jim Hyslop and I plan to compile a book of the C++ Conversations articles we coauthored in C++ Report and C/C++ Users Journal (thanks again for your patience, Jim!) and I also plain to write a second edition of Exceptional C++ – and both books will be updated for C++0x, which is now feature-complete and undergoing its first round of international review. It’s quite interesting just how often using C++0x language and library features (from lambdas to shared_ptrs and concurrency) really help solve the old problems in even more elegant and robust ways… and I’m sure we’ll throw in a couple of new problems and solutions too. My goal is to post these books’ draft Items, perhaps in the form of blog entries, as we write them. But I’m also working with the publisher to see if perhaps we could even post the very final Items in that format with their permission. I’ll post more news about that as there’s something to say.

Effective Concurrency: Sharing Is the Root of All Contention

This month’s Effective Concurrency column, “Sharing Is the Root of All Contention”, is now live on DDJ’s website.

This article aims to address the root cause behind some frequently made assertions: Statements like “locks kill scalability” and “CAS kills scalability” are mostly true but focus on symptoms rather than causes; and others such as “reader/writer mutexes are great for scalability” at minimum need some heavy qualification.

From the article:

Quick: What fundamental principle do all of the following have in common?

Two children drawing with crayons.

Three customers at Starbucks adding cream to their coffee.

Four processes running on a CPU core.

Five threads updating an object protected by a mutex.

Six threads running on different processors updating the same lock-free queue.

Seven modules using the same reference-counted shared object.

Answer: In each case, multiple users share a resource that requires coordination because not everyone can use it simultaneously — and such sharing is the root cause of all resulting contention that will degrade everyone’s performance. (Note: The only kind of shared resource that doesn’t create contention is one that inherently requires no synchronization because all users can use it simultaneously without restriction, which means it doesn’t fall into the description above. For example, an immutable object can have its unchanging bits be read by any number of threads or processors at any time without synchronization.)

In this article, I’ll deliberately focus most of the examples on one important case, namely mutable (writable) shared objects in memory, which are an inherent bottleneck to scalability on multicore systems. But please don’t lose sight of the key point, namely that "sharing causes contention" is a general principle that is not limited to shared memory or even to computing.

The Inherent Costs of Sharing

Here’s the issue in one sentence: Sharing fundamentally requires waiting and demands answers to expensive questions. …

Aside: This will be in the first electronic-only Dr. Dobb’s Report, and I’m going to have to get used to Dobb’s being electronic-only. One consequence of this change that I just discovered today is that an article can go online hours after I correct the proofs. That may seem like it shouldn’t be surprising in the days of blogs, but a magazine isn’t (or at least wasn’t) like a blog. When I wrote for magazines like C++ Report in the late 1990s, the time between submitting the final column manuscript and seeing it in print was about two to three months, depending on the magazine. Since the mid-2000s with Dobb’s, it’s been only one month from proofs to print — and Dobb’s could have posted the online versions even sooner, but held them back to be semi-synced with the print magazine. Now that the print constraint has disappeared, apparently I can (and, today, did) start work on a Friday morning to find the Dobb’s proofs I corrected the night before already on the web. Nifty. (And thanks again, Deirdre!)

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns and the DDJ print magazine issue in which they first appeared:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)