This month’s Effective Concurrency column is special — it turned into a feature-length article. (I don’t know whether it’ll officially be called a “feature” or a “column” in the print issue.) “Measuring Parallel Performance: Optimizing a Concurrent Queue” just went live on DDJ’s site, and will also appear in the print magazine.
From the article:
How would you write a fast, internally synchronized queue, one that callers can use without any explicit external locking or other synchronization? Let us count the ways…or four of them, at least, and compare their performance. We’ll start with a baseline program and then successively apply three optimization techniques, each time stopping to measure each change’s relative performance for queue items of different sizes to see how much each trick really bought us.
The goal of the article is to see how to measure and understand our code’s parallel performance and measure the actual effect of specific optimizations. Disclaimer: The goal of this article is not to write the fastest possible queue in the world (though it’s pretty good). I’ve already had plenty of email on recent queue-related columns from people who sent me their “faster” implementations; writing lock-free queues seems to be a popular indoor sport. Interestingly, for well of half of the ones I received, a 30-second glance at the code was enough to determine that the code had to be incorrect. Why? Because if it doesn’t do any synchronization on the shared variables — if there aren’t any locks, atomics, fences or other synchronization in the code — then it has races, which will manifest in practice even on forgiving platforms like x86/x64, and there’s no need to look further. (For more details, see the September 2008 column, Lock-Free Code: A False Sense of Security. Even some code submissions I received in response to that very article were broken for the same reasons shown in that article.)
I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns and the DDJ print magazine issue in which they first appeared:
The Pillars of Concurrency (Aug 2007)
How Much Scalability Do You Have or Need? (Sep 2007)
Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)
Apply Critical Sections Consistently (Nov 2007)
Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)
Use Lock Hierarchies to Avoid Deadlock (Jan 2008)
Break Amdahl’s Law! (Feb 2008)
Going Superlinear (Mar 2008)
Super Linearity and the Bigger Machine (Apr 2008)
Interrupt Politely (May 2008)
Maximize Locality, Minimize Contention (Jun 2008)
Choose Concurrency-Friendly Data Structures (Jul 2008)
The Many Faces of Deadlock (Aug 2008)
Lock-Free Code: A False Sense of Security (Sep 2008)
Writing Lock-Free Code: A Corrected Queue (Oct 2008)
Writing a Generalized Concurrent Queue (Nov 2008)
Understanding Parallel Performance (Dec 2008)
Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)