This month’s Effective Concurrency column, “Eliminate False Sharing”, is now live on DDJ’s website.
People keep writing asking me about my previous mentions of false sharing, even debating whether it’s really a problem. So this month I decided to treat it in depth, including:
- A compelling and realistic example where just changing a couple of lines to remove false sharing takes an algorithm from zero scaling to perfect scaling – even when many threads are merely doing reads. Hopefully after this nobody will argue that false sharing isn’t a problem. :-)
- How your performance monitoring and analysis tools do and/or don’t help you uncover the problem, and how to use them effectively to identify the culprit. Short answer: CPU activity monitors aren’t very helpful, but cycles-per-instruction (CPI) and cache miss rate measurements attributed to specific lines of source code are your friend.
- The two ways to correct the code: Reduce the frequency of writes to the too-popular cache line, or add padding to move other data off the line.
- Reusable code in C++ and C#, and a note about Java, that you can use to use padding (and alignment if available) to put frequently-updated objects on their own cache lines.
From the article:
In two previous articles I pointed out the performance issue of false sharing (aka cache line ping-ponging), where threads use different objects but those objects happen to be close enough in memory that they fall on the same cache line, and the cache system treats them as a single lump that is effectively protected by a hardware write lock that only one core can hold at a time. … It’s easy to see why the problem arises when multiple cores are writing to different parts of the same cache line… In practice, however, it can be even more common to encounter a reader thread using what it thinks is read-only data still getting throttled by a writer thread updating a different but nearby memory location…
A number of readers have asked for more information and examples on where false sharing arises and how to deal with it. … This month, let’s consider a concrete example that shows an algorithm in extremis due to false sharing distress, how to use tools to analyze the problem, and the two coding techniques we can use to eliminate false sharing trouble. …
I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:
The Pillars of Concurrency (Aug 2007)
How Much Scalability Do You Have or Need? (Sep 2007)
Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)
Apply Critical Sections Consistently (Nov 2007)
Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)
Use Lock Hierarchies to Avoid Deadlock (Jan 2008)
Break Amdahl’s Law! (Feb 2008)
Going Superlinear (Mar 2008)
Super Linearity and the Bigger Machine (Apr 2008)
Interrupt Politely (May 2008)
Maximize Locality, Minimize Contention (Jun 2008)
Choose Concurrency-Friendly Data Structures (Jul 2008)
The Many Faces of Deadlock (Aug 2008)
Lock-Free Code: A False Sense of Security (Sep 2008)
Writing Lock-Free Code: A Corrected Queue (Oct 2008)
Writing a Generalized Concurrent Queue (Nov 2008)
Understanding Parallel Performance (Dec 2008)
Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)
volatile vs. volatile (Feb 2009)
Sharing Is the Root of All Contention (Mar 2009)
Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)
“Use Thread Pools Correctly: Keep Tasks Short and Nonblocking” (Apr 2009)
“Eliminate False Sharing” (May 2009)
Last month, we marked the 40th anniversary of the famous