We spend most of our scalability lives inside a triangular box, shown in Figure 1. It reminds me of the early days of flight: We try to lift ourselves away from the rough ground of zero scalability and fly as close as possible to the cloud ceiling of linear speedup. Normally, the Holy Grail of parallel scalability is to linearly use P processors or cores to complete some work almost P times faster, up to some hopefully high number of cores before our code becomes bound on memory or I/O or something else that means diminishing returns for adding more cores. As Figure 1 illustrates, the traditional shape of our "success" curve lies inside the triangle.
Sometimes, however, we can equip our performance plane with extra tools and safely break through the linear ceiling into the superlinear stratosphere. So the question is: "Under what circumstances can we use P cores to do work more than P times faster?" There are two main ways to enter that rarefied realm:
- Do disproportionately less work.
- Harness disproportionately more resources.
This month and next, we’ll consider situations and techniques that fall into one or both of these categories. …
|July 2007||The Pillars of Concurrency|
|August 2007||How Much Scalability Do You Have or Need?|
|September 2007||Use Critical Sections (Preferably Locks) to Eliminate Races|
|October 2007||Apply Critical Sections Consistently|
|November 2007||Avoid Calling Unknown Code While Inside a Critical Section|
|December 2007||Use Lock Hierarchies to Avoid Deadlock|
|January 2008||Break Amdahl’s Law!|
|February 2008||Going Superlinear|