Effective Concurrency: Understanding Parallel Performance

Wow, DDJ just posted the previous one a few days ago, and already the next Effective Concurrency column is available: “Understanding Parallel Performance” just went live, and will also appear in the print magazine.

From the article:

Let’s say that we’ve slickly written our code to apply divide-and-conquer algorithms and concurrent data structures and parallel traversals and all our other cool tricks that make our code wonderfully scalable in theory. Question: How do we know how well we’ve actually succeeded? Do we really know, or did we just try a couple of tests on a quad-core that looked reasonable and call it good? What key factors must we measure to understand our code’s performance, and answer not only whether our code scales, but quantify how well under different circumstances and workloads? What costs of concurrency do we have to take into account?

This month, I’ll summarize some key issues we need to keep in mind to accurately analyze the real performance of our parallel code. I’ll list some basic considerations, and then some common costs. Next month, I have a treat in store: We’ll take some real code and apply these techniques to analyze its performance in detail as we successively apply a number of optimizations and measure how much each one actually buys us, under what conditions and in what directions, and why.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):

August 2007: The Pillars of Concurrency

September 2007: How Much Scalability Do You Have or Need?

October 2007: Use Critical Sections (Preferably Locks) to Eliminate Races

November 2007: Apply Critical Sections Consistently

December 2007: Avoid Calling Unknown Code While Inside a Critical Section

January 2007: Use Lock Hierarchies to Avoid Deadlock

February 2008: Break Amdahl’s Law!

March 2008: Going Superlinear

April 2008: Super Linearity and the Bigger Machine

May 2008: Interrupt Politely

June 2008: Maximize Locality, Minimize Contention

July 2008: Choose Concurrency-Friendly Data Structures

August 2008: The Many Faces of Deadlock

September 2008: Lock-Free Code: A False Sense of Security

October 2008: Writing Lock-Free Code: A Corrected Queue

November 2008: Writing a Generalized Concurrent Queue

December 2008: Understanding Parallel Performance

2 thoughts on “Effective Concurrency: Understanding Parallel Performance

  1. Hello Herb,

    Nice series of articles. Per our previous email discussion I would like to suggest that a RING buffer is faster, simpler and gives you control over memory usage.

    In my testing, I ran a series of data object sizes (1, 10, 100, 200, 500, 1000) and graphed both the transactions per second and bytes per second.

    I posted my code to;

  2. Hi,

    I’ve read your interesting article about “Understanding parallel performance” in ddj journal.

    You said on your example 2 that it was a common mistake.. In some case, I wonder if it is so bad. Imagine that the cost for the 3 functions is about the same… If you run it with 2 cores, at least you know that the 3 threads will end at the same time. In comparison, with pools, the last thread will run on a core while the other does not run any job.

    Tell me if I am right ? I had to deal with this kind of problem, that’s why I think that example 2 is not so bad.

    // Example 2 (flawed): Naïve parallel code for MyApp 2.0?
    int NetSales() {
    // perform the subcomputations concurrently
    future wholesale = new thread( [] { CalcWholesale(); } );
    future retail = new thread( [] { CalcRetail(); } );
    future returns = new thread( [] { TotalReturns(); } );

    // now block for the results
    return wholesale.value() + retail.value() – returns.value();


    Christophe Bailly

Comments are closed.