Quad-core a "waste of electricity"?

Jeff Atwood wrote:

In my opinion, quad-core CPUs are still a waste of electricity unless you’re putting them in a server. Four cores on the desktop is great for bragging rights and mathematical superiority (yep, 4 > 2), but those four cores provide almost no benchmarkable improvement in the type of applications most people use. Including software development tools.

Really? You must not be using the right tools. :-) For example, here are three I’m familiar with:

image Visual C++ 2008’s /MP flag tells the compiler to compile files in the same project in parallel. I typically get linear speedups on the compile phase. The link phase is still sequential, but on most projects compilation dominates.
imageSince Visual Studio 2005 we’ve supported parallel project builds in Batch Build mode, where you can build multiple subprojects in parallel (e.g., compile your release and debug builds in parallel), though that feature didn’t let you compile multiple files in the same project in parallel. (As I’ve blogged about before, Visual C++ 2005 actually already shipped with the /MP feature, but it was undocumented.)
image Excel 2007 does parallel recalculation. Assuming the spreadsheet is large and doesn’t just contain sequential dependencies between cells, it usually scales linearly up to at least 8 cores (the most I heard that was tested before shipping). I’m told that customers who are working on big financial spreadsheets love it.
imageAnd need I mention games? (This is just a snarky comment… Jeff already correctly noted that “rendering, encoding, or scientific applications” are often scalable today.)

And of course, even if you’re having a terrible day and not a single one of your applications can use more than one core, you can still see real improvement on CPU-intensive multi-application workloads on a multicore machine today, such as by being able to run other foreground applications at full speed while encoding a movie in the background.

Granted, as I’ve said before, we do need to see examples of manycore (e.g., >10 cores) exploiting mainstream applications (e.g., something your dad might use). But it’s overreaching to claim that there are no multicore (e.g., <10 cores) exploiting applications at all, not even development tools. We may not yet have achieved the mainstream manycore killer app, but it isn’t like we have nothing to show at all. We have started out on the road that will take us there.

Usability: Watch out for those non-errors that start with “ER”

Today I had a nice lesson in transaction codes. I did a happy little online transaction, and then the confirmation screen came up with what at first glance looked like an error. It startled me, until I read more closely:

Thank you. Your transaction has been placed and received by SuperMondoCorp.

Transaction Confirmation Number: ER6661234567

“Yikes!” thought I to myself, thought I. Then, “oh, the bolded confirmation number just starts with ER which only looks like ERR.” (And yes, the rest of the number did start with 666. I only altered the other numbers.)

I realize you can’t anticipate everything, but it is a reminder about usability. If the thing you draw the customer’s eye to on a confirmation screen can start with what looks like a negative confirmation, it’s not the greatest thing.

Effective Concurrency: Interrupt Politely

The latest Effective Concurrency column, “Interrupt Politely”, just went live on DDJ’s site, and will also appear in the print magazine. From the article:

ec10-tbl1 Violence isn’t the answer.

We want to be able to stop a running thread or task when we discover that we no longer need or want to finish it. As we saw in the last two columns, in a simple parallel search we can stop other workers once one finds a match, and when speculatively running two alternative algorithms to compute the same result we can stop the longer-running one once the first finds a result. [1,2] Stopping threads or tasks lets us reclaim their resources, including locks, and apply them to other work.

But how do you stop a thread or task you longer need or want? Table 1 summarizes the four main ways, and how they are supported on several major platforms. Let’s consider them in turn. …

I hope you enjoy it.
 
Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):
July 2007 The Pillars of Concurrency
August 2007 How Much Scalability Do You Have or Need?
September 2007 Use Critical Sections (Preferably Locks) to Eliminate Races
October 2007 Apply Critical Sections Consistently
November 2007 Avoid Calling Unknown Code While Inside a Critical Section
December 2007 Use Lock Hierarchies to Avoid Deadlock
January 2008 Break Amdahl’s Law!
February 2008 Going Superlinear
March 2008 Super Linearity and the Bigger Machine
April 2008 Interrupt Politely

Cringe not: Vectors are guaranteed to be contiguous

Andy Koenig is the expert’s expert, and I rarely disagree with him. And, well, when I do disagree I’m invariably wrong… but there’s a first time for everything, so I’ll take my chances one more time.

I completely agree with the overall sentiment of Andy’s blog entry today:

I spend a fair amount of time reading (and sometimes responding to) questions in the C++ newsgroups. Every once in a while, someone asks a question that makes me cringe.

What makes a question cringe-worthy?

Usually it is a question that implies that the person asking it is trying to do something inappropriate.

Asking how to violate programming-language abstractions is similar: If you have to ask, you probably shouldn’t be doing it.

Amen! Bravo! Absolutely correct. Great stuff. Except that the example in question isn’t violating an abstraction:

For example, I just saw one such question: Are the elements of a std::vector contiguous? Here is why that question made me cringe.

Every C++ container is part of an abstraction that includes several companion iterators. The normal way of accessing a container’s elements is through such an iterator.

I can think of only one reason why one should care whether the elements of a vector are in continuous memory, and that is if you intend to use pointers, rather than iterators, to access those elements. Doing so, of course, violates the abstraction.

There is nothing wrong per se with violating abstractions: As Robert Dewar told me more years ago than I care to remember, some programs are poorly designed on purpose. However, there is something wrong with violating abstractions when you know so little of the data structures used to implement those abstractions that you have to ask strangers on Usenet about your proposed violation. To put it more bluntly: If you have to ask whether vector elements are contiguous, you probably should not be trying to make use of that knowledge.

The reason this analysis isn’t quite fair is that contiguity is in fact part of the vector abstraction. It’s so important, in fact, that when it was discovered that the C++98 standard didn’t completely guarantee contiguity, the C++03 standard was amended to explicitly add the guarantee.

Why is it so important that vectors be contiguous? Because that’s what you need to guarantee that a vector is layout-compatible with a C array, and therefore we have no reason not to use vector as a superior and type-safe alternative to arrays even when we need to exchange data with C code. So vector is our gateway to other languages and most operating systems features, whose lingua franca is the venerable C array.

And it’s not just vector: The TR1 and C++0x std::array, which implements fixed-size arrays, is also guaranteed to be contiguous for the same reasons. (std::array is available in Boost and, ahem, the VC++ TR1 implementation we shipped today.)

So why do people continually ask whether the elements of a std::vector (or std::array) are stored contiguously? The most likely reason is that they want to know if they can cough up pointers to the internals to share the data, either to read or to write, with other code that deals in C arrays. That’s a valid use, and one important enough to guarantee in the standard.

Visual C++ 2008 Feature Pack now available

Back in November, I reported that we’d be shipping Visual C++ 2008 that month (we did!) and that we’d soon thereafter be doing the “agile thing” and shipping a major update mere months later, instead of waiting two years between releases per our prior tradition. I wrote:

The update is expected to be available in beta form in January 2008, and to ship in the first half of 2008. Enjoy!

Well, it’s official: It’s available. Enjoy! Besides major updates to MFC for the latest Office/VS/Vista look-and-feel to support first-class native code development, it also includes most of TR1 (everything except C99 compatibility, and the special math functions that didn’t make it into C++0x):

TR1 (“Technical Report 1”) is a set of proposed additions to the C++0x standard.  Our implementation of TR1 contains a number of important features such as smart pointers, regular expression parsing, containers (tuple, array, unordered set, etc) and sophisticated random number generators.

More information on TR1 can be found at the sites below:

TR1 documentation

Channel 9: Digging into TR1

TR1 slide decks (recommended)

Enjoy, everyone – and thanks, team!

Notes

1. This feature pack requires Visual C++ 2008 Standard or above. The only VC08 edition it doesn’t work with is Express; we’ll support Express in a future release.

2. When I wrote that it would be available “in the first half of 2008,” a number of people seemed to automatically interpret that as code for “maybe around June 31.” We’re not always that bad at shipping, fortunately. :-)

3. Yes, I know that June has 30 days.

Trip Report: February/March 2008 ISO C++ Standards Meeting

[Updated Apr 3 to note automatic deduction of return type.]

The ISO C++ committee met in Bellevue, WA, USA on February 24 to March 1, 2008. Here’s a quick summary of what we did (with links to the relevant papers to read for more details), and information about upcoming meetings.

Lambda functions and closures (N2550)

For me, easily the biggest news of the meeting was that we voted lambda functions and closures into C++0x. I think this will make STL algorithms an order of magnitude more usable, and it will be a great boon to concurrent code where it’s important to be able to conveniently pass around a piece of code like an object, to be invoked wherever the program sees fit (e.g., on a worker thread).

C++ has always supported this via function objects, and lambdas/closures are merely syntactic sugar for writing function object. But, though “merely” a convenience, they are an incredibly powerful convenience for many reasons, including that they can be written right at the point of use instead of somewhere far away.

Example: Write collection to console

For example, let’s say you want to write each of a collection of Widgets to the console.

// Writing a collection to cout, in today’s C++, option 1:

for( vector<Widget>::iterator i = w.begin(); i != w.end(); ++i )
  cout << *i << ” “;

Or we can leverage that C++ already has a special-purpose ostream_iterator type that does what we want:

// Writing a collection to cout, in today’s C++, option 2:

copy( w.begin(), w.end(),
          ostream_iterator<const Widget>( cout, ” ” ) );

In C++0x, just use a lambda that writes the right function object on the fly:

// Writing a collection to cout, in C++0x:

for_each( w.begin(), w.end(),
                []( const Widget& w ) { cout << w << ” “; } );

(Usability note: The lambda version was the only one I wrote correctly the first time as I tried these examples on compilers to check them. ‘Nuff said. <tease type=”shameless”> Yes, that means I tried it on a compiler. No, I’m not making any product feature announcements about VC++ version 10. At least not right now. </tease>)

Example: Find element with Weight() > 100

For another example, let’s say you want to find an element of a collection of Widgets whose weight is greater than 100. Here’s what you might write today:

// Calling find_if using a functor, in today’s C++:

// outside the function, at namespace scope
class GreaterThan {
  int weight;
public:
  GreaterThan( int weight_ )
    : weight(weight_) { }
  bool operator()( const Widget& w ) {
    return w.Weight() > weight;
  }
};

// at point of use
find_if( w.begin(), w.end(), GreaterThan(100) );

At this point some people will point out that (a) we have C++98 standard binder helpers like bind2nd or (b) that we have Boost’s bind and lambda libraries. They don’t really help much here, at least not if you’re interested in having the code be readable and maintainable. If you doubt, try and see.

In C++0x, you can just write:

// Calling find_if using a lambda, in C++0x:

find_if( w.begin(), w.end(),
            []( Widget& w ) { return w.Weight() > 100; } );

Ah. Much better.

Most algorithms are loops… hmm…

In fact, every loop-like algorithm is now usable as a loop. Quick examples using std::for_each and std::transform:

for_each( v.begin(), v.end(), []( Widget& w )
{

  …
  … use or modify w …
  …
} );

transform( v.begin(), v.end(), output.begin(), []( Widget& w )
{
  …
  return SomeResultCalculatedFrom( w );
} );

Hmm. Who knows: As C++0x lambdas start to be supported in upcoming compilers, we may start getting more used to seeing “});” as the end of a loop body.

Concurrency teaser

Finally, want to pass a piece of code to be executed on a thread pool without tediously having to define a functor class out at namespace scope? Do it directly:

// Passing work to a thread pool, in C++0x:

mypool.run( [] { cout << “Hello there (from the pool)”; } );

Gnarly.

Other approved features

  • N2535 Namespace associations (inline namespace)
  • N2540 Inheriting constructors
  • N2541 New function declarator syntax
  • N2543 STL singly linked lists (forward_list)
  • N2544 Unrestricted unions
  • N2546 Removal of auto as a storage-class specifier
  • N2551 Variadic template versions of std::min, std::max, and std::minmax
  • N2554 Scoped allocator model
  • N2525 Allocator-specific swap and move behavior
  • N2547 Allow lock-free atomic<T> in signal handlers
  • N2555 Extended variadic template template parameters
  • N2559 Nesting exceptions (aka wrapped exceptions)

Next Meetings

Here are the next meetings of the ISO C++ standards committee, with links to meeting information where available.

The meetings are public, and if you’re in the area please feel free to drop by.

Concurrency Interview with DevX

I recently spent an hour on the phone to talk concurrency with DevX’s Alexa Weber Morales. Part 1 of that interview just went live on the web, and focuses mostly on what concurrency and parallelism are, how to take advantage of multicore chips, and whether concurrency will ever be really accessible to mainstream developers. The site seems to be having intermittent problems displaying the pages; just hit the link a few more times if it doesn’t work right away.

Disclaimer: I am not responsible for the article title (yikes! “whisperer”?! my goodness gracious) and Alexa’s intro blurb is way too kind. But it is true that it’s important for us-the-industry to bring concurrency to the mainstream in a grokkable way, as we have already successfully done with OO and GUIs in the past.

New Course Available: Effective Concurrency

Many of you have kindly sent mail about my Effective Concurrency columns and asking when there’ll be a course. Well, I’m happy to announce that the answer is: May 19-21, 2008.

Here’s the brief information (more details below):

3-Day Seminar: Effective Concurrency

May 19-21, 2008
Bellevue, WA, USA
Developed and taught by Herb Sutter

This course covers the fundamental tools that software developers need to write effective concurrent software for both single-core and multi-core/many-core machines. To use concurrency effectively, we must identify and solve four key challenges:

  • Leverage the ability to perform and manage work asynchronously
  • Build applications that naturally run faster on new hardware having more and more cores
  • Manage shared objects in memory effectively to avoid races and deadlocks
  • Engineer specifically for high performance

This seminar will equip attendees to reason correctly about concurrency requirements and tradeoffs, to migrate existing code bases to be concurrency-enabled, and to achieve key success factors for a concurrent programming project. Most code examples in the course can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, Java, .NET, pthreads, and the forthcoming ISO C++0x standard.

Note on class size limit and possible waitlist: There is a hard limit on attendance at this first one (really). But if the registration site says you’ll get waitlisted, don’t give up: Go ahead and sign up anyway because we may be able to put together a second installment of the seminar a week or two later if there’s enough interest.

Finally, here’s a summary of what we’ll cover during the three days.

Fundamentals

  • Define basic concurrency goals and requirements
  • Understand applications’ scalability needs
  • Key concurrency patterns

Isolation: Keep Work Separate

  • Running tasks in isolation and communicate via async messages
  • Integrating multiple messaging systems, including GUIs and sockets
  • Building responsive applications using background workers
  • Threads vs. thread pools

Scalability: Re-enable the Free Lunch

  • When and how to use more cores 
  • Exploiting parallelism in algorithms 
  • Exploiting parallelism in data structures 
  • Breaking the scalability barrier

Consistency: Don’t Corrupt Shared State

  • The many pitfalls of locks–deadlock, convoys, etc.
  • Locking best practices
  • Reducing the need for locking shared data
  • Safe lock-free coding patterns
  • Avoiding the pitfalls of general lock-free coding
  • Races and race-related effects

Migrating Existing Code Bases to Use Concurrency

Near-Future Tools and Features

High Performance Concurrency

  • Machine architecture and concurrency
  • Costs of fundamental operations, including locks, context switches, and system calls
  • Memory and cache effects
  • Data structures that support and undermine concurrency
  • Enabling linear and superlinear scaling

I hope to get to meet some of you here in the Seattle area!

Effective Concurrency: Super Linearity and the Bigger Machine

The latest Effective Concurrency column, "Super Linearity and the Bigger Machine", just went live on DDJ’s site, and will also appear in the print magazine. From the article:

ec09-fig2

There are two main ways to achieve superlinear scalability, or to use P processors to compute an answer more than P times faster…:

  • Do disproportionately less work.
  • Harness disproportionately more resources.

Last month, we focused on the first point by illustrating parallel search and how it naturally achieves superlinear speedups when matches are not distributed evenly because some workers get "rich" subranges and will find a match faster, which benefits the whole search because we can stop as soon as any worker finds a match.

This month, we’ll conclude examining the first point with a few more examples, and then consider how to achieve superlinear speedups by harnessing more resources—quite literally, running on a bigger machine without any change in the hardware. …

I hope you enjoy it.
Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):
July 2007 The Pillars of Concurrency
August 2007 How Much Scalability Do You Have or Need?
September 2007 Use Critical Sections (Preferably Locks) to Eliminate Races
October 2007 Apply Critical Sections Consistently
November 2007 Avoid Calling Unknown Code While Inside a Critical Section
December 2007 Use Lock Hierarchies to Avoid Deadlock
January 2008 Break Amdahl’s Law!
February 2008 Going Superlinear
March 2008 Super Linearity and the Bigger Machine

Stroustrup & Sutter: The Lyrics

Last week’s Stroustrup & Sutter on C++ was a huge amount of fun, and Bjarne and I want to thank everyone who came. It was a record-shattering year, and it’s great to see C++ clearly thriving and growing.

A lot of people requested the (modified) lyrics to the songs we performed (yes, if you missed the event, you missed live music by geeks — imagine, if you will). To those who were there: You can now find the song lyrics at the same web page we gave out that contains the course eval link and the updated slides link. Just go back and you’ll see them, as well as the slides for What Not to Code in the handouts zipfile. Enjoy.

Thanks again for coming, and we hope to see you again next time. (The response to the post-seminar eval question about “would you recommend this course to a colleague” was a humbling 100.0%. Wow. It’s not often I see a pie chart that’s a solid circle. Thank you, and we’re glad you enjoyed it!)