Suggestion for “Required Viewing”: Machine Architecture Talk Online

Over the past 15 years or so that I’ve been giving software development talks, I’ve never had the chutzpah to suggest one of my own talks be considered "required viewing" for serious developers regardless of language or platform. But I’m going to suggest it now.

Two years ago, several highly experienced software architects I know (whose names many of you gentle readers would recognize) complained to me privately that "we shouldn’t let developers labor in ignorance" of how the enormous complexity of modern commodity computers affects our code’s performance and correctness. From that seed and others like it sprang this talk, now freely available online:

Section of a slide imageMachine Architecture: Things Your Programming Language Never Told You (117 minutes)

Video: Google video (recorded live on Sep 19, 2007)
Slides: PDF slides

Abstract: Programmers are routinely surprised at what simple code actually does and how expensive it can be, because so many of us are unaware of the increasing complexity of the machine on which the program actually runs. This talk examines the “real meanings” and “true costs” of the code we write and run especially on commodity and server systems, by delving into the performance effects of bandwidth vs. latency limitations, the ever-deepening memory hierarchy, the changing costs arising from the hardware concurrency explosion, memory model effects all the way from the compiler to the CPU to the chipset to the cache, and more — and what you can do about them.

Teaser: Would you be surprised to discover that only about 1% (one percent) of all the transistors on your modern CPU exist to ever compute anything? And that the other 99% (ninety-nine percent) of your CPU’s transistors are essentially dedicated to nothing but hiding memory latency? Those are round numbers, of course. But you get the idea…

So how do we cope with latency...?This is a talk I wish I’d been able to attend years ago. Consider making this required viewing for your team, including for new hires in software development roles. I guarantee it’ll be time well invested.

Here’s one suggestion: Roll your own training session. Arrange an extended lunch brownbag for your developers in a conference or training room, give each person a printout of the PDF slides to follow along and make notes, and display the video on the big screen. For extra benefit, leave a little time for group discussion, both during the session (the Pause feature is your friend) and afterwards ("how does this apply to our projects?"). Presto, instant training session — and you don’t even have to put your developers on a plane to attend a class somewhere, or arrange for a speaker to fly out to your site.

Finally, if you like the material and agree that it’s worthwhile, please help spread the word. For example, you can tell Slashdot by going to Submit Story. Or, if you prefer, you can tell Digg, or Reddit, or one of the others… you know the rest of the usual suspects.


Omit Needless Words (in C++)

In C++ as in life, some people tend to use more words than they need to. As Strunk and White put it: Omit needless words.

Here’s an example I saw again yesterday in a recent peer-reviewed online magazine article showing how to write some C++ code to solve a particular problem. There’s nothing wrong with the code I’m going to show; but it tries to use a technique to "save typing" while accomplishing the opposite because of being unaware of a clever little C++ feature.

I’ll rearrange the particular code I saw to disguise the example (lots of people do this, and it would be unfair to target one person). The code started something like this:

  typename T1,
  typename T2,
  typename T3,
  template <typename A1, typename A2 >class CreationPolicy,
  template <typename A1, typename A2 >class MemoryPolicy,
  template <typename A1, typename A2 >class SnarkPolicy,
  template <typename A1, typename A2 >class HumptyPolicy,
  typename SomeHelperType,
  typename StillAnotherType>
class MyClass

So far, so good; these words are needed for the purpose the author is trying to accomplish (it’s a heavy-duty template with enough type parameters to make Andrei proud).

Next comes the definition of MyClass, and some part of that may want to refer to "my type (this particular MyClass instantiation)." Now, it could spell it out as "MyClass< T1, T2, T3, CreationPolicy, MemoryPolicy, SnarkPolicy, HumptyPolicy, SomeHelperType, StillAnotherType >" — but, yuck, who wants to write all that every time?

In an attempt to avoid that verbosity, the class author next writes a convenience typedef:

  // Typedef — to save typing (?)
  typedef MyClass< T1, T2, T3, CreationPolicy, MemoryPolicy, SnarkPolicy, HumptyPolicy, SomeHelperType, StillAnotherType > ThisClass;

(Some of you may be about to interrupt our program with a "but why would he write that when he can just…" — right, but just wait for it.)

Then the code can go on to name "this particular MyClass instantiation" easily, say to pass "my type" to someone else:

  static bool SomeFunc()
    // instantiate some other template with "my type"
    vector<ThisClass> v;

    // call another static function of "my own type"

    … more code …

There are two cases of needless words here.

First, the typedef doesn’t save any writing on the call to "ThisClass::OtherFunc()," because that call doesn’t need any qualification at all. It could simply have been written as "OtherFunc();" since that call is already inside MyClass.

Second, and more to the point, the typedef is completely unnecessary in the first place. Why? Because inside a class template, the simple name of the class template automatically implies "this instantiation" including the full parameter list. That is, in the above example there is no difference between typing

MyClass< T1, T2, T3, CreationPolicy, MemoryPolicy, SnarkPolicy, HumptyPolicy, SomeHelperType, StillAnotherType >

and just typing


when inside the template.

So the typedef is completely unnecessary, and the original code:

  // Typedef — to save typing (?)
  typedef MyClass< T1, T2, T3, CreationPolicy, MemoryPolicy, SnarkPolicy, HumptyPolicy, SomeHelperType, StillAnotherType > ThisClass;

  static bool SomeFunc()
    // instantiate some other template with "my type"
    vector<ThisClass> v;

    // call another static function of "my own type"

    … more code …

could have been written simply (and a bit more clearly) without any typedef, using just the class template’s name or nothing at all:

  static bool SomeFunc()
    // instantiate some other template with "my type"
    vector<MyClass> v;

    // call another static function of "my own type"

    … more code …

Ironically, the typedef not only didn’t save typing, but actually added a little.

Sometimes less is more. Where possible, use fewer words.

Webcast (via Intel) on September 25

On Tuesday September 25, I’ll be doing the kickoff webcast in Intel’s fall 2007 developer "webinar" series. It’ll be closely based on a talk I’ve given before on "Software and the Concurrency Revolution," but I’m going to update-and-trim material to try to leave more time for some interactive discussion. Here’s the info:

Webcast: "The Concurrency Revolution"

Date: Tuesday, September 25, 2007
Time: 9:00am U.S. Pacific Time (click here for more time zones) *
Length: 1 hour

Although driven by the industry-wide shift to multi-core hardware architectures, concurrency is primarily a software revolution. We are now seeing the initial stages of the next major change in software development, as over the next few years the software industry brings concurrency pervasively into mainstream software development, just as it has done in the past for objects, garbage collection, generics, and other technologies. Sutter summarizes the issues involved, gives an overview of the impact, and describes what to expect over the coming decade.

They’ll be doing a concurrency-related webcast every two weeks. The second talk in the fall series will be "Steps to Parallelism NOW" by James Reinders, author of the new book Threading Building Blocks about Intel’s cool (and recently open-sourced) C++ template library for concurrency. Should be good.


* I love

Trip Report: July 2007 ISO C++ Standards Meeting

The ISO C++ committee met in Toronto on July 15-20. Here’s a quick summary of what we did, and information about upcoming meetings.

Features voted into draft C++0x

enum class (N2347)

This is an extension from C++/CLI that allow writing enums that has a predictable size and underlying type, has its own scope (to the enumerators don’t get injected into the enclosing scope where they can conflict with similar names), and doesn’t have a nasty implicit conversion to int (or anything else). The proposal was written by David Miller, myself, and Bjarne.

Saving exceptions (N2179)

Added language and library support for saving an exception: This feature is useful in plain old sequential code if you want to catch and store an exception to be rethrown later. But it’ll be especially useful in concurrent code, so that we can catch an exception and transport it across threads (e.g., when waiting on an asynchronous function call).

constexpr (N2235 and N2349)

This feature permits generalized constant expressions — or, in English, being able to write your own compile-time constants that really act like compile-time constants. To illustrate, here’s one simple example from the paper:

constexpr int square(int x) { return x * x; }

What’s the point? Now if you call square() with an argument that is a compile-time constant, then the result of calling square() is still a compile-time constant, and you can use it anywhere you can use compile-time constants, such as specifying the length of an array:

float arr[ square(9) ]; // ok, array of length 81

Note that the above is not a C99 variable length array. It’s a normal fixed-size array, but we have more flexibility and convenience in specifying its size.

See the linked paper for more details and examples. And to see how this new language feature is immediately also being used in the C++ standard library, here’s your source: N2349, "Constant Expressions in the Standard Library — Revision 2."

decltype (N2343 and N2194; for discussion see N2115 and N1978)

The decltype feature lets you get the type of an expression, so that you can do things with the type (e.g., declare more variables of that type) without knowing in advance what the type is or how to spell it.

This is a great boon to generic programming with templates, and without the runtime cost of reflection in other languages. For example, say that you’re handed an iterator, and you want to know what type it refers to. Today, you need to ask the iterator for its value_type, which is a manual "traits" convention everyone is expected to follow when writing and using iterators:

template<typename Iter>
void f( Iter it ) {
Iter::value_type v = *it;

With decltype, we could instead write:

template<typename Iter>
void f( Iter it ) {
decltype(*it) v = *it;

Are you curious to see how decltype is being used in the C++0x standard library itself? Here’s the paper for you: N2194, "decltype for the C++0x Standard Library."

alignof (N2341)

A nice portable way to get aligned storage and inquire about the alignment requirements of types without performing system-dependent backflips.

=default and =delete (N2346 and N2292)

If you’ve ever wished you could control the four default special member functions (default constructor, copy constructor, copy assignment, and destructor) and especially how they’re inherited from base classes, this is the paper for you.

How could these be useful in the C++0x standard library itself do you ask? Here’s where to find the answer: N2292, "Standard Library Applications for Deleted Functions."

Some other approved features

  • N2340 "C99 Compatibility : _ _ func _ _ and predeclared identifiers (revision 2)"
  • N2342 "POD’s Revisited; Resolving Core Issue 568 (Revision 5)"
  • N2293 "Standard Library Applications for Explicit Conversion Operators"
  • N2348 "Wording for std::numeric_limits<T>::lowest()"
  • N2350 "Container insert/erase and iterator constness (Revision 1)"
  • N2351 "Improving shared_ptr for C++0x, Revision 2"
  • N2299 "Concatenating tuples" (except change the name of the four functions named concatenate to tuple_cat)
  • N2007 "PROPOSED LIBRARY ADDITIONS FOR CODE CONVERSION" (hey, don’t blame me for screaming, that’s the title…)
  • N2308 "Adding allocator support to std::function for C++0x" (except change the pass-by-value Allocator argument to pass-by-const-reference)
  • N2321 "Enhancing the time_get facet for POSIX® compatibility, Revision 2"
  • N2353 "A Specification for vector<bool>" into the C++0X Working Paper. (After all, no meeting would be complete without some discussion of vector<bool>…)

Next Meetings

Here are the next meetings of the ISO C++ standards committee, with links to meeting information where available.

  • October 1-6: Kona, Hawaii, USA [N2289]
  • February 24-29, 2008: Bellevue, Washington, USA
  • June 8-13, 2008: Sophia Antipolis, France

The meetings are public, and if you’re in the area please feel free to drop by.

Effective Concurrency: Use Critical Sections (Preferably Locks) to Eliminate Races

"In a race, no one can hear you scream."
That’s my tagline for the third Effective Concurrency column, "Use Critical Sections (Preferably Locks) to Eliminate Races." It just went live on DDJ’s site, and will also appear in the print magazine.
This article focuses on two main things:
  • The detailed facts of life about why most or all bets are off if you have a data race. You really, really, really don’t want to have any races in your code.
  • The commonality that unifies all synchronization constructs you’ve ever used or will use, from locking to lock-free styles to fences to transactional memory.

Here’s the article’s intro:

Everyone knows the basics of how to use locks:

  mut.lock(); // acquire lock on x
  … read/write x …
  mut.unlock(); // release lock on x

But why do locks, lock-free styles, and other synchronization techniques work at all, never mind interoperate well with each other and with aggressive optimizers that transform and reorder your program to make it run faster? Because every synchronization technique you’ve ever heard of must express, and every optimization that may ever be performed must respect and uphold, the common fundamental concept of a critical section. …

I hope you enjoy it.
Next month’s article is already in post-production. It follows directly from this one, and will be titled "Apply Critical Sections Consistently." I’ll blog here when it hits the web about a month from now.
Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):