Visual C++ 2008 Feature Pack now available

Back in November, I reported that we’d be shipping Visual C++ 2008 that month (we did!) and that we’d soon thereafter be doing the “agile thing” and shipping a major update mere months later, instead of waiting two years between releases per our prior tradition. I wrote:

The update is expected to be available in beta form in January 2008, and to ship in the first half of 2008. Enjoy!

Well, it’s official: It’s available. Enjoy! Besides major updates to MFC for the latest Office/VS/Vista look-and-feel to support first-class native code development, it also includes most of TR1 (everything except C99 compatibility, and the special math functions that didn’t make it into C++0x):

TR1 (“Technical Report 1”) is a set of proposed additions to the C++0x standard. Our implementation of TR1 contains a number of important features such as smart pointers, regular expression parsing, containers (tuple, array, unordered set, etc) and sophisticated random number generators.

More information on TR1 can be found at the sites below:

TR1 documentation

Channel 9: Digging into TR1

TR1 slide decks (recommended)

Enjoy, everyone – and thanks, team!

Notes

1. This feature pack requires Visual C++ 2008 Standard or above. The only VC08 edition it doesn’t work with is Express; we’ll support Express in a future release.

2. When I wrote that it would be available “in the first half of 2008,” a number of people seemed to automatically interpret that as code for “maybe around June 31.” We’re not always that bad at shipping, fortunately. :-)

3. Yes, I know that June has 30 days.

Trip Report: February/March 2008 ISO C++ Standards Meeting

[Updated Apr 3 to note automatic deduction of return type.]

The ISO C++ committee met in Bellevue, WA, USA on February 24 to March 1, 2008. Here’s a quick summary of what we did (with links to the relevant papers to read for more details), and information about upcoming meetings.

Lambda functions and closures (N2550)

For me, easily the biggest news of the meeting was that we voted lambda functions and closures into C++0x. I think this will make STL algorithms an order of magnitude more usable, and it will be a great boon to concurrent code where it’s important to be able to conveniently pass around a piece of code like an object, to be invoked wherever the program sees fit (e.g., on a worker thread).

C++ has always supported this via function objects, and lambdas/closures are merely syntactic sugar for writing function object. But, though “merely” a convenience, they are an incredibly powerful convenience for many reasons, including that they can be written right at the point of use instead of somewhere far away.

Example: Write collection to console

For example, let’s say you want to write each of a collection of Widgets to the console.

// Writing a collection to cout, in today’s C++, option 1:

for( vector<Widget>::iterator i = w.begin(); i != w.end(); ++i )
cout << *i << ” “;

Or we can leverage that C++ already has a special-purpose ostream_iterator type that does what we want:

// Writing a collection to cout, in today’s C++, option 2:

copy( w.begin(), w.end(),
ostream_iterator<const Widget>( cout, ” ” ) );

In C++0x, just use a lambda that writes the right function object on the fly:

// Writing a collection to cout, in C++0x:

for_each( w.begin(), w.end(),
[]( const Widget& w ) { cout << w << ” “; } );

(Usability note: The lambda version was the only one I wrote correctly the first time as I tried these examples on compilers to check them. ‘Nuff said. <tease type=”shameless”> Yes, that means I tried it on a compiler. No, I’m not making any product feature announcements about VC++ version 10. At least not right now. </tease>)

Example: Find element with Weight() > 100

For another example, let’s say you want to find an element of a collection of Widgets whose weight is greater than 100. Here’s what you might write today:

// Calling find_if using a functor, in today’s C++:

// outside the function, at namespace scope
class GreaterThan {
int weight;
public:
GreaterThan( int weight_ )
: weight(weight_) { }
bool operator()( const Widget& w ) {
return w.Weight() > weight;
}
};

// at point of use
find_if( w.begin(), w.end(), GreaterThan(100) );

At this point some people will point out that (a) we have C++98 standard binder helpers like bind2nd or (b) that we have Boost’s bind and lambda libraries. They don’t really help much here, at least not if you’re interested in having the code be readable and maintainable. If you doubt, try and see.

In C++0x, you can just write:

// Calling find_if using a lambda, in C++0x:

find_if( w.begin(), w.end(),
[]( Widget& w ) { return w.Weight() > 100; } );

Ah. Much better.

Most algorithms are loops… hmm…

In fact, every loop-like algorithm is now usable as a loop. Quick examples using std::for_each and std::transform:

for_each( v.begin(), v.end(), []( Widget& w )
{
…
… use or modify w …
…
} );

transform( v.begin(), v.end(), output.begin(), []( Widget& w )
{
…
return SomeResultCalculatedFrom( w );
} );

Hmm. Who knows: As C++0x lambdas start to be supported in upcoming compilers, we may start getting more used to seeing “});” as the end of a loop body.

Concurrency teaser

Finally, want to pass a piece of code to be executed on a thread pool without tediously having to define a functor class out at namespace scope? Do it directly:

// Passing work to a thread pool, in C++0x:

mypool.run( [] { cout << “Hello there (from the pool)”; } );

Gnarly.

Other approved features

N2535 Namespace associations (inline namespace)
N2540 Inheriting constructors
N2541 New function declarator syntax
N2543 STL singly linked lists (forward_list)
N2544 Unrestricted unions
N2546 Removal of auto as a storage-class specifier
N2551 Variadic template versions of std::min, std::max, and std::minmax
N2554 Scoped allocator model
N2525 Allocator-specific swap and move behavior
N2547 Allow lock-free atomic<T> in signal handlers
N2555 Extended variadic template template parameters
N2559 Nesting exceptions (aka wrapped exceptions)

Next Meetings

Here are the next meetings of the ISO C++ standards committee, with links to meeting information where available.

June 8-14, 2008: Sophia Antipolis, France
September 14-20, 2008: San Francisco Bay area, California, USA

The meetings are public, and if you’re in the area please feel free to drop by.

Concurrency Interview with DevX

I recently spent an hour on the phone to talk concurrency with DevX’s Alexa Weber Morales. Part 1 of that interview just went live on the web, and focuses mostly on what concurrency and parallelism are, how to take advantage of multicore chips, and whether concurrency will ever be really accessible to mainstream developers. The site seems to be having intermittent problems displaying the pages; just hit the link a few more times if it doesn’t work right away.

Disclaimer: I am not responsible for the article title (yikes! “whisperer”?! my goodness gracious) and Alexa’s intro blurb is way too kind. But it is true that it’s important for us-the-industry to bring concurrency to the mainstream in a grokkable way, as we have already successfully done with OO and GUIs in the past.

New Course Available: Effective Concurrency

Many of you have kindly sent mail about my Effective Concurrency columns and asking when there’ll be a course. Well, I’m happy to announce that the answer is: May 19-21, 2008.

Here’s the brief information (more details below):

3-Day Seminar: Effective Concurrency

May 19-21, 2008
Bellevue, WA, USA
Developed and taught by Herb Sutter

This course covers the fundamental tools that software developers need to write effective concurrent software for both single-core and multi-core/many-core machines. To use concurrency effectively, we must identify and solve four key challenges:

Leverage the ability to perform and manage work asynchronously

Build applications that naturally run faster on new hardware having more and more cores

Manage shared objects in memory effectively to avoid races and deadlocks

Engineer specifically for high performance

This seminar will equip attendees to reason correctly about concurrency requirements and tradeoffs, to migrate existing code bases to be concurrency-enabled, and to achieve key success factors for a concurrent programming project. Most code examples in the course can be directly translated to popular platforms and concurrency libraries, including Linux, Windows, Java, .NET, pthreads, and the forthcoming ISO C++0x standard.

Note on class size limit and possible waitlist: There is a hard limit on attendance at this first one (really). But if the registration site says you’ll get waitlisted, don’t give up: Go ahead and sign up anyway because we may be able to put together a second installment of the seminar a week or two later if there’s enough interest.

Finally, here’s a summary of what we’ll cover during the three days.

Fundamentals

Define basic concurrency goals and requirements

Understand applications’ scalability needs

Key concurrency patterns

Isolation: Keep Work Separate

Running tasks in isolation and communicate via async messages

Integrating multiple messaging systems, including GUIs and sockets

Building responsive applications using background workers

Threads vs. thread pools

Scalability: Re-enable the Free Lunch

When and how to use more cores

Exploiting parallelism in algorithms

Exploiting parallelism in data structures

Breaking the scalability barrier

Consistency: Don’t Corrupt Shared State

The many pitfalls of locks–deadlock, convoys, etc.

Locking best practices

Reducing the need for locking shared data

Safe lock-free coding patterns

Avoiding the pitfalls of general lock-free coding

Races and race-related effects

Migrating Existing Code Bases to Use Concurrency

Near-Future Tools and Features

High Performance Concurrency

Machine architecture and concurrency

Costs of fundamental operations, including locks, context switches, and system calls

Memory and cache effects

Data structures that support and undermine concurrency

Enabling linear and superlinear scaling

I hope to get to meet some of you here in the Seattle area!

Effective Concurrency: Super Linearity and the Bigger Machine

The latest Effective Concurrency column, "Super Linearity and the Bigger Machine", just went live on DDJ’s site, and will also appear in the print magazine. From the article:

There are two main ways to achieve superlinear scalability, or to use P processors to compute an answer more than P times faster…:

Do disproportionately less work.

Harness disproportionately more resources.

Last month, we focused on the first point by illustrating parallel search and how it naturally achieves superlinear speedups when matches are not distributed evenly because some workers get "rich" subranges and will find a match faster, which benefits the whole search because we can stop as soon as any worker finds a match.

This month, we’ll conclude examining the first point with a few more examples, and then consider how to achieve superlinear speedups by harnessing more resources—quite literally, running on a bigger machine without any change in the hardware. …

I hope you enjoy it.

Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):

July 2007	The Pillars of Concurrency
August 2007	How Much Scalability Do You Have or Need?
September 2007	Use Critical Sections (Preferably Locks) to Eliminate Races
October 2007	Apply Critical Sections Consistently
November 2007	Avoid Calling Unknown Code While Inside a Critical Section
December 2007	Use Lock Hierarchies to Avoid Deadlock
January 2008	Break Amdahl’s Law!
February 2008	Going Superlinear
March 2008	Super Linearity and the Bigger Machine

Stroustrup & Sutter: The Lyrics

Last week’s Stroustrup & Sutter on C++ was a huge amount of fun, and Bjarne and I want to thank everyone who came. It was a record-shattering year, and it’s great to see C++ clearly thriving and growing.

A lot of people requested the (modified) lyrics to the songs we performed (yes, if you missed the event, you missed live music by geeks — imagine, if you will). To those who were there: You can now find the song lyrics at the same web page we gave out that contains the course eval link and the updated slides link. Just go back and you’ll see them, as well as the slides for What Not to Code in the handouts zipfile. Enjoy.

Thanks again for coming, and we hope to see you again next time. (The response to the post-seminar eval question about “would you recommend this course to a colleague” was a humbling 100.0%. Wow. It’s not often I see a pie chart that’s a solid circle. Thank you, and we’re glad you enjoyed it!)

How parallelism demos are useful

In "Break Amdahl’s Law!", I described ways to enable scalable applications, and wrote in part:

But don’t show me ray-traced bouncing balls or Mandelbrot graphics or the other usual embarrassingly parallel but niche (or downright useless) clichés—what we’re looking for are real ideas of real software we could imagine real kids and grandmothers using that could become possible on manycore machines. Here’s a quick potential example: Researchers know how to do speech recognition with near-human-quality accuracy in the lab, which is astonishingly good and would enable breakthrough user interfaces if it could be done that reliably in real time. The only trouble is that the software takes a week to run…on a single core. Can it be parallelized, and if so how many cores would we need to get a useful answer in a useful time? Bunches of smart people (and the smart money behind them) are investing hard work not only to find out the answer to that question, but also to find more questions like it.

Just to be clear, in the first sentence above I didn’t mean to say that the standard demos are useless — far from it (see below). This was intended to be a challenging call to action to not be satisfied with demos alone, but for us as an industry to imagine and develop compelling mainstream end applications that are multicore- and manycore-scalable. (To make that clearer, I’m going to ditch and rewrite the first sentence above for the Effective Concurrency book.)

The standard demos are indeed important — not only as proofs of concept, that the technology really does enable scalable parallel code, but at least as importantly as helpful tools in helping us to understand how a given parallel technology or runtime works.

To understand concurrency mechanics/characteristics

For example, consider a standard Mandelbrot-rendering demo, but with the twist that each worker thread (core) renders its portion of the work in a different color. On a traditional runtime with static scheduling, some workers with easy-to-compute sections will be done early and wait idly while the other workers finish their harder-to-compute sections, and we can see visually that each colored section is the same size and some colored sections appear faster than others. But on a runtime with dynamic scheduling, and especially one that supports Cilk-style work stealing, we get efficient load balancing where workers who are assigned "easy" sections and are done early can contribute to remaining work in harder-to-compute areas — and visually some sections fill in with one color but then the same color starts to add to other yet-unfinished sections. The bands of color let us see which worker did what work and helped out in what other areas, and the overall visual progress of the whole image lets us see that the system as a whole is doing useful work the whole time. So the colored Mandelbrot demo is a very useful tool to let us understand what’s going on quickly and clearly, in a way that presenting the results in a numerical table can’t.

To illustrate a path to future applications

Similarly, ray-tracing may well make multicore and manycore CPUs the future of photorealistic graphics in a way that may not be applicable to standard GPUs (time will tell). As shown in the blogs below, ray-tracing makes a qualitative difference in the nature of lighting models. But can’t we do this already with GPUs? Interestingly, not necessarily; ray-tracing seems to represent an algorithm that is hard to accelerate with GPUs with limited abilities to do the fine-grain scheduling that runtimes based on techniques like work stealing are well suited to do. Some links:

Yes, demos are useful

My point in the original quote above (which I see I could have stated more clearly) was simply this: Once we’ve achieved the demos, we shouldn’t sit back and declare victory. The demos aren’t the end goal; we still need the applications.

Concurrency demos are useful to help prove a technology can scale and to understand how it works, and some of them show potentially fruitful and exciting paths to real and compelling manycore-exploiting applications, but it’s still up to us as an industry to continue to imagine and build those applications. I believe that we can and will.

Effective Concurrency: Going Superlinear

The latest Effective Concurrency column, "Going Superlinear", just went live on DDJ’s site, and will also appear in the print magazine. From the article:

We spend most of our scalability lives inside a triangular box, shown in Figure 1. It reminds me of the early days of flight: We try to lift ourselves away from the rough ground of zero scalability and fly as close as possible to the cloud ceiling of linear speedup. Normally, the Holy Grail of parallel scalability is to linearly use P processors or cores to complete some work almost P times faster, up to some hopefully high number of cores before our code becomes bound on memory or I/O or something else that means diminishing returns for adding more cores. As Figure 1 illustrates, the traditional shape of our "success" curve lies inside the triangle.

Sometimes, however, we can equip our performance plane with extra tools and safely break through the linear ceiling into the superlinear stratosphere. So the question is: "Under what circumstances can we use P cores to do work more than P times faster?" There are two main ways to enter that rarefied realm:

Do disproportionately less work.

Harness disproportionately more resources.

This month and next, we’ll consider situations and techniques that fall into one or both of these categories. …

I hope you enjoy it.

Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):

July 2007	The Pillars of Concurrency
August 2007	How Much Scalability Do You Have or Need?
September 2007	Use Critical Sections (Preferably Locks) to Eliminate Races
October 2007	Apply Critical Sections Consistently
November 2007	Avoid Calling Unknown Code While Inside a Critical Section
December 2007	Use Lock Hierarchies to Avoid Deadlock
January 2008	Break Amdahl’s Law!
February 2008	Going Superlinear

What Not To Code

At Stroustrup & Sutter on C++ this March, one of my sessions will be on "What Not To Code" (submission link). The premise is to try something new I haven’t done before: A session dedicated to making over code nominated by you, the public, in the few weeks before the talk.

In return for your submission, here’s what I’ll do if your entry is selected:

Full notes: Whether or not you attend the S&S event, I’ll send you the full talk handouts — not only for your submission, but for all selected submissions — as your thank-you for participating.
Participation: If you’re there in the audience, you can participate in the makeover on stage (if you want to, your choice). Oh, and there might be an additional live prize.

I’ll select as many of the submissions as I can, and analyze/critique/improve them during the session, including talking about tradeoffs and alternatives that can make the code clearer, faster, simpler, and/or safer. As the talk blurb concludes:

… you will refresh your sense of elegance and beauty, not to mention old-fashioned performance and robustness and maintainability, so often lacking in the broken code littering today’s bleak postmodern corporate landscape.

So if you’ve seen a friend’s or coworker’s (or your own) code that could use a makeover, please nominate it anytime here. Thanks!

Many Books

When I walk into a Chapters or a Borders, seeing the many shelves of books often recalls the ancient writer’s words about quality vs. quantity, circa 1000 BC:

"To the making of many books there is no end."

So true. Yet that observation predates the printing press… and netnews… and now RSS.

(Yes, I’ve been thinking of managing-down my Google Reader subscriptions again…)

Sutter’s Mill

Visual C++ 2008 Feature Pack now available

Notes

Trip Report: February/March 2008 ISO C++ Standards Meeting

Lambda functions and closures (N2550)

Example: Write collection to console

Example: Find element with Weight() > 100

Most algorithms are loops… hmm…

Concurrency teaser

Other approved features

Next Meetings

Concurrency Interview with DevX

New Course Available: Effective Concurrency

3-Day Seminar: Effective Concurrency

May 19-21, 2008
Bellevue, WA, USA
Developed and taught by Herb Sutter

Effective Concurrency: Super Linearity and the Bigger Machine

Stroustrup & Sutter: The Lyrics

How parallelism demos are useful

To understand concurrency mechanics/characteristics

To illustrate a path to future applications

Yes, demos are useful

Effective Concurrency: Going Superlinear

What Not To Code

Many Books

Notes

Lambda functions and closures (N2550)

Example: Write collection to console

Example: Find element with Weight() > 100

Most algorithms are loops… hmm…

Concurrency teaser

Other approved features

Next Meetings

May 19-21, 2008 Bellevue, WA, USA Developed and taught by Herb Sutter

To understand concurrency mechanics/characteristics

To illustrate a path to future applications

Yes, demos are useful

May 19-21, 2008
Bellevue, WA, USA
Developed and taught by Herb Sutter