AFDS Keynote Live Stream

Just a reminder for those interested in using C++ to harness GPUs for fast code: My keynote at AMD Fusion Developer’s Conference will be webcast live. I’ll post another link when the recorded talk is available for on-demand viewing.

The talk starts at 8:30am U.S. Pacific time tomorrow (Wed June 15).

Today Jem Davies of ARM also gave a keynote. He’s a great speaker with a great message; look for it when it becomes available on demand. Recommended viewing whether or not you target ARM processors.

Interview on Channel 9

Channel 9 just posted a new interview with me about ISO C++0x, C++’s place in the modern world, and all things C++. The topics we talked about ranged pretty widely, as you can see from the questions below.

Here’s the blurb as posted on Channel 9 with links to specific questions in the interview. Enjoy.

Herb

I was lucky enough to catch up with Herb Sutter not too long after the FDIS announcement (Final Draft International Standard is complete).

As usual when talking to Herb, the conversation is all about C++ (well, we do talk about C# for a little while, but in the context of C++. Why? Tune in…).

See below for the specific questions that were asked. You can simply click on a link to move directly to that point in the conversation. I do, however, strongly recommend that you watch the entire thing. I also recommend that you don’t get used to this level of categorization in my videos (it takes a fair amount of time to do this sort of thing, so enjoy the times when I actually do this, but don’t expect me to do this all of the time).

It’s always great to talk to Herb and get a glimpse of what goes on in the C++ Standards Committee (which Herb chairs). In this specific conversation, it’s uplifting to see how excited Herb is for the future of one of the world’s most capable and widely used general purpose programming languages. C++ is a modern programming language for power and performance, but it’s also a highly abstracted general purpose language for building user mode applications, mobile apps, etc. The amazing part is how C++ can provide rich general programming abstractions and also ensure that your code can run at machine speeds. We talk about this, of course.

Tune in. Learn. Go native!

1:37 -> What were the goals of the C++0x standard, at a high level?

2:40 -> Language and Library abstractions and performance (how high can you go and still be fast as possible?)…

5:23 -> C++ as an application development language (in addition to the traditional C++ is a systems programming language meme)…

07:17 -> C++0x or can we now call it C++11?

09:21 -> Standards committees and real world user representation…

10:39 -> Who comes up with the new features that get standardized (or not…)?

13:01 -> What were the goals of the C++0x standard (non-canned answer)?

14:21 -> What does Bjarne mean by C++0x being a better C++ for novice programmers?

15:51 -> Why can’t C++ look more like C#?

18:50 -> At the end of the day, everything(in terms of programmer-controlled computing) boils down to memory, right?

23:12 -> What are some of the most significant new features in C++0x?

25:05 -> What can VC++ developers expect to see in terms of C++0x implementation in Visual C++ next?

27:09 -> C++ and type safety…

29:05 -> C++0x and backwards compatibility: any big breaking changes?

34:16 -> C++0x in the Standard Library…

37:01 -> Any thinking in the Committee about doing more frequent experimental releases C++?

39:04 -> Are their features that didn’t make it into the standard that you really wanted to be standardized?

41:45 -> Are you comfortable with C++’s current state? Is it modern enough?

43:22 -> Conclusion (or Charles doesn’t end the conversation when his farewell begins – where does it go from there? )

Keynote at the AMD Fusion Developer Summit

In a couple of months, I’ll be giving a keynote at the AMD Fusion Developer’s Summit, which will be held on June 13-16, 2011, in Bellevue, WA, USA.

Here’s my talk’s description as it appears on the conference website:

AFDS Keynote: “Heterogeneous Parallelism at Microsoft”
Herb Sutter, Microsoft Principal Architect, Native Languages

Parallelism is not just in full bloom, but increasingly in full variety. We know that getting full computational performance out of most machines—nearly all desktops and laptops, most game consoles, and the newest smartphones—already means harnessing local parallel hardware, mainly in the form of multicore CPU processing. This is the commoditization of the supercomputer.

More and more, however, getting that full performance can also mean using gradually ever-more-heterogeneous processing, from local GPGPU and Accelerated Processing Unit (APU) flavors to “often-on” remote parallel computing power in the form of elastic compute clouds. This is the generalization of the heterogeneous cluster in all its NUMA glory, and it’s appearing at all scales from on-die to on-machine to on-cloud.

In this talk, Microsoft’s chief native languages architect shares a vision of what this will mean for native software on Microsoft platforms from servers to devices, and showcases upcoming innovations that bring access to increasingly heterogeneous compute resources — from vector units and multicore, to GPGPU and APU, to elastic cloud — directly into the world’s most popular native languages.

If you’re interested in high performance code for GPUs, APUs, and other high-performance TLAs, I hope to see you there.

Note: This talk is related to, but different from, the GPU talk I’ll be presenting in August at C++ and Beyond 2011 (aka C&B). You can expect the above keynote to be, well, keynote-y… oriented toward software product features and of course AMD’s hardware, with plenty of forward-looking industry vision style material. My August C&B technical talk will be just that, an in-depth performance-oriented and sometimes-gritty technical session that will also mention product-related and hardware-specific stuff but is primarily about heterogeneous hardware, with a more pragmatically focused forward-looking eye.

C++ and Beyond 2011

I’m very much looking forward to C++ and Beyond 2011 this August, again with Scott Meyers and Andrei Alexandrescu. All of my own talks will be brand-new material never given publicly before.

This year’s program will be heavily oriented toward performance (first) and C++0x (second). There are two talks announced so far:

Andrei will be giving an in-depth talk on “BIG: C++ Strategies, Data Structures, and Algorithms Aimed at Scalability.” Briefly, it’s about writing high-performance C++ code for highly distributed architectures, focusing on translating C++’s strong modeling capabilities directly to great scaling and/or great savings, and finding the right but non-intuitive C++ techniques and data structures to get there.
I’ll be giving a brand-new talk on “C++ and the GPU… and Beyond.” I’ll cover the state of the art for using C++ (not just C) for general-purpose computation on graphics processing units (GPGPU). The first half of the talk discusses the most important issues and techniques to consider when using GPUs for high-performance computation, especially where we have to change our traditional advice for doing the same computation on the CPU. The second half focuses on upcoming C++ language and library extensions that bring key abstractions for GPGPU — and in time considerably more — directly into C++.

An announcement for a third (also performance-focused) talk should be posted within the week, with more to come as we continue to announce the talk schedule as it firms up.

Registration is now open. I hope many of you will be able to make it.

Book on PPL is now available

For those of you who may be interested in concurrency and parallelism using Microsoft tools, there’s a new book now available on the Visual C++ 2010 Parallel Patterns Library (PPL). I hope you enjoy it.

Normally I don’t write about other people’s platform-specific books, but I happened to be involved in the design of PPL, thought the book was nicely done, and contributed a Foreword. Here’s what I wrote to introduce the book:

This timely book comes as we navigate a major turning point in our industry: parallel hardware + mobile devices = the pocket supercomputer as the mainstream platform for the next 20 years.

Parallel applications are increasingly needed to exploit all kinds of target hardware. As I write this, getting full computational performance out of most machines—nearly all desktops and laptops, most game consoles, and the newest smartphones—already means harnessing local parallel hardware, mainly in the form of multicore CPU processing; this is the commoditization of the supercomputer. Increasingly in the coming years, getting that full performance will also mean using gradually ever-more-heterogeneous processing, from local general-purpose computation on graphics processing units (GPGPU) flavors to harnessing “often-on” remote parallel computing power in the form of elastic compute clouds; this is the generalization of the heterogeneous cluster in all its NUMA glory, with instantiations ranging from on-die to on-machine to on-cloud, with early examples of each kind already available in the wild.

Starting now and for the foreseeable future, for compute-bound applications, “fast” will be synonymous not just with “parallel,” but with “scalably parallel.” Only scalably parallel applications that can be shipped with lots of latent concurrency beyond what can be exploited in this year’s mainstream machines will be able to enjoy the new Free Lunch of getting substantially faster when today’s binaries can be installed and blossom on tomorrow’s hardware that will have more parallelism.

Visual C++ 2010 with its Parallel Patterns Library (PPL), described in this book, helps enable applications to take the first steps down this new path as it continues to unfold. During the design of PPL, many people did a lot of heavy lifting. For my part, I was glad to be able to contribute the heavy emphasis on lambda functions as the key central language extension that enabled the rest of PPL to be built as Standard Template Library (STL)-like algorithms implemented as a normal library. We could instead have built a half-dozen new kinds of special-purpose parallel loops into the language itself (and almost did), but that would have been terribly invasive and non-general. Adding a single general-purpose language feature like lambdas that can be used everywhere, including with PPL but not limited to only that, is vastly superior to baking special cases into the language.

The good news is that, in large parts of the world, we have as an industry already achieved pervasive computing: the vision of putting a computer on every desk, in every living room, and in everyone’s pocket. But now we are in the process of delivering pervasive and even elastic supercomputing: putting a supercomputer on every desk, in every living room, and in everyone’s pocket, with both local and non-local resources. In 1984, when I was just finishing high school, the world’s fastest computer was a Cray X-MP with four processors, 128MB of RAM, and peak performance of 942MFLOPS—or, put another way, a fraction of the parallelism, memory, and computational power of a 2005 vintage Xbox, never mind modern “phones” and Kinect. We’ve come a long way, and the pace of change is not only still strong, but still accelerating.

The industry turn to parallelism that has begun with multicore CPUs (for the reasons I outlined a few years ago in my essay “The Free Lunch Is Over”) will continue to be accelerated by GPGPU computing, elastic cloud computing, and other new and fundamentally parallel trends that deliver vast amounts of new computational power in forms that will become increasingly available to us through our mainstream programming languages. At Microsoft, we’re very happy to be able to be part of delivering this and future generations of tools for mainstream parallel computing across the industry. With PPL in particular, I’m very pleased to see how well the final product has turned out and look forward to seeing its capabilities continue to grow as we re-enable the new Free Lunch applications—scalable parallel applications ready for our next 20 years.

Herb Sutter
Principal Architect, Microsoft
Bellevue, WA, USA

February 2011

Interview on Channel 9

Over the holidays, Erik Meijer interviewed me on Channel 9. We covered a wide variety of topics, mostly centered on C++ with some straying into C#/Java/Haskell/Clojure/Erlang, but ranging from auto and closures to why (not?) derive future<T> from T, and from what the two most important problems in parallelism are in 2011 to why and how to taste new programming languages regularly. I think it turned out well. Enjoy!

Effective Concurrency: Know When to Use an Active Object Instead of a Mutex

This month’s Effective Concurrency column, “Know When to Use an Active Object Instead of a Mutex,” is now live on DDJ’s website.

From the article:

Let’s say that your program has a shared log file object. The log file is likely to be a popular object; lots of different threads must be able to write to the file; and to avoid corruption, we need to ensure that only one thread may be writing to the file at any given time.

Quick: How would you serialize access to the log file?

Before reading on, please think about the question and pencil in some pseudocode to vet your design. More importantly, especially if you think this is an easy question with an easy answer, try to think of at least two completely different ways to satisfy the problem requirements, and jot down a bullet list of the advantages and disadvantages they trade off.

Ready? Then let’s begin.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

1 The Pillars of Concurrency (Aug 2007)

2 How Much Scalability Do You Have or Need? (Sep 2007)

3 Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

4 Apply Critical Sections Consistently (Nov 2007)

5 Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

6 Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

7 Break Amdahl’s Law! (Feb 2008)

8 Going Superlinear (Mar 2008)

9 Super Linearity and the Bigger Machine (Apr 2008)

10 Interrupt Politely (May 2008)

11 Maximize Locality, Minimize Contention (Jun 2008)

12 Choose Concurrency-Friendly Data Structures (Jul 2008)

13 The Many Faces of Deadlock (Aug 2008)

14 Lock-Free Code: A False Sense of Security (Sep 2008)

15 Writing Lock-Free Code: A Corrected Queue (Oct 2008)

16 Writing a Generalized Concurrent Queue (Nov 2008)

17 Understanding Parallel Performance (Dec 2008)

18 Measuring Parallel Performance: Optimizing a Concurrent Queue(Jan 2009)

19 volatile vs. volatile (Feb 2009)

20 Sharing Is the Root of All Contention (Mar 2009)

21 Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

22 Use Thread Pools Correctly: Keep Tasks Short and Nonblocking(Apr 2009)

23 Eliminate False Sharing (May 2009)

24 Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

25 The Power of “In Progress” (Jul 2009)

26 Design for Manycore Systems (Aug 2009)

27 Avoid Exposing Concurrency – Hide It Inside Synchronous Methods (Oct 2009)

28 Prefer structured lifetimes – local, nested, bounded, deterministic(Nov 2009)

29 Prefer Futures to Baked-In “Async APIs” (Jan 2010)

30 Associate Mutexes with Data to Prevent Races (May 2010)

31 Prefer Using Active Objects Instead of Naked Threads (June 2010)

32 Prefer Using Futures or Callbacks to Communicate Asynchronous Results (August 2010)

33 Know When to Use an Active Object Instead of a Mutex (September 2010)

Effective Concurrency: Prefer Using Futures or Callbacks to Communicate Asynchronous Results

This month’s Effective Concurrency column, “Prefer Using Futures or Callbacks to Communicate Asynchronous Results,” is now live on DDJ’s website.

From the article:

This time, we’ll answer the following questions: How should we express return values and out parameters from an asynchronous function, including an active object method? How should we give back multiple partial results, such as partial computations or even just “percent done” progress information? Which mechanisms are suited to callers that want to “pull” results, as opposed to having the callee “push” the results back proactively? And how can “pull” be converted to “push” when we need it? Let’s dig in…

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

1 The Pillars of Concurrency (Aug 2007)

2 How Much Scalability Do You Have or Need? (Sep 2007)

3 Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

4 Apply Critical Sections Consistently (Nov 2007)

5 Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

6 Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

7 Break Amdahl’s Law! (Feb 2008)

8 Going Superlinear (Mar 2008)

9 Super Linearity and the Bigger Machine (Apr 2008)

10 Interrupt Politely (May 2008)

11 Maximize Locality, Minimize Contention (Jun 2008)

12 Choose Concurrency-Friendly Data Structures (Jul 2008)

13 The Many Faces of Deadlock (Aug 2008)

14 Lock-Free Code: A False Sense of Security (Sep 2008)

15 Writing Lock-Free Code: A Corrected Queue (Oct 2008)

16 Writing a Generalized Concurrent Queue (Nov 2008)

17 Understanding Parallel Performance (Dec 2008)

18 Measuring Parallel Performance: Optimizing a Concurrent Queue(Jan 2009)

19 volatile vs. volatile (Feb 2009)

20 Sharing Is the Root of All Contention (Mar 2009)

21 Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

22 Use Thread Pools Correctly: Keep Tasks Short and Nonblocking(Apr 2009)

23 Eliminate False Sharing (May 2009)

24 Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

25 The Power of “In Progress” (Jul 2009)

26 Design for Manycore Systems (Aug 2009)

27 Avoid Exposing Concurrency – Hide It Inside Synchronous Methods (Oct 2009)

28 Prefer structured lifetimes – local, nested, bounded, deterministic(Nov 2009)

29 Prefer Futures to Baked-In “Async APIs” (Jan 2010)

30 Associate Mutexes with Data to Prevent Races (May 2010)

31 Prefer Using Active Objects Instead of Naked Threads (June 2010)

32 Prefer Using Futures or Callbacks to Communicate Asynchronous Results (August 2010)

Effective Concurrency: Prefer Using Active Objects Instead of Naked Threads

This month’s Effective Concurrency column, “Prefer Using Active Objects Instead of Naked Threads,” is now live on DDJ’s website.

From the article:

… Active objects dramatically improve our ability to reason about our thread’s code and operation by giving us higher-level abstractions and idioms that raise the semantic level of our program and let us express our intent more directly. As with all good patterns, we also get better vocabulary to talk about our design. Note that active objects aren’t a novelty: UML and various libraries have provided support for active classes. Some actor-based languages already have variations of this pattern baked into the language itself; but fortunately, we aren’t limited to using only such languages to get the benefits of active objects.

This article will show how to implement the pattern, including a reusable helper to automate the common parts, in any of the popular mainstream languages and threading environments, including C++, C#/.NET, Java, and C/Pthreads.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

1 The Pillars of Concurrency (Aug 2007)

2 How Much Scalability Do You Have or Need? (Sep 2007)

3 Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

4 Apply Critical Sections Consistently (Nov 2007)

5 Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

6 Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

7 Break Amdahl’s Law! (Feb 2008)

8 Going Superlinear (Mar 2008)

9 Super Linearity and the Bigger Machine (Apr 2008)

10 Interrupt Politely (May 2008)

11 Maximize Locality, Minimize Contention (Jun 2008)

12 Choose Concurrency-Friendly Data Structures (Jul 2008)

13 The Many Faces of Deadlock (Aug 2008)

14 Lock-Free Code: A False Sense of Security (Sep 2008)

15 Writing Lock-Free Code: A Corrected Queue (Oct 2008)

16 Writing a Generalized Concurrent Queue (Nov 2008)

17 Understanding Parallel Performance (Dec 2008)

18 Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

19 volatile vs. volatile (Feb 2009)

20 Sharing Is the Root of All Contention (Mar 2009)

21 Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

22 Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

23 Eliminate False Sharing (May 2009)

24 Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

25 The Power of “In Progress” (Jul 2009)

26 Design for Manycore Systems (Aug 2009)

27 Avoid Exposing Concurrency – Hide It Inside Synchronous Methods (Oct 2009)

28 Prefer structured lifetimes – local, nested, bounded, deterministic (Nov 2009)

29 Prefer Futures to Baked-In “Async APIs” (Jan 2010)

30 Associate Mutexes with Data to Prevent Races (May 2010)

31 Prefer Using Active Objects Instead of Naked Threads (June 2010)

Effective Concurrency Course: June and (Not) October

I forgot to blog about this until now because of focusing on the Effective Concurrency course in Stockholm a few weeks ago, but to answer those who wonder if I’ll be giving it again in North America too: Yes, I’m also giving the public Effective Concurrency course again at the end of this month at the Construx facility in Bellevue, WA, USA. This will be the full four-day version of the course. Spaces are still available.

I’ll cover the following topics:

Fundamentals: Define basic concurrency goals and requirements • Understand applications’ scalability needs • Key concurrency patterns
Isolation — Keep work separate: Running tasks in isolation and communicate via async messages • Integrating multiple messaging systems, including GUIs and sockets • Building responsive applications using background workers • Threads vs. thread pools
Scalability — Re-enable the Free Lunch: When and how to use more cores • Exploiting parallelism in algorithms • Exploiting parallelism in data structures • Breaking the scalability barrier
Consistency — Don’t Corrupt Shared State: The many pitfalls of locks–deadlock, convoys, etc. • Locking best practices • Reducing the need for locking shared data • Safe lock-free coding patterns • Avoiding the pitfalls of general lock-free coding • Races and race-related effects
High Performance Concurrency: Machine architecture and concurrency • Costs of fundamental operations, including locks, context switches, and system calls • Memory and cache effects • Data structures that support and undermine concurrency • Enabling linear and superlinear scaling
Migrating Existing Code Bases to Use Concurrency
Near-Future Tools and Features

I hope to see some of you there!

Update 8/19: I had planned to do the EC course again in October (hence the post title) but that now has to be deferred to sometime in the new year. Sorry for any inconvenience to those of you who had already registered, or were planning to. I’ll blog about it here when it’s rescheduled.