How parallelism demos are useful

In "Break Amdahl’s Law!", I described ways to enable scalable applications, and wrote in part:

But don’t show me ray-traced bouncing balls or Mandelbrot graphics or the other usual embarrassingly parallel but niche (or downright useless) clichés—what we’re looking for are real ideas of real software we could imagine real kids and grandmothers using that could become possible on manycore machines. Here’s a quick potential example: Researchers know how to do speech recognition with near-human-quality accuracy in the lab, which is astonishingly good and would enable breakthrough user interfaces if it could be done that reliably in real time. The only trouble is that the software takes a week to run…on a single core. Can it be parallelized, and if so how many cores would we need to get a useful answer in a useful time? Bunches of smart people (and the smart money behind them) are investing hard work not only to find out the answer to that question, but also to find more questions like it.

Just to be clear, in the first sentence above I didn’t mean to say that the standard demos are useless — far from it (see below). This was intended to be a challenging call to action to not be satisfied with demos alone, but for us as an industry to imagine and develop compelling mainstream end applications that are multicore- and manycore-scalable. (To make that clearer, I’m going to ditch and rewrite the first sentence above for the Effective Concurrency book.)

The standard demos are indeed important — not only as proofs of concept, that the technology really does enable scalable parallel code, but at least as importantly as helpful tools in helping us to understand how a given parallel technology or runtime works.

To understand concurrency mechanics/characteristics

For example, consider a standard Mandelbrot-rendering demo, but with the twist that each worker thread (core) renders its portion of the work in a different color. On a traditional runtime with static scheduling, some workers with easy-to-compute sections will be done early and wait idly while the other workers finish their harder-to-compute sections, and we can see visually that each colored section is the same size and some colored sections appear faster than others. But on a runtime with dynamic scheduling, and especially one that supports Cilk-style work stealing, we get efficient load balancing where workers who are assigned "easy" sections and are done early can contribute to remaining work in harder-to-compute areas — and visually some sections fill in with one color but then the same color starts to add to other yet-unfinished sections. The bands of color let us see which worker did what work and helped out in what other areas, and the overall visual progress of the whole image lets us see that the system as a whole is doing useful work the whole time. So the colored Mandelbrot demo is a very useful tool to let us understand what’s going on quickly and clearly, in a way that presenting the results in a numerical table can’t.

To illustrate a path to future applications

Similarly, ray-tracing may well make multicore and manycore CPUs the future of photorealistic graphics in a way that may not be applicable to standard GPUs (time will tell). As shown in the blogs below, ray-tracing makes a qualitative difference in the nature of lighting models. But can’t we do this already with GPUs? Interestingly, not necessarily; ray-tracing seems to represent an algorithm that is hard to accelerate with GPUs with limited abilities to do the fine-grain scheduling that runtimes based on techniques like work stealing are well suited to do. Some links:

Yes, demos are useful

My point in the original quote above (which I see I could have stated more clearly) was simply this: Once we’ve achieved the demos, we shouldn’t sit back and declare victory. The demos aren’t the end goal; we still need the applications.

Concurrency demos are useful to help prove a technology can scale and to understand how it works, and some of them show potentially fruitful and exciting paths to real and compelling manycore-exploiting applications, but it’s still up to us as an industry to continue to imagine and build those applications. I believe that we can and will.

4 thoughts on “How parallelism demos are useful”

Pingback: Quad-core a "waste of electricity"? « Sutter’s Mill

Great series of articles Herb.
Regarding: "Researchers
know how to do speech recognition with near-human-quality accuracy in
the lab, which is astonishingly good and would enable breakthrough user
interfaces if it could be done that reliably in real time." do you have any references ?

Good read I’ve never seen the ray trace demo before

I am reading your articles with great interest and I appreciate the work and time you put into making them. I would like to see some more challenging issues being debated like:
– the CRT memory allocator doesn’t scale (has a mutex) so even if you write multi-threaded programs for multiple cores you inevitably hit this barrier. Is MS going to address this? Both Hoard and Intel’s TBB have bugs in them – can’t use them for production code. Writing a scalable memory allocator is _very hard_ (that’s why nobody managed to do it correctly).
– Intel is going NUMA, just like AMD. OS support for NUMA is almost inexistent, tools for developers to handle NUMA are completely missing.
Just like you say in your article, we need to tackle some more general problems, not only niche problems like image processing. Almost every paper, article about multi-core programming uses image processing as examples. OK, we got that. How about some hard, real world problems? These will make a lot more interesting read for me.
Thanks!

Comments are closed.

To understand concurrency mechanics/characteristics

To illustrate a path to future applications

Yes, demos are useful

Published by Herb Sutter

4 thoughts on “How parallelism demos are useful”