Answering email about error handling in concurrent code

Someone emailed me today asking:

I’m writing because I’m somewhat conscious of what I would consider a rather large hole in the parallel programming literature.

… What if one or more of your tasks throws an exception? Should the thread that runs the task swallow it? Should the caught exceptions get stashed somewhere so that the "parent" thread can deal with them once the tasks are complete? (This is somewhat tricky currently in a language such as C++(98) where one cannot store an exception caught with the "catch(…)" construct). Perhaps all tasks should have a no-throw guarantee? Perhaps some kind of asynchronous error handlers might be installed, somewhat like POSIX signals? The options are many, but choosing a strategy is hard for those of us with little parallel programming experience.

I thought I’d share my response here:

That’s an excellent question. Someone asked that very question in Stockholm last month at my Effective Concurrency course, and my answer started out somewhat dismissive: "Well, it’s about the same as you do in sequential code, and all the same guarantees apply; nothrow/nofail is only for a few key functions used for commit/rollback operations, and you’d usually target the basic guarantee unless adding the strong guarantee comes along naturally for near-free. So it’s pretty much the same as always. Although, well, of course futures may transport exceptions across threads, but that’s still the same because they manifest on .get(). And of course for parallel loops you may get multiple concurrent exceptions from multiple concurrent loop bodies that get aggregated into a single exception; and then there’s the question of whether you start new loop bodies that haven’t started yet (usually no) but do you interrupt loop bodies that are in progress (probably not), and… oh, hmm, yeah, I guess it would be good to write an article about that."

So the above is now adding to my notes of things to write about. :-) Maybe some of that stream-of-consciousness may be helpful until I can get to writing it up in more detail.

I pointed him to Doug Lea’s Concurrent Programming in Java pages 161-176, "Dealing with Failure", adding that I haven’t read it in detail but the subtopics look right. Also Joe Duffy’s Concurrent Programming on Windows, pages 721-733.

If you know of a good standalone treatise focused on error handling in concurrent code, please mention it in the comments.

Effective Concurrency: Break Up and Interleave Work to Keep Threads Responsive

This month’s Effective Concurrency column, “Break Up and Interleave Work to Keep Threads Responsive”, is now live on DDJ’s website.

Sorry for the long title; suggestions welcome. I always try to word the title to make it (a) short, (b) active, and (c) advice, but sometimes I’ll settle for two of those, or just one, until a better suggestion comes along.

From the article:

What happens when this thread must remain responsive to new incoming messages that have to be handled quickly, even when we’re in the middle of servicing an earlier lower-priority message that may take a long time to process?

If all the messages must be handled on this same thread, then we have a problem. Fortunately, we also have two good solutions, both of which follow the same basic strategy: Somehow break apart the large piece of work to allow the thread to perform other work in between, interleaved between the chunks of the large item. Let’s consider the two major ways to implement that interleaving, and their respective tradeoffs in the areas of fairness and performance.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

Eliminate False Sharing (May 2009)

Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

Truth In Spam

This afternoon I was just finishing up my next Effective Concurrency article (it’ll be up in a few days), when some spam email arrived. Just as my fingers’ auto-delete macro was about to fire, I noticed something odd about the name of the attachment and did a double-take:

Cool! There must be some kind of new truth-in-advertising laws for spammers.

Yes, I know that as programmers we could argue about naming all day long. We could point out that maybe “virusLoader.gif” or “exploit_exploit_muhaha.gif” would be a little better, and argue about the relative merits of camel case and underscores. But there’s no need; I think “runnable.gif” is short, clear, and definitely good enough. (Evidently someone else thought so too, and just shipped it.)