2008-08-01 – Sutter’s Mill

Today I received an email that asked:

I have recently come across your excellent articles on concurrency and the changes in software writing paradigm. They make a lot of sense, but I am having trouble translating them to my world of Telecom oriented web services, where practically everything is run through a DBMS. It seems to me we get everything “free”, simply by using an inherently concurrent multi-everything beast such as that :-) .

Could you please share your thoughts on the issue in one of your coming blog entries? It seems to me nowadays most complex systems would take advantage of a DBMS, certainly any application that is internet based, telecom oriented, or enterprise level. Be it in C++, Java, or PHP and its ilk, using a DBMS – often as a sort of message queue – is one of the best practices that ensures parallelism.

Sure. At right is a slide I give in talks that summarizes the answer to this question, and I’ve addressed this and other similar issues in an ACM Queue article.

The problem with taking advantage of multicore/manycore hardware isn’t (as much) on the server, it’s on the client. When experienced people say things like, “but the concurrency problem is already solved, we’ve been building scalable software for years,” that’s server or niche-client application people talking. That kind of laid-back sound bite sure isn’t coming from mainstream client application developers.

Typical Server Workloads

On the server:

Workloads typically already have lots of inherent concurrency (000s or 000,000s of incoming requests for web/dbms/etc. operations), and it’s easy to launch independent requests concurrently.
Shared data is typically inside highly structured relational databases where we have decades of experience with automatic concurrency control. The DBMS itself knows how to optimistically run transactions in parallel and control conflicts by escalating row locks to page locks, index locks to table locks, and so on.
The programming model is typically transactions, and all the programmer has to know about is to write “begin transaction; /* … read/write whatever stuff you feel the need to, then … */ end transaction;” which is about as sweet as it gets.

We already know how to build somewhat scalable server apps. Sure, it’s still rocket science and takes expert knowledge to do well. But we generally know the rocket science, have experts who can implement it with repeatable success, and have regularly scheduled missions to the “scalable servers” space station. With some care, we can say that the “concurrency problem is already solved” here.

Typical Client Workloads

The world is very different for typical mainstream client applications (i.e., I’m not talking about Photoshop and a handful of others), where:

Workloads don’t have lots of inherent concurrency. The user clicks one button, and we have to figure out how to divide the work and recombine it in order to get the answer faster on many cores.
Shared data is typically in unstructured pointer-chasing graphs of objects in shared memory that require explicit concurrency control. Note that “unstructured” doesn’t mean there’s no structure — of course there’s some — but it’s a gloriously diverse pile of objects and containers, and more like an organically growing shantytown with unplanned twisty little alleys and passages, than the nice rectangular downtown city street plan of a nice rectangular database table.
The programming model to protect shared data is to use error-prone explicit locks. You have to remember which locks protect which data, and not to acquire them in inconsistent and deadlock-prone orders.

We’re still discovering and productizing the rocket science here. You could say that the tools like OpenMP that we do have now are still at the V-2 stage — they have limited applicability, are somewhat fussy, and don’t always land where you aim them.

But we’re working on it. Up-and-coming tools like Threading Building Blocks are like the Mercury and Venera missions, setting out to reach successively higher goals and repeatability… and we’re starting to see what are perhaps Apollo– and ISS-class missions in the form of PLINQ, the Task Parallel Library, and one for native C++ we’ll be announcing in October at PDC. In part, these tools are trying to see how much we can make client workloads look more like server workloads, notably in providing a transaction-oriented programming model. For example, transactional memory is an area of active research that would let us write “begin transaction; /* … read/write whatever memory variables you feel the need to, then … */ end transaction;”, and if successful it could eventually replace many or even most existing uses of locks.

We have rightly celebrated some successful ‘manned’ flights with client products like Photoshop (parallel rendering) and Excel (parallel recalc) that scale to a number of cores. We’re on the road to, but still working toward, establishing the infrastructure and technology base to enable regularly scheduled commercial flights/shipments of scalable client applications that “light up” on multicore/manycore machines.

Day: August 1, 2008

Server Concurrency != Client Concurrency

Typical Server Workloads

Typical Client Workloads