Effective Concurrency: How Much Scalability Do You Have or Need?

The second Effective Concurrency column in DDJ just went live. It’s titled "How Much Scalability Do You Have or Need?" and makes the case that there’s more than just one important category of throughput scalability, and one size does not fit all. From the article:
In your application, how many independent pieces of work are ready to run at any given time? Put another way, how many cores (or hardware threads, or nodes) can your code harness to get its answers faster? And when should the answers to these questions not be "as many as possible"?
I hope you enjoy it.
Next month’s article is already in post-production. It will be titled "Use Critical Regions (Preferably Locks) to Eliminate Races" and will hit the web about a month from now. One of the early questions it answers is, How bad can a race be? There’s a hint in the article’s tagline: "In a race no one can hear you scream…"
Finally, here are some links to last month’s Effective Concurrency column and to a prior locking article of interest that provides a nice background motivation for the next few EC articles to come starting next month:

The Pit and the Pendulum

Don’t fall into the pit of thinking there’s no pendulum, or that the pendulum can be nailed to one side.

Earlier today, Michael Swaine wrote an article commenting on the "trend" of Google Gears, Adobe AIR, and Microsoft Silverlight. Here’s the opening blurb and intro paragraph:

Return of the Desktop

Is the rediscovery of the desktop just the latest swing of some tech-trend pendulum, or is there something more going on here?

This year, some of the big boys gave every impression of having suddenly and simultaneously remembered that there is such a thing as a desktop. Google got Geared up, Adobe announced AIR, and Microsoft saw the light with Silverlight, all of which are tools to help web developers integrate operations on the Web and the desktop just a little better. That oft-repeated mantra that the web browser is the new operating system? In 2007, not so much.

Of course it’s a pendulum. More specifically, it’s the industry constantly rebalancing the mix of several key technology factors, notably:

  • computation capacity available on the edge (from motes and phones through to laptops and desktops) and in the center (from small and large servers through to local and global datacenters)
  • communication bandwidth, latency, cost, availability, and reliability

This balancing actually isn’t news; we’ve been doing it since the dawn of computing. Conceptually, it’s not much different from how the designers of your PC balanced the kind and speed of memory to match the speed of the processor and the bus and the hard drive etc. to create a balanced system. We do and redo this exercise all the time. Here are just a few of the pendulum swings we’ve seen historically:

Era/Epoch The Center The Edge
Precambrian ENIAC
Cambrian Walk-up mainframes
Devonian Terminals and time-sharing
Permian Minicomputers
Triassic Microcomputers, personal computers
Jurassic File and print servers
Cretaceous Client/Server, server tier Client/Server, middle tier
Paleocene PDA
Eocene Web servers
Oligocene ActiveX, JavaScript
PDA phone
Miocene E-tailers
Pliocene Flash, AJAX
Pleistocene Web services
Data centers
Holocene Google Gears
Adobe AIR
Silverlight

How many pendulum swings can you count on just that list? In my own career, I’ve missed only the Precambrian and Cambrian (I’m a child of terminals and micros, and never had to carry stacks of punched cards uphill both ways in snow up to my waist). Many of you have experienced most of these swings.

It’s also not news that neither the center nor the edge is going to go away. We’re in an expanding computing universe: The question is not whether one will replace the other, but what balance they will be in at a given point. This will continue to be true for the foreseeable future no matter how often people on either end of the pendulum swing try to nail the pendulum where they want it for their own business reasons. (Take it from someone who lived through trying to market early peer-to-peer database and application models in the midst of Larry Ellison’s screaming-loud "network computer" hype, and had to deal with VC after VC who believed desktops and notebooks were going to evaporate. Sigh.)

The Computing Pendulum (slide from Craig Mundie's talk)What is news, of course, is how those factors are changing and therefore how their balance is changing. Craig Mundie has spoken about this pendulum in several talks this year, including last week’s Financial Analysts Meeting (transcript and WMP webcast link; slides, including the one reproduced at right).

Quoting from one of those talks:

One of the things that I also find fascinating at this point in time is how people, how easily we forget about the cyclical nature of the evolution of the computing paradigm.

And from another:

Right now, as the Internet has evolved, broadband has become more highly penetrated, and to some extent the computers seems to be not fully utilized, we’re in the middle of one of these natural pendulum like swings between centralized computing and computing at the edge. It started with the mainframe, and then we added terminals, and then we moved to departmental, and then we moved to personal; it just kind of moves back and forth. And there are a lot of people today who say, oh, you know, I think that in the future we’ll just have dumb presentation devices again, and we’ll do all the computing in the cloud.

But … I contend that since the cloud is made ultimately from the same microprocessors, as the utilization becomes higher, it becomes impractical for a whole variety of costs and latency reasons to think you can just push everything up the wire into some centralized computing utility.

And so, in fact, I think for the first time in a long time we’re going to see the pendulum come into a fairly balanced position where we, in fact, do have incredible power plants of the Internet in these huge datacenters that provide these integrating services across the network, but at the same time we’re going to see increasingly powerful local personal computing facilities in everything from embedded devices, cell phones, and on up the computing spectrum.

A nicely balanced view. The center (mainframes, datacenters) isn’t going away anytime soon. But neither is the edge (PDAs, laptops). It would obviously be foolish to imagine either away, at least yet, because they each have different capability, availability, performance, and reliability characteristics, so there’s plenty of reason to choose each one for a different part of an application or system.

Don’t fall into the pit of assuming the pendulum will get nailed to one side. That’s pretty unlikely. Bet on new technologies constantly being developed to bring the center and the edge into new balance by filling the holes where each is deficient and as the center and edge grow at different rates. Yesterday’s disconnected computers just couldn’t do everything you can on an Internet — so as internetworks became mainstream something like HTML and AJAX had to come to let us exploit them. Early and current web apps just can’t do everything you can on a rich client — hence first AJAX, then Gears, AIR, and Silverlight, with more still to come tomorrow and next year and next decade.

Fasten your seat belts.

Machine Architecture Talk at NWCPP in September

For anyone who’s interested and in the Pacific Northwest area, I’ll be giving a talk at NWCPP on September 19. Without giving too much away, I can tell you it will feature everything from a real rocket launch (well, on video) to a Roman chariot race (well, actually, sort of a simulation of a chariot race involving arrays, lists, and sets):

Machine Architecture: Things Your Programming Language Never Told You

High-level languages insulate the programmer from the machine. That’s a wonderful thing — except when it obscures the answers to the fundamental questions of “What does the program do?” and “How much does it cost?”

The C++/C#/Java programmer is less insulated than most, and still we find that programmers are consistently surprised at what simple code actually does and how expensive it can be — not because of any complexity of a language, but because of being unaware of the complexity of the machine on which the program actually runs.

This talk examines the “real meanings” and “true costs” of the code we write and run especially on commodity and server systems, by delving into the performance effects of bandwidth vs. latency limitations, the ever-deepening memory hierarchy, the changing costs arising from the hardware concurrency explosion,
memory model effects all the way from the compiler to the CPU to the chipset to the cache, and more — and what you can do about them.

See the NWCPP website for more details as the date gets closer.

NWCPP meetings are open and free-as-in-beer, so come one and come all, but note that some nights the room tends to fill up quickly so it can be worth being a little early to be sure of having an actual seat. NWCPP are currently meeting on the Microsoft campus in Building 41, which happens to be the building I work in, and so unfortunately none of my usual excuses for being late will work and I’ll have to show up on time…

C++ Video Podcasts with Bjarne

While at Stroustrup & Sutter #3, Bjarne and I took some time out to be interviewed by Ted Neward on behalf of the OnSoftware series from our mutual publisher Addison Wesley. We were only half-serious when we asked Ted to turn his I Love C# shirt inside out, but he did it — what a sport!

Two 10-minute video podcasts have been published so far:

  • Design and Evolution of C++: Talking about C++98, C++0x, and related topics including concurrency.
  • High Performance Applications with C++: Resuming with concurrency, HOPL, and enabling more than any one person could imagine in advance.

I’m told that these were in the top 25 of all tech podcasts in early July. Wow.

Here’s where you can find them:

  • Now at iTunes: Search Podcasts for Stroustrup, or click here to subscribe to OnSoftware video podcasts or here to subscribe to audio podcasts.
  • (Workaround) Now through OnSoftware Video RSS: In MP4 format, if you don’t do iTunes and have an MP4 viewer.
  • Soon at InformIT: To escape the iTunes and MP4 tyrannies, watch this homepage to see when they go up. I’m told InformIT will be self-hosting the podcasts sometime in the next week or two.

Andy Koenig’s C++ Blog, and Parrots

I recently discovered that Andy Koenig has a blog on DDJ (recommended). For those keeping score at home, Andy is one of the folks that Scott Meyers named as his "most important C++ people ever" last year. Andy is #2 (in chronological order), after Bjarne Stroustrup of course.

On a whimsically related note, I also recently had the pleasure of watching the engaging documentary The Wild Parrots of Telegraph Hill (also recommended). I mention it here because the main subject, Mark Bittner, reminds me a lot of Andy. Mark talks very much like Andy, and he even looks a little similar. I apologize in advance to those who think any similarity is all in my head, especially if they’re Mark (or Judy) or Andy (or Barb).

For the attention-span-challenged, I-can’t-wait-for-Netflix generation (which sometimes admittedly includes me, sigh), you can watch an interview of Mark and Judy here along with a few other related videos.

[Updated to add the chronological order note.]

EC: A new column on Effective Concurrency

I’m pleased to say that I’m starting a new column on Effective Concurrency in DDJ, and the first installment just went live.
It’s titled "The Pillars of Concurrency" and tries to make the case that we need to have a consistent mental model for talking about concurrency issues if we’re going to make any serious headway in designing and using concurrent systems. In particular, as the article notes:
Have you ever talked with another developer about concurrency, and felt as though you were somehow speaking completely different languages? If so, you’re not alone. You can see the confusion in our vocabulary…
Next month’s article is already in post-production. It’s titled "How Much Scalability Do You Have or Need?" and it’ll be online about a month from now. I’ll let you know here once it hits the web.
And who knows? If this newfangled essential-Items-at-3-to-5-pages-each format turns out to be popular, there just might eventually be a book here…

Name Lookup Uses the Static Type

I recently received the following question in email from Vijay Visana. I’ve slightly edited it for brevity and/or flow. Vijay writes:

While tinkering with multiple-inheritance in C++ I have come across one peculiarity that baffled me a lot.

A derived (multiple inheritance – no virtual base class) class having all pure abstract base classes can have multiple copies of a distant base class embedded in it and can call that class’s methods without ambiguity.

In Figure 1, C implements all the Pure ABC methods of all the above pure Abstract Base Classes. When I call method of C as following

C* p = new C;
p->Method_of_A(); // though it has two ways to reach A no ambiguity

Let’s pause here for a moment: Do you see why there’s no ambiguity? And did you notice the interesting tidbit of information in the brief description of C’s implementation?

But let’s read on and see the rest of the question:

Now in Figure 2 I twist the pure ABC hierarchy to introduce a closer path. Here C again implements all the Pure ABC methods of all the above pure Abstract Base Classes. When I call method of C as following

B4* p = new C;
p->Method_of_A();

At this point of time compiler (vc++) finds it ambiguous. I have seen that an adjustor thunk is being created (when I remove this dubious method call and debug it) for it. Virtual inheritance can solve the problem but I want to know just out of plain curiosity to understand the implementation of MI in C++ (or rather in VC++).

Okay, let’s look into this.

The difference in behavior has nothing to do with the complexity of the inheritance hierarchy or with vtables or thunking. Rather, it has to do with name lookup (in this case, finding "Method_of_A") which in turn has to do whether the static type of the object has the function you want, or whether the compiler has to look further (into base classes) to find the name.

Here’s a quick recap of what happens when we write a function call in C++:

  • First comes name lookup: The compiler looks around to find a function having the requested name. It starts in the current scope (in these cases, the scope of the class we’re calling the member function on) and makes a "candidate list" of all functions having that name; if the list is empty, it goes outward to the next enclosing scope (e.g., namespace or base class) and repeats. If it makes it all the way out to the global scope and still finds no candidates, sorry Charlie, you get "name not found."As soon as a scope is encountered that has at least one function with the requested name, the compiler goes to step two.
  • Second comes overload resolution: If the candidate list has more than one function in it, the compiler attempts to find a unique best match based on the argument and parameter types. If two or more functions are equally good (or bad) matches, sorry Charlie, you get "ambiguous call."
  • Third comes accessibility checking: Finally, the compiler looks to see whether you may actually call the function (e.g., that it’s not private). If you don’t have clearance to call the function, sorry Charlie, you’re not calling from within a derived class, a member function, or friend function and you should have thought of that before trying to access an inaccessible protected or private function. For shame.

I’ve written more about this in my books and articles. Here are a few that are online:

All three steps consider only the static type of the object. Here’s the key interesting information that makes the first example work:

In Figure 1, C implements all the Pure ABC methods of all the above pure Abstract Base Classes.

In other words, there is a function C::Method_of_A. So when the reader did this

C* p = new C;
p->Method_of_A(); // though it has two ways to reach A no ambiguity

the comment is really a red herring because name lookup is not reaching up to A at all. Rather, this code is invoking C::Method_of_A directly.

The second example is needlessly complex, but the key here is that the most-derived class is still the one actually implementing the overrides, but now B4 doesn’t. So when we do this:

B4* p = new C;
p->Method_of_A();

we’re using the object as a B4 and trying to invoke B4::Method_of_A, but since B4 doesn’t provide one itself, name lookup starts looking up through the base classes and finds two equal candidates that it can’t resolve using overload resolution, and so the call is ambiguous.

Name lookup is done based on the static type of the object, or the "type that we’re using it as right now," not on its dynamic type, or the type it really is (the two happen to be the same in Figure 1, and different in Figure 2, which contributed to the confusion). In Figure 1, the static type of the object p points to is C, because p has type pointer to C (as opposed to, say, pointer to some base of C). In Figure 2, the static type is B4, which does not implement Method_of_A and so name lookup goes off into the tree of base classes, finds equal candidates that have identical signatures and so aren’t distinguishable by overload resolution, and the call is ambiguous — irrespective of the fact that the object’s dynamic type happens to be C which uniquely implements Method_of_A. We’re using it as a B4, and so a B4 it shall be… for name lookup purposes, at any rate.

Talks online about C++0x

Today, someone asked me if I was going to give a talk anytime soon about C++0x, the next C++ standard. I’ve blogged about it before, notably here. But I haven’t yet prepared a talk about C++0x as a whole — in the meantime, I pointed that person to the following talks, and I thought other people might enjoy knowing about them too. They’re all by experts who are active standards committee members participating in the evolution of C++, and at least one of the names is probably familiar: