TRS-80 vs. Alpha, and Parallel Optimization

Lest people get the wrong idea, I enjoy reading Jeff Atwood’s blog and agree with much of what he writes so entertainingly and provocatively. So far I’ve only responded when I strongly felt differently about something, which has been a grand total of twice now.

So let me also offer an example of something I wholeheartedly agree with. Yesterday, Jeff cited what is also my own favorite Programming Pearls figure:

Despite the enduring wonder of the yearly parade of newer, better hardware, we’d also do well to remember my all time favorite graph from Programming Pearls:

Everything is fast for small n.

Spot on. If you’re a professional programmer and haven’t read Programming Pearls yet, “run don’t walk” to your bookstore of choice.

Incidentally, just to tie this in to parallel computing as well, Jeff’s article also cites a nice graph of optimizations that improved NDepend:

Patrick Smacchia’s lessons learned from a real-world focus on performance is a great case study in optimization.

Patrick was able to improve nDepend analysis performance fourfold, and cut memory consumption in half. As predicted, most of this improvement was algorithmic in nature, but at least half of the overall improvement came from a variety of different optimization techniques.

As I’ve said many times, measure twice, optimize once: Know when and where to optimize. Profilers are your friend. As Patrick writes:

When it comes to enhancing performance there is only one way to do things right: measure and focus your work on the part of the code that really takes the bulk of time, not the one that you think takes the bulk of time.

And of course, in our ever-more-multicore world, the contribution of parallelization gain will continue to grow and dominate the optimization of CPU-bound code. But as Patrick also notes, realizing that gain is not always trivial:

While we get the 15% gain from between 1 and 2 processors, the gain is almost zero between 2 and 4 processors. We identified some potential IO contentions and memory issues that will require more attention in the future. This leads to another lesson: Don’t expect that scaling on many processors will be something easy, even if you don’t share states and don’t use synchronization.

Rich-GUI SaaS/Web 2.0 Apps Should Not Be Considered Harmful

Yesterday, the ever-popular Jeff Atwood (of Coding Horror fame) wrote an article [*] on how not to write Web 2.0 UIs. Unfortunately, it’s exactly backwards: What he identifies as a problem is in fact not only desirable, but necessary.

[*] Aside: Jeff, I know you love pictures, but is that particular gratuitous one really necessary? Yes, I know it’s CGI, but it made me really hesitate about linking to your post and has nothing to do with your technical point.

Jeff observes correctly that when you write an application to run on a platform like Windows and/or OS X, your application should follow the local look-and-feel. Fine so far. But he then repeats a claim that I believe is incorrect, at least today, and based on a fallacy — and adds another fallacy:

[Quoting Bill Higgins]

… a Windows application should look and feel like a Windows application, a Mac application should look and feel like a Mac application, and a web application should look and feel like a web application.

Bill extends this to web applications: a web app that apes the conventions of a desktop application is attempting to cross the uncanny valley of user interface design. This is a bad idea for all the same reasons; the tiny flaws and imperfections of the simulation will be grossly magnified for users.

…

When you build a “desktop in the web browser”-style application, you’re violating users’ unwritten expectations of how a web application should look and behave.

There are actually two fallacies here.

Fallacy #1: “Look and feel like a web application”

The first fallacy is here:

a web application should look and feel like a web application.

…

violating users’ unwritten expectations of how a web application should look and behave.

These assertions beg the question: What does a web application “look and feel like,” and what do users expect? Also, are you talking about Web 1.0, where there is an answer to these questions, or Web 2.0, where there isn’t?

For Web 1.0 applications, the answer is fairly easy: They look like hyperlinked documents built on technologies like HTML and CSS. That’s what people expect, and get.

But the examples Bill uses aren’t Web 1.0 applications, they’re Web 2.0 applications. For Web 2.0 applications, there are no widely accepted UI standards, and applications are all over the map. Indeed, the whole point of Ajax-y/Web2.0-y applications is to get beyond the current 1.0-standard functionality.

Not only are there are no widely-accepted UI standards, there aren’t even many widely-accepted UI technologies. Consider how many dissimilarities there are among just Flash, Silverlight, and JavaFX as these technologies compete for developer share. Then consider that even within any one of these technologies people actually build wildly diverse interfaces.

Here’s the main example these bloggers used:

Consider the Zimbra web-based email that Bill refers to.

It’s pretty obvious that their inspiration was Microsoft Outlook, a desktop application.

But what’s wrong with Zimbra?

Here’s a Better Question #1: How could you do better?

And for bonus points, Still Better Question #2: What about OWA? Consider that Microsoft already provides essentially the same thing, with the same approach, in the form of Outlook Web Access, which looks remarkably like the usual Outlook [caveat: this is an example of why I write above and below that ‘most’ Web 2.0 apps don’t try to emulate a particular OS look-and-feel; this one does]. For example (a couple of sample shots taken from this brief video overview):

The rich UI isn’t a bug, it really is a feature — a killer feature we’re going to be seeing more of, not less of, because this is what delivering software-as-a-service (SaaS) is all about. Although I use the desktop Outlook most of the time, I like OWA and think it’s the best web-based email and calendaring I’ve seen especially when I’m away from my machine (and I’ve tried several others, though granted you do need to be using Exchange). I suspect its UI conventions are probably pretty accessible even to non-Windows users, though that’s probably debatable and your mileage may vary.

Fallacy #2: “A web app that apes the conventions of a desktop application is “

The second fallacy is Jeff’s comment (boldface original):

a web app that apes the conventions of a desktop application is attempting to cross the uncanny valley of user interface design.

I think this is flawed for three reasons.

First, the “uncanny valley” part makes the assumption that people will find it jarring that the app tries to look like a desktop application. (This concern is related to, but different from, the concern of fallacy #1.) But most such apps aren’t doing that at all, because they know their users will access them from PCs, Macs, and lots of different environments, and they have to look reasonable regardless of the user’s native environment. They’re usually not trying to duplicate a given platform’s native look and feel.

Second, what they are doing is borrowing from UI components and conventions that already work well in desktop GUI environments, and are common across many of those environments. When you have no standards for Web 2.0 look and feel, then doing the best you can by borrowing from ideas we already know work pretty well isn’t just okay, it’s necessary. What else can you do?

Finally, the worst part is this: The whole point of SaaS is to deliver desktop-like rich-GUI applications on the web. So what is being labeled ‘wrong’ above is the whole point of what we’re doing as an industry.

“SaaS/Web 2.0 on Web 1.0”: The new “GUI on DOS”

Most SaaS/Web 2.0 applications today look and feel pretty much the way GUI applications looked and felt like on DOS, before technologies like Windows and OS/2 PM existed. Around the late 1980s, people wrote lots of GUI applications that ran on DOS, but we didn’t have a widely-used common GUI infrastructure that handled basic windows and menus and events, much less standards like CUA that tried to say how to use such a common infrastructure if we had it. So they each did their own thing, borrowing where possible from what seemed to work well for GUIs on other platforms.

Twenty years ago, everyone writing GUIs on DOS designed the UIs as best they could, borrowing where possible from what they saw worked on platforms like the Macintosh and Xerox Alto and Star — but the results were all over the map, and would stay that way until a standard environment, followed by standard guidelines, came into being.

Today, everyone writing rich Web 2.0 applications is doing their own thing, borrowing as best they can from Macs and Windows and others — but the results are all over the map, and will continue to be until there actually is such a thing as a UI standard for rich-GUI web applications. You can see that in the differences between Zimbra and Outlook Web Access. In the meantime, it’s not just okay to borrow from what we’ve learned on the desktop; it’s necessary.

And the question isn’t whether metaphors users already understand on the desktop will migrate to the web, but which ones and how soon, because it’s the whole point of SaaS. The industry will soon be going well beyond Google Apps; with offerings like Office Online already announced for the short term, which puts still more rich-client GUI apps like word processors and spreadsheets in the browser (with functionality somewhere between Google Apps and the desktop version of Office).

Zimbra and Outlook Web Access aren’t examples of poor web app design, but exactly the opposite: They’re just the beginning of the next wave of rich web apps.

Effective Concurrency: Measuring Parallel Performance — Optimizing a Concurrent Queue

This month’s Effective Concurrency column is special — it turned into a feature-length article. (I don’t know whether it’ll officially be called a “feature” or a “column” in the print issue.) “Measuring Parallel Performance: Optimizing a Concurrent Queue” just went live on DDJ’s site, and will also appear in the print magazine.

From the article:

How would you write a fast, internally synchronized queue, one that callers can use without any explicit external locking or other synchronization? Let us count the ways…or four of them, at least, and compare their performance. We’ll start with a baseline program and then successively apply three optimization techniques, each time stopping to measure each change’s relative performance for queue items of different sizes to see how much each trick really bought us.

The goal of the article is to see how to measure and understand our code’s parallel performance and measure the actual effect of specific optimizations. Disclaimer: The goal of this article is not to write the fastest possible queue in the world (though it’s pretty good). I’ve already had plenty of email on recent queue-related columns from people who sent me their “faster” implementations; writing lock-free queues seems to be a popular indoor sport. Interestingly, for well of half of the ones I received, a 30-second glance at the code was enough to determine that the code had to be incorrect. Why? Because if it doesn’t do any synchronization on the shared variables — if there aren’t any locks, atomics, fences or other synchronization in the code — then it has races, which will manifest in practice even on forgiving platforms like x86/x64, and there’s no need to look further. (For more details, see the September 2008 column, Lock-Free Code: A False Sense of Security. Even some code submissions I received in response to that very article were broken for the same reasons shown in that article.)

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns and the DDJ print magazine issue in which they first appeared:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

(out of order) Effective Concurrency: Writing Lock-Free Code — A Corrected Queue

Oops, I just noticed that I forgot to blog about one recent Effective Concurrency column: “Writing Lock-Free Code: A Corrected Queue” which also appeared in the October 2008 print issue of Dr. Dobb’s Journal.

From the article:

As we saw last month [1], lock-free coding is hard even for experts. There, I dissected a published lock-free queue implementation [2] and examined why the code was quite broken. This month, let’s see how to do it right.

Here is the complete-as-of-this-writing set of links to the published Effective Concurrency columns. As always, the months reflect the magazine print issue dates; they usually hit the web a bit sooner:

August 2007: The Pillars of Concurrency

September 2007: How Much Scalability Do You Have or Need?

October 2007: Use Critical Sections (Preferably Locks) to Eliminate Races

November 2007: Apply Critical Sections Consistently

December 2007: Avoid Calling Unknown Code While Inside a Critical Section

January 2007: Use Lock Hierarchies to Avoid Deadlock

February 2008: Break Amdahl’s Law!

March 2008: Going Superlinear

April 2008: Super Linearity and the Bigger Machine

May 2008: Interrupt Politely

June 2008: Maximize Locality, Minimize Contention

July 2008: Choose Concurrency-Friendly Data Structures

August 2008: The Many Faces of Deadlock

September 2008: Lock-Free Code: A False Sense of Security

October 2008: Writing Lock-Free Code: A Corrected Queue

November 2008: Writing a Generalized Concurrent Queue

December 2008: Understanding Parallel Performance

Effective Concurrency: Understanding Parallel Performance

Wow, DDJ just posted the previous one a few days ago, and already the next Effective Concurrency column is available: “Understanding Parallel Performance” just went live, and will also appear in the print magazine.

From the article:

Let’s say that we’ve slickly written our code to apply divide-and-conquer algorithms and concurrent data structures and parallel traversals and all our other cool tricks that make our code wonderfully scalable in theory. Question: How do we know how well we’ve actually succeeded? Do we really know, or did we just try a couple of tests on a quad-core that looked reasonable and call it good? What key factors must we measure to understand our code’s performance, and answer not only whether our code scales, but quantify how well under different circumstances and workloads? What costs of concurrency do we have to take into account?

This month, I’ll summarize some key issues we need to keep in mind to accurately analyze the real performance of our parallel code. I’ll list some basic considerations, and then some common costs. Next month, I have a treat in store: We’ll take some real code and apply these techniques to analyze its performance in detail as we successively apply a number of optimizations and measure how much each one actually buys us, under what conditions and in what directions, and why.

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):

August 2007: The Pillars of Concurrency

September 2007: How Much Scalability Do You Have or Need?

October 2007: Use Critical Sections (Preferably Locks) to Eliminate Races

November 2007: Apply Critical Sections Consistently

December 2007: Avoid Calling Unknown Code While Inside a Critical Section

January 2007: Use Lock Hierarchies to Avoid Deadlock

February 2008: Break Amdahl’s Law!

March 2008: Going Superlinear

April 2008: Super Linearity and the Bigger Machine

May 2008: Interrupt Politely

June 2008: Maximize Locality, Minimize Contention

July 2008: Choose Concurrency-Friendly Data Structures

August 2008: The Many Faces of Deadlock

September 2008: Lock-Free Code: A False Sense of Security

October 2008: Writing Lock-Free Code: A Corrected Queue

November 2008: Writing a Generalized Concurrent Queue

December 2008: Understanding Parallel Performance

Effective Concurrency: Writing a Generalized Concurrent Queue

The next Effective Concurrency column, “Writing a Generalized Concurrent Queue”, just went live on DDJ’s site, and also appears in the print magazine.

From the article:

Last month [1], I showed code for a lock-free queue that supported the limited case of exactly two threads—one producer, and one consumer. That’s useful, but maybe not as exciting now that our first rush of lock-free coding glee has worn off. This month, let’s tackle the general problem of supporting multiple producers and multiple consumers with as much concurrency as possible. The code in this article uses four main design techniques: …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns (based on the magazine print issue dates):

August 2007: The Pillars of Concurrency

September 2007: How Much Scalability Do You Have or Need?

October 2007: Use Critical Sections (Preferably Locks) to Eliminate Races

November 2007: Apply Critical Sections Consistently

December 2007: Avoid Calling Unknown Code While Inside a Critical Section

January 2007: Use Lock Hierarchies to Avoid Deadlock

February 2008: Break Amdahl’s Law!

March 2008: Going Superlinear

April 2008: Super Linearity and the Bigger Machine

May 2008: Interrupt Politely

June 2008: Maximize Locality, Minimize Contention

July 2008: Choose Concurrency-Friendly Data Structures

August 2008: The Many Faces of Deadlock

September 2008: Lock-Free Code: A False Sense of Security

October 2008: Writing Lock-Free Code: A Corrected Queue

November 2008: Writing a Generalized Concurrent Queue

September 2008 ISO C++ Standards Meeting: The Draft Has Landed, and a New Convener

The ISO C++ committee met in San Francisco, CA, on September 15-20. You can find the minutes here, including the votes to approve papers.

The most important thing the committee accomplished was this:

Complete C++0x draft published for international ballot

The biggest goal entering this meeting was to make C++0x feature-complete and stay on track to publish a complete public draft of C++0x for international review and comment — in ISO-speak, an official Committee Draft or CD. As I predicted in the summer, the committee achieved that at this meeting. Now the world will know the shape of C++0x in good detail. Here’s where to find it: The September C++0x working draft document is essentially the same as the September 2008 CD.

This is “it”, feature-complete C++0x, including the major feature of “concepts” which had its own extensive set of papers for language and library extensions — I’ll stop there, but there are still more concepts papers at the mailing page and some more still to come during the CD phase. (If you get the impression that concepts is a big feature, well, it is indeed easily the biggest addition we made in C++0x.)

What’s next? As I’ve mentioned before, we’re planning to have two rounds of international comment review. The first of two opportunities for national bodies to give their comments is now underway; the second round will probably be this time next year. The only changes expected to be made between that CD and the final International Standard are bug fixes and clarifications. It’s helpful to think of a CD as a feature-complete beta, and we’re on track to ship one more beta before the full release.

And a new convener

On a personal note, I’m very happy to see this accomplished at the last meeting during my convenership. I’ve now served as secretary and then convener (chair) of the ISO C++ committee for over 10 years, and my second three-year term as convener ended one week after the San Francisco meeting. A decade is enough; I decided not to volunteer for another term as chair.

As of a few weeks ago, P. J. Plauger is the new convener of ISO/IEC JTC1/SC22/WG21 (C++). Many of you will know P.J. (or Bill, as he’s known within the committee) from his long service to the C and C++ communities, including that he has been a past convener of the ISO C standards committee, past editor of the C/C++ Users Journal, the principal author of the Dinkumware implementation of the C++ standard library, and recipient of the 2004 Dr Dobb’s Journal Excellence in Programming Award, among various other qualifications and honors. He has been a regular participant at ISO C++ meetings for about as long as they’ve been held, and his long experience with both the technology and the ISO standards world will serve WG21 well.

I’m very happy to have been able to chair the committee during the development of C++0x. Now as we move from “develop mode” into “ship mode” it will be great to have his experienced hand guiding the committee through the final ISO process. Thanks for volunteering, Bill!

Stroustrup & Sutter on C++ 2008, Second Showing: October 30-31, 2008, in Boston, MA, USA

This spring at SD West in Santa Clara, Bjarne and I did a fresh-and-updated S&S event with lots of new material.

We don’t usually repeat the same material, but this time there’s been such demand that we agreed to do a repeat… four weeks from today, in Boston. More information and talk descriptions follow.

CONTENT ADVISORY

Again, usually our S&S events feature mostly new material, but this one is almost identical to the material we did in spring 2008 in Santa Clara. Bjarne is substituting one talk, and will present “C++ in Safety-Critical Systems” instead of his talk on C++’s design and evolution; and we’ll both be updating to the material to reflect the current state of the draft ISO C++0x standard. But otherwise it’ll be identical.

So:

If you missed our event this spring, here’s your second chance! It was our highest-rated S&S ever, and in the post-conference survey we asked the question “Would you recommend this course to a colleague?” and 100% said yes.
If you already attended this spring and came to all our sessions, you’ve seen nearly all this material already, but feel free to encourage a friend or colleague to attend who you think would benefit from the material.

The Talks

Wednesday, October 29, 2008

C++0x Overview (Bjarne Stroustrup)

Safe Locking: Best Practices to Eliminate Race Conditions (Herb Sutter)

How to Design Good Interfaces (Bjarne Stroustrup)

Lock-Free Programming in C++—or How to Juggle Razor Blades (Herb Sutter)

Grill the Experts: Ask Us Anything! (Bjarne Stroustrup & Herb Sutter)

Thursday, October 30, 2008

[“Best of Stroustrup & Sutter”] Update of talk voted “Most Informative” at S&S 2007: Concepts and Generic Programming in C++0x (Bjarne Stroustrup)

What Not to Code: Avoiding Bad Design Choices and Worse Implementations (Herb Sutter)

C++ in Safety-Critical Systems (Bjarne Stroustrup)

How to Migrate C++ Code to the Manycore “Free Lunch” (Herb Sutter)

Discussion on Questions Raised During the Seminar (Herb Sutter & Bjarne Stroustrup)

Registration

This two-day seminar is getting billing on two different conferences that are running at the same time in the Hynes Convention Center: SD Best Practices and the Embedded Systems Conference. S&S is technically part of both conferences, which means you can attend S&S via either one… either

and both ways will include options that include our two-day seminar.

I look forward to seeing many of you in Boston! Best wishes,

Herb

Data and Perspective

Even genuinely newsworthy topics can get distorted when commentators exaggerate or use data selectively. Here are two recent examples I noticed.

“This is the worst financial crisis since the Great Depression.” It’s true that it’s bad and even historic, and this sound bite correctly doesn’t actually claim it’s as bad as the Depression. I hope it doesn’t turn out to be in the same league as that; then, people were lining up at soup kitchens. For now, however, Apple is still on track to sell 10 million iPhone 3Gs this year, which says something.

“Yesterday [Monday, September 29, 2008] saw the worst single-day plunge in Dow Jones history.” “It’s a new Black Monday.” Well, these got my attention, because I remember Black Monday on October 19, 1987 very well. I was working in IT at a major bank, doing software application support for traders and related departments. When I went up to the trading floor that day, I immediately knew something was badly wrong because of the eerie sound as the elevator doors opened — a sound you never hear during trading hours, and certainly not from a room full of traders standing at their desks: silence.

Yesterday’s loss of 777 points was stunning as the largest single-day point loss in Dow history. But as a percentage loss that’s not even in the top 10 Bad Dow Days, all but two of which occurred before 1935. Those two since the Depression occurred on October 19 and 26, 1987, when the Dow lost 22.6% and then another 8% of its total value in single sessions, respectively #2 and #9 on All-Time Bad Dow Days list. For perspective, as of this writing the Dow is down 19.8% so far this entire year, and it surely hasn’t been a good year. Today’s crisis is already historic and could well get worse yet, of course, but some of us do remember some pretty bad ones in the past.

As good old Sam Clemens said (approximately), there are lies, darned lies, and statistics. Even when the statistics are true, always cross-check them for perspective. Even the best news and the worst news can be overstated, and viewing the same data from multiple angles helps ensure we understand it properly.

[Edited 9/30 to add: At 22.6%, Black Monday in 1987 was actually the worst Dow Day ever if you don’t count “reopening days” after unusual market closures when the markets catch up with events that happened while they were closed. So when was the all-time worst Dow Day ever? Perhaps surprisingly, the answer is not in 1929, though several of the all-time top 10 were in that year. Rather, it was December 12, 1914, when the Dow dropped 24.4% after the markets reopened after being closed entirely for over four months due to the outbreak of World War I. (The markets closed on July 30, two days after Austria-Hungary declared war and a day before Germany did.) That helps put in perspective just how bad October 1987 was, and of course that today’s crisis is also pretty bad even if it hasn’t beaten those prior records, yet.]

Ralph Johnson on Parallel Programming Patterns

A few days ago at UIUC, Ralph Johnson gave a very nice talk on “Parallel Programming Patterns.” It’s now online, and here’s the abstract:

Parallel programming is hard. One proposed solution is to provide a standard set of patterns. Learning the patterns would help people to become expert parallel programmers. The patterns would provide a vocabulary that would let programmers think about their programs at a higher level than the programming language. The patterns could steer programmers away from common errors and towards good design principles.

There have been a number of papers about parallel patterns, and one book Patterns for Parallel Programming. None of them have become popular. I think the problem is that parallel programming is diverse and requires more design expertise than traditional software design. Thus, parallel programming experts use more patterns than parallel programming expert. I’ll critique the existing patterns and explain what I think should be done to make a set of patterns that can be as effective for parallel programming as patterns have been for object-oriented design.

If Johnson’s name sounds familiar, it should: He’s one of the “Gang of Four” authors of the seminal book Design Patterns.

Recommended viewing.