Quad-core a "waste of electricity"?

Jeff Atwood wrote:

In my opinion, quad-core CPUs are still a waste of electricity unless you’re putting them in a server. Four cores on the desktop is great for bragging rights and mathematical superiority (yep, 4 > 2), but those four cores provide almost no benchmarkable improvement in the type of applications most people use. Including software development tools.

Really? You must not be using the right tools. :-) For example, here are three I’m familiar with:

image Visual C++ 2008’s /MP flag tells the compiler to compile files in the same project in parallel. I typically get linear speedups on the compile phase. The link phase is still sequential, but on most projects compilation dominates.
imageSince Visual Studio 2005 we’ve supported parallel project builds in Batch Build mode, where you can build multiple subprojects in parallel (e.g., compile your release and debug builds in parallel), though that feature didn’t let you compile multiple files in the same project in parallel. (As I’ve blogged about before, Visual C++ 2005 actually already shipped with the /MP feature, but it was undocumented.)
image Excel 2007 does parallel recalculation. Assuming the spreadsheet is large and doesn’t just contain sequential dependencies between cells, it usually scales linearly up to at least 8 cores (the most I heard that was tested before shipping). I’m told that customers who are working on big financial spreadsheets love it.
imageAnd need I mention games? (This is just a snarky comment… Jeff already correctly noted that “rendering, encoding, or scientific applications” are often scalable today.)

And of course, even if you’re having a terrible day and not a single one of your applications can use more than one core, you can still see real improvement on CPU-intensive multi-application workloads on a multicore machine today, such as by being able to run other foreground applications at full speed while encoding a movie in the background.

Granted, as I’ve said before, we do need to see examples of manycore (e.g., >10 cores) exploiting mainstream applications (e.g., something your dad might use). But it’s overreaching to claim that there are no multicore (e.g., <10 cores) exploiting applications at all, not even development tools. We may not yet have achieved the mainstream manycore killer app, but it isn’t like we have nothing to show at all. We have started out on the road that will take us there.

26 thoughts on “Quad-core a "waste of electricity"?

  1. Jeff Atwood is probably mostly a C# / ASP.NET developer, and compilation time is thus far less of an issue to him than to us poor C++ sods.

    As for games, I think most of them are still at the two-and-a-bit-threads simulation/rendering stage. We won’t see quadcores utilized well by games soon IMHO, outside of the few job queue-based PS3 ports.

  2. Also modern operating systems are capable of executing on multiple processors now (usually), this means things like sound multiplexing, networking, and I/O can also operate while your application gobbles a CPU.

    The real issue with the multicore chips is that per-core speed has gone down compared to single core chips.

  3. Ivan-Assen wrote: “Jeff Atwood is probably mostly a C# / ASP.NET developer.”

    Note that one of the examples I gave was of Visual Studio (not just Visual C++) parallel batch builds. Even Jeff presumably likes to build at least debug and release versions of his system, and would benefit from doing that in parallel.

    At any rate, the claim that “four cores provide almost no benchmarkable improvement in … software development tools” is just plain wrong for a number of modern tools.

    Ivan-Assen continued: “I think most [games] are still at the two-and-a-bit-threads simulation/rendering stage. We won’t see quadcores utilized well by games soon IMHO, outside of the few job queue-based PS3 ports.””

    Actually, see the links I provided for both Xbox and PC parallel games. Since Xbox 360’s launch, many Xbox games have already been well-wired for 6-core (3 cores x 2 hardware threads per core). And on the PC, Valve’s game engine demonstrates scalable performance on commodity multicore hardware, though you’re right that this has yet to work its way into lots of PC games.

    Of course, the Valve example is still fairly recent, and these are just first-generation examples. But more will yet come.

  4. “Even Jeff presumably likes to build at least debug and release versions of his system, and would benefit from doing that in parallel.”
    But isn’t that two? Sounds like dual, not quad. I mean, you might want 4 builds, I suppose, but quad still seems excessive for what you gain there, especially as it sucks up more magic lightning.

  5. Arguing over whether to have 4 or 2 cores will in a couple of months be like arguing over whether you should buy a processor with an FP coprocessor or not: Just about every mid-level and up CPU will be quad or more. Jeff’s statement is for those who are buying right now, are on a budget, and have the same needs as he. By the time we have finished arguing over this, dual cores will have to be specially ordered and fabbed for $1M each from Intel & AMD’s division of computing history.

    That said, personally I’d like a quad or more. The applications I write have test times of ten-fifteen minutes, so I usually set off a test run and keep editing the source while the test runs. That means I have the following running: editor, compiler, 3 x test processes w/ 2-3 threads each. Possibly a very odd workload, and I may well be in the extreme minority, but please allow me to appear as a data point.

    But what I’d like more than a quad core cpu would be much less i/o latency. Anyone have any ideas how to best match the storage and memory subsystems to multicore cpus? Is it just a question of beefing up the cache, or can one be smart with selecting the right bus / hd types / config? SSD is still a mite too expensive for me, though.

  6. “As for games, I think most of them are still at the two-and-a-bit-threads simulation/rendering stage. We won’t see quadcores utilized well by games soon IMHO”

    It is not true:
    One core for rendering is going to be too few as a single core for simulation is.

    Networking stuff should be placed in a separate thread to avoid facing fucky async stuff (and probably in Xbox 360 the XRNM library run in a separate thread)

    Physics Cards are coming, and they need special attention (and a thread)

    I believe in the equation: more threads = better performance

  7. ugasoft: I think games still have the problem of all threads trying to access a shared world model. If I remember correctly, UnrealEngine 3 still has only one thread for simulation and one for rendering.

  8. Hi Herb,

    Great response! You’re absolutely right about the huge C++ perf benefits for developers and I should have covered that.

    I posted a clarifying blog entry here:
    http://www.codinghorror.com/blog/archives/001103.html

    As for games, I’m a pretty avid gamer to put it mildly, and I have yet to see a single game scale to 4 cores in any significant way. Most barely scale to 2! I think there’s more room for hope on the dev tools side than the gaming side, frankly.

  9. So, I’ve got two of those puppies in my tender beast.

    I love it.

    Stop whining about me wasting juice, write me some software that will actually use that capacity.

    Get cracking already!

  10. Well, for all I know (and I do) 4+ cores in strategy games? – It’s like the type of games that can utilize 64 or more of them. Parallel AI and task processing of tens and hundreds of units can add so much spice to the genre.

  11. As a developer of applications for Apple’s OS X, I regularly produce “Universal Binaries” containing machine code for both PowerPC and Intel. For those, even the “precompile headers” and the link phases do both architectures in parallel.

    The unit tests, at the end of compilation, currently run one architecture at a time. (Xcode runs the PowerPC tests using Rosetta, the built-in emulator.)

    If my apps would benefit, I have the option of checking a single checkbox and getting a 4-way Universal Binary: 32-bit and 64 bit, for both Intel and PowerPC. But, since Rosetta doesn’t have a 64-bit version, only 3 of the architectures would get their unit tests run. (I’d have to ship the code to an actual PowerPC system to run the 64-bit powerPC unit test.)

  12. One core for rendering is what you can afford on the PC nowadays, because you can’t feed DirectX with commands from more than one thread. (Of course you can, if you use its pseudo-MT mode, which makes it internally lock a CS on every call. Ouch.)

    Sound and networking aren’t serious contenders for CPU time.

    Physics cards are officially dead, now that NVIDIA bought Ageia and are porting their API to GPUs. Serious use of physics, however, can and should stress [a] separate core[s].

  13. I’ve been recommending multi-* systems to every software development peer that would listing since (OMG) I bought my first dual Pentium-Pro system way-back-when.

    The win isn’t so much in single application performance (although I love getting my VC++ builds done in reasonable time), but in overall system experience. With one notable exception, every developer I know runs multiple applications (sometimes dozens). It’s a no-brainer. Get the most parallel system you can.

  14. What I found interesting was Jeff’s comment that it was “all about marketing.” Which, of course, right now, it is. But, as pointed out above, it won’t be for much longer … and at least the Intel techs have got the ball rolling with TBB., so even they don’t think it’s all about marketing.

    I see two basic reasons for developers to grab a four-core right now.

    The more general one is virtualisation. I’d give a lot to be able to run parallel builds on different platforms at the same time. In terms of “Marketing,” I’d also like to be able to persuade my employers to adopt a sane policy whereby the development, test, and (possibly multiple) production environments are entirely separate. I’ve not had much luck so far because of the perceived cost. I think terabyte disks and multi-core processors are a no-brainer for this non-problem.

    The more particular one relates to my background as a server-side programmer on multi-processor machines. I really don’t want to have to buy an ES10K just to test my system. On this point, I don’t even care about the inefficiencies implied by woeful locking strategies, queuing, etc; I’ll just sit back and model the results before putting the thing into production and watching it grind to a halt.

  15. Herb: “…but on most projects compilation dominates…”, i’ve found this to be completely untrue, linking takes most of the time, so much that we actually have to do the horrible trick of having “batchbuilds” (a single cpp files that includes all the others in a module) to speed things up (and no, proper use of precompiled headers do not help much). Also in my experience my work is usually IO bound, that’s a huge problem that in my opinion desktop applications have to face…

    Ivan: “…We won’t see quadcores utilized well by games soon IMHO, outside of the few job queue-based PS3 ports…”, actually as the 360 has 6 hardware threads I’m pretty sure that most modern games and engines are capable of scaling at least up to 4 cores. Most engines do employ high level parallelism doing stuff similar to parallel extensions for .net. See the valve software papers for more information.

  16. > Sound and networking aren’t serious contenders for CPU time.

    untrue.

    Networking doesn’t simply mean “packet handling”.
    Online (massive) multiplayer management include dead reckoning (that means physics simulation), data recovery, data interpolation… in client/server application this process could be critical and data-quality driven.

    It IS critical in sport games like racing games (that’s what I’m working for).

  17. > Sound and networking aren’t serious contenders for CPU time.

    About sound, in games you regularly want to decode 30+ sounds simultaneously… and add filtering and effects on top of that :)

    It can easily take 1/10th of processing time in a video game.

  18. You know, I’ve been reading the stuff some people putt here and is interesting to see some people discussing the technical merits or pitfalls of the multi/core multi CPU issue and frankly, overall I see a slight oversight of something very important. I love to wonder about the future of computing a lot but sometimes we forget to look at history and in this case the history of computing comes to mind.

    I remember a time when many people had a lot of doubts about GPUs, I mean even the big IBM overlooked this very badly when they rolled out their flashy Personal Systems 2, they had higher resolution and colors with VGA and XGA but in essence they came out with dumb graphic controllers that left most of the computing to the CPU. This wasn’t the only mistake they did with this computers, there were several. At a time when people were talking about multimedia all over the place and computers like the Amiga, Macintosh and Atari ST were showing a lot of multimedia prowess they didn’t even put a sound chip in their PC (no sound processor or DSP or the like, and this was a problem that Creative Labs and others had to solve), you know I knew when these PS2 computers came out that they were going to be a failure and I was right. Computers like the Amiga had graphics that ran circles around PCs with VGA and I remember companies like Texas Instruments had graphic accelerators chips for PCs but the point is that the industry basically did not put in my opinion as much emphasis in these as it should have and it took a long time until companies and the consumers realized that GPUs, Sound Cards, the GUI and multimedia were the way to go for PCs. Now, today what decent personal computer doesn’t have these capabilities?

    Now somebody might argue that Comodore and Atari dissapeared in the computer business and that IBM is still here but I think that this didn’t happen because of technical issues with their computers. Many people would say that some of these machines were actually far superior to the PCs of that time, the problem was that the PCs were better marketed because they were targeted at the business segments which was were the money was and Apple, Commodore and Atari were targeting a lot the home market because many people had foreseen that computers were going to go mainstream sooner or later and that was correct. The problem with that was that the home market AT THE TIME wasn’t ready for the personal computer yet. Now look how the PC is all over the place at peoples homes. But now the PC has the multimedia capabilities and the ease of use that they needed to have and they have the right price for this market and after all IBM was a much more powerful company than any of these at the time and that also gave it the wherewithal to survive some mistakes and adapt and at least Apple made it through. You see a lot of people back there (not some of us) didn’t think that things like graphic processors or floating point units were that important for PCs and lo and behold what the GPUs have done for 3D graphics today, you’ve seen the things that modern games and 3D applications can do with this added power, and, FPUs are embedded in CPUs and there is a lot of talk of embedding GPUs and other type of co processors in CPUs.

    The main point I’m trying to make is that we might be making the same mistake with multi core CPUs that some people made back there with GPUs and other technological changes or evolutions of the personal computer and I remember that GPUs had to go through a lot of growing pains (remember when we didn’t have a 3D standard like Open GL or Direct 3D? It took time to solve that), well it could be that multi core is the right way to go but that it might take time for both the hardware and software to grow to a point were these issues of not being able to harness all their power are resolved. We are in a transition time and remember that is a double transition because we are also changing into 64 bit computing in the mainstream and I believe that the history of the PC will tell us that some of these things take time to evolve and become solved. A few years down the road we might have efficient 64 cores CPUs and operating systems and software that really put this power to good use and we might be laughing at those old ideas that multi core CPU’s were nothing but a waste of resources.

  19. Being a C++ dev, I can say that tools like TBB can be a gold mine!

    On a large scale application, it is often better to design with multiple threads each running sequential problems and doing more work with more cores is the way to go. But in other scenarios, one code path has to go as fast as possible and make it scale to multicore. Libraries like TBB will make it a no-brainer to add parallelism into applications at almost no added effort! From now on, games built around tools like that will definitly scales to any core.

  20. > The link phase is still sequential, but on most projects
    > compilation dominates.

    I don’t think that speed-up compilation time of C++ projects is good readon to buy 4+ core cpu. If compilation dominates then project is
    a. are small enought that you’ll spare only few seconds on every compilation, or
    b. do not use dependency break techniques (nicely-described by… Herb Sutter).

    When project is big real nightmare is *linking* and 4+ core cpu does not help. That can help is pararel linker – like gold (http://google-opensource.blogspot.com/2008/04/gold-google-releases-new-and-improved.html)

  21. On a multicore CPUs (Intel’s HyperThreading included) all thread racing and deadlock conditions show up much much faster, provided you are not only developing your project on your workstation but also running and debugging the applications. So if you deal with multithreaded code having multicore CPU is just a must.

  22. On the topic of games using multi-core. It’s very different to say you using multi-core then it is to say you are using multi-core well.

    Using 6 theads is no feat. Using 6 threads with windows operating system is difficult. Using 6 threads in concert with an old fashioned memory system is nearly impossible with a memory footprint as big as a game that uses 2Gigs of ram on it’s own.

    A large majority of game engines are not using the memory system correctly. If you are not using the memory system efficiently on a single threaded application, how can you expect to achieve faster performance with multiple cores that share the memory system at once.

    Another issue: Even with 6 possible concurrent threads like the 360, it is hard to write scalable multi-threaded game subsystems. I don’t want to invest in new engine components that will scale to 4 cores and not to 8.

    I think a reasonable multi-threaded game engine design won’t be possible until we have 16 or more cores. Well – at least I would rather wait.

    Lastly: Another consideration is latency. This about this…a queue for submitting tasks, a command buffer in DX, the latency of a pipelined graphics card. 66.6 ms latency on a game running at 60Hz from input to display is not acceptable for most games.

    *steps off soap box*

Comments are closed.