Welcome to the Jungle

With so much happening in the computing world, now seemed like the right time to write “Welcome to the Jungle” – a sequel to my earlier “The Free Lunch Is Over” essay. Here’s the introduction:

 

Welcome to the Jungle

In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done.

The free lunch is over. Now welcome to the hardware jungle.

 

From 1975 to 2005, our industry accomplished a phenomenal mission: In 30 years, we put a personal computer on every desk, in every home, and in every pocket.

In 2005, however, mainstream computing hit a wall. In “The Free Lunch Is Over” (December 2004), I described the reasons for the then-upcoming industry transition from single-core to multi-core CPUs in mainstream machines, why it would require changes throughout the software stack from operating systems to languages to tools, and why it would permanently affect the way we as software developers have to write our code if we want our applications to continue exploiting Moore’s transistor dividend.

In 2005, our industry undertook a new mission: to put a personal parallel supercomputer on every desk, in every home, and in every pocket. 2011 was special: it’s the year that we completed the transition to parallel computing in all mainstream form factors, with the arrival of multicore tablets (e.g., iPad 2, Playbook, Kindle Fire, Nook Tablet) and smartphones (e.g., Galaxy S II, Droid X2, iPhone 4S). 2012 will see us continue to build out multicore with mainstream quad- and eight-core tablets (as Windows 8 brings a modern tablet experience to x86 as well as ARM), image_thumb99and the last single-core gaming console holdout will go multicore (as Nintendo’s Wii U replaces Wii).

This time it took us just six years to deliver mainstream parallel computing in all popular form factors. And we know the transition to multicore is permanent, because multicore delivers compute performance that single-core cannot and there will always be mainstream applications that run better on a multi-core machine. There’s no going back.

For the first time in the history of computing, mainstream hardware is no longer a single-processor von Neumann machine, and never will be again.

That was the first act.  . . .

 

I hope you enjoy it.

10 thoughts on “Welcome to the Jungle

  1. Great article!, really does put things in to perspective.
    Is it safe to say that “Moore’s Law” is going out of the window?…
    (I mean we have been exceeding Moore’s Law for several years now, right?)
    The acceleration of advancement appears to have no bounds.

  2. Brilliant! Just brilliant. I often wondered what would follow your “Free lunch is over” meme, so the “Welcome to the jungle” meme comes as a perfect retro-futuristic follow-up. Retro for the (obvious?) GnR music undertones and futuristic for the ideas you outlined with your description here.

    I look forward to the full series of this meme.

    Many thanks! :)

  3. Herb – a good write up and I look forward to reading the rest of it. Well I started out on apple IIe with a Z80 CPU card, running digital CPM, dbase, visicalc and wordstar and then along came IBM PC DOS 1.0 I have never looked back. It has been a phenomenal ride from DOS, to Windows to OS/2, to Windows NT. in the process Microsoft did give us a hard time especially in the development area. Visual Studio and C++ was always languishing in the Microsoft Prison – at least lately we have seen some dramatic improvement in VS and the incorporation of C++ 11 features – it still is very slow in powering up compared to Builder C++.

    I have listened to ALL your presentation on C++ but one point that I am not clear is that if one is developing a metro style APP and uses WINRT, then from a performance point of view it should not make a difference if one codes in C++ or C# as both languages are making direct calls to WINRT ?

  4. Excellent, Herb, as always. Thanx!

    @John: I view “restrict” as a declaration of intent with two main purposes:

    1- Indicates to the compiler that it should search for a GPU (or special-purpose) translation of the code.
    2- Provides feedback to the programmer when the code snippet cannot be translated to conform to the special-purpose.

    If you think that well-typed data is a good thing, then this sort of facility provides for well-typed code. Like well-typed data, it’s possibly not strictly needed in the long run, but it’s nevertheless a good idea.

    I do believe that pervasive heterogeneous computing requires this sort of strict typing for the code, and also for the execution environment (which facilities and data it has access to).

    If you want to look at another initiative along the same lines, check out the gpipe package for Haskell (http://www.haskell.org/haskellwiki/GPipe), which was an attempt to blur the lines between CPU and GPU, through the use of polymorphic (akin to templated, although the comparison is imperfect) but ultimately strictly-typed code. Unfortunately in a state of bitrot.

  5. “developer express code that is restricted to use just a subset of a mainstream programming language” — if I write a lambda that is pure *and* low on branching, can’t I expect (eventually) a smart compiler to create code for CPU and gpu without needing the c++amp restrict keyword? I see the use for restrict now, but should I expect that to continue?

  6. @Herb: Thanks for the reply!

    Yes, both ASFs are video links, just double-checked with Firefox (and watched before with VideoLAN – VLC). Being a Windows Media file type, I guess it should work w/ Windows Media Player, too! :-)

    BTW, the slides I’ve linked to don’t correspond to the video talk one-to-one (different revision), unfortunately, the original link to slides I’ve had already 404-ed, but you can also try (also different revs., but somewhat closer perhaps):

    http://www.docstoc.com/docs/16601629/The-Microprocessor-Ten-Years-from-Now-Why-it-is-Relevant-to-all
    http://internetconferences.net/belgrade2009/Yale%20Patt%20-%20Future%20Microprocessors%20-%20What%20must%20we%20do%20differently.ppt
    http://www.ece.utexas.edu/~patt/10s.382N/handouts/Implications_of_multi-core.ppt

    I guess it’s best to just watch the video of the talk!

    I’d be interested in hearing your comments if you could find the time, I’m curious if you see things the similar way (I also found the part about improvements in ILP not being over yet quite interesting)!

    This slide is interesting in the context of this discussion:

    * 50 billion transistors naturally leads to
    – A large number of simple processors, AND
    – A few very heavyweight processors, AND
    – Enough “accelerators” for handling lots of special tasks

    * Heavyweight processors mean we can exploit ILP
    – i.e., Improve performance on hard, sequential problems
    – The death of ILP has been greatly exaggerated
    – There is still plenty of head room yet to be pursued

    * We need software that can utilize both

    Also, the discussion on Tightly-coupled vs Loosely-coupled concurrency and the memory implications thereof (on another occasion — the following is from a course, not the talk) caught my attention in this context:

    http://www.ece.utexas.edu/~patt/11s.460N/handouts/concurrency_handout.pdf
    Outline // slide #10
    Multi-core (one thread spans an engine, multiple engines)
    – Tightly coupled vs. Loosely coupled
    – Interconnect
    – Cache Coherency
    – Memory Consistency

    Tightly-coupled vs Loosely-coupled // slide #25

    * Tightly coupled (i.e., Multiprocessor)
    – Shared memory
    – Each processor capable of doing work on its own
    – Easier for the software
    – Hardware has to worry about cache coherency, memory contention

    * Loosely-coupled (i.e., Multicomputer Network)
    – Message passing
    – Easier for the hardware
    – Programmer’s job is tougher

    [PPT, diff. revision] http://www.ece.utexas.edu/~patt/10s.382N/handouts/Approaches_to_Concurrency.ppt

  7. @Matt: No, I hadn’t seen that lecture, thanks. Yes, the small overlap seems to be that we both cover the first half of heterogeneity — big/fast vs. small/slow cores. But then one can go quite a bit further… (BTW, are those links to video, or just audio?)

  8. Well the intro has me hooked, sounds like a fascinating read, can’t wait :-)

    Despite the switch to multicore systems however, I still find it amazing the number of developers who work in a single core mindset!

  9. Interesting!

    This reminds me of the “Multicore, Meganonsense, and the Future, if We Get It Right” talk by Yale N. Patt /* http://www.ece.utexas.edu/~patt/ */ — especially the “Heterogeneous, not Homogeneous” part.

    I was wondering, have you had a chance to see it and would you agree that it’s basically in the same trend?

    Slides:
    http://www.bscmsrc.eu/sites/default/files/media/yale_bmw2011.pdf

    Videos:
    http://hps.ece.utexas.edu/videos.html
    http://hps.ece.utexas.edu/media/talks/UPCRC-2010-04-30yp.asf
    http://users.ece.utexas.edu/~patt/Videos/talk_videos/UPCRC-2010-04-30yp.asf

Comments are closed.