Archive for the ‘Hardware’ Category

In Welcome to the Jungle, I predicted that “weak” hardware memory models will disappear. This is true, and it’s happening before our eyes:

  • x86 has always been considered a “strong” hardware memory model that supports sequentially consistent atomics efficiently.
  • The other major architecture, ARM, recently announced that they are now adding strong memory ordering in ARMv8 with the new sequentially consistent ldra and strl instructions, as I predicted they would. (Actually, Hans Boehm and I influenced ARM in this direction, so it was an ever-so-slightly disingenuous prediction…)

However, at least two people have been confused by what I meant by “weak” hardware memory models, so let me clarify what “weak” means – it means something different for hardware memory models and software memory models, so perhaps those aren’t the clearest terms to use.

By “weak (hardware) memory model” CPUs I mean specifically ones that do not natively support efficient sequentially consistent (SC) atomics, because on the software side programming languages have converged on “sequential consistency for data-race-free programs” (SC-DRF, roughly aka DRF0 or RCsc) as the default (C11, C++11) or only (Java 5+) supported software memory model for software. POWER and ARMv7 notoriously do not support SC atomics efficiently.

Hardware that supports only hardware memory models weaker than SC-DRF, meaning that they do not support SC-DRF efficiently, are permanently disadvantaged and will either become stronger or atrophy. As I mentioned specifically in the article, the two main current hardware architectures with what I called “weak” memory models were current ARM (ARMv7) and POWER:

  • ARM recently announced ARMv8 which, as I predicted, is upgrading to SC acquire/release by adding new SC acquire/release instructions ldra and strl that are mandatory in both 32-bit and 64-bit mode. In fact, this is something of an industry first — ARMv8 is the first major CPU architecture to support SC acquire/release instructions directly like this. (Note: That’s for CPUs, but the roadmap for ARM GPUs is similar. ARM GPUs currently have a stronger memory model, namely fully SC; ARM has announced their GPU future roadmap has the GPUs fully coherent with the CPUs, and will likely add “SC load acquire” and “SC store release” to GPUs as well.)
  • It remains to be seen whether POWER will adapt similarly, or die out.

Note that I’ve seen some people call x86 “weak”, but x86 has always been the poster child for a strong (hardware) memory model in all of our software memory model discussions for Java, C, and C++ during the 2000s. Therefore perhaps “weak” and “strong” are not useful terms if they mean different things to some people, and I’ve updated the WttJ text to make this clearer.

I will be discussing this in detail in my atomic<> Weapons talk at C&B next week, which I hope to make freely available online in the near future (as I do most of my talks). I’ll post a link on this blog when I can make it available online.

Read Full Post »

imageHere’s another deep session for C&B 2012 on August 5-8 – if you haven’t registered yet, register soon. We got a bigger venue this time, but as I write this the event is currently almost 75% full with five weeks to go.

I know, I’ve already posted three sessions and a panel. But there’s just so much about C++11 to cover, so here’s a fourth brand-new session I’ll do at C&B 2012 that goes deeper on its topic than I’ve ever been willing to go before.

atomic<> Weapons: The C++11 Memory Model and Modern Hardware

This session in one word: Deep.

It’s a session that includes topics I’ve publicly said for years is Stuff You Shouldn’t Need To Know and I Just Won’t Teach, but it’s becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they’ll reach for the big red levers with the flashing warning lights. Since we can’t keep people from pulling the big red levers, we’d better document the A to Z of what the levers actually do, so that people don’t SCRAM unless they really, really, really meant to.

This session covers:

  • The facts: The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct. We’ll include clear answers to several FAQs: “how do the compiler and hardware cooperate to remember how to respect these rules?”, “what is a race condition?”, and the ageless one-hand-clapping question “how is a race condition like a debugger?”
  • The tools: The deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers. I’ll try to convince you why standalone memory barriers are bad, and why barriers should always be associated with a specific load or store.
  • The unspeakables: I’ll grudgingly and reluctantly talk about the Thing I Said I’d Never Teach That Programmers Should Never Need To Now: relaxed atomics. Don’t use them! If you can avoid it. But here’s what you need to know, even though it would be nice if you didn’t need to know it.
  • The rapidly-changing hardware reality: How locks and atomics map to hardware instructions on ARM and x86/x64, and throw in POWER and Itanium for good measure – and I’ll cover how and why the answers are actually different last year and this year, and how they will likely be different again a few years from now. We’ll cover how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers.
  • Coda: Volatile and “compiler-only” memory barriers. It’s important to understand exactly what atomic and volatile are and aren’t for. I’ll show both why they’re both utterly unrelated (they have exactly zero overlapping uses, really) and yet are fundamentally related when viewed from the perspective of talking about the memory model. Also, people keep seeing and asking about “compiler-only” memory barriers and when to use them – they do have a valid-though-rare use, but it’s not the use that most people are trying to use them for, so beware!

For me, this is going to be the deepest and most fun C&B yet. At previous C&Bs I’ve spoken about not only code, but also meta topics like design and C++’s role in the marketplace. This time it looks like all my talks will be back to Just Code. Fun times!

Here a snapshot of the list of C&B 2012 sessions so far:

Universal References in C++11 (Scott)
You Don’t Know [keyword] and [keyword] (Herb)
Convincing Your Colleagues (Panel)
Initial Thoughts on Effective C++11 (Scott)
Modern C++ = Clean, Safe, and Faster Than Ever (Panel)
Error Resilience in C++11 (Andrei)
C++ Concurrency – 2012 State of the Art (and Standard) (Herb)
C++ Parallelism – 2012 State of the Art (and Standard) (Herb)
Secrets of the C++11 Threading API (Scott)
atomic<> Weapons: The C++11 Memory Model and Modern Hardware (Herb)

It’ll be a blast. I hope to see many of you there. Register soon.

Read Full Post »

imageWhile visiting Facebook earlier this month, I gave a shorter version of my “Welcome to the Jungle” talk, based on the eponymous WttJ article. They made a nice recording and it’s now available online here:

Facebook Engineering

Title: Herb Sutter: Welcome to the Jungle

In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend—mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done. — The free lunch is over. Now welcome to the hardware jungle.

The slides are available here. (There doesn’t seem to be a link to the slides on the page itself as I write this.)

For those interested in a longer version, in April I gave a 105-minute + Q&A version of this talk in Kansas City at Perceptive, also available online where I posted before.

A word about “cluster in a box”

I should have remembered that describing a PC as a “heterogeneous cluster in a box” is a big red button for people, in particular because “cluster” implies “parts can fail and program should continue.” So in the Q&A, one commenter made the point that I should have mentioned reliability is an issue.

As I answered there, I half agree – it’s true but it’s only half the story, and it doesn’t affect the programming model (see more below). One of the slides I omitted to shorten this version of the talk highlighted that there are actually two issues when you go from “Disjoint (tightly coupled)” to “Disjoint (loosely coupled)”: reliability and latency, and both are important. (I also mentioned this in the original WttJ article this is based on; just search for “reliability.”)

Even after the talk, I still got strong resistance along the lines that, ‘no, you obviously don’t get it, latency isn’t a significant issue at all, reliability is the central issue and it kills your argument because it makes the model fundamentally different.’ Paraphrasing subsequent email:

‘A fundamental difference between distributed computing and single-box multiprocessing is that in the former case you don’t know whether a failure was a communication failure (i.e. the task was completed but communication failed) or a genuine failure to carry the task. (Hence all complicated two-phase commit protocols etc.) In contrast, in a single-box scenario you can know the box you’re on is working.’

Let me respond further to this here, because clearly these guys know more about distributed systems than I do and I’m always happy to be educated, but I also think we have a disconnect on three things asserted above: It is not my understanding that reliability is more important than latency, or that apps have to distinguish comms failures from app exceptions, or that N-phase commit enters the picture.

First, I don’t agree with the assertion that reliability alone is what’s important, or that it’s more important than latency, for the following reason:

  • You can build reliable transports on top of unreliable ones. You do it through techniques like sequencing, redundancy, and retry. A classic example is TCP, which delivers reliable communications over notoriously- and deliberately-unreliable IP which can drop and reorder packets as network nodes and communications paths keep madly appearing and reappearing like a herd of crazed Cheshire cats. We can and do build secure reliable global banking systems on that.
  • Once you do that, you have turned a reliability issue into a performance (specifically latency) issue. Both reliability and latency are key issues when moving to loosely-coupled systems, but because you can turn the first into the second, it’s latency that is actually the more fundamental and important one – and the only one the developer needs to deal with.

For example, to use compute clouds like Azure and AWS, you usually start with two basic pieces:

  • the queue(s), which you use to push the work items out/around and results back/around; and
  • an elastic set of compute nodes, each of which pulls work items from the queue and processes them.

What happens when you encounter a reliability problem? A node can pull a work item but fail to complete it, for example if the node crashes or the system encounters a partial network outage or other communication problem.

Many modern systems already automatically recover and have another node re-pull the same work item to make sure each work item gets done even in the face of partial failures. From the app’s point of view, such failures just manifest as degraded performance (higher latency or time-to-solution) and therefore mainly affect the granularity of parallel work items – they have to be big enough to be worth sending elsewhere and so minimum size is directly proportional to latency so that the overheads do not dominate. They do not manifest as app-visible failures.

Yes, the elastic cloud implementation has to deal with things like network failures and retries. But no, this isn’t your problem; it’s not supposed to be your job to implement the elastic cloud, it’s supposed to be your job just to implement each node’s local logic and to create whatever queues you want and push your work item data into them.

Aside: Of course, as with any retry-based model, you have to make sure that a partly-executed work item doesn’t expose any partial side effects it shouldn’t, and normally you prevent that by doing the work in a transaction and rolling it back on failure, or in the extreme (not generally recommended but sometimes okay) resorting to compensating writes to back out partial work.

That covers everything except the comment about two-phase commit: Citing that struck me as odd because I haven’t heard much us of that kind of coupled approach in years. Perhaps I’m misinformed, but my impression of 2- or N-phase commit protocols was that they have some serious problems:

  • They are inherently nonscalable.
  • They increase rather than decrease interdependencies in the system – even with heroic efforts like majority voting and such schemes that try to allow for subsets of nodes being unavailable, which always seemed fragile to me.
  • Also, I seem to remember that NPC is a blocking protocol, which if so is inherently anti-concurrency. One of the big realizations in modern mainstream concurrency in the past few years is that Blocking Is Nearly Always Evil. (I’m looking at you, future.get(), and this is why the committee is now considering adding the nonblocking future.then() as well.)

So my impression is that these were primarily of historical interest – if they are still current in modern datacenters, I would appreciate learning more about it and seeing if I’m overly jaded about N-phase commit.

Read Full Post »

It’s time for, not one, but two brand-new, up-to-date talks on the state of the art of concurrency and parallelism in C++. I’m going to put them together especially and only for C++ and Beyond 2012, and I’ll be giving them nowhere else this year:

  • C++ Concurrency – 2012 State of the Art (and Standard)
  • C++ Parallelism – 2012 State of the Art (and Standard)

And there’s a lot to tell. 2012 has already been a busy year for the pushing the boundaries of both “shipping-and-practical” and “proto-standard” concurrency and parallelism in C++:

  • In February, the spring ISO C++ standards meeting saw record attendance at 73 experts (normal is 50-55), and spent the full week primarily on new language and library proposals, with notable emphasis on the area of concurrency and parallelism. There was so much interest that I formed four Study Groups and appointed chairs: the largest on concurrency and parallelism (SG1, Hans Boehm), and three others on modules (SG2, Doug Gregor), filesystem (SG3, Beman Dawes), and networking (SG4, Kyle Kloepper).
  • Three weeks ago, we hosted another three-day face-to-face meeting for SG1 and SG4 – and at nearly 40 people the SG1 attendance rivaled that of a normal full ISO C++ meeting, with a who’s-who of the world’s concurrency and parallelism experts in attendance and further proposal presentations from companies like IBM, Intel, and Microsoft. There was so much interest that I had to form a new Study Group 5 for Transactional Memory (SG5), and appointed Michael Wong of IBM as chair.
  • Over the summer, we’ll all be working on updated proposals for the October ISO C++ meeting in Portland.

Things are heating up, and we’re narrowing down which areas to focus on.

I’ve spoken and written on these topics before. Here’s what’s different about these talks:

  • Brand new: This material goes beyond what I’ve written and taught about before in my Effective Concurrency articles and courses.
  • Cutting-edge current: It covers the best-practices state of the art techniques and shipping tools, and what parts of that are standardized in C++11 already (the answer to that one may surprise you!) and what’s en route to near-term standardization and why, with coverage of the latest discussions.
  • Mainstream hardware – many kinds of parallelism: What’s the relationship among multi-core CPUs, hardware threads, SIMD vector units (Intel SSE and AVX, ARM Neon), and GPGPU (general-purpose computation on GPUs, which I covered at C++ and Beyond 2011)? Which are most interesting, what technologies are available now, and what’s being considered for near-term standardization?
  • Blocking vs. non-blocking: What’s the difference between blocking and non-blocking styles, why on earth would you care, which kinds does C++11 support, and how are we looking at rounding it out in C++1y?
  • Task and data parallelism: What’s the difference between task parallelism and data parallelism, which kind of of hardware does each allow you to exploit, and why?
  • Work stealing: What’s the difference between thread pools and work stealing, what are the major flavors of work stealing, which of these (if any) does C++11 already support and is already shipping on some advanced commercial C++ compilers today (this answer will likely surprise you), and what needs to be done in the next round for a complete state-of-the-art parallelism story in C++1y?

The answers all matter to you – even the ones not yet in the C++ standard – because they are real, available in shipping products, and affect how you design your software today.

This will be a broad and deep dive. At C++ and Beyond 2011, the attendees (audience!) included some of the world’s leading experts on parallelism and compilers. At these sessions of C&B 2012, I expect anyone who wasn’t personally at the SG1 meeting this month, even world-class experts, will learn something new in these talks. I certainly did, and that’s why I’m motivated to turn the information into talks and share. This isn’t just cool stuff – it’s important and useful in production code today.

I hope to see many of you at C&B 2012. I’m excited about these topics, and about Scott’s and Andrei’s new material – you just can’t get this stuff anywhere else.

Asheville is going to be blast. I can’t wait.


P.S.: I haven’t seen this much attention and investment in C++ since last century – C++ conferences at record numbers, C++ compiler investments by the biggest companies in the industry (e.g., Clang), and much more that we’ve seen already…

… and a little bird tells me there’s a lot more major C++ news coming this year. Stay tuned, and fasten your seat belts. 2012 ain’t done yet, not by a long shot, and I’ll be able to say more about C++ as a whole (besides the specific topics mentioned above) for the first time at C&B in August. I hope to see you there.

FYI, C&B is already over 60% full, and early bird registration ends this Friday, June 1 – so register today.

Read Full Post »

In answering a reader question about Flash today, I linked to Adobe’s November press release and I commented:

Granted, Adobe says it’s abandoning Flash ‘only for new mobile device browsers while still supporting it for PC browsers.’ This is still a painful statement because [in part] … the distinction between mobile devices and PCs is quickly disappearing as of this year as PCs are becoming fully mobilized.

But what’s a “mobile device” vs. a “PC” as of 2012? Here’s a current data point, at least for me.

For almost two weeks now, my current primary machine has been a Slate 7 running Windows 8 Consumer Preview, and I’m extremely pleased with it. It’s a full Windows notebook (sans keyboard), and a full modern tablet. How do I slot it between “mobile device” and “PC,” exactly? Oh, and the desktop browser still supports Flash, but the tablet style browser doesn’t…

Since I’ve been using it (and am using it to write this post), let me write a mini-review.

I loved my iPad, and still do, and so I was surprised how quickly I came to love this snappy device even more. Here are a few thoughts, in rough order from least to most important:

  • It has a few nice touches that I miss on iOS, like task switching by simple swipe-from-left (much easier than double-clicking the home button and swiping, and my iPhone home button is started to get unreliable with all the double-clicking [ETA: and I never got used to four-finger swiping probably in part because it isn't useful on the iPhone]), having a second app open as a sidebar (which greatly relieves the aforementioned back-and-forth task-switching I find myself doing on iOS to refer to two apps), and some little things like including left- and right-cursor keys on the on-screen keyboard (compared to iOS’s touch-and-hold to position the cursor by finger using the magnification loupe). In general, the on-screen keyboard is not only unspeakably better than Win7’s attempt, but even slightly nicer than iPad’s as I find myself not having to switch keyboards as much to get at common punctuation symbols.
  • I was happily surprised to find that some of my key web-related apps like Live Writer came already installed.
  • The App Store, which isn’t even live yet, already had many of my major apps including Kindle, USA Today, and Cut the Rope. Most seem very reliable; a few marked “App Preview” are definitely beta quality at the moment though. The Kindle app is solid and has everything I expected, except for one complaint: It should really go to a two-column layout in landscape mode like it does on iPad, especially given the wider screen. Still, the non-“preview” apps do work, and the experience and content is surprisingly nice for a not-officially-open App Store.
  • Real pen+ink support. This is a Big Deal, as I said two years ago. Yes, I’ve tried several iPad pens and apps for sort-of-writing notes, and no, iOS has nothing comparable here; the best I can say for the very best of them is that they’re like using crayons. Be sure to try real “ink” before claiming otherwise – if you haven’t, you don’t know what you’re missing. iPad does have other good non-pen annotation apps, and I’ve enjoyed using iAnnotate PDF extensively to read and annotate almost half of Andrei’s D book. But for reading articles and papers I just really, really miss pen+ink.
  • All my software just works, from compilers and editors to desktop apps for full Office and other work.
  • Therefore, finally, I get my desktop environment and my modern tablet environment without carrying two devices. My entire environment, from apps to files, is always there without syncing between notebook and tablet devices, and I can finally eliminate a device. I expected I would do that this year, but I’m pleasantly surprised to be able to do it for real already this early in the year with a beta OS and beta app store.

I didn’t expect to switch over to it this quickly, but within a few days of getting it I just easily switched to reading my current book-in-progress on this device while traveling (thanks to the Kindle app), reading and pen-annotating a couple of research papers on lock-free coding techniques (it’s by far my favorite OneNote device ever thanks to having both great touch and great pen+ink and light weight so I can just write), and using it both as a notebook and as a tablet without having to switch devices (just docking when I’m at my desk and using the usual large monitors and my favorite keyboard+mouse, or holding it and using touch+pen only). It already feels like a dream and very familiar both ways. I’m pretty sure I’ll never go back to a traditional clamshell notebook, ever.

Interestingly, as a side benefit, even the desktop apps are often very, and more, usable when in pure tablet+touch mode than before despite the apparently-small targets. Those small targets do sometimes matter, and I occasionally reach for my pen when using those on my lap. But I’ve found in practice they often don’t matter at all when you swipe to scroll a large region – I was surprised to find myself happily using Outlook in touch-only mode. In particular, it’s my favorite OneNote device ever.

By the end of this week when I install a couple of more apps, including the rest of my test C++ compilers, it will have fully replaced my previous notebook and my previous tablet, with roughly equal price and power as the former alone (4GB RAM, 128GB SSD + Micro SD slot, Intel Core i5-2467M) and roughly equal weight and touch friendliness as the latter alone (1.98lb vs. 1.44lb). Dear Windows team, my back thanks you.

So, then, returning to the point – in our very near future, how much sense does it really make to distinguish between browsers for “mobile devices” and “PCs,” anyway? Convergence is already upon us and is only accelerating.

Read Full Post »

Last month in Kansas City I gave a talk on “Welcome to the Jungle,” based on my recent essay of the same name (sequel to “The Free Lunch Is Over”) concerning the turn to mainstream heterogeneous distributed computing and the end of Moore’s Law.

Perceptive Software has now made the talk available online [EOA: the talk itself starts six minutes in]:

Welcome to the Jungle

In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done. – The free lunch is over. Now welcome to the hardware jungle.

I hope you enjoy it.

Warning: It’s two hours (with Q&A) because of the broad and deep material. There’s a nice pause point between major sections at the one-hour mark that makes it convenient to split it into two one-hour lunchtime brownbag viewings.

Read Full Post »

Don’t trust hardware or software; then you can build trustworthy hardware and software.

James Hamilton on how to write reliable software in a world where anything that can fail, will fail.

Read Full Post »

With so much happening in the computing world, now seemed like the right time to write “Welcome to the Jungle” – a sequel to my earlier “The Free Lunch Is Over” essay. Here’s the introduction:


Welcome to the Jungle

In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done.

The free lunch is over. Now welcome to the hardware jungle.


From 1975 to 2005, our industry accomplished a phenomenal mission: In 30 years, we put a personal computer on every desk, in every home, and in every pocket.

In 2005, however, mainstream computing hit a wall. In “The Free Lunch Is Over” (December 2004), I described the reasons for the then-upcoming industry transition from single-core to multi-core CPUs in mainstream machines, why it would require changes throughout the software stack from operating systems to languages to tools, and why it would permanently affect the way we as software developers have to write our code if we want our applications to continue exploiting Moore’s transistor dividend.

In 2005, our industry undertook a new mission: to put a personal parallel supercomputer on every desk, in every home, and in every pocket. 2011 was special: it’s the year that we completed the transition to parallel computing in all mainstream form factors, with the arrival of multicore tablets (e.g., iPad 2, Playbook, Kindle Fire, Nook Tablet) and smartphones (e.g., Galaxy S II, Droid X2, iPhone 4S). 2012 will see us continue to build out multicore with mainstream quad- and eight-core tablets (as Windows 8 brings a modern tablet experience to x86 as well as ARM), image_thumb99and the last single-core gaming console holdout will go multicore (as Nintendo’s Wii U replaces Wii).

This time it took us just six years to deliver mainstream parallel computing in all popular form factors. And we know the transition to multicore is permanent, because multicore delivers compute performance that single-core cannot and there will always be mainstream applications that run better on a multi-core machine. There’s no going back.

For the first time in the history of computing, mainstream hardware is no longer a single-processor von Neumann machine, and never will be again.

That was the first act.  . . .


I hope you enjoy it.

Read Full Post »

In my keynote on Wednesday, I highlighted just the top two important features in the C++ AMP programming model. That afternoon, my coding colleague and demo demigod Daniel Moth gave a 45-minute session covering the entire C++ AMP programming model that walked through all the features with more examples. Daniel’s talk is now also online at Channel 9. I hope you enjoy it.

Note: The PDF slides link is small but important — the screen isn’t easy to see in the video itself.

Read Full Post »

Yesterday I had the privilege of talking about some of the work we’ve been doing to support massive parallelism on GPUs in the next version of Visual C++. The video of my talk announcing C++ AMP is now available on Channel 9. (Update: Here’s an alternate link; it seems to be posted twice.)

The first 20 minutes has nothing to do with C++ in particular or any platform in particular, but tries to make the case that the right way to view the “trends” of multicore computing, GPU computing, and cloud computing (HaaS) is that they are not three trends at all, but merely facets of the same single trend — heterogeneous parallel computing.

If they are, then one programming model should be able to address them all. We think we’ve found one.

The main reasons we decided to build a new model is that we believe there needs to be a single model that has all of the following attributes:

  • C++, not C: It should leverage C++’s power for strong abstraction without sacrificing performance, not just be a dialect of C.
  • Mainstream: It should be programmable by millions of developers, not just by a priesthood. Litmus test: Is the Hello World parallel GPU program a page and half, or a couple of lines?
  • Minimal: It adds just one general-purpose language extension that addresses not only the immediate problem (dealing with cores that can’t support full C++) but many others. With the right general-purpose extension, the rest can be done as just a library.
  • Portable: It allows shipping a single EXE that can use any combination of GPU vendors’ hardware. The initial implementation uses DirectCompute and supports all devices that are DX11 capable; DirectCompute is just an implementation detail of the first release, and the model can (and I expect will) be implemented to directly talk to any interesting hardware.
  • General and future-proof: The initial release will focus on GPU computing, but it’s intended to enable people to write code for the GPU in a way that in the future we can recompile with few or no changes to spread across any and all accessible compute cores, including ones in the cloud.
  • Open: I mentioned that Microsoft intends to make the C++ AMP specification open, and encourages its implementation on other C++ compilers for any hardware or OS target. AMD announced that they will implement C++ AMP in their FSA reference compiler. NVidia also announced support.

We’re really excited about this, and I hope you find the information in the talk to be useful. A prerelease implementation in Visual C++ that runs on Windows will be available later this year. More to come…

Read Full Post »

Older Posts »


Get every new post delivered to your Inbox.

Join 1,981 other followers