VC++ 2012 Desktop Express (Free)

Today Microsoft released another free Express version of Visual C++ 2012. In addition to the free Express Visual C++ compiler for building tablet applications, Visual Studio Express 2012 for Windows Desktop directly supports traditional Windows and command-line applications in C++.

This a great free C++ compiler on Windows for everything from hobby development to using and contributing to open source projects. Besides additional C++11 standards conformance with range-for, override and final on the language side (with more to come in the coming months; watch this space) and a complete C++11 standard library implementation, the free compiler also includes unit testing framework for C++, code analysis for C++ (try /analyze today if you haven’t already, as John Carmack says so well), C++ AMP for GPGPU programming, and much more.

See also the longer announcement here.

Reader Q&A: How to write a CAS loop using std::atomics

The following is not intended to be a complete treatise on atomics, but just an answer to a specific question.

A colleague asked:

How should one write the following “conditional interlocked” function in the new C++ atomic<> style?

// if (*plValue >= 0) *plValue += lAdd  ; return the original value

LONG MpInterlockedAddNonNegative(__inout LONG volatile* plValue,  __in  LONG const  lAdd) 
{ 
    LONG lValue = 0; 
    for (;;)  {

        lValue = *plValue; // volatile plValue suppress compile optimizations in which

 

                           // lValue is optimized out hence MT correctness is broken

        if (lValue < 0)   break;

        if (lValue == InterlockedCompareExchange(plValue, lValue + lAdd, lValue)) { 
            break; 
        } 
    }

    return lValue; 
}

Note: ISO C/C++ volatile is not for inter-thread communication,[*] but this is legacy code that predates std::atomics and was using a combination of platform-specific volatile semantics and Windows InterlockedXxx APIs.

The answer is to use a CAS loop (see code at top), which for std::atomics is spelled compare_exchange:

  • Use compare_exchange_weak by default when looping on this which generally naturally tolerates spurious failures.
  • Use compare_exchange_strong for single tests when you generally don’t want spurious failures.
  • Usage note: In the code at top we save an explicit reload from ‘a’ in the loop because compare_exchange helpfully (or “helpfully” – this took me a while to discover and remember) stores the actual value in the ‘expected’ value slot on failure. This actually makes loops simpler, though some of us are still have different feelings on different days about whether this subtlety was a good idea… anyway, it’s in the standard.

For the std::atomic version, roughly (compiling in my head), and generalizing to any numeric type just because I’m in the habit, and renaming for symmetry with atomic<T>::fetch_add(), I think this is what you want:

template<typename T>
T fetch_add_if_nonnegative( std::atomic<T>& a,  T val ) {
    T old = a;
    while( old >= 0 && !a.compare_exchange_weak( old, old+val ) )
        { }
    return old;
}

Because the only test in your loop was to break on negative values, it naturally migrated into the loop condition. If you want to do more work, then follow the general pattern which is the following (pasting from the standard, 29.6.5/23 – and note that the explicit “.load()” is unnecessary but some people including the author of this clause of the standard prefer to be pedantically explicit :) ):

[ Example: the expected use of the compare-and-exchange operations is as follows.

The compare-and-exchange operations will update expected when another iteration of the loop is needed.

expected = current.load();

do {

desired = function(expected);

} while (!current.compare_exchange_weak(expected, desired));

—end example ]

So the direct implementation of your function in the general pattern would be:

T old = a; 
do { 
    if( old < 0 ) break; 
} while(!a.compare_exchange_weak( old, old+val ) )


but since that easily moves into the loop test I just did this instead in the code at top:

T old = a; 
while( old >= 0 && !a.compare_exchange_weak( old, old+val ) ) 
    { }

and hoping that no one will discover and point out that I’ve somehow written a subtle bug by trying to make the code cuter just before leaving for a holiday weekend.

 

[*] Here’s the difference between ISO C/C++ volatile vs. std::atomic<T>/atomic_T: ISO C/C++ volatile is intended to be used only for things like hardware access and setjmp/longjmp safety, to express that the variable is in storage that is not guaranteed to follow the C++11 memory model (e.g., the compiler can’t make any assumptions about it). It has nothing to do with inter-thread communication – the proper tool for that is std::atomic<T> which for C compatibility can also be spelled atomic_T (note that in Java and C# this is called volatile which adds to the confusion). For more, see my article “volatile vs. volatile” and Hans Boehm’s ISO C++ paper “Should volatile Acquire Atomicity and Thread Visibility Semantics?”.

C&B Panel: Alexandrescu, Meyers, Sutter on Static If, C++11, and Metaprogramming

The first panel from C++ and Beyond 2012 is now available on Channel 9:

On Static If, C++11 in 2012, Modern Libraries, and Metaprogramming

Andrei Alexandrescu, Scott Meyers, Herb Sutter

Channel 9 was invited to this year’s C++ and Beyond to film some sessions (that will appear on C9 over the coming months!)…

At the end of day 2, Andrei, Herb and Scott graciously agreed to spend some time discussing various modern C++ topics and, even better, answering questions from the community. In fact, the questions from Niners (and a conversation on reddit/r/cpp) drove the conversation.

Here’s what happened…

[more]

“Strong” and “weak” hardware memory models

In Welcome to the Jungle, I predicted that “weak” hardware memory models will disappear. This is true, and it’s happening before our eyes:

  • x86 has always been considered a “strong” hardware memory model that supports sequentially consistent atomics efficiently.
  • The other major architecture, ARM, recently announced that they are now adding strong memory ordering in ARMv8 with the new sequentially consistent ldra and strl instructions, as I predicted they would. (Actually, Hans Boehm and I influenced ARM in this direction, so it was an ever-so-slightly disingenuous prediction…)

However, at least two people have been confused by what I meant by “weak” hardware memory models, so let me clarify what “weak” means – it means something different for hardware memory models and software memory models, so perhaps those aren’t the clearest terms to use.

By “weak (hardware) memory model” CPUs I mean specifically ones that do not natively support efficient sequentially consistent (SC) atomics, because on the software side programming languages have converged on “sequential consistency for data-race-free programs” (SC-DRF, roughly aka DRF0 or RCsc) as the default (C11, C++11) or only (Java 5+) supported software memory model for software. POWER and ARMv7 notoriously do not support SC atomics efficiently.

Hardware that supports only hardware memory models weaker than SC-DRF, meaning that they do not support SC-DRF efficiently, are permanently disadvantaged and will either become stronger or atrophy. As I mentioned specifically in the article, the two main current hardware architectures with what I called “weak” memory models were current ARM (ARMv7) and POWER:

  • ARM recently announced ARMv8 which, as I predicted, is upgrading to SC acquire/release by adding new SC acquire/release instructions ldra and strl that are mandatory in both 32-bit and 64-bit mode. In fact, this is something of an industry first — ARMv8 is the first major CPU architecture to support SC acquire/release instructions directly like this. (Note: That’s for CPUs, but the roadmap for ARM GPUs is similar. ARM GPUs currently have a stronger memory model, namely fully SC; ARM has announced their GPU future roadmap has the GPUs fully coherent with the CPUs, and will likely add “SC load acquire” and “SC store release” to GPUs as well.)
  • It remains to be seen whether POWER will adapt similarly, or die out.

Note that I’ve seen some people call x86 “weak”, but x86 has always been the poster child for a strong (hardware) memory model in all of our software memory model discussions for Java, C, and C++ during the 2000s. Therefore perhaps “weak” and “strong” are not useful terms if they mean different things to some people, and I’ve updated the WttJ text to make this clearer.

I will be discussing this in detail in my atomic<> Weapons talk at C&B next week, which I hope to make freely available online in the near future (as I do most of my talks). I’ll post a link on this blog when I can make it available online.

Late-Breaking C&B Session: A Special Announcement

image_thumb

At the end of the Monday afternoon session, I will be making a special announcement related to Standard C++ on all platforms. Be there to hear the details, and to receive an extra perk that’s being reserved for C&B 2012 attendees only.

  • Note: We sometimes record sessions and make them freely available online via Channel 9, and we intend to do that again this year for some selected sessions. However, this session is for C&B attendees only and will not be recorded.

Registration is open until Wednesday and the event is pretty full but a few spaces are still available. I’m looking forward to seeing many of you there for a top-notch C++ conference full of fresh new current material – I’ve seen Andrei’s and Scott’s talk slides too, and I think this C&B is going to be the best one yet.

You’ll leave exhausted, but with a full brain and quite likely a big silly grin as you think about all the ways to use the material right away on your current project back home.

C&B Session: atomic<> Weapons – The C++11 Memory Model and Modern Hardware

imageHere’s another deep session for C&B 2012 on August 5-8 – if you haven’t registered yet, register soon. We got a bigger venue this time, but as I write this the event is currently almost 75% full with five weeks to go.

I know, I’ve already posted three sessions and a panel. But there’s just so much about C++11 to cover, so here’s a fourth brand-new session I’ll do at C&B 2012 that goes deeper on its topic than I’ve ever been willing to go before.

atomic<> Weapons: The C++11 Memory Model and Modern Hardware

This session in one word: Deep.

It’s a session that includes topics I’ve publicly said for years is Stuff You Shouldn’t Need To Know and I Just Won’t Teach, but it’s becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they’ll reach for the big red levers with the flashing warning lights. Since we can’t keep people from pulling the big red levers, we’d better document the A to Z of what the levers actually do, so that people don’t SCRAM unless they really, really, really meant to.

This session covers:

  • The facts: The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct. We’ll include clear answers to several FAQs: “how do the compiler and hardware cooperate to remember how to respect these rules?”, “what is a race condition?”, and the ageless one-hand-clapping question “how is a race condition like a debugger?”
  • The tools: The deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers. I’ll try to convince you why standalone memory barriers are bad, and why barriers should always be associated with a specific load or store.
  • The unspeakables: I’ll grudgingly and reluctantly talk about the Thing I Said I’d Never Teach That Programmers Should Never Need To Now: relaxed atomics. Don’t use them! If you can avoid it. But here’s what you need to know, even though it would be nice if you didn’t need to know it.
  • The rapidly-changing hardware reality: How locks and atomics map to hardware instructions on ARM and x86/x64, and throw in POWER and Itanium for good measure – and I’ll cover how and why the answers are actually different last year and this year, and how they will likely be different again a few years from now. We’ll cover how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers.
  • Coda: Volatile and “compiler-only” memory barriers. It’s important to understand exactly what atomic and volatile are and aren’t for. I’ll show both why they’re both utterly unrelated (they have exactly zero overlapping uses, really) and yet are fundamentally related when viewed from the perspective of talking about the memory model. Also, people keep seeing and asking about “compiler-only” memory barriers and when to use them – they do have a valid-though-rare use, but it’s not the use that most people are trying to use them for, so beware!

For me, this is going to be the deepest and most fun C&B yet. At previous C&Bs I’ve spoken about not only code, but also meta topics like design and C++’s role in the marketplace. This time it looks like all my talks will be back to Just Code. Fun times!

Here a snapshot of the list of C&B 2012 sessions so far:

Universal References in C++11 (Scott)
You Don’t Know [keyword] and [keyword] (Herb)
Convincing Your Colleagues (Panel)
Initial Thoughts on Effective C++11 (Scott)
Modern C++ = Clean, Safe, and Faster Than Ever (Panel)
Error Resilience in C++11 (Andrei)
C++ Concurrency – 2012 State of the Art (and Standard) (Herb)
C++ Parallelism – 2012 State of the Art (and Standard) (Herb)
Secrets of the C++11 Threading API (Scott)
atomic<> Weapons: The C++11 Memory Model and Modern Hardware (Herb)

It’ll be a blast. I hope to see many of you there. Register soon.

Reader Q&A: Why don’t modern smart pointers implicitly convert to *?

Today a reader asked a common question:

Why doesn’t unique_ptr (and the ilk) appear to have an operator overload somewhat as follows:

operator T*() { return get(); };

The reason I ask is because we have reams of old code wanting raw pointers (as function parms), and I would like to replace the outer layers of the code which deal with the allocation and deallocation with unique_ptrs without having to either ripple unique_ptrs through the entire system or explicitly call .get() every time the unique_ptr is a parm to a function which wants a raw pointer.

What my programmers are doing is creating a unique_ptr and immediately using get() to put it into a local raw pointer which is used from then on. Somehow that doesn’t feel right, but I don’t know what would be the best alternative.

In the olden days, smart pointers often did provide the convenience of implicit conversion to *. It was by using those smart pointers that we learned it caused more problems than it solves, and that requiring people to write .get() was actually not a big deal.

For an example of the problems of implicit conversions, consider:

unique_ptr p( new widget );
...
use( p + 42 ); // error (maybe he meant "*p + 42"?)
    // but if implicit conversion to * were allowed, would silently compile -- urk
...
delete p; // error
    // but if implicit conversion to * were allowed, would silently compile -- double urk

For more, see also Andrei’s Modern C++ Design section 7.7, “Implicit Conversion to Raw Pointer Types.”

However, this really isn’t as bad as most people fear for several reasons, including but not limited to:

  • The large majority of uses of the smart pointer, such as calling member functions on the object (e.g., p->foo())  just work naturally and effortlessly because we do have operator->.
  • You rarely if ever need to say unique_ptr on a local variable, because C++11’s auto is your friend – and “rarely” becomes “never” if you use make_unique which is described here and should become standard in the future.
  • Parameters (which you mention) themselves should almost never be smart pointers, but should be normal pointers and references. So if you’re managing an object’s lifetime by smart pointer, you do write .get() – but only once at the top of each call tree. More on this in the current GotW #105 – solution coming soon, watch this space.

GotW #105: Smart Pointers, Part 3 (Difficulty: 7/10)

JG Question

1. What are the performance and correctness implications of the following function declaration? Explain.

void f( shared_ptr<widget> );

 

Guru Question

2. A colleague is writing a function f that takes an existing object of type widget as a required input-only parameter, and trying to decide among the following basic ways to take the parameter (omitting const):

void f( widget& );
void f( unique_ptr<widget> );
void f( unique_ptr<widget>& );
void f( shared_ptr<widget> );
void f( shared_ptr<widget>& );

Under what circumstances is each appropriate? Explain your answer, including where const should or should not be added anywhere in the parameter type.

(There are other ways to pass the parameter, but we will consider only the ones shown above.)

Facebook Folly – OSS C++ Libraries

I’ve been beating the drum this year (see the last section of the talk) that the biggest problem facing C++ today is the lack of a large set of de jure and de facto standard libraries. My team at Microsoft just recently announced Casablanca, a cloud-oriented C++ library and that we intend to open source, and we’re making other even bigger efforts that I hope will bear fruit and I’ll be able to share soon. But it can’t be just one company – many companies have already shared great open libraries on the past, but still more of us have to step up.

In that vein, I was extremely happy to see another effort come to fruition today. A few hours ago I was at Facebook to speak at their C++ day, and I got to be in the room when Andrei Alexandrescu dropped his arm to officially launch Folly:

Folly: The Facebook Open Source Library

by Jordan DeLong on Saturday, June 2, 2012 at 2:59pm ·

Facebook is built on open source from top to bottom, and could not exist without it. As engineers here, we use, contribute to, and release a lot of open source software, including pieces of our core infrastructure such as HipHop and Thrift.

But in our C++ services code, one clear bottleneck to releasing more work has been that any open sourced project needed to break dependencies on unreleased internal library code. To help solve that problem, today we open sourced an initial release of Folly, a collection of reusable C++ library artifacts developed and used at Facebook. This announcement was made at our C++ conference at Facebook in Menlo Park, CA.

Our primary aim with this ‘foolishness’ is to create a solution that allows us to continue open sourcing parts of our stack without resorting to reinventing some of our internal wheels. And because Folly’s components typically perform significantly faster than counterparts available elsewhere, are easy to use, and complement existing libraries, we think C++ developers might find parts of this library interesting in their own right.

Andrei announced that Folly stood for the Facebook Open Llibrary. He claimed it was a Welsh name, which is even funnier when said by a charming Romanian man.

Microsoft, and I personally, would like to congratulate Facebook on this release! Getting more high-quality open C++ libraries is a Good Thing, and I think it is safe to say that Casablanca and Folly are just the beginning. There’s a lot more coming  across our industry this year. Stay tuned.