Reader Q&A: How to write a CAS loop using std::atomics

The following is not intended to be a complete treatise on atomics, but just an answer to a specific question.

A colleague asked:

How should one write the following “conditional interlocked” function in the new C++ atomic<> style?

// if (*plValue >= 0) *plValue += lAdd  ; return the original value

LONG MpInterlockedAddNonNegative(__inout LONG volatile* plValue,  __in  LONG const  lAdd) 
    LONG lValue = 0; 
    for (;;)  {

        lValue = *plValue; // volatile plValue suppress compile optimizations in which


                           // lValue is optimized out hence MT correctness is broken

        if (lValue < 0)   break;

        if (lValue == InterlockedCompareExchange(plValue, lValue + lAdd, lValue)) { 

    return lValue; 

Note: ISO C/C++ volatile is not for inter-thread communication,[*] but this is legacy code that predates std::atomics and was using a combination of platform-specific volatile semantics and Windows InterlockedXxx APIs.

The answer is to use a CAS loop (see code at top), which for std::atomics is spelled compare_exchange:

  • Use compare_exchange_weak by default when looping on this which generally naturally tolerates spurious failures.
  • Use compare_exchange_strong for single tests when you generally don’t want spurious failures.
  • Usage note: In the code at top we save an explicit reload from ‘a’ in the loop because compare_exchange helpfully (or “helpfully” – this took me a while to discover and remember) stores the actual value in the ‘expected’ value slot on failure. This actually makes loops simpler, though some of us are still have different feelings on different days about whether this subtlety was a good idea… anyway, it’s in the standard.

For the std::atomic version, roughly (compiling in my head), and generalizing to any numeric type just because I’m in the habit, and renaming for symmetry with atomic<T>::fetch_add(), I think this is what you want:

template<typename T>
T fetch_add_if_nonnegative( std::atomic<T>& a,  T val ) {
    T old = a;
    while( old >= 0 && !a.compare_exchange_weak( old, old+val ) )
        { }
    return old;

Because the only test in your loop was to break on negative values, it naturally migrated into the loop condition. If you want to do more work, then follow the general pattern which is the following (pasting from the standard, 29.6.5/23 – and note that the explicit “.load()” is unnecessary but some people including the author of this clause of the standard prefer to be pedantically explicit :) ):

[ Example: the expected use of the compare-and-exchange operations is as follows.

The compare-and-exchange operations will update expected when another iteration of the loop is needed.

expected = current.load();

do {

desired = function(expected);

} while (!current.compare_exchange_weak(expected, desired));

—end example ]

So the direct implementation of your function in the general pattern would be:

T old = a; 
do { 
    if( old < 0 ) break; 
} while(!a.compare_exchange_weak( old, old+val ) )

but since that easily moves into the loop test I just did this instead in the code at top:

T old = a; 
while( old >= 0 && !a.compare_exchange_weak( old, old+val ) ) 
    { }

and hoping that no one will discover and point out that I’ve somehow written a subtle bug by trying to make the code cuter just before leaving for a holiday weekend.


[*] Here’s the difference between ISO C/C++ volatile vs. std::atomic<T>/atomic_T: ISO C/C++ volatile is intended to be used only for things like hardware access and setjmp/longjmp safety, to express that the variable is in storage that is not guaranteed to follow the C++11 memory model (e.g., the compiler can’t make any assumptions about it). It has nothing to do with inter-thread communication – the proper tool for that is std::atomic<T> which for C compatibility can also be spelled atomic_T (note that in Java and C# this is called volatile which adds to the confusion). For more, see my article “volatile vs. volatile” and Hans Boehm’s ISO C++ paper “Should volatile Acquire Atomicity and Thread Visibility Semantics?”.

13 thoughts on “Reader Q&A: How to write a CAS loop using std::atomics

  1. Herb, feel free to ignore my previous comment. Found the place (that I missed the first time around, apologies) where you clearly state that for unique_ptr “You still need to write the visible class’ destructor yourself”… and, I believe, it is equally applicable for copy constructor and op=(). So, with that “hassle” I personally find unique_ptr usefullness quite questionale in the context of pimpl. But that’s quite another matter. All the best,

  2. Herb, ‘ve come across your updated Pimpl-related GotW#100. There it says that “the most appropriate choice is to hold the Pimpl object by unique_ptr”. Very tempting indeed. However, I am under strong suspicion it won’t work. Unlike shared_ptr, unique_ptr does not implement the incomplete-type management technique (for efficiency reasons) and, consequently, requires the internal implementation class to be a complete/visible/known type, i.e. it won’t compile with only a forward declaration… that IMO beats the whole purpose of Pimpl, i.e. implementation hiding. I’d love to be wrong but gcc-4.8 only confirmed my expectations.


  3. Am I right in that the behavior of the ‘expected’ argument to compare_exchange is analogous to the behavior of the GCC intrinsic __sync_val_compare_and_swap()? That intrinsic returns the value it found at the memory location, instead of a boolean, and therefore makes an explicit reload unnecessary, IIUC.

  4. Would it be even better to do the initial load using a.load(std::memory_order_relaxed) ? That would probably be most equivalent to the original implementation, wouldn’t it? It would also save you from the unnecessary synchronization on the load, and just rely on the CAS to keeps things atomic.

  5. Herb, where’s the book ;-)

    With the amount of work in my plate I rather read this emerging good advice and good practices in book form.


  6. @Amir: It must have taken time to refresh the front page — it looks okay to me now.

    @Pizer: The operation is a RMW (read-modify-write). So you have to do an initial load (which is atomic in itself), then the work (in this case a test since we need do nothing if the value is less than 0), then the update write (which is also atomic) — and to make it all atomic, the update write has to test that the value hasn’t changed, and if it has then do nothing (we’re not atomic) and retry (until the whole transaction is atomic, not interleaved with something else). Note that, as the commentary says, the compare_exchange will again reload the current value on failure so you don’t have to write an explicit equivalent “old = a;” inside the loop — this is kind of subtle, but it’s the way it was specified because it gives a slight performance advantage.

  7. In the version that appears on the front page, you still have the “if( old >= 0 ) break” condition.
    Sigh…I thought I was be the first one who noticed the bug :-)

  8. I don’t understand why you put the old=a assignment out of the loop. Could you shed some more light on this, please? To me, it seems that it breaks the “atomicity” of the operation.

  9. I hate to disappoint you on your holiday, but I think you have reversed the ‘old >= 0’ test between the second to last and the last version.

Comments are closed.