Reader Q&A: Is std::atomic_compare_exchange_* implementable?

Updated 8/26: Duncan’s question is actually correct and compare_exchange should have the semantics he asks for. However, the answer to ‘is it implementable’ is I think still Yes.

Quick answer: Yes.

I see there was also a thread about this on StackOverflow, so I’ll echo this Q&A publicly for others’ benefit and hopefully to dispel confusion.

Duncan Forster asked:

I’m quite alarmed the C++ committee chose such a bad interface for std::atomic compare_exchange, i.e.:

    bool compare_exchange_???(T& expected, T desired, …);

I notice you have mentioned (here reader-qa-how-to-write-a-cas-loop-using-stdatomics) that the committee had doubts whether it was a good idea.
Your quote below:

  • Usage note: In the code at top we save an explicit reload from ‘a’ in the loop because compare_exchange helpfully (or “helpfully” – this took me a while to discover and remember) stores the actual value in the ‘expected’ value slot on failure. This actually makes loops simpler, though some of us are still have different feelings on different days about whether this subtlety was a good idea… anyway, it’s in the standard.

The reason I think it’s not only bad but also dangerous is that we now have a race condition baked into the standard. Race condition you say? All hardware CAS implementations that I know of only return 1 value (the old value). Yet the C++ version has 2 returns (success/failure as a boolean return and the old value by reference). So how can an atomic class which is suppose to implement atomic methods do this? Answer is it can’t, the boolean result is calculated after the atomic exchange has occurred. That leaves us with a method which is only partially atomic and with the bonus of a built-in race condition!

Perhaps I haven’t convinced you, so here’s some code to help. I have simulated the hardware CAS with simple C++ code to help demonstrate the problem. The crux of the problem is this statement: while(!atomic_bool_compare_and_swap(&head, new_node->next, new_node))
By creating a 1-line while loop and passing new_node->next as the expected value, if someone is also consuming data the new_node will temporarily be visible by 2 threads. The other thread may process and delete the node before atomic_bool_compare_and_swap has calculated success/failure. This would result in a spurious failure and the new_node actually being pushed twice onto the queue. As you can image this should lead to double delete and possibly the process aborting.

template<typename _T>
bool atomic_bool_compare_and_swap(_T *value, _T& expected, _T new_value)
{
_T old_value;

// Here be atomic
{
old_value = *value;
if(old_value == expected)
*value = new_value;
}

// Here be race conditions
return (old_value == expected);
}

[… more code that exercises this function …]

I don’t believe there is an implementability bug in the standard. Rather, your code is incorrect.

[Update: …off-point stuff omitted…]

Let’s say you have a CAS that returns only the old value, but doesn’t set “expected,” as you describe below. Then you should just be able to implement the standard one in terms of that – quick sketch (untested code):

    template<typename _T>
    bool atomic_compare_exchange(_T *value, _T& expected, _T new_value)
    {
        _T old_value;
        _T old_expected = expected;

        // If all you have is a CAS that returns the old value, use that:
        old_value = CAS(value, expected, new_value);

        bool result = old_value == old_expected;
        expected = old_value;
        return result;
    }

Now that there’s no use of “expected” after the CAS and so no timing window.

If I’m misunderstanding the question, or have a bug in my thinking, please let me know in the comments. [Update: Thanks to Duncan in particular for pointing out my original answer did have a bug in my thinking.]

23 thoughts on “Reader Q&A: Is std::atomic_compare_exchange_* implementable?

  1. this is all cool and interesting, but I wish people would pay attention a bit to the boring stuff also, like serialization(preferably to make class serializable automatically in a sense that it can be serialized automatically if all its members can be serialized automatically… and if you have serialization for all built in ntypes then you have default serialization for any type :D ), for each for enums, to_string for enums…

    atomics are for small perchentage of the people, other just need to know here be danger except in simple cases like counter, bool flag…

  2. I believe the issue is the while loop:
    while(!atomic_bool_compare_and_swap(&head, new_node->next, new_node)).

    First its been noted that 2 threads could try to add the “new_node”, so I expect this to be an existing node and a reachable next pointer, The problem then is that you should never update the existing pointer unless the cmpexchange suceeded.

    Instead you would need something like this:

    MyPointer temp = new_node->next;
    while(!atomic_bool_compare_and_swap(&head, temp, new_node))
       ;
    new_node->next = temp;
    

    (might use the compare_exchange_*_strong variant instead of a loop).

    In short, the expected variable must not be reachable from anywhere except the calling thread.

    The rest about CAs and compare_echange is covered already, I will just add that the boolean return value is bloody necessary when you implement the compare_exchange with loadlinked/storelinked. Look at a implementation with only the old value returned:

        template<typename _T>
        _T atomic_compare_exchange(_T *value, _T expected, _T new_value)
        {
            _T old_value = loadlnked(value);
            if (old_value == expected && storelinked(value, new_value))
              return new_value;
            else
            {
              return atomic_load(value);
            }
        }

    Lets assume _T=int and the value in question is 5, and wants to change it to 42.

    Thread A:
    1) _T old_value = loadlnked(value);
    2) another Thread interrupts and changes the value (or atleast causes the linked sequence to fail)
    3) storelinked fails
    4) another thread interrupts and changes the value to 42
    5) atomic_load(value) returns 5

    The caller of atomic_compare_exchange gets the expected value as return value, and thus would assume he changed the value, when in thruth the value got set from somewhere else. This might not be a problem for some uses, but for example reference-counting would mean the counter got incremented once instead of twice.

    So a usefull atomic_compare_exchange needs an indication of the operation failed, the write-back of expected is the only thing that could have been left out.

  3. Herb, I think Duncan’s ‘Here be atomic’ comment means that the following block is pseudo-code for the hardware CAS instruction. And I think he’s concerned that the use of the the ‘expected’ variable is not atomic. (Actually, it could be on some architectures.) But you’re right that that isn’t normally required.

    Norbert, your code is backwards. You have to set the ‘next’ pointer of the new node before swapping the list head. So you would do something like:

    MyPointer temp = head;
    do
        new_node->next = temp;
    while(!atomic_compare_exchange(&head, temp, new_node));
    
  4. I think you have misunderstood my example. The example function atomic_bool_compare_and_swap is not real code but used to illustrate how CAS works in hardware. So the block with comment ‘Here be atomic’ above for this example should be considered atomic. This serves the discussion well, except I made a mistake. In fact CMPXCHG does return success/failure in the ZF flag (on x86, thanks Anthony for correcting me). I overlooked this because every time I have used CMPXCHG (or seen it used) the result has always been calculated by another compare. This seems innocuous but is racy.

    I’m not talking about an issue in theory, this has caused serious production issues. So to recap the C++ interface (it turns) out is implementable with CMPXCHG (on x86) and in some cases is rather nice. Wait a second the interface is correct, CAS is correct, this doesn’t compute in regards to my real-world issue. I have a test case, I can prove this doesn’t work. Ah, the penny drops, this is actually a bug in the implementation (in my case GCC 4.8).

    Take this simple example:

    #include
    struct Node { Node* next; };
    void Push(std::atomic head, Node* node)
    {
    node->next = head.load();
    while(!head.compare_exchange_weak(node->next, node))
    ;
    }

    This is completely correct C++ and should work always in any situation (with the assumption nodes are unique and never re-used).

    I like to think of this example as the Push thread creating a private object. The compare_exchange_weak operation is an attempt by the Push thread to relinquish ownership and pass it to another thread. If the operation succeeds the object is no longer owned by the Push thread and it shouldn’t touch the object ever (otherwise you have a race and all bets are off). If the operation fails
    the Push thread still owns the object (still private), thus it may be modified race free.

    As I have discussed with Anthony Williams and also mentioned on SO, no current compiler implements this correctly. The only know correct implementation is the Just::Thread library. I won’t go thru the assembly for all the compilers but I will mention VC++:

    inline int _Compare_exchange_seq_cst_4(volatile _Uint4_t *_Tgt, _Uint4_t *_Exp, _Uint4_t _Value)
    { /* compare and exchange values atomically with
    sequentially consistent memory order */
    int _Res;
    _Uint4_t _Prev = _InterlockedCompareExchange((volatile long
    *)_Tgt, _Value, *_Exp);
    if (_Prev == *_Exp) !!!!!!!!!!!!!!!!!!!!!!!
    _Res = 1;
    else
    { /* copy old value */
    _Res = 0;
    *_Exp = _Prev;
    }
    return (_Res);
    }

    This is not correct, it has a race condition which means it’s reading a value it shouldn’t after the CAS. Since it’s a race it may calculate the wrong return code. If that happens it will also invent a write when in fact the CAS succeeded. This is very bad when you consider object ownership (from above).

  5. Duncan is unfortunately correct: there are bugs in at least gcc 4.8, clang 3.4 and Visual Studio 2013, all of which do an unconditional store to the “expected” parameter, even on successful exchanges, which can cause a race.

    The bugs have now been reported:

    MSVC: https://connect.microsoft.com/VisualStudio/feedback/details/819819/std-atomic-compare-exchange-weak-has-spurious-write-which-can-cause-race-conditions

    GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272

    Clang: http://llvm.org/bugs/show_bug.cgi?id=18899

  6. btw my apology, I thought it was just an another example of somebody not understanding some brainf**k atomic code, but it turns out to be about a serious bug. :D But since Herb said it is not I was like meh, boring…

  7. My reading of the standard is:

    part of 29.6.5/2:
    In the following operation definitions:
    — an A refers to one of the atomic types.
    — a C refers to its corresponding non-atomic type. (…)

    An example function definition from just before 29.6.5/20:

    bool atomic_compare_exchange_weak(A* object,C* expected,C desired) noexcept

    Expected is type C so is NOT atomic.

    If we assume what I’ve said above, the specific issue in your code is that by passing in a non-atomic variable (expected), you guarantee it will unexpectedly modified or invalidated during that call. Then you break that guarantee by deleting or modifying it in another thread. Ben’s code appears to account for this correctly.

    The compiler bugs listed only cover storing to expected in the success case where it would store the same value anyway.

  8. Or rather… you guarantee it will NOT unexpectedly modified/invalidated. Whoops.

  9. After reading both the SO as well as the above post. My initial reaction was that indeed Duncan and Anthony are both correct.

    However on second thoughts, I think Herb is also right. In this case for compare_exchange, I would rather say the atomicity was for the head node not for the node-next. Think of it this way suppose we have the following code which store the new value into the atomic_var and put the original value in the old_value.

    template
    void SuperStore(std::atomic& atmoic_var, const Ty1* new_value, Ty1* old_value);

    Obviously from the function signature, we safely say the the store to the atomic_var would be atomically safe, but to assume the write to old_value is safe would be a big no no. If it was atomic, old_value should have been std::atomic* old_value.

    I think to make the whole issue of compare_exchange better, ‘expected’ should have been an atomic variable.

    Or maybe the standard committee was referring to the atomicity of the calling object of compare_exchange_weak, rather than the atomicity of the compare_exchange_weak function parameters. It is just a guess and maybe I am wrong.

  10. In fact, I think Ducan is stretching the guarantee of atomicity to its function parameter which I don’t think was intended.

  11. I don’t think the issue is that implementations are failing to make the assignment to expected atomic with the compare/exchange. The problem is that the spec guarantees that, if the exchange occurs, then expected is _not_ written to. Thus if the exchange occurs then another thread which was waiting on the exchange can go ahead and access expected, knowing that the compare_exchange will not write to it and there will therefore be no race.

    But since actual implementations write to expected unconditionally real implementations introduce a race.

  12. My opinion on the issue is that the problem is not about race/unconditional write but more of sharing the ‘expected’ variable across thread. I would say there was never a guarantee in atomic write to ‘expected’. Thus I think trying to assume atomicity for ‘expected’ isn’t an appropriate thing. If there was never a guarantee to atomic write for ‘expected’ then we should not attempt to pass a variable to ‘expected’ which spans multiple thread.

    Think of it as a simple function:
    void foo(int* arg);

    In the above case, passing a variable to the function ‘arg’ which is shared across multiple thread is just asking for trouble.

    Whether function foo write arg unconditionally or does anything to it is beside the point. It doesn’t change the problem that passing in a variable used by multiple thread into arg is creating a problem of attempting atomicity/synchronisation on a variable/type not meant to do that.

  13. The write to expected does not need to be atomic. It should only be written to on failure, so if the compare-exchange succeeds, then other threads can touch the value passed as the “expected” parameter, because the compare-exchange will not do so (in a correct implementation).

    This is important when doing things like lock-free data structures, where the “expected” value may often be part of a node being added to the data structure by the compare-exchange. If the compare-exchange fails then the node is still not part of the data structure, so modifying expected is not a problem. If the compare-exchange succeeds then the node is now part of the data structure and may be accessed by other threads.

  14. BTW, I’ve started an email thread with the designers of the feature and Duncan and Anthony.

    Short update: I still think my answer is correct, but we’re checking on the intent of the feature and the standardese probably needs improving in any event.

    Longer update:

    I understood that the design of having compare_exchange_* take the “expected” value by reference, and update “expected” on failure to expose the old value, was only intended for convenience to simplify the calling loops which would otherwise always have to write an extra “reload” line of code.

    Now that “expected” is an lvalue, folks like Duncan are trying to use the success of compare_exchange_* to hand off ownership **of “expected” itself** to another thread. For that to be safe, if the compare_exchange_* succeeds then the thread that performed it must no longer read or write from “expected” else his technique contains a race.

    I don’t think this usage was ever intended to be supported, but I could be wrong (waiting to hear back from the feature’s designer), even if it wasn’t we may want to extend and add support for it now, and in any event I think we need to improve the standardese wording which is apparently unclear today.

  15. Herb,

    Has a defect report been filed on this (or do you know of one in the works)? The actual language of the spec does seem to imply much stronger guarantees than you are suggesting here. (In fact the actual language of the spec appears to guarantee behavior even stronger than Anthony’s library provides.)

  16. You might also want to check the thread on GCC mailing list where Linus is raising hell about atomics wording in C11 (and all the things around).

  17. He was free to raise his concerns during development of C11 and C++11. so I suggest skipping his rants.

  18. Update: Duncan and Anthony are quite correct, and now that I understand the issue I’ve discovered this affects code I’ve already written. So thank you again, Duncan!

    This post: Updated.

    Implementations: Fixed or being fixed. I believe the GCC libstd++, Clang libc++, and VC++ implementations already have fixed this in their current or next releases. (The good news is that I think the problem implementations inject a potential race but only in a very tiny window, so hopefully actual code won’t notice it while we get the fixes out, except that heavy stress does expose windows this small.)

    The standard: Being fixed. This will also be reviewed for the standard in WG21, possibly as soon as our SG1 meeting next week, and I believe everyone agrees we need to tighten the wording in the standard.

    Thanks again.

  19. @Anthony Williams have you also reported the bug to ICC? Didn’t see that being mentioned.

Comments are closed.