Reader Q&A: Acquire/release and sequential consistency

Herb Sutter C++, Concurrency 2013-10-28 2 Minutes

Reader Ernie Cohen emailed me this morning to ask a question about one slide in my atomic<> Weapons talk from last year’s C++ and Beyond:

In your atomic weapons talk (part 1) (updated 2/15/2013) ,page 18, titled “Sc > Acq/Rel Alone: Some examples”, the first example listed “transitivity/causality”:

T0: g = 1; x = 1;

T1: if (x == 1) y = 1;

T2: if (y == 1) assert(g == 1);

I understood you to mean that the assertion might fail if the loads were simple C++11 acquires and the stores were simple C++ releases. But this works just fine with the weaker memory order; the operations in each thread are related by sequenced-before, the communications between the threads create happens-before, and without consumes happens-before is transitive, so there is a happens-before edge from g = 1 to the assertion. Am I missing something?

[Note: g is an ordinary variable, x and y are std::atomic, and all initially zero as usual.] The motivation behind this example, and the other example on the same slide, was to show that when we specified the C++ memory model and atomics, we had to consider more than individual acquire-release pairs in isolation, but also provide additional guarantees to ensure that the whole program was sequentially consistent (SC).

In the above example, yes, we guarantee that the assertion cannot fail with C++ acquire and release semantics, and making sure the memory model required this transitivity is exactly one of the two key points of this example. As you point out, it requires getting the “right” answer when combining sequenced-before and happens-before.

The second point illustrated here is that it was essential to support cases where the programmer could depend on reasoning based on tests of whether a particular write was read and then making SC assumptions based on the outcome of the test, otherwise the whole program wouldn’t be SC.

For completeness, the other example on the slide showed an additional case where individual pairwise acquire/release alone was insufficient to guarantee SC outcomes unless we added requirements. Here is the example, with x and y std::atomic and initially zero:

T1: x = 1;

T2: y = 1;

T3: if( x == 1 && y == 0 ) print( “x first” );

T4: if( y == 1 && x == 0 ) print( “y first” );

This illustrates the total store order requirement: It must be impossible to print both messages, else the result wouldn’t be SC.

Note that in most cases using (non-SC) memory_order_acquire and memory_order_release explicitly happens to give you SC results, except when they don’t (e.g., Dekker’s fails, and I think the second example above fails as well). And of course other relaxed atomics can allow non-SC results at the drop of a hat.

Published by Herb Sutter

Herb Sutter is an author and speaker, a technical fellow at Citadel Securities, and serves as chair of the Standard C++ Foundation and chair emeritus of the ISO C++ standards committee. View all posts by Herb Sutter

Published 2013-10-28

5 thoughts on “Reader Q&A: Acquire/release and sequential consistency”

Tony Van Eerd says:

2014-01-08 at 11:51 am

I think the important case is when b and c are global variables, possibly side-by-side in memory.

Herb says “it’s fine because all modern processors have single-byte reads so no need even to inject alignment/padding”

Bjarne says (in the FAQ) “However, most modern processors cannot read or write a single character, it must read or write a whole word”

I think the answer @Fernando is looking for is that, with concurrency now in the standard, it is up to the compiler to make sure it works. Typically that means either the processor can do single-byte reads _or_ the chars are padded/aligned. Or whatever else the compiler thinks is best (but almost definitely not locks, I hope!).

I wonder what a compiler would do on a machine that doesn’t have single-byte reads, but you set alignment/packing to 1?…
Herb Sutter says:

2013-10-31 at 11:40 am

@Fernando: C++98 didn’t cover this case because there was no notion of threads or other concurrency in the standard. Implementations generally did the right thing. Then when the standard introduced concurrency it also had to specify a memory model for concurrency.
Fernando Pelliccioni says:

2013-10-31 at 11:10 am

Herb, thanks for your answer.
This means that in C++98 we are also protected against such cases?
I think that the code is correct because ‘disjoint stacks’ even in pre-C++11. (Leaving out ‘escape analysis and constant propagation’)
Why is mentioned as something fixed in C++11 Memory Model?
I remember an Effective Concurrency article ( I don’t remember exactly which ) which deals with the same subject, is it possible?
Herb Sutter says:

2013-10-31 at 10:47 am

@Fernando: That code is fine and needs no special code generation for many reasons (disjoint stacks so those locals won’t be adjacent; escape analysis and constant propagation would eliminate c and b outright; even if c and b were static and laid out adjacent in memory it’s fine because all modern processors have single-byte reads so no need even to inject alignment/padding, etc.). Ah, and now I see Bjarne already answered this right in that FAQ. :)
Fernando Pelliccioni says:

2013-10-31 at 10:18 am

Hi Herb,

I’ve seen the Atomics Weapons (1 and 2) videos, I found great, thanks!
I have one doubt…

In all examples where you explain the Memory Model you’re using atomic declarations, right?

My question is:
How C++11 protects us from case like the following? (Extracted from Bjarne FAQ – http://www.stroustrup.com/C++11FAQ.html#memory-model )

// thread 1:
char c;
c = 1;
int x = c;

// thread 2:
char b;
b = 1;
int y = b;

…
“So, C++11 guarantees that no such problems occur for “separate memory locations.” More precisely: A memory location cannot be safely accessed by two threads without some form of locking unless they are both read accesses.”
…

Does this mean that a C++11 compliance compiler must insert “lock” instructions (barrier, fences, adquire/release, etc…) to protect NON-Shared NON-atomic memory?
Or… is this solved assuming that there must be ‘Cache Coherency’ implementation?
Or….the Standard imposes the existence of ‘Cache Coherency’?

Thanks and regards,
Fernando Pelliccioni,

Comments are closed.