GotW #95: Thread Safety and Synchronization

Herb Sutter C++ 2014-01-062014-01-06 1 Minute

This GotW was written to answer a set of related frequently asked questions. So here’s a mini-FAQ on “thread safety and synchronization in a nutshell,” and the points we’ll cover apply to thread safety and synchronization in pretty much any mainstream language.

Problem

JG Questions

1. What is a race condition, and how serious is it?

2. What is a correctly synchronized program? How do you achieve it? Be specific.

Guru Questions

3. Consider the following code, where some_obj is a shared variable visible to multiple threads.

// thread 1 (performs no additional synchronization)
code_that_reads_from( some_obj );  // passes some_obj by const &

// thread 2 (performs no additional synchronization)
code_that_modifies( some_obj );    // passes some_obj by non-const &

If threads 1 and 2 can run concurrently, is this code correctly synchronized if the type of some_obj is:

(a) int?

(b) string?

(d) shared_ptr<widget>?

(e) mutex?

(f) condition_variable?

(g) atomic<unsigned>?

Hint: This is actually a two-part question, not a seven-part question. There are only two unique answers, each of which covers a subset of the cases.

4. External synchronization means that the code that uses/owns a given shared object is responsible for performing synchronization on that object. Answer the following questions related to external synchronization:

(a) What is the normal external synchronization responsibility of code that owns and uses a given shared variable?

(b) What is the “basic thread safety guarantee” that all types must obey to enable calling code to perform normal external synchronization?

5. Full internal synchronization (a.k.a. “synchronized types” or “thread-safe types”) means that a shared object performs all necessary synchronization internally within that object, so that calling code does not need to perform any external synchronization. What types should be fully internally synchronized, and why?

Published by Herb Sutter

Herb Sutter is an author and speaker, a technical fellow at Citadel Securities, and serves as chair of the Standard C++ Foundation and chair emeritus of the ISO C++ standards committee. View all posts by Herb Sutter

Published 2014-01-062014-01-06

15 thoughts on “GotW #95: Thread Safety and Synchronization”

nosenseetal says:

2014-01-09 at 1:13 pm

btw thou shalt not return reference to a member variable… idk if that is answer to one of this questions but if it is not it should be added… it is extremly easy to forgett about that when you decide to mutexify your class by adding a lock_guard to ever method. :)
Bartosz Milewski says:

2014-01-09 at 11:47 am

As it often happens in OO programming, there is some confusion about what is meant by “object”. From the point of view of concurrency, a mutex protecting an object protects the transitive closure of all the pointers in that data structure. But that’s probably not what most people understand as an object, so the problem of pointers (references) going out of an object arises. If these pointers point to data that is independently shared — meaning: not only through the parent object — then there’s need for additional synchronization. It’s cases like these that may lead to deadlocks, but that’s probably a topic for another GotW.
Ralph Tandetzky says:

2014-01-08 at 3:44 pm

1. What is a race condition, and how serious is it?

Well, a race condition is the unpredictable behavior of a piece of software due to a dependence on sequence and timing of different threads or processes. A data race is a race condition that follows from a write access to a non-atomic variable and access by an other thread (read or write) to that same variable. Access to atomic variables from multiple threads can still lead to race condition. However, as long as the end result of the program does not depend on it we’re fine and the program is correctly synchronized.
A data race as defined above is much more dangerous since it implies undefined behavior. And undefined behavior means that your program may actually do **anything** including sending an e-mail to your boss that you quit or starting world war III against mankind. So a data race is indeed very serious!

2. What is a correctly synchronized program? How do you achieve it? Be specific.

A function is correctly synchronized, if the result of it does not depend on timing of threads or processes. A program is correctly synchronized if it consists of correctly synchronized functions only. That means that at least shared access to a common data amoung multiple threads must be safe: either the variables are atomic or it is ensured that while one thread writes to a location there are no accesses to the same location from other threads at the same time. Typically this is achieved by protecting the data by a mutex.

3. The types in (a)-(d) are not thread safe in the sense that concurrent read and write access is guaranteed to be without data race. Therefore, the program will not be correctly synchronized in these cases.

The types in (e)-(g) are thread safe. Hence, there will be no data race as defined above. However, the execution of the program may still depend on the order in which the variable is modified and read by the two functions. Therefore, the program might be correctly synchronized but it is also possible that the program is not correctly synchronized and there’s a latent hard to detect bug. (Even though there’s no undefined behavior.)

4. (a) What is the normal external synchronization responsibility of code that owns and uses a given shared variable?

The code must make sure that whenever there is a write to that variable there are no concurrent read or write accesses to it by other threads. Concurrent read accesses are allowed.

4. (b) What is the “basic thread safety guarantee” that all types must obey to enable calling code to perform normal external synchronization?

All access to a const object of that type must be thread-safe. Furthermore, if there are multiple instances of that type then it must be possible to access (read or write) different objects concurrently as long as each write to each object is mutually exclusive. In other words: As long as one thread has write access to an object there must be no other accesses from other threads to that objects, but other threads may access other objects at the same time.

This is what the standard library guarantees for all its types and functions (with only a few exceptions like rand()). The standard library also expects this from user types, for example when used as template arguments.

4. (c) What partial internal synchronization can still be required within the shared variable’s implementation?

If functions of a type access static or global data, then this access must be protected in order to provide the basic thread safety guarantee. Also different instances of a type may internally refer to some shared state. Access to that shared state must also be protected, if writes are possible. An example for this would be shared_ptr. When copying a shared_ptr the two resulting smart pointers will internally refer to the same object **and** the same reference counter. It is expected from the user to protect accesses to the pointee, but accesses to the reference counter must be synchronized internally. That’s the reason shared_ptr may incur a performance penalty compared to unique_ptr which is as efficient as a raw pointer in most cases.

5. What types should be fully internally synchronized, and why?

Types that are used in order to communicate between different threads or that enable synchronization between threads such as mutexes, condition variables, futures and concurrent containers should be internally synchronized. For almost all other types internal synchronization is not necessary.

There’s the so called monitor pattern which in essence turns a type providing the basic thread safety guarantee into a thread-safe type by adding a mutex and protecting every access to an instance by locking that mutex. This serializes the data accesses. Sometimes this kind of behavior is safer, but not very useful. An example are streams: E. g. writing to cout is thread-safe, but the letters arriving in the standard output might be interleaved. (That’s a race condition, but no data race as defined above. The program is not correctly synchronized.) That’s a mess with different threads and external synchronization is the only thing that really helps this issue.

For normal types there is no use in making them thread-safe, but it is usually better to let the caller decide on how to synchronize access. Most types don’t need to know about multi-threading. It a bit like exceptions. Most types don’t need to worry about exceptions, they let the caller take care of them. Don’t try to give every type the responsibility of synchronizing itself. Most often it is not necessary and it only slows down the program and external synchronization, if at all necessary, is the better alternative.

From my experience hardly any types need internal synchronization. Exceptions are typically types that are used for synchronization or types that have only one global instance (such as the standard output stream or particular pieces of hardware).
nosenseetal says:

2014-01-08 at 3:15 pm

@Bartosz nice A, I think Herb in 5 is aiming for stuff from his talk you dont know const and mutable…
and his M&M rule, mutexes and mutable, aka you should make your mutex mutable so external callers can call const methods although those methods modify the mutex by locking it…
cmeerw says:

2014-01-08 at 7:33 am

I guess for question 4 you have your “You Don’t Know const and mutable” talk in mind (i.e. “const” means thread-safe)
Luís Marques says:

2014-01-08 at 5:59 am

Weird. This second time my comment appears and says explicitly that “Your comment is awaiting moderation.” Must have been a bug the first time…
Luís Marques says:

2014-01-08 at 5:54 am

Ah, my comment now appeared (was it stuck in moderation limbo? wordpress would only say I was repeating myself if I posted it again). When I posted my comment Bartosz Milewski had not replied yet, I see now that I was not too off base :-) Bartosz, please bring your wizardry back to the D world!
Xin Huang says:

2014-01-08 at 4:30 am

I agree most on Bartosz’s reply, except for 4(c) which I don’t quite understand.

A little comment on 3(a):

It’s not safe to access an int concurrently, even only 1 thread will write. Because you will never know what the compiler would do, it can just load the int into one register and never read it again; or it can rewrite the int value back to the memory. (yes, unchanged value! See http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong)

There is no benign data races.
Łukasz Łucek says:

2014-01-08 at 2:13 am

Let me try my luck :) I should state that i’m not a native english speaker so don’t mind my writing problems.

So for the first two questions I think the answers are allready in the comments.

For question 3: mutex, condition_variable, and atomic are ok. Int is close but not quite there, because of possible code reordering and lack of sequential consistency.

4. (a) The tesponsibility of the code that uses a shared object is to make sure that when some thread is wrinting to an object it is the only thread that has access to this object. No other thread should be allowed acces to object neither for writing nor reading. On the other hand there can be many reader threads accessing the object in the same time, but as long as noone is writing to it.

4. (b) I belive that this “basic thread safety guarantee” is the constness of an object. That means that the type which is to be synchronized cannot change a single bit of itself in const methods. Especially copy constructor and comparison operations cannot change anything in the memory of an object.

4. (c) I Made a mistake in answer to (b), but it is on purpouse. An object is allowed to change its state (bits in memory) in the const operations as long as those changes are internally synchronizeb by that object. Only then this object can be properly externally synchronized.

5. My guess is that you should design types to be thread-safe if you intend to share objects of this type across threads :). If cost of designing and using such a type is lower than a cost of synchronizing objects of this type externally then go for it – the code using your type will be much nicer to read, and less error prone :). The tricky question is: Is it possible to make internal synchronization without signifficant cost on performance? For example can you afford to lock/unlock a mutex every time you use your variable.
Luís Marques says:

2014-01-07 at 9:14 pm

My C++ is very rusty, but here is my 5:00 AM attempt!

1. Consider a concurrent system, where some data can be accessed concurrently by two or more threads, at least one of which might modify the shared data. A race condition is present in the program if it can result in a pattern of interleaved reads and writes that breaks the guarantees of the language or of the problem domain.

A race condition is very serious for various reasons. If we have a data race we are subject to undefined behavior in theory, and to torn writes in practice. Surely we do not want to access, say, a half written pointer? (In x86 perhaps that could happen in real-mode, when writing to a “far” pointer, if the segment and the offset were written by different instructions? Or a misaligned pointer? Particularly if it crosses cache lines and pages?). Even if we do not break the type system (perhaps “the language rules” might be more accurate?) a race condition in the problem domain logic is highly problematic, because the defect will tend to be very hard to reproduce and isolate, due to its non-deterministic nature.

2. A correctly synchronized program is one which is free of data races and where you observe the relevant data with the causality guarantees required by your program / problem domain (I suppose that second part is not required by the “data race free” first part?). Generally those causality guarantees would be achieved by providing sequential consistency on top of the more fundamental data synchronization steps? (the “atomic” bit in the atomics, as opposed to the ordering guarantees and in contrast with relaxed atomics?) (BTW, does that mean we have “total order for the parts of the program that matter to the relevant other parts”, if we don’t use relaxed atomics? Now that’s a confusing question!).

You achieve correct synchronization by first ditching your old broken compiler, and then not inserting race conditions of your own :-). That last part is done by either not mutating your data or by synchronizing access to it. That means you either make your objects internally synchronized (e.g., use an atomic, or require the object to be fundamentally thread-safe, like a mutex) or you use an external synchronization primitive, like a lock, to serialize and order your accesses.

3. (a). I suppose that this is not safe / correctly synchronized, because you have a data race and an int is not internally synchronized (is it?). BTW, (assuming that is correct) I would imagine that the Java memory model departs the C++ memory model here, and that you could not observe a torn write to an int (or other undefined behaviours). Is this a correct guess? If so, I suppose that implies you can only reasonably (with acceptable performance) implement concurrent Java in a system where the native types are guaranteed by the host system to be atomic (not in the C++ sense of providing ordering guarantees)

3. (b-d) Also not safe, but you can modify elements of the containers that you know won’t accessed by thread 1?

3. (e-g) These are correctly synchronized because they are internally synchronized?

It’s getting way too late, I’ll continue tomorrow/soon (I don’t know how long after the questions are the answers posted). I hope I was not too off base :-)
Bartosz Milewski says:

2014-01-07 at 6:26 pm

Since nobody raises their hand, let me give it a try.

1. I think there is some subtlety in the nomenclature: data race vs. race condition, so without looking at the Standard: More than one thread accessing the same memory location without synchronization, at least one of them writing. (The tricky part is the definition of synchronization.)

It’s deadly serious because a data race may go undetected for a long time until one day it causes a disaster.

2. A correctly synchronized program has no data races. You achieve it by making sure that all accesses to shared mutable memory are properly synchronized. The simplest approach is to associate a mutex with every such object (while trying to avoid deadlocks). If you’re adventurous, you may use atomics. If you are reckless, you may use weak atomics.

3. a, b, c, and d are not correctly synchronized. None provides its own synchronization. There is part of shared_ptr that’s thread safe — the reference counter — but that doesn’t protect the widget itself.

Mutex provides synchronization for other data and is designed to be accessed from multiple threads. So is condition_variable, although it can only be used in tandem with a mutex. That mutex, however, is not there to protect the condition_variable itself, but the data that’s used for signaling. Atomic variables are synchronized.

4 a. The code should provide mutual exclusion. Any access, read or write, must occur under a lock.

4 b. I’m not sure I understand this question. You probably have something very specific in mind. Shared data has to have the right granularity in order to be correctly protected — you can’t protect a single bit because there is no way to write a bit without touching other bits in the word.

4 c. This must be a tricky question. I would think that under total mutual exclusion you wouldn’t need any additional synchronization. Are you thinking of (multiple) reader / writer locks?

5. I guess this is more of a design question. I guess if you are exporting objects from a multithreading library and your library must access them concurrently with your client, these objects should have internal synchronization.

I’m sure there is much more to it, but that should be enough to start a discussion.
Herb Sutter says:

2014-01-07 at 3:53 pm

The lack of answers is a sign that the question is either too easy or too hard. :) I doubt the former, and hopefully this GotW’s solution will help if it’s the latter…
sbohmann says:

2014-01-07 at 3:10 pm

Btw., Herb, would you say that C++ as of version /14 can now, as these high-level models for memory and concurrency have entered the stage, add CommonLisp to its tree of ancestors? :)
Sebastian Bohmann says:

2014-01-07 at 12:29 pm

There’s a truly baffling number of incredibly deep sub-questions and sub-topics in that :)

(I know I’m overusing adjectives here from a stylistic point of view, but gosh…)

To me it’s a bit like, now that C++ finally is getting comprehensive abstract memory and threading models plus all the necessary abstractions in the STL, please go and learn everything possible about it… and I most certainly will attempt that, but I’m so looking forward to Herb’s answer post as a starting point ;)
Pingback: Guru of the week 7 | C++, Qt, OpenGL, CUDA