GotW #100: Compilation Firewalls (Difficulty: 6/10)

[This is a C++11-updated version of the original GotW #24.]

JG Questions

1. What is the Pimpl Idiom, and why is it useful?

Guru Questions

2. What is the best way to express the basic Pimpl Idiom in C++11?

3. What parts of the class should go into the impl object? Some potential options include:

put all private data (but not functions) into impl;
put all private members into impl;
put all private and protected members into impl;
put all private nonvirtual members into impl;
put everything into impl, and write the public class itself as only the public interface, each implemented as a simple forwarding function (a handle/body variant).

What are the advantages/drawbacks of each? How would you choose among them?

4. Does the impl require a back pointer to the public object? If yes, what is the best way to provide it? If not, why not?

Solution

1. What is the Pimpl Idiom, and why is it useful?

In C++, when anything in a header file class definition changes, all users of that class must be recompiled – even if the only change was to the private class members that the users of the class cannot even access. This is because C++’s build model is based on textual inclusion, and because C++ assumes that callers know two main things about a class that can be affected by private members:

Size and Layout: The calling code must know the size and layout of the class, including private data members. This constraint of always being able to see implementations incurs the cost of more tightly coupling callers and callees, but is central to C++’s object model and philosophy because guaranteeing that the compiler has direct access to objects by default is an (perhaps “the”) essential ingredient in enabling C++ to achieve its famed heavily-optimizable efficiency.
Functions: The calling code must be able to resolve calls to member functions of the class, including inaccessible private functions that overload with nonprivate functions — if the private function is a better match, the calling code will fail to compile. (C++ took the deliberate design decision to perform overload resolution before accessibility checking for safety reasons. For example, it was felt that changing the accessibility of a function from private to public shouldn’t change the meaning of legal calling code.)

To reduce these compilation dependencies, a common technique is to use an opaque pointer to hide some of the implementation details. Here’s the basic idea:

// Pimpl idiom - basic idea
class widget {
    // :::
private:
    struct impl;		// things to be hidden go here
    impl* pimpl_;		// opaque pointer to forward-declared class
};

Class widget uses a variant of the handle/body idiom. As documented by Coplien [1], handle/body was described as being primarily useful for reference counting of a shared implementation, but it also has more general implementation-hiding uses. For convenience, from now on I’ll call widget the “visible class” and impl the “Pimpl class.” [2]

One big advantage of this idiom is that it breaks compile-time dependencies. First, system builds run faster because using a Pimpl can eliminate extra #includes. I have worked on projects where converting just a few widely-visible classes to use Pimpls has halved the system’s build time. Second, it localizes the build impact of code changes because the parts of a class that reside in the Pimpl can be freely changed – that is, members can be freely added or removed – without recompiling client code. Because it’s so good at eliminating compilation cascades due to changes in only the now-hidden members, it’s often dubbed a “compilation firewall.”

But this leaves some questions option: Should pimpl be a raw pointer? What should go into the Pimpl class, anyway? So let’s look at these and other important details.

2. What is the best way to express the basic Pimpl Idiom in C++11?

Avoid using raw pointers and explicit delete. To express Pimpl using only standard C++ facilities, the most appropriate choice is to hold the Pimpl object by unique_ptr since the Pimpl object is uniquely owned by the visible class. Using unique_ptr also leads to the simplest code. [3]

// in header file
class widget {
public:
    widget();
    ~widget();
private:
    class impl;
    unique_ptr<impl> pimpl;
};

// in implementation file
class widget::impl {
    // :::
};

widget::widget() : pimpl{ new impl{ /*...*/ } } { }
widget::~widget() { }					// or =default

Note some key parts of the pattern:

Prefer to hold the Pimpl using a unique_ptr. It’s more efficient than using a shared_ptr, and correctly expresses the intent that the Pimpl object should not be shared.
Define and use the Pimpl object in your own implementation file. This is what keeps its details hidden.
In the visible class’ out-of-line constructor, allocate the Pimpl object.
You still need to write the visible class’ destructor yourself and define it out of line in the implementation file, even if normally it’s the same as what the compiler would generate. This is because although both unique_ptr and shared_ptr can be instantiated with an incomplete type, unique_ptr’s destructor requires a complete type in order to invoke delete (unlike shared_ptr which captures more information when it’s constructed). By writing it yourself in the implementation file, you force it to be defined in a place where impl is already defined, and this successfully prevents the compiler from trying to automatically generate the destructor on demand in the caller’s code where impl is not defined.
The above pattern does not make the visible class either copyable or movable by default, because C++11 is less eager to have the compiler generate default copying and moving operations for you. Because we’ve had to write a user-defined destructor, that turns off the compiler-generated move constructor and move assignment operator. If you do decide to supply copy and/or move, note that the copy assignment and move assignment operator need to be defined out of line in the implementation class for the same reason as the destructor.

A new advantage to Pimpl in C++11 is that Pimpl’d types are very move-friendly types because they only have to copy a single pointer value. C’est très cool.

Let’s consider some of the proposed options for what should go into the hidden Pimpl.

What’s in a Pimpl? [4]

3. What parts of the class should go into the impl object? Some potential options include:

put all private data (but not functions) into impl;

Option 1 (Score: 6 / 10): This is a good start, because now we can forward-declare any class which only appears as a data member, rather than #include the class’ actual definition which would make calling code depend on that too.

But there are drawbacks too: A slight annoyance is that in the implementation of the visible class we still need to write pimpl-> all the time. A more important annoyance is that we still recompile the world when we add or remove private member functions, and in rare cases those can also still interfere with overload resolution if they overload with non-private functions.

Can we do better?

put all private members into impl;

Option 2 (Score: 9 / 10): This is (almost) my usual practice these days. After all, in C++, the phrase “outside code shouldn’t and doesn’t care about these parts” is spelled private.

There are three caveats, the first of which is the reason for my “almost” above:

You can’t hide virtual member functions in the Pimpl, even if the virtual functions are private. If the virtual function overrides one inherited from a base class, then it must appear in the actual derived class; this is true even if it is final. If the virtual function is new, then it must still appear in the visible class in order to be available for overriding by further derived classes.
Functions in the Pimpl may require a “back pointer” to the visible object if they need to in turn use visible functions, which adds another level of indirection.
Often a good compromise is to use Option 2, and additionally put into the Pimpl only those non-private functions that need to be called by the private ones (see the “back pointer” comments below).

put all private and protected members into impl;

Option 3 (Score: 0 / 10): Taking this extra step to include protected members is actually wrong. Like virtual members, protected members should never go into a Pimpl since putting them there makes them worthless. After all, protected members exist specifically to be seen and used by derived classes, and so aren’t nearly as useful if derived classes can’t see or use them; derived types would be forced to also know about and derive from the Pimpl type and maintain a parallel two-object hierarchy.

However, note that there can be valid reasons to put virtuals into a Pimpl-like “body/implementation” class, but not for the same reasons as the motivation of the Pimpl idiom – that arises in the Bridge pattern [5] which is about splitting a class into two parts both of which may carry implementation and be independently extensible via virtual functions. But that’s a different pattern with a different motivation than Pimpl.

put all private nonvirtual members into impl;

Option 4 (Score: 10 / 10): This is the ideal. To reduce the need for storing or passing a back pointer, you may also put into the Pimpl any public functions that the private functions call, with passthroughs to them in the visible class. However, you won’t be able to move the protected or virtual functions into the Pimpl, as noted above.

put everything into impl, and write the public class itself as only the public interface, each implemented as a simple forwarding function (a handle/body variant).

Option 5 (Score: 8 / 10 in restricted cases): This is useful in a few restricted cases, and has the benefit of avoiding a back pointer since all services are available within the Pimpl class. The chief drawbacks are that it requires an extra wrapper function call and normally makes the visible class useless for inheritance.

What are the advantages/drawbacks of each? How would you choose among them?

The complete answer is actually much simpler than all the discussion we just had above. Instead of empirically analyzing specific combinations, we need to back up and answer the question from first principles.

Deep breath.

Exhale.

Okay.

The key observation is that there are exactly three parts to a class in any OO language. [6] They are:

The interface for callers = public members. This is all that outside callers can see and use.
The interface for derivers = protected or virtual members. This is what only derived classes can see and use.
Everything else = private and nonvirtual members. This is all the things that are just implementation details, by definition.

Only #3, and all of #3, is what can and should be hidden in Pimpl. From this we can derive all the other results mentioned above; for example, we can’t put virtuals into the Pimpl because those are part of #2 and need to be visible to the derived classes.

The accompanying table summarizes these design choices. Pimpl as described here covers the derivation interface too, which Coplien’s Handle/Body omits. And Bridge, while similar in some ways to Pimpl, has a very different motivation and structure.

4. Does the impl require a back pointer to the public object? If yes, what is the best way to provide it? If not, why not?

The answer is: Sometimes, unhappily, yes. After all, what we’re doing is (somewhat artificially) splitting each object into two halves for the purposes of hiding one part.

Consider: Whenever a function in the visible class is called, usually some function or data in the hidden half is needed to complete the request. That’s fine and reasonable. But as discussed already, sometimes a function in the Pimpl must call a nonprivate or virtual function in the visible class. In that case, it needs a pointer to the visible class.

There are two options:

Store a back pointer in the Pimpl. This incurs slight overhead and stores the pointer all the time when it may not be needed all the time. Also, when you repeat yourself you can lie – the back pointer can get out of sync if you’re not careful to maintain it correctly to point at the right visible object, for example during move operations which then could no longer be correctly =defaulted.
(Recommended) Pass this as a parameter to the Pimpl functions (e.g., pimpl->func(this, params) ). This incurs only brief space overhead on the stack (cheap) for the duration of the function call (brief), and can’t possibly get out of sync. However, it does mean adding an extra parameter to each hidden function.

Acknowledgments

Thanks to Edd, pizer, and Howard Hinnant for clarifying why ~unique_ptr<T> requires T to be a complete type, and therefore the need to write a user-defined external class destructor; to Stephan Lavavej and Alisdair Meredith for reminding me to write =default on the move constructor and move assignment operator; and to Howard Hinnant for pointing out that even with =default the move assignment operator must be defined out of line in the implementation file because it requires a complete type (in case it has to delete it).

Notes

[1] James O. Coplien. “C++ Idioms” (EuroPLoP98).

[2] At first I used to name the pointer variable impl_. The eponymous pimpl was actually coined in 1996 by friend and colleague Jeff Sumner, who shares my penchant for “p” prefixes for pointer variables as well as my occasional taste for horrid puns.

[3] This is the simplest expression of the pattern in C++11. The major alternatives would be to use a shared_ptr or a raw pointer instead of a unique_ptr, and both are more complex to write correctly and/or more error-prone when compiler-generated operations do the wrong thing: If you used a shared_ptr you’d get the right destructor, move constructor, and move assignment operator by default, but the compiler-generated copying operations would be wrong so you’d have to write those explicitly to provide them or =delete them (and if you forget you silently have the wrong semantics), and it would incur small but needless inefficiency for unused-but-still-present reference counting. And if you used a raw pointer, you have to write all five operations by hand – destructor, copy construction, copy assignment, move construction, and move assignment.

[4] Please don’t email me jokes about this subheading. I can imagine most of the answers.

[5] Gamma et al. Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1994).

[6] Not all need be present in a given class. For example, #2 doesn’t apply to value-like classes that don’t participate in inheritance.

Major Change History

2011-11-27: Removed move operations from the basic pattern. Since not all Pimpl’d types need to be move-aware, it’s not really part of the core pattern.