GotW #7b Solution: Minimizing Compile-Time Dependencies, Part 2

Now that the unnecessary headers have been removed, it’s time for Phase 2: How can you limit dependencies on the internals of a class?

 

Problem

JG Questions

1. What does private mean for a class member in C++?

2. Why does changing the private members of a type cause a recompilation?

Guru Question

3. Below is how the header from the previous Item looks after the initial cleanup pass. What further #includes could be removed if we made some suitable changes, and how?

This time, you may make changes to X as long as X‘s base classes and its public interface remain unchanged; any current code that already uses X should not be affected beyond requiring a simple recompilation.

//  x.h: sans gratuitous headers
//
#include <iosfwd>
#include <list>

// None of A, B, C, or D are templates.
// Only A and C have virtual functions.
#include "a.h" // class A
#include "b.h" // class B
#include "c.h" // class C
#include "d.h" // class D
class E;

class X : public A, private B {
public:
X( const C& );
B f( int, char* );
C f( int, C );
C& g( B );
E h( E );
virtual std::ostream& print( std::ostream& ) const;

private:
std::list<C> clist;
D d;
};

std::ostream& operator<<( std::ostream& os, const X& x ) {
return x.print(os);
}

 

Solution

1. What does private mean for a class member in C++?

It means that outside code cannot access that member. Specifically, it cannot name it or call it.

For example, given this class:

class widget {
public:
void f() { }
private:
void f(int) { }
int i;
};

Outside code cannot use the name of the private members:

 int main() {
auto w = widget{};
w.f(); // ok
w.f(42); // error, cannot access name "f(int)"
w.i = 42; // error, cannot access name "i"
}

 

2. Why does changing the private members of a type cause a recompilation?

Because private data members can change the size of the object, and private member functions participate in overload resolution.

Note that accessibility is still safely enforced: Calling code still doesn’t get to use the private parts of the class. However, the compiler gets to know all about them at all times, including as it compiles the calling code. This does increase build coupling, but it’s for a deliberate reason: C++ has always been designed for efficiency, and a little-appreciated cornerstone of that is that C++ is designed to by default expose a type’s full implementation to the compiler in order to make aggressive optimization easier. It’s one of the fundamental reasons C++ is an efficient language.

 

3. What further #includes could be removed if we made some suitable changes, and how? … any current code that already uses X should not be affected beyond requiring a simple recompilation.

There are a few things we weren’t able to do in the previous problem:

  • We had to leave a.h and b.h. We couldn’t get rid of these because X inherits from both A and B, and you always have to have full definitions for base classes so that the compiler can determine X‘s object size, virtual functions, and other fundamentals. (Can you anticipate how to remove one of these? Think about it: Which one can you remove, and why/how? The answer will come shortly.)
  • We had to leave list, c.h and d.h. We couldn’t get rid of these right away because a list<C> and a D appear as private data members of X. Although C appears as neither a base class nor a member, it is being used to instantiate the list member, and some have compilers required that when you instantiate list<C> you be able to see the definition of C. (The standard doesn’t require a definition here, though, so even if the compiler you are currently using has this restriction, you can expect the restriction to go away over time.)

Now let’s talk about the beauty of Pimpls.

 

The Pimpl Idiom

C++ lets us easily encapsulate the private parts of a class from unauthorized access. Unfortunately, because of the header file approach inherited from C, it can take a little more work to encapsulate dependencies on a class’ privates.

“But,” you say, “the whole point of encapsulation is that the client code shouldn’t have to know or care about a class’ private implementation details, right?” Right, and in C++ the client code doesn’t need to know or care about access to a class’ privates (because unless it’s a friend it isn’t allowed any), but because the privates are visible in the header the client code does have to depend upon any types they mention. This coupling between the caller and the class’s internal details creates dependencies on both (re)compilation and binary layout.

How can we better insulate clients from a class’ private implementation details? One good way is to use a special form of the handle/body idiom, popularly called the Pimpl Idiom because of the intentionally pronounceable pimpl pointer, as a compilation firewall.

A Pimpl is just an opaque pointer (a pointer to a forward-declared, but undefined, helper class) used to hide the private members of a class. That is, instead of writing this:

// file widget.h
//
class widget {
// public and protected members
private:
// private members; whenever these change,
// all client code must be recompiled
};

We write instead:

// file widget.h
//
#include <memory>

class widget {
public:
widget();
~widget();
// public and protected members
private:
struct impl;
std::unique_ptr<impl> pimpl; // ptr to a forward-declared class
};

// file widget.cpp
//
#include "widget.h"

struct widget::impl {
// private members; fully hidden, can be
// changed at will without recompiling clients
};

widget::widget() : pimpl{ make_unique<widget::impl>(/*...*/) } { }
widget::~widget() =default;

Every widget object dynamically allocates its impl object. If you think of an object as a physical block, we’ve essentially lopped off a large chunk of the block and in its place left only “a little bump on the side”—the opaque pointer, or Pimpl. If copy and move are appropriate for your type, write those four operations to perform a deep copy that clones the impl state.

The major advantages of this idiom come from the fact that it breaks the caller’s dependency on the private details, including breaking both compile-time dependencies and binary dependencies:

  • Types mentioned only in a class’ implementation need no longer be defined for client code, which can eliminate extra #includes and improve compile speeds.
  • A class’ implementation can be changed—that is, private members can be freely added or removed—without recompiling client code. This is a useful technique for providing ABI-safety or binary compatibility, so that the client code is not dependent on the exact layout of the object.

The major costs of this idiom are in performance:

  • Each construction/destruction must allocate/deallocate memory.
  • Each access of a hidden member can require at least one extra indirection. (If the hidden member being accessed itself uses a back pointer to call a function in the visible class, there will be multiple indirections, but is usually easy to avoid needing a back pointer.)

And of course we’re replacing any removed headers with the <memory> header.

We’ll come back to these and other Pimpl issues in GotW #24. For now, in our example, there were three headers whose definitions were needed simply because they appeared as private members of X. If we instead restructure X to use a Pimpl, we can immediately make several further simplifications:

#include <list>
#include "c.h" // class C
#include "d.h" // class D

One of these headers (c.h) can be replaced with a forward declaration because C is still being mentioned elsewhere as a parameter or return type, and the other two (list and d.h) can disappear completely.

Guideline: For widely-included classes whose implementations may change, or to provide ABI-safety or binary compatibility, consider using the compiler-firewall idiom (Pimpl Idiom) to hide implementation details. Use an opaque pointer (a pointer to a declared but undefined class) declared as struct impl; std::unique_ptr<impl> pimpl; to store private nonvirtual members.

 

Note: We can’t tell from the original code by itself whether or not X had (default) copy or move operations. If it did, then to preserve that we would need to write them again ourselves since the move-only unique_ptr member suppresses the implicit generation of copy construction and copy assignment, and the user-declared destructor suppresses the implicit generation of move construction and move assignment. If we do need to write them by hand, the move constructor and move assignment can be =defaulted, and the copy constructor and copy assignment will need to copy the Pimpl object.

After making that additional change, the header looks like this:

//  x.h: after converting to use a Pimpl
//
#include <iosfwd>
#include <memory>
#include "a.h" // class A (has virtual functions)
#include "b.h" // class B (has no virtual functions)
class C;
class E;

class X : public A, private B {
public:
~X(); // defined out of line
// and copy/move operations if X had them before

X( const C& );
B f( int, char* );
C f( int, C );
C& g( B );
E h( E );
virtual std::ostream& print( std::ostream& ) const;

private:
struct impl;
std::unique_ptr<impl> pimpl; // ptr to a forward-declared class
};

std::ostream& operator<<( std::ostream& os, const X& x ) {
return x.print(os);
}

Without more extensive changes, we still need the definitions for A and B because they are base classes, and we have to know at least their sizes in order to define the derived class X.

The private details go into X‘s implementation file where client code never sees them and therefore never depends upon them:

//  Implementation file x.cpp
//
#include <list>
#include "c.h" // class C
#include "d.h" // class D
using namespace std;

struct X::impl {
list<C> clist;
D d;
};

X::X() : pimpl{ make_unique<X::impl>(/*...*/) } { }
X::~X() =default;

That brings us down to including only four headers, which is a great improvement—but it turns out that there is still a little more we could do, if only we were allowed to change the structure of X more extensively. This leads us nicely into Part 3…

 

Acknowledgments

Thanks to the following for their feedback to improve this article: John Humphrey, thokra, Motti Lanzkron, Marcelo Pinto.

14 thoughts on “GotW #7b Solution: Minimizing Compile-Time Dependencies, Part 2

  1. Herb I guess you dont take wishlist requirements since you are not Santa, but it would be nice if VC++ compiler would support pimpl switch so people dont need to rewrite their code for daily hackery :) but still get the productivity boost and still get the perf in release mode without pimpling :) .

  2. In the requirements for #3 you say:
    * any current code that already uses X should not be affected beyond requiring a simple recompilation.

    However introducing a unique_ptr into X means that X no longer has value semantics, its copy constructor (as well as operator=) are disabled, this may affect current code that uses X

  3. @Motti: I did mention earlier in the article that “If copy and move are appropriate for your type, write those four operations to perform a deep copy that clones the impl state.” but I agree it bears repeating. I’ve added an extra paragraph and an extra comment line in the code. Note that not only does the move-only unique_ptr member disable compiler-generated copying by default, but the explicitly user-declared destructor also disables compiler-generated move by default. Thanks for the note!

  4. You can still remove one more header which is b.h . Just let the pimpl class X::impl inherit publicly from B and remove B as a private base of X . This would even work perfectly fine, if B had virtual functions which is excluded in the exercise.

  5. I just looked at the next question and figured out that it asks for what I already noticed. Sorry, I should have read on first. But here’s one more note: If both A and B inherit virtually from a common base (directly or indirectly) then the above change might not be as trivial. However, this is highly unlikely since B doesn’t have any virtual functions.

  6. Shouldn’t the standard headers come after our own header to avoid hiding some including errors in our own headers?

    And in the last code snippet you have “using namespace std” (that I don’t like very much) and then you have “std::list…”. Don’t you think you should avoid “using namespace std” in such a short code?

    Best regards

  7. @Marcelo: using namespace std; is perfectly fine (and should be encouraged) in an implementation file after all #includes. See C++ Coding Standards Item 59: “Don’t write namespace usings in a header file or before an #include.”

    However, you’re right that saying “std::” in front of the list member is now redundant. Removed, with credit. Thanks!

  8. @herb:
    “Note that not only does the move-only unique_ptr member disable compiler-generated copying by default, but the explicitly user-declared destructor also disables compiler-generated move by default. Thanks for the note!”
    I had no idea about this, if you dont consider it way tooo basic for gotw maybe you could write about that. also since it is 2014 if i could have one wish it is gotw on emplace_back vs push_back(for vector, i guess same conclusions apply to the other containers)

  9. Given that pimpl introduces an allocation anyway, I’ve never really seen what benefits it offers that couldn’t be provided more cleanly and flexibly by a factory returning a unique_ptr to a (probably abstract) base class. Pimpl trades vtable indirection for data-access indirection, which isn’t an obvious win, and makes subclassing a good deal uglier since you need to subclass both handle and body types.

    Could you expand on where you see the relative merits here?

  10. @Mike: For polymorphic classes your approach is an alternative. But for value classes (types being copyable and movable) a base class could introduce slicing problems. Instead of copying you would have to clone your value type. Hence normal copy semantics are impossible because of this implementation detail. The idea is to preserve the semantics of the class and only touch the internals in the private section of the class. Even in the case of a class hierarchy pimpling is sometimes preferred since it’s less code to write. Think about it: You have to create an interface class and mirror the functions there in the implementation class. This is not necessary when pimpling and only those functions are virtual which really need to be virtual giving the reader of your code an idea of intent. He or she can then know which functions may be customized.

  11. @Herb: regarding the using-directive — unless we’re talking about strictly scoped uses (e.g., within a function scope; but not within (still dangerous) namespace level), with time I’m becoming less and less inclined to agree with this. If anything, nowadays I find myself more inclined toward using-declaration — and even that only under extremely limited, specific circumstances (i.e., strictly for the idiomatic ADL pull-in technique: as in std::swap, std::begin, std::end, etc.).

    I’m talking here not solely from the legalistic perspective (“what is legal” / allowed by the standard — sure, no problems here), but also from the good software engineering principles perspective (what is right/maintainable/extensible — and I do see some problems here; let me know what you think about these).

    For instance, we all agree that it has no place in the header files. However, if we do, then this itself has implications that (IMHO) go further:

    1. Readability — this may vary depending on a domain, etc., I’m a heavy user of Boost libraries, which use the same naming convention as the C++ Standard Library. In practice, symbol named “vector” may correspond to around dozen different names (and it’s not unusual at all to use standard C++ vector, Boost.MPL vector and, say, Boost.uBLAS vector, in one codebase). A single misplaced using-directive will make name clashes unavoidable and not always trivial to track down (diagnostic messages for this can be “funny” at times).

    Yes, I’m very careful about coming up with distinct, maximally self-explanatory variable (and typedefs!) names (some would perhaps say to the point of being obsessive about it), yes, the context helps, no, I don’t think that’s enough (especially given that programming is primarily a social activity and we have to take into account that other team members coming from different domains may find different contexts “obvious” — and other “arcane”).

    If taken to extreme, this may imply having to define (sub)library-specific prefixed names, as in pre-namespaces Qt (e.g., QVector) — isn’t that a problem that namespaces are precisely aiming to solve?

    2. Complexity — effectively, encouraging using-directive for implementation files amounts to encouraging to have two different naming conventions for cpp and hpp/ipp files — multiple naming conventions raise overall coding style complexity, while one naming convention brings the advantage of a consistent, uniform style that can be used throughout the entire team/organization (independently of member/team-specific domains).

    3. Maintenance & refactoring — again, maintaining different naming conventions for cpp and hpp/ipp files means having to change (rename) the unqualified names each time we’re moving code from implementation (or specific application) to the interface (or general library); this makes the code less refactoring-friendly (again, YMMV, but another reason I may be biased here is that I routinely see “write a reusable library” as the preferred way to solve programming problems; anything that distracts from this ultimately does more harm than good).

    I do recall Item 59 stating “which would be onerous, and frankly people just won’t put up with it.” However, it seems that (mostly) header-only libraries, like Boost, have empirically demonstrated that people do put up with it after all. Yes, it is a few extra keystrokes, but personally I find the using-directive readability, complexity, maintenance, and refactoring costs to be more onerous. Given that most code is read/maintained more often that it’s written from the scratch, isn’t optimizing the no. of keystrokes a classic case of premature optimization (on the suboptimal metric at that)?

    Let me know if I’m wrong, perhaps I am missing something?

  12. @mttpd In most of my production code I don’t use using namespace. However, sometimes I do and I find my code more readable because of it. Examples are using namespace std::chrono or some Boost namespaces which are tedious to spell out all the time. Sometimes it is idiomatic to write

    using namespace std;
    swap( x, y );
    

    or similar code as you already noted yourself. Name clashes can be avoided because it’s YOUR code and you can change it if it becomes necessary. I don’t believe it makes sense to produce another guideline to avoid using namespace in cpp files. Sometimes it is better to do so, sometimes it is not. Use your programmer gut feeling and don’t sweat the small stuff. After all it’s a matter of taste.

  13. @mttpd

    I regularly do what you argue against. I wish there was a file-scope using directive so I could make the header consistent with the implementation file. I do find the need to be explicit, in headers, frustrating, because I’d much rather use shorter names when there’s no ambiguity.

    I use typedefs in headers (mostly in class definitions), namespace aliases, and using declarations along with using directives, so you’d probably dislike my code as a result. Despite all of that, however, I don’t see that this approach increases complexity. I’ll grant that one must examine context to know, with certainly, which vector some code means when it isn’t scope resolved, and I do find myself forgetting to switch to fully qualified names in headers when I copy a type from the implementation file. Nevertheless, I’d hate to encourage or, worse still, require full qualification in all or most contexts. After all, namespaces are given long names, in many cases, specifically to reduce conflicts and because there are tools to obviate the line noise.

    Speaking of line noise, you noted that code is read more than written, so one shouldn’t be concerned with typing extra characters. That isn’t my reason for avoiding the fully qualified names. The extra characters on a line are usually not needed to increase code clarity. Indeed, I find that they normally detract from readability. You invoked Boost as disclaiming Herb’s assertion that “people just won’t put up with” the long names. The problem is that Boost must use fully qualified names in its headers, but in no way does that impose the same on Boost users. (I use many Boost libraries and use all of the tools at my disposal to shorten names when possible.)

Comments are closed.