Feeds:
Posts
Comments

Archive for the ‘GotW’ Category

Managing dependencies well is an essential part of writing solid code. C++ supports two powerful methods of abstraction: object-oriented programming and generic programming. Both of these are fundamentally tools to help manage dependencies, and therefore manage complexity. It’s telling that all of the common OO/generic buzzwords—including encapsulation, polymorphism, and type independence—along with most design patterns, are really about describing ways to manage complexity within a software system by managing the code’s interdependencies.

When we talk about dependencies, we usually think of run-time dependencies like class interactions. In this Item, we will focus instead on how to analyze and manage compile-time dependencies. As a first step, try to identify (and root out) unnecessary headers.

Problem

JG Question

1. For a function or a class, what is the difference between a forward declaration and a definition?

Guru Question

2. Many programmers habitually #include many more headers than necessary. Unfortunately, doing so can seriously degrade build times, especially when a popular header file includes too many other headers.

In the following header file, what #include directives could be immediately removed without ill effect? You may not make any changes other than removing or rewriting (including replacing) #include directives. Note that the comments are important.

//  x.h: original header
//
#include <iostream>
#include <ostream>
#include <list>

// None of A, B, C, D or E are templates.
// Only A and C have virtual functions.
#include "a.h"  // class A
#include "b.h"  // class B
#include "c.h"  // class C
#include "d.h"  // class D
#include "e.h"  // class E

class X : public A, private B {
public:
       X( const C& );
    B  f( int, char* );
    C  f( int, C );
    C& g( B );
    E  h( E );
    virtual std::ostream& print( std::ostream& ) const;

  private:
    std::list<C> clist;
    D            d_;
  };

std::ostream& operator<<( std::ostream& os, const X& x ) {
    return x.print(os);
}

Solution

1. For a function or class, what is the difference between a forward declaration and a definition?

A forward declaration of a (possibly templated) function or class simply introduces a name. For example:

class widget;  // "widget" names a class 

widget* p;     // ok: allocates sizeof(*) space typed as widget*

widget  w;     // error: wait, what? how big is that? does it have a
               //        default constructor?

Again, a forward declaration only introduces a name. It lets you do things that require only the name, such as declaring a pointer to it—all pointers to objects are the same size and have the same set of operations you can perform on them, and ditto for pointers to nonmember functions, so the name is all you need to make a strongly-typed and fully-usable variable that’s a pointer to class or pointer to function.

What a class forward declaration does not do is tell you anything about what you can do with the type itself, such as what constructors or member functions it has or how big it is if you want to allocate space for one. If you try to create a widget w; with only the above code, you’ll get a compile-time error because widget has no definition yet and so the compiler can’t know how much space to allocate or what functions the type has (including whether it has a default constructor).

A class definition has a body and lets you know the class’s size and know the names and types of its members:

class widget { // "{" means definition
    widget();
    // ...
};

widget* p;     // ok: allocs sizeof(ptr) space typed as widget*

widget  w;     // ok: allocs sizeof(widget) space typed as widget
               //     and calls default constructor

2. In the following header file, what #include directives could be immediately removed without ill effect?

Of the first two standard headers mentioned in x.h, one can be immediately removed because it’s not needed at all, and the second can be replaced with a smaller header:

1. Remove iostream.

#include <iostream>

Many programmers #include <iostream> purely out of habit as soon as they see anything resembling a stream nearby. Class X does make use of streams, that’s true; but it doesn’t mention anything specifically from iostream, which mainly declares the standard stream objects like cout. At the most, X needs ostream alone for its basic_ostream type, and even that can be whittled down as we will see.

Guideline: Never #include unnecessary header files.

2. Replace ostream with iosfwd.

#include <ostream>

Parameter and return types only need to be forward-declared, so instead of the full definition of ostream we really only need its forward declaration.

However, you can’t write the forward declaration yourself using something like class ostream;. First, ostream lives in namespace std in which you can’t redeclare existing standard types and objects. Second, ostream is an alias for basic_ostream<char> which you couldn’t reliably forward-declare even if you were allowed to because library implementations are allowed to do things like add their own extra template parameters beyond those required by the standard that of course your code wouldn’t know about—which is one of the primary reasons for the rule that programmers aren’t allowed to write their own declarations for things in namespace std.

All is not lost, though: The standard library helpfully provides the header iosfwd, which contains forward declarations for all of the stream templates and their standard aliases, including basic_ostream and ostream. So all we need to do is replace #include <ostream> with #include <iosfwd>.

Guideline: Prefer to #include <iosfwd> when a forward declaration of a stream will suffice.

Incidentally, once you see iosfwd, one might think that the same trick would work for other standard library templates like string and list. There are, however, no comparable “stringfwd” or “listfwd” standard headers. The iosfwd header was created to give streams special treatment for backwards compatibility, to avoid breaking code written in years past for the “old” non-templated version of the iostreams subsystem. It is hoped that a real solution will come in a future version of C++ that supports modules, but that’s a topic for a later time.

There, that was easy. We can now move on to…

… what? “Not so fast!” I hear some of you say. “This header does a lot more with ostream than just mention it as a parameter or return type. The inlined operator<< actually uses an ostream object! So it must need ostream‘s definition, right?”

That’s a reasonable question. Happily, the answer is: No, it doesn’t. Consider again the function in question:

std::ostream& operator<<( std::ostream& os, const X& x ) {
    return x.print(os);
}

This function mentions an ostream& as both a parameter and a return type, which most people know doesn’t require a definition. And it passes its ostream& parameter in turn as a parameter to another function, which many people don’t know doesn’t require a definition either—it’s the same as if it were a pointer, ostream*, discussed above. As long as that’s all we’re doing with the ostream&, there’s no need for a full ostream definition—we’re not really using an ostream itself at all, such as by calling functions on it, we’re only using a reference to type for which we only need to know the name. Of course, we would need the full definition if we tried to call any member functions, for example, but we’re not doing anything like that here.

So, as I was saying, we can now move on to get rid of one of the other headers, but only one just yet:

3. Replace e.h with a forward declaration.

#include "e.h"  // class E

Class E is just being mentioned as a parameter and as a return type in function E h(E), so no definition is required and x.h shouldn’t be pulling in e.h in the first place because the caller couldn’t even be calling this function if he didn’t have the definition of E already, so there’s no point in including it again. (Note this would not be true if E were only a return type, such as if the signature were E h();, because in that case it’s good style to include E’s definition for the caller’s convenience so he can easily write code like auto val = x.h();.) All we need to do is replace #include “e.h” with class E;.

Guideline: Never #include a header when a forward declaration will suffice.

That’s it.

You may be wondering why we can’t get rid of the other headers yet. It’s because to define class X means you need to know its size in order to know how much space to allocate for an X object, and to know X’s size you need to know at least the size of every base class and data member. So we need the definitions of A and B because they are base classes, and we need the header definitions of list, C, and D because they are used to define the data members. How we can begin to address some of these is the subject of Part 2…

 

Acknowledgments

Thanks to the following for their feedback to improve this article: Gennaro, Sebastien Redl, Emmanuel Thivierge.

Read Full Post »

GotW #7a: Minimizing Compile-Time Dependencies, Part 1

Managing dependencies well is an essential part of writing solid code. C++ supports two powerful methods of abstraction: object-oriented programming and generic programming. Both of these are fundamentally tools to help manage dependencies, and therefore manage complexity. It’s telling that all of the common OO/generic buzzwords—including encapsulation, polymorphism, and type independence—along with the lion’s share of design patterns, are really about describing ways to manage complexity within a software system by managing the code’s interdependencies.

When we talk about dependencies, we usually think of run-time dependencies like class interactions. In this Item, we will focus instead on how to analyze and manage compile-time dependencies. As a first step, try to identify (and root out) unnecessary headers.

Problem

JG Question

1. For a function or a class, what is the difference between a forward declaration and a definition?

Guru Question

2. Many programmers habitually #include many more headers than necessary. Unfortunately, doing so can seriously degrade build times, especially when a popular header file includes too many other headers.

In the following header file, what #include directives could be immediately removed without ill effect? You may not make any changes other than removing or rewriting #include directives. Note that the comments are important.

//  x.h: original header
//
#include <iostream>
#include <ostream>
#include <list>

// None of A, B, C, D or E are templates.
// Only A and C have virtual functions.
#include "a.h" // class A
#include "b.h" // class B
#include "c.h" // class C
#include "d.h" // class D
#include "e.h" // class E

class X : public A, private B {
public:
X( const C& );
B f( int, char* );
C f( int, C );
C& g( B );
E h( E );
virtual std::ostream& print( std::ostream& ) const;

private:
std::list<C> clist;
D d_;
};

std::ostream& operator<<( std::ostream& os, const X& x ) {
return x.print(os);
}

Read Full Post »

Toward correct-by-default, efficient-by-default, and pitfall-free-by-default variable declarations, using “AAA style”… where “triple-A” is both a mnemonic and an evaluation of its value.

Problem

JG Questions

1. What does this code do? What would be a good name for some_function?

template<class Container, class Value>
void some_function( Container& c, const Value& v ) {
    if( find(begin(c), end(c), v) == end(c) )
        c.emplace_back(v); 
    assert( !c.empty() );
}

2. What does “write code against interfaces, not implementations” mean, and why is it generally beneficial?

Guru Questions

3. What are some popular concerns about using auto to declare variables? Are they valid? Discuss.

4. When declaring a new local variable x, what advantages are there to declaring it using auto and one of the two following syntaxes:

(a) auto x = init; when you don’t need to commit to a specific type? (Note: The expression init might include calling a helper that performs partial type adjustment, such as as_signed, while still not committing to a specific type.)

(b) auto x = type{ init }; when you do want to commit to a specific type by naming a type?

List as many as you can. (Hint: Look back to GotW #93.)

5. Explain how using the style suggested in #4 is consistent with, or actively leverages, the following other C++ features:

(a) Heap allocation syntax.

(b) Literal suffixes, including user-defined literal operators.

(c) Named lambda syntax.

(d) Function declarations.

(e) Template alias declarations.

6. Are there any cases where it is not possible to use the style in #4 to declare all local variables?

Solution

1. What does this code do? What would be a good name for some_function?

template<class Container, class Value>
void append_unique( Container& c, const Value& v ) {
    if( find(begin(c), end(c), v) == end(c) )
        c.emplace_back(v); 
    assert( !c.empty() );
}

Let’s call this function append_unique. First, it checks to see whether the value v is already in the container. If not, it appends it at the end. Finally, it asserts that c is not empty, since by now it must contain one copy of the value v.

You probably thought this question was fairly easy.

Maybe too easy.

If so, good. That’s the point of the example. Hold the thought, and we’ll come back to this in Question 3.

2. What does “write code against interfaces, not implementations” mean, and why is it generally beneficial?

It means we should care principally about “what,” not “how.” This separation of concerns applies at all levels in high-quality modern software—hiding code, hiding data, and hiding type. Each increases encapsulation and reduces coupling, which are essential for large-scale and robust software.

Please indulge a little repetition in the following paragraphs. It’s there to make a point about similarity.

Hiding code. With the invention of separately compiled functions and structured programming, we gained “encapsulation to hide code.” The caller knows the signature only—the function’s internal code is not his concern and not accessible programmatically, even if the function is inline and the body happens to be visible in source code. We try hard not to inadvertently leak implementation details, such as internal data structure types. The point is that the caller does not, and should not, commit to knowledge of the current internal code; if he did, it would create interdependencies and make separately compiled libraries impossible.

Hiding data (and code). With object oriented styles (OO), we gained two new manifestations of this separation. First, we got “more encapsulation to hide both code and data.” The caller knows the class name, bases, and member function signatures only—the class’s internal data and internal code are hidden and not accessible programmatically, even though the private class members are lexically visible in the class definition and inline function bodies may also be visible. (In turn, dynamic libraries and the potential future-C++ modules work aim to accomplish the same thing at a still larger scale.) Again we try hard not to inadvertently leak implementation details, and again the point is that the caller does not, and should not, commit to knowledge of the current internal data or code, which would make the class difficult to ever change or to ship on its own as a library.

Hiding type (run-time polymorphism). Second, OO also gave us “separation of interfaces to hide type.” A base class or interface can delegate work to a concrete derived implementation via virtual functions. Now the interface the caller sees and the implementation are actually different types, and the caller knows the base type only—he doesn’t know or care about the concrete type, including even its size. The point, once again, is that the caller does not, and should not, commit to a single concrete type, which would make the caller’s code less general and less able to be reused with new types.

Hiding type (compile-time polymorphism). With templates, we gained a new compile-time form of this separation—and it’s still “separation of interfaces to hide type.” The caller knows an ad-hoc “duck typed” set of operations he wants to perform using a type, and any type that supports those operations will do just fine. The contemplated future C++ concepts feature will allow making this stricter and less ad-hoc, but still avoids committing to a concrete type at all. The whole point is still is that the caller does not, and should not, commit to a single concrete type, which would make the caller’s code less generic and less able to be reused with new types.

3. What are some popular concerns about using auto to declare variables? Are they valid? Discuss.

In many languages, not just C++, there are several reasons people commonly give for why they are reluctant to use auto to declare variables (or the equivalent in another language, such as var or let). We could summarize them as: laziness, commitment, and readability. Let’s take them in order.

Laziness and commitment

First, laziness: One common concern is that “writing auto to declare a variable is primarily about saving typing.” However, this is just a misunderstanding of auto. As we saw in GotW #92 and #93 and will see again below, the main reasons to declare variables using auto are for correctness, performance, maintainability, and robustness—and, yes, convenience, but that’s in last place on the list.

Guideline: Remember that preferring auto variables is motivated primarily by correctness, performance, maintainability, and robustness—and only lastly about typing convenience.

Second, commitment: “But in some cases I do want to commit to a specific type, not automatically deduce it, so I can’t use auto.” It’s true that sometimes you do want to commit to a specific type, but you can still use auto. As demonstrated in GotW #92 and #93, not only can you still write declarations of the form auto x = type{ init }; (instead of type x{init};) to commit to a specific type, but there are good reasons for doing so, such as that saying auto means you can’t possibly forget to initialize the variable.

Guideline: Consider declaring local variables auto x = type{ expr }; when you do want to explicitly commit to a type. It is self-documenting to show that the code is explicitly requesting a conversion, it guarantees the variable will be initialized, and it won’t allow an accidental implicit narrowing conversion. Only when you do want explicit narrowing, use ( ) instead of { }.

(Un)readability?

The third and most common argument concerns readability: “My code gets unreadable quickly when I don’t know what exact type my variable is without hunting around to see what that function or expression returns, so I can’t just use auto all the time.” There is truth to this, including losing the ability to search for occurrences of specific types when using the non-typed syntax auto x = expr; in 4(a) below, so this appears at first to be a strong argument. And it’s true that any feature can be overused. However, I think this argument is actually weaker than it first seems for four reasons, two minor and two major.

The two minor counterarguments are:

  • The “can’t use auto” part isn’t actually true, because as we just saw above you can be explicit about your type and still use auto, with good benefit.
  • The argument doesn’t apply when you’re using an IDE, because you can always tell the exact type, for example by hovering over the variable. Granted, this mitigation goes away when you leave the IDE, such as if you print the code.

But we should focus on the two major counterarguments:

  • It reflects a bias to code against implementations, not interfaces. Overcommitting to explicit types makes code less generic and more interdependent, and therefore more brittle and limited. It runs counter to the excellent reasons to “write code against interfaces, not implementations” we saw in Question 2.
  • We (meaning you) already ignore actual types all the time…

“… Wait, what? I do not ignore types all the time,” someone might say. Actually, not only do you do it, but you’re so comfortable and cavalier about it that you may not even realize you’re doing it. Let’s go back to that code in Question 1:

template<class Container, class Value>
void append_unique( Container& c, const Value& v ) {
    if( find(begin(c), end(c), v) == end(c) )
        c.emplace_back(v); 
    assert( !c.empty() );
}

Quick quiz: How many specific types are mentioned in that function? Name as many as you can.

Take a moment to consider that before reading on…

… We can see pretty quickly that the answer is a nice round number: Zero. Zilch. (Pedantic mode: Yes, there’s void, but I’m going to declare that void doesn’t count because it’s to denote “no type,” it’s not a meaningful type.)

Not a single specific type appears anywhere in this code, and the lack of exact types makes it much more powerful and doesn’t significantly harm its readability. Like most people, you probably thought Question 1 felt “easy” when we did it in isolation. Granted, this is generic code, and not all your code will be templates—but the point is that the code isn’t unreadable even though it doesn’t mention specific types, and in fact auto gives you the ability to write generic code even when not writing a template.

So starting with the cases illustrated in this short example, let’s consider some places where we routinely ignore exact types. First, function template parameters:

  • What exact type is Container? We have no idea, and that’s great… anything we can call begin, end, emplace_back and empty on and otherwise use as needed by this code will do just fine. In fact, we’re glad we don’t know anything about the exact type, because it means we’re following the Open/Closed Principle and staying open for extension— this append_unique will work fine with a type that won’t be written until years from now. Interestingly, the concepts feature currently being proposed for ISO C++ to express template parameter constraints doesn’t change how this works at all, it only makes it more convenient to express and check the requirements. Note how much more powerful this is compared to OO style frameworks: In OO frameworks where containers have to inherit from a base class or interface, that’s already inducing coupling and limiting the ability to just plug in and use arbitrary suitable types. It is important that we can know nothing at all about the type here besides its necessary interface, not even restricting it by as much as limiting it to types in a particular inheritance hierarchy. We should strongly resist compromising this wonderful and powerful “strictly typed but loosely coupled” genericity.
  • What exact type is Value? Again, we don’t know, and we don’t want to know… anything we can pass to find and emplace_back is just dandy. At this point some of you may be thinking: “Oh yes we know what type it is, it’s the container’s value type!” No, it doesn’t have to be that, it just has to be convertible, and that’s important. For example, we want vector<string> vec; append_unique(vec, “xyzzy”); to work, and “xyzzy” is a const char[6], not a string.

Second, function return values:

  • What type does find return? Some iterator type, the same as begin(c) coughed up, but we don’t know specifically what type it is just from reading this code, and it doesn’t matter. We can look up the signature if we’re feeling really curious, but nobody bothers doing that because anything that’s comparable to end(c) will do.
  • What type does empty return? We don’t even think twice about it. Something testable like a bool… we don’t care much what exactly as long as we can “not” it.

Third, many function parameters:

  • What specific type does emplace_back take? Don’t know; might be the same as v, might not. Really don’t care. Can we pass v to it? Yes? Groovy.

And that’s just in this example. We routinely and desirably ignore types in many other places, such as:

  • Fourth, any temporary object: We never get to name the object, much less name its type, and we may know what the type is but we don’t care about actually spelling out either name in our code.
  • Fifth, any use of a base class: We don’t know the dynamic concrete type we’re actually using, and that’s a benefit, not a bug.
  • Sixth, any call to a virtual function: Ditto; plus on top of that if the virtual function return type itself could also be covariant for another layer of “we don’t know the dynamic concrete type” since in the presence of covariance we don’t know what type we’re actually getting back.
  • Seventh, any use of function<>, bind, or other type erasure: Just think about how little we actually know, and how happy it makes us. For example, given a function<int(string)>, not only don’t we know what specific function or object it’s bound to, we don’t even know that thing’s signature—it might not actually even take a string or return an int, because conversions are allowed in both directions, so it only has to take something a string can be converted to, and return something that can be converted to an int. All we know is that it’s something that we can invoke with a string and that gives us back something we can use as an int. Ignorance is bliss.
  • Eighth, Any use of a C++14 generic lambda function: A generic lambda just means the function call operator is a template, after all, and like any function template it gets stamped out individually for whatever actual argument types you pass each time you use it.

There are probably more.

Although lack of commitment may be a bad thing in other areas of life, not committing to a specific type is often desirable by default in reusable code.

4. When declaring a new local variable x, what advantages are there to declaring it using auto and one of the two following syntaxes:

Let’s consider the base case first, which has by far the strongest arguments in its favor and is gaining quite a bit of traction in the C++ community.

(a) auto x = init; when you don’t need to commit to a specific type?

GotW #93 offered many concrete examples to support habitually declaring local variables using auto x = expr; when you don’t need to explicitly commit to a type. The advantages include:

  • It guarantees the variable will be initialized. Uninitialized variables are impossible because once you start by saying auto the = is required and cannot be forgotten.
  • It is efficient by default and guarantees that no implicit conversions (including narrowing conversions), temporary objects, or wrapper indirections will occur. In particular, prefer using auto instead of function<> to name lambdas unless you need the type erasure and indirection.
  • It guarantees that you will use the correct exact type now.
  • It guarantees that you will continue to use the correct exact type under maintenance as the code changes, and the variable’s type automatically tracks other functions’ and expressions’ types unless you explicitly said otherwise.
  • It is the simplest way to portably spell the implementation-specific type of arithmetic operations on built-in types, which vary by platform, and ensure that you cannot accidentally get lossy narrowing conversions when storing the result.
  • It is the only good option for hard-to-spell and impossible-to-spell types such as lambdas, binders, detail:: helpers, and template helpers (including expression templates when they should stay unevaluated for performance), short of resorting to repetitive decltype expressions or more-expensive indirections like function<>.
  • It is more symmetric and consistent with other parts of modern C++ (see Question 5).
  • And yes, it is just generally simpler and less typing.

See GotW #93 for concrete examples of these cases, where using auto helps eliminate correctness bugs, performance bugs, and silently nonportable code.

As noted in the questions, the expression init might include calling a helper that performs partial type adjustment, such as as_signed, while still not committing to a specific type. As shown in GotW #93, prefer to use auto x = as_signed(integer_expr); or auto x = as_unsigned(integer_expr); to store the result of an integer computation that should be signed or unsigned—these should be viewed as “casts that preserve width,” so we are not casting to a specific type but rather casting an attribute of the type while correctly preserving the other basic characteristics of the type, notably by not forcing it to commit to a particular size.

Using auto together with as_signed or as_unsigned makes code more portable: the variable will both be large enough (thanks to auto) and preserve the required signedness on all platforms. Note that signed/unsigned conversions within integer_expr may still occur and so you may need additional finer-grained as_signed/as_unsigned casts within the expression for full portability.

(b) auto x = type{ init }; when you do want to commit to a specific type by naming a type?

This is the explicitly typed form, and it still has advantages but they are not as clearly strong as implicitly typed form. The jury is still out on whether to recommend this one wholesale, as we’re still trying it out, but it does offer some advantages and I suggest you try it out for a while and see if it works well for you.

So here’s the recommendation to consider trying out for yourself: Consider declaring local variables auto x = type{ expr }; when you do want to explicitly commit to a type. (Only when you do want to allow explicit narrowing, use ( ) instead of { }.) The advantages of this typed auto declaration style include:

  • It guarantees the variable will be initialized; you can’t forget.
  • It is self-documenting to show that the code is explicitly requesting a conversion.
  • It won’t allow an accidental implicit narrowing conversion.
  • It is more symmetric and consistent, both with the basic auto x = init; form and with other parts of C++…

… which brings us to Question 5.

5. Explain how using the style suggested in #4 is consistent with, or actively leverages, the following other C++ features:

Let’s start off this question with some side-by-side examples that give us a taste of the symmetry we gain when we habitually declare variables using modern auto style. Starting with two examples where we don’t need to commit to a type and then two where we do, we see that the right-hand style is not only more robust and maintainable for the reasons already given (for example, can you spot a subtle difference in the type of s, where the auto style is more correct?), but also arguably cleaner and more regular with the type consistently on the right when it is mentioned:

// Classic C++ declaration order     // Modern C++ style

const char* s = "Hello";             auto s = "Hello";
widget w = get_widget();             auto w = get_widget();

employee e{ empid };                 auto e = employee{ empid };
widget w{ 12, 34 };                  auto w = widget{ 12, 34 };

Now consider the (dare we say elegant) symmetry with each of the following.

(a) Heap allocation syntax.

When allocating heap variables, did you notice that the type name is already on the right naturally anyway? And since it’s there, we don’t want to have to repeat it. (I’ll show the raw “new” form for completeness, but prefer make_unique and make_shared in that order for allocation in modern code, resorting to raw new only well-encapsulated inside the implementation of low-level data structures.)

// Classic C++ declaration order     // Modern C++ style

widget* w = new widget{};            /* auto w = new widget{}; */
unique_ptr<widget> w                 auto w = make_unique<widget>();
  = make_unique<widget>();

(b) Literal suffixes, including user-defined literal operators.

Using auto declaration style doesn’t merely work naturally with built-in literal suffixes like ul for unsigned long, plus user-defined literals including standard ones now in draft C++14, but it actively encourages using them:

// Classic C++ declaration order     // Modern C++ style

int x = 42;                          auto x = 42;
float x = 42.;                       auto x = 42.f;
unsigned long x = 42;                auto x = 42ul;
std::string x = "42";                auto x = "42"s;   // C++14
chrono::nanoseconds x{ 42 };         auto x = 42ns;    // C++14

Based on the examples so far, which do you think is more regular? But wait, there’s more…

(c) Named lambda syntax.
(d) Function declarations.

Lambdas have unutterable types, and auto is the best way to capture them exactly and efficiently. But because their declarations are now so similar, let’s consider lambdas and (other) functions together, and in the last two lines of this example also use C++14 return type deduction:

// Classic C++ declaration order     // Modern C++ style

int f( double );                     auto f (double) -> int;
…                                    auto f (double) { /*...*/ };
…                                    auto f = [=](double) { /*...*/ };

(e) Template alias declarations.

Modern C++ frees us from the tyranny of un-template-able typedef:

// Classic C++ workaround            // Modern C++ style

typedef set<string> dict;            using dict = set<string>;

template<class T> struct myvec {     template<class T>
  typedef vector<T,myalloc> type;    using myvec = vector<T,myalloc>;
};

An observation

Have you noticed that the C++ world is moving to a left-to-right declaration style everywhere, of the form

category name = type and/or initializer ;

where “category” can be auto or using?

Take a moment to re-skim the two columns of examples above. Even ignoring correctness and performance advantages, do you find the right-hand column to be most consistent, and most readable?

6. Are there any cases where it is not possible to use the style in #4 to declare all local variables?

There is one case I know of where this style cannot be followed, and it applies to the type-specific auto x = type{ init }; form. In that form, type has to be moveable (even though the move operation will be routinely elided by compilers), so these won’t work:

auto lock = lock_guard<mutex>{ m };  // error, not moveable
auto ai   = atomic<int>{};           // error, not moveable

(Aside: For at least some of these cases, an argument could be made that this is actually more of a defect in the type itself, in particular that perhaps atomic<int> should be moveable.)

Having said that, there are three other cases I know of that you might encounter that may at first look like they don’t work with this auto style, but actually do. Let’s consider those for completeness.

First, the basic form auto x = init; will exactly capture an initializer_list or a proxy type, such as an expression template. This is a feature, not a bug, because you have a convenient way to spell both “capture the list or proxy” and “resolve the computation” depending which you mean, and the default syntax goes to the more efficient one: If you want to efficiently capture the list or proxy, use the basic form which gives you performance by default, and if you mean to force the proxy to resolve the computation, specify the explicit type to ask for the conversion you want. For example:

auto i1 = { 1 };                       // initializer_list<int>
auto i2 = 1;                           // int

auto a = matrix{...}, b = matrix{...}; // some type that does lazy eval
auto ab = a * b;                       // to capture the lazy-eval proxy
auto c = matrix{ a * b };              // to force computation

Second, here is a rare case that you may discover now that we have auto: Due to the mechanics of the C++ grammar, you can’t legally write a multi-word type like long long or class widget in the place where type goes in the auto x = type{ init }; form. However, note that this affects only those two cases:

  • The multi-word built-in types like long long, where you’re better off anyway writing a known-width type alias or using a literal.
  • Elaborated type specifiers like class widget, where the “class” part is already redundant. The “class widget” syntax is allowed as a compatibility holdover from C which liked seeing struct widget everywhere unless you typedef‘d the struct part away.

So just avoid the multi-word form and use the better alternative instead:

auto x = long long{ 42 };            // error
auto x = int64_t{ 42 };              // ok, better 
auto x = 42LL;                       // ok, better 

auto y = class X{1,2,3};             // error
auto y = X{1,2,3};                   // ok

Summary

We already ignore explicit and exact types much of the time, including with temporary objects, virtual functions, templates, and more. This is a feature, not a bug, because it makes our code less tightly coupled, and more generic, flexible, reusable, and future-proof.

Declaring variables using auto, whether or not we want to commit to a type, offers advantages for correctness, performance, maintainability, and robustness, as well as typing convenience. Furthermore, it is an example of how the C++ world is moving to a left-to-right declaration style everywhere, of the form

category name = type and/or initializer ;

where “category” can be auto or using, and we can get not only correctness and performance but also consistency benefits by using the style to consistently declare local variables (including using literals and user-defined literals), function declarations, named lambdas, aliases, template aliases, and more.

Acknowledgments

Thanks in particular to Scott Meyers and Andrei Alexandrescu for their time and insights in reviewing and discussing drafts of this material. Both helped generate candidate names for this idiom; it was Alexandrescu who suggested the name “AAA (almost always auto)” which I merged with the best names I’d thought of to that point (“auto style” or “auto (+type) style”) to get “AAA Style (almost always auto).” Thanks also to the following for their feedback to improve this article: Adrian, avjewe, mttpd, ned, zadecn, noniussenior, Marcel Wid, J Guy Davidson, Mark Garcia, Jonathan Wakely.

Read Full Post »

Toward correct-by-default, efficient-by-default, and pitfall-free-by-default variable declarations, using “AAA style”… where “triple-A” is both a mnemonic and an evaluation of its value.

 

Problem

JG Questions

1. What does this code do? What would be a good name for some_function?

template<class Container, class Value>
void some_function( Container& c, const Value& v ) {
if( find(begin(c), end(c), v) == end(c) )
c.emplace_back(v);
assert( !c.empty() );
}

2. What does “write code against interfaces, not implementations” mean, and why is it generally beneficial?

Guru Questions

3. What are some popular concerns about using auto to declare variables? Are they valid? Discuss.

4. When declaring a new local variable x, what advantages are there to declaring it using auto and one of the two following syntaxes:

(a) auto x = init; when you don’t need to commit to a specific type? (Note: The expression init might include calling a helper that performs partial type adjustment, such as as_signed, while still not committing to a specific type.)

(b) auto x = type{ init }; when you do want to commit to a specific type by naming a type?

List as many as you can. (Hint: Look back to GotW #93.)

5. Explain how using the style suggested in #4 is consistent with, or actively leverages, the following other C++ features:

(a) Heap allocation syntax.

(b) Literal suffixes, including user-defined literal operators.

(c) Named lambda syntax.

(d) Function declarations.

(e) Template alias declarations.

6. Are there any cases where it is not possible to use the style in #4 to declare all local variables?

Read Full Post »

Why prefer declaring variables using auto? Let us count some of the reasons why…

Problem

JG Question

1. In the following code, what actual or potential pitfalls exist in each labeled piece of code? Which of these pitfalls would using auto variable declarations fix, and why or why not?

// (a)
void traverser( const vector<int>& v ) {
    for( vector<int>::iterator i = begin(v); i != end(v); i += 2 )
        // ...
}

// (b)
vector<int> v1(5);
vector<int> v2 = 5;

// (c)
gadget get_gadget();
// ...
widget w = get_gadget();

// (d)
function<void(vector<int>)> get_size
    = [](const vector<int>& x) { return x.size(); };

Guru Question

2. Same question, subtler examples: In the following code, what actual or potential pitfalls exist in each labeled piece of code? Which of these pitfalls would using auto variable declarations fix, and why or why not?

// (a)
widget w;

// (b)
vector<string> v;
int size = v.size();

// (c) x and y are of some built-in integral type
int total = x + y;

// (d) x and y are of some built-in integral type
int diff = x - y;
if(diff < 0) { /*...*/ }

// (e)
int i = f(1,2,3) * 42.0;

Solution

As you worked through these cases, perhaps you noticed a pattern: The cases are mostly very different, but what they have in common is that they illustrate reason after reason motivating why (and how) to use auto to declare variables. Let’s dig in and see.

1. In the following code, what actual or potential pitfalls exist, which would using auto variable declarations fix, and why or why not?

(a) will not compile

// (a)
void traverser( const vector<int>& v ) {
    for( vector<int>::iterator i = begin(v); i != end(v); i += 2 )
        // ...
}

With (a), the most important pitfall is that the code doesn’t compile. Because v is const, you need a const_iterator. The old-school way to fix this is to write const_iterator:

vector<int>::const_iterator i = begin(v)     // ok + requires thinking

However, that requires thinking to remember, “ah, v is a reference to const, I better remember to write const_ in front of its iterator type… and take it off again if I ever change v to be a reference to non-const… and also change the “vector” part of i‘s type if v is some other container type…”

Not that thinking is a bad thing, mind you, but this is really just a tax on your time when the simplest and clearest thing to write is auto:

auto i = begin(v)                           // ok, best

Using auto is not only correct and clear and simpler, but it stays correct if we change the type of the parameter to be non-const or pass some other type of container, such as if we make traverser into a template in the future.

Guideline: Prefer to declare local variables using auto x = expr; when you don’t need to explicitly commit to a type. It is simpler, guarantees that you will use the correct type, and guarantees that the type stays correct under maintenance.

Although our focus is on the variable declaration, there’s another independent bug in the code: The += 2 increment can zoom you off the end of the container. When writing a strided loop, check your iterator increment against end on each increment (best to write it once as a checked_next(i,end) helper that does it for you), or use an indexed loop something like for( auto i = 0L; i < v.size(); i += 2 ) which is more natural to write correctly.

(b) and (c) rely on implicit conversions

// (b)
vector<int> v1(5);     // 1
vector<int> v2 = 5;    // 2

Line 1 performs an explicit conversion and so can call vector‘s explicit constructor that takes an initial size.

Line 2 doesn’t compile because its syntax won’t call an explicit constructor. As we saw in GotW #1, it really means “convert 5 to a temporary vector<int>, then move-construct v2 from that,” so line 2 only works for types where the conversion is not explicit.

Some people view the asymmetry between 1 and 2 as a pitfall, at least conceptually, for several reasons: First, the syntaxes are not quite the same and so learning when to use each can seem like finicky detail. Second, some people like line 2’s syntax better but have to switch to line 1 to get access to explicit constructors. Finally, with this syntax, it’s easy to forget the (5) or = 5 initializer, and then we’re into case 2(a), which we’ll get to in a moment.

If we use auto, we have a single syntax that is always obviously explicit:

auto v2 = vector<int>(5);

Next, case (c) is similar to (b):

// (c)
gadget get_gadget();
// ...
widget w = get_gadget();

This works, assuming that gadget is implicitly convertible to widget, but creates a temporary object. That’s a potential performance pitfall, as the creation of the temporary object is not at all obvious from reading the call site alone in a code review. If we can use a gadget just as well as a widget in this calling code and so don’t explicitly need to commit to the widget type, we could write the following which guarantees there is no implicit conversion because auto always deduces the basic type exactly:

// better, if you don't need an explicit type
auto w = get_gadget();

Guideline: Prefer to declare local variables using auto x = expr; when you don’t need to explicitly commit to a type. It is efficient by default and guarantees that no implicit conversions or temporary objects will occur.

By the way, if you’ve been wondering whether that “=” in auto x = expr; causes a temporary object plus a move or copy, wonder no longer: No, it constructs x directly. (See GotW #1.)

Now, what if we said widget here because we know about the conversion and really do want to deal with a widget? Then writing auto is still more self-documenting:

// better, if you do need to commit to an explicit type
auto w = widget{ get_gadget() };

Guideline: Consider declaring local variables auto x = type{ expr }; when you do want to explicitly commit to a type. It is self-documenting to show that the code is explicitly requesting a conversion.

Note that this last version technically requires a move operation, but compilers are explicitly allowed to elide that and construct w directly—and compilers routinely do that, so there is no performance penalty in practice.

(d) creates an indirection, and commits to a single type

// (d)
function<void(vector<int>)> get_size
    = [](const vector<int>& x) { return x.size(); };

Case (d) has two problems, and auto can help with both of them. (Bonus points if you noticed that a form of “auto” is actually already helping in a third way.)

First, the lambda object is converted to a function<>. That can be appropriate when passing or returning the lambda to a function, but it costs an indirection because function<> has to erase the actual type and create a wrapper around its target to hold it and invoke it. In this case, we appear to be using the lambda locally, and so the correct default way to capture it is using auto, which binds to the exact (compiler-generated and otherwise-unutterable-by-you) type of the lambda and so doesn’t incur an indirection:

// partly improved
auto get_size = [](const vector<int>& x) { return x.size(); };

Guideline: Prefer to use auto name = to name a lambda function object. Use std::function</*…*/> name = only when you need to rebind it to another target or pass it to another function that needs a std::function<>.

Second, the lambda commits to a specific argument type—it only works with vector<int>, and not with vector<double> or set<string> or anything else that is also able to report a .size(). The way to fix that is to write another auto:

// best
auto get_size = [](const auto& x) { return x.size(); };

// yes, you could use this "too cute" variation for slightly less typing
//              [](auto&& x) { return x.size(); };
// but you'll also get less const-enforcement and that isn't a good deal

This still creates just a single object, but with a templated function call operator so that it can be invoked with different types of arguments, and so will work with any type of container that supports calling .size()

Guideline: Prefer to use auto lambda parameter types. They are just as efficient as explicit parameter types, and allow you to call the same lambda with different argument types.

… and did you notice the “third auto” that was there all along? Even in the original example, we’ve been implicitly using automatic type deduction in a third place by allowing the lambda to deduce its return type, and so now with the fully generic “best” version of the code that return type will always be exactly whatever .size() returns for whatever kind of object we’re calling .size() on, which can be different for different argument types. All in all, that’s pretty nifty.

Guideline: Prefer to use implicit return type deduction for lambda functions.

2. Same question, subtler examples: In the following code, what actual or potential pitfalls exist, which would using auto variable declarations fix, and why or why not?

(a) might leave the variable uninitialized.

// (a)
widget w;

This creates an object of type widget. However, we can’t tell just looking at this line whether it’s initialized or contains garbage values. As noted in GotW #1, if widget is a built-in type or aggregate type, its members won’t get initialized. Uninitialized variables should be avoided by default, and only used deliberately in cases where you really want to start with an uninitialized memory region for performance reasons—notably when you have a large object, such as an array, that is expensive to zero-initialize and is immediately going to be overwritten anyway, such as if it’s being used as an “out” parameter.

Guideline: Always initialize variables, except only when you can prove garbage values are okay, typically because you will immediately overwrite the contents.

Would auto help here? Indeed it would:

auto w = widget{};    // guaranteed to be initialized

One of the key benefits of declaring a local variable using auto is that the “=” is required—there’s no way to declare the variable without setting an initial value. Further, this is explicit and clear just from reading the above variable declaration on its own during a code review, without having to go inquire in the type’s header about the exact details of the type and poll the neighborhood for character references who will swear it’s not now, and is even under maintenance never likely to become, an aggregate.

Guideline: Prefer to declare local variables using auto. It guarantees that you cannot accidentally leave the variable uninitialized.

(b) might perform a silent narrowing conversion.

// (b)
vector<string> v;
int size = v.size();

This will compile, run, and sometimes lose information because it uses an implicit narrowing conversion. Not the safest route to a happy weekend when the bug report from the field comes in on Friday night—normally from a large and important customer, because the bug will be exercised only with larger data sizes.

Here’s why: The return type of vector<string>::size() is vector<string>::size_type, but what’s that? It depends on your implementation, because the standard leaves it implementation-defined. But one thing I guarantee you is that “it ain’t no int“—for at least two reasons, which lead to at least two ways this can lose information by silent narrowing:

  • Sign:
    size_type is required to be an unsigned integer value, so this code is asking to convert it to a signed value. That’s bad enough even if sizeof(size_type) == sizeof(int) and it throws away the high bit—and with it the upper half of the representable values—to make room for the sign bit. It’s worse than that if sizeof(size_type) > sizeof(int), which brings us to the second problem, because that’s actually likely…
  • Size:
    size_type basically needs to be the same size as a pointer, since it may have to represent any offset in a vector<char> that is larger than half the machine’s address space. In 64-bit code, 64-bit pointers mean 64-bit size_types. However, if on the same system an int is still 32 bits for compatibility (and this is common), then size_type is bigger than int, and converting to int throws away not just the high-order bit, but over half of the bits and the vast majority of the representable values.

Of course, you won’t notice on small vectors as long as .size() < 2(CHAR_BITS*sizeof(int)-1). That doesn’t mean it’s not a bug; it just means it’s a latent bug.

Does auto help? Yes indeed:

auto size = v.size();    // exact type, guaranteed no narrowing

Guideline: Prefer to declare local variables using auto. It guarantees that you get the exact type and cannot accidentally get narrowing conversions.

(c), (d), and (e) have potential narrowing and signedness issues.

// (c) x and y are of some built-in integral type
int total = x + y;

In case (c), we might also have a narrowing conversion. The simplest way to see this is that if either x or y is larger than int, which is what we’re trying to store the result into, then we’ve definitely got a silent narrowing conversion here, with the same issues as already described in (b). And even if x and y are ints today, if under maintenance the type of one later changes to something like long or size_t, the code silently becomes lossy—and possibly only on some platforms, if it changes to long and that’s the same size as int on some platforms you target but larger than int on others.

Note that, even if you know the exact types of x and y, you will get different types for x+y on different platforms, particularly if one is signed and one is unsigned. If both x and y are signed, or both are unsigned, and one’s type has more bits than the other, that’s the type of the result. If one is signed and the other is unsigned then other rules kick in, and the size and signedness of the result can vary on different platforms depending on the relative actual sizes and the signedness of x and y on that platform. (This is one of the consequences of C and C++ not standardizing the sizes of the built-in types; for example, we know a long is guaranteed to be at least as big as an int, but we don’t know how many bits each is, and the answer varies by compiler and platform.)

Does auto help here? Almost always “yes,” but in one case “yes with a little help you really want to reach for anyway.”

By default, write for correctness, clarity, and portability first: To avoid lossy narrowing conversions, auto is your portability pal and you should use it by default. Writing auto is much better than writing it out by hand as std::common_type< decltype(x), decltype(y) >.

auto total = x + y;    // exact type, guaranteed no narrowing

Guideline: Prefer to declare local variables using auto. It guarantees that you get the exact type and so is the simplest way to portably spell the implementation-specific type of arithmetic operations on built-in types, which vary by platform, and ensure that you cannot accidentally get narrowing conversions when storing the result.

However, what if in rare cases this code may be in a tight loop where performance matters, and auto may select a wider type than you know you need to store all possible values? For example, in some cases performing arithmetic using uint64_t instead of uint32_t could be twice as slow. If you first prove that this actually matters using hard profiler data, and then further prove by performing other validation that you won’t (or won’t care if you do) encounter results that would lose value by narrowing, then go ahead and commit to an explicit type—but prefer to do it using the following style:

// rare cases: use auto + <cstdint> type
auto total = uint_fast64_t{ x+y };  // total is an unsigned 64-bit value
             // ^ see note [1]

// or use auto + size-preserving signed/unsigned helper [2]
auto total = as_unsigned( x+y );    // total is unsigned and size of x+y
  • Still use auto to naturally make this more self-documenting and make the code review easy, because auto syntax makes it explicit that you’re performing a conversion.
  • Use a portable sized type name from the standard <cstdint> header, because you almost certainly care about size and this makes the size portable.[1]

    Guideline: Prefer using the <cstdint> type aliases in code that cares about the size of your numeric variables. Avoid relying on what your current platform(s) happen to do.

    Guideline: Consider declaring local variables auto x = type{ expr }; when you do want to explicitly commit to a type. It is self-documenting to show that the code is explicitly requesting a conversion, and won’t allow an accidental implicit narrowing conversion. Only when you do want explicit narrowing, use ( ) instead of { }.

Case (d) is similar:

// (d) x and y are of some built-in integral type
int diff = x - y;
if(diff < 0) { /*...*/ }

This time, we’re doing a subtraction. No matter whether x and y are signed or not, putting the answer in a signed variable like this is the right thing to do—the result could be negative, after all.

However, we have two issues. The first, again, is that int may not be big enough to avoid truncating the result, so we might lose information if x – y produces something larger than an int. Using auto can help with that.

The second is that x – y might give a strange answer, which isn’t the programmer’s fault but is something you want to remember about arithmetic in C and C++. Consider this code:

unsigned long x    = 42;
signed short  y    = 43;
auto          diff = x - y;   // one actual result: 18446744073709551615
if(diff < 0) { /*...*/ }      // um, oops – branch won't be taken

“Wait, what?” you ask. On nearly all platforms, an unsigned long is bigger than a signed short, and because of the promotion rules the type of s – u, and therefore of result, will be… unsigned long. Which is, well, not very signed. So depending on the types of x and y, and depending on your actual platform, it may be that the branch won’t be taken, which clearly isn’t the same as the original code.

Guideline: Combine signed and unsigned arithmetic carefully.

Before you say, “then I always want signed!” remember that if you overflow then unsigned arithmetic wraps, which can be valid for your use, whereas signed arithmetic has undefined behavior, which is quite unlikely to be useful. Sometimes you really need signed, and sometimes you really need unsigned, even though often you won’t care.

From observing auto‘s effect in case (d), it might seem like auto has helped one problem… but was it at the expense of creating another?

Yes, on the one hand, auto did indeed help us: Using auto ensured we could write portable and correct code where the result wasn’t needlessly narrowed. If we didn’t care about signedness, which is often true, that’s quite sufficient.

On the other hand, using auto might not preserve signedness in a computation like x – y that’s supposed to return something with a sign, or it might not preserve unsignedness when that’s desirable. But this isn’t so much an issue with auto itself as that we have to be careful when combining signed and unsigned arithmetic, and by binding to an exact type auto is exposing this issue with some code that might potentially be already nonportable, or have corner cases the developer wasn’t aware of when he wrote it.

So what’s a good answer? Consider using auto together with the as_signed or as_unsigned conversion helper we saw before, which is used in lieu of a cast to a specific type; the helper is written out more fully in the endnotes. [2] Then we get the best of both worlds—we don’t commit to an explicit type, but we ensure the basic size and signedness in portable code that will work as intended on many different compilers and platforms.

Guideline: Prefer to use auto x = as_signed(integer_expr); or auto x = as_unsigned(integer_expr); to store the result of an integer computation that should be signed or unsigned. Using auto together with as_signed or as_unsigned makes code more portable: the variable will both be large enough and preserve the required signedness on all platforms. (Signed/unsigned conversions within integer_expr may still occur.)

Finally, case (e) brings floating point into the picture:

// (e)
int i = f(1,2,3) * 42.0;

Here we have our by-now-yawnworthy-typical narrowing—and an easy case because it isn’t even hiding, it’s saying int and 42.0 right there in the same breath, which is narrowing almost regardless of what type f returns.

Does auto help? Yes, in making our code self-documenting and more reviewable, as we noted before. If we follow the auto x = type{expr}; declaration style, we would be (happily) forced to write the conversion explicitly, and when we initially use { } we get an error that in fact it’s a narrowing conversion, which we acknowledge (again explicitly) by switching to ( ):

auto i = int( f(1,2,3) * 42.0 );

This code is now free of implicit conversions, including implicit narrowing conversions. If our team’s coding style says to use auto x = expr; or auto x = type{expr}; wherever possible, then in a code review just seeing the ( ) parens can immediately connote explicit narrowing; adding a comment doesn’t hurt either.

But for floating point calculations, can using auto by itself hurt? Consider this example, contributed by Andrei Alexandrescu:

float f1 = /*...*/, f2 = /*...*/;

auto   f3 = f1 + f2;   // correct, but on some compilers/platforms...
double f4 = f1 + f2;   // ... this might keep more bits of precision

As Alexandrescu notes: “Machines are free to do intermediate calculations in a larger precision than the target, and in many cases (and traditionally in C) calculations are done in double precision. So for f3 we have a sum done in double precision, which is then truncated down to float. For f4, the sum is preserved at full precision.”

Does this mean using auto creates a potential flaw here? Not really. In the language, the type of f1 + f2 is still float, and the naked auto maintains that exact type for us. However, if we do want to follow the pattern of switching to double early in a complex computation, we can and should say so:

float f1 = /*...*/, f2 = /*...*/;

auto f5 = double{f1} + f2;

Summary

We’ve seen a number of reasons to prefer to declare variables using auto, optionally with an explicit type if you do want to commit to a specific type.

If you’re observed a pattern in this GotW’s Guidelines, you’ll already have a sense of what’s coming in GotW #94… a Special Edition on, you guessed it, auto style.

Notes

[1] Another reason to prefer using the <cstdint> typedef names is because, due to a quirk in the C++ language grammar, only a single-word type is allowed where uint64_t appears in this example. That’s fine nearly always because it’s all you need for class types and all typedef and using alias names and most built-in types, but you can’t directly name arrays or the multi-word built-in types like unsigned int or long long in that position; for the latter, use the uintNN_t-style typedef names instead. The exact ones, such as uint64_t, are “optional” in the standard, but they are in the standard and expected to be widely implemented so I used them. The “least” and “fast” ones are required, so if you don’t have uint64_t you can use uint_least64_t or uint_fast64_t.

[2] The helpers preserve the size of the type while changing only the signedness. Thanks to Andrei Alexandrescu for this basic idea; any errors are mine, not his. The C++98 way is to provide a set of overloads for each type, but a modern version might look something like the following which uses the C++11 std::make_signed/make_unsigned facilities.

// C++11 version
//
template<class T>
typename make_signed<T>::type as_signed(T t)
    { return make_signed<T>::type(t); }

template<class T>
typename make_unsigned<T>::type as_unsigned(T t)
    { return make_unsigned<T>::type(t); }

Note that with C++14 this gets even sweeter, using auto return type deduction to eliminate typename and repetition, and the _t alias to replace ::type:

// C++14 version, option 1
//
template<class T> auto as_signed  (T t){ return make_signed_t  <T>(t); }
template<class T> auto as_unsigned(T t){ return make_unsigned_t<T>(t); }

or you can equivalently write these function templates as named lambdas:

// C++14 version, option 2
//
auto as_signed   =[](auto x){ return make_signed_t  <decltype(x)>(x); };
auto as_unsigned =[](auto x){ return make_unsigned_t<decltype(x)>(x); };

Sweet, isn’t it? Once you have a compiler that supports these features, pick whichever suits your fancy.

Acknowledgments

Thanks in particular to Scott Meyers and Andrei Alexandrescu for their time and insights in reviewing and discussing drafts of this material. Thanks also to the following for their feedback to improve this article: mttpd, Jim Park, Yuri Khan, Arne, rhalbersma, Tom, Martin Ba, John, Frederic Dumont, Sebastian.

Read Full Post »

Why prefer declaring variables using auto? Let us count some of the reasons why…

 

Problem

JG Question

1. In the following code, what actual or potential pitfalls exist in each labeled piece of code? Which of these pitfalls would using auto variable declarations fix, and why or why not?

// (a)
void traverser( const vector<int>& v ) {
for( vector<int>::iterator i = begin(v); i != end(v); i += 2 )
// ...
}

// (b)
vector<int> v1(5);
vector<int> v2 = 5;

// (c)
gadget get_gadget();
// ...
widget w = get_gadget();

// (d)
function<void(vector<int>)> get_size
= [](vector<int> x) { return x.size(); };

Guru Question

2. Same question, subtler examples: In the following code, what actual or potential pitfalls exist in each labeled piece of code? Which of these pitfalls would using auto variable declarations fix, and why or why not?

// (a)
widget w;

// (b)
vector<string> v;
int size = v.size();

// (c) x and y are of some built-in integral type
int total = x + y;

// (d) x and y are of some built-in integral type
int diff = x - y;
if(diff < 0) { /*...*/ }

// (e)
int i = f(1,2,3) * 42.0;

Read Full Post »

What does auto do on variable declarations, exactly? And how should we think about auto? In this GotW, we’ll start taking a look at C++’s oldest new feature.

 

Problem

JG Questions

1. What is the oldest C++11 feature? Explain.

2. What does auto mean when declaring a local variable?

Guru Questions

3. In the following code, what is the type of variables a through k, and why? Explain.

int         val = 0;
auto a = val;
auto& b = val;
const auto c = val;
const auto& d = val;

int& ir = val;
auto e = ir;

int* ip = &val;
auto f = ip;

const int ci = val;
auto g = ci;

const int& cir = val;
auto h = cir;

const int* cip = &val;
auto i = cip;

int* const ipc = &val;
auto j = ipc;

const int* const cipc = &val;
auto k = cipc;

4. In the following code, what type does auto deduce for variables a and b, and why? Explain.

int val = 0;

auto a { val };
auto b = { val };

 

Solution

1. What is the oldest C++11 feature? Explain.

auto x = something; to declare a new local variable whose type is deduced from something, and isn’t just always int.

Bjarne Stroustrup likes to point out that auto for deducing the type of local variables is the oldest feature added in the 2011 release of the C++ standard. He implemented it in C++ 28 years earlier, in 1983—which incidentally was the same year the language’s name was changed to C++ from C with Classes (the new name was unveiled publicly on January 1, 1984), and the same year Stroustrup added other fundamental features including const (later adopted by C), virtual functions, & references, and BCPL-style // comments.

Alas, Stroustrup was forced to remove auto because of compatibility concerns with C’s then-existing implicit int rule, which has since been abandoned in C. We’re glad auto is now back and here to stay.

2. What does auto mean when declaring a local variable?

It means to deduce the type from the expression used to initialize the new variable. In particular, auto local variables deduction is exactly the same as type deduction for parameters of function templates—by specification, the rule for auto variables says “do what function templates are required to do”—plus they can capture initializer_list as a type. For example:

template<class T> void f( T ) { }

int val = 0;

f( val ); // deduces T == int, calls f<int>( val )
auto x = val; // deduces T == int, x is of type int

When you’re new to auto, the key thing to remember is that you really are declaring your own new local variable. That is, “what’s on the left” is my new variable, and “what’s on the right” is just its initial value:

auto my_new_variable = its_initial_value;

You want your new variable to be just like some existing variable or expression over there, and be initialized from it, but that only means that you want the same basic type, not necessarily that other variable’s own personal secondary attributes such as top-level const- or volatile-ness and &/&& reference-ness which are per-variable. For example, just because he’s const doesn’t mean you’re const, and vice versa.

It’s kind of like being identical twins: Andy may be genetically just like his brother Bobby and is part of the same family, but he’s not the same person; he’s a distinct person and can make his own choice of clothes and/or jewelry, go to be seen on the scene in different parts of town, and so forth. So your new variable will be just like that other one and be part of the same type family, but it’s not the same variable; it’s a distinct variable with its own choice of whether it wants to be dressed with const, volatile, and/or a & or && reference, may be visible to different threads, and so forth.

Remembering this will let us easily answer the rest of our questions.

3. In the following code, what is the type of variables a through k, and why? Explain.

Quick reminder: auto means “take exactly the type on the right-hand side, but strip off top-level const/volatile and &/&&.” Armed with that, these are mostly pretty easy.

For simplicity, these examples use const and &. The rules for adding or removing const and volatile are the same, and the rules for adding or removing & and && are the same.

int         val = 0;
auto a = val;
auto& b = val;
const auto c = val;
const auto& d = val;

For a through d, the type is what you get from replacing auto with int: int, int&, const int, and const int&, respectively. The same ability to add const applies to volatile, and the same ability to add & applies to &&. (Note that && will be what Scott Meyers calls a universal reference, just as with templates, and does in some cases bring across the const-ness if it’s binding to something const.)

Now that we’ve exercised adding top-level const (or volatile) and & (or &&) on the left, let’s consider how they’re removed on the right. Note that the left hand side of a through d can be used in any combination with the right hand side of e through k.

int&        ir  = val;
auto e = ir;

The type of e is int. Because ir is a reference to val, which makes ir just another name for val, it’s exactly the same as if we had written auto e = val; here.

Remember, just because ir is a reference (another name for the existing variable val) doesn’t have any bearing on whether we want e to be a reference. If we wanted e to be a reference, we would have said auto& as we did in case b above, and it would have been a reference irrespective of whether ir happened to be a reference or not.

int*        ip  = &val; 
auto f = ip;

The type of f is int*.

const int   ci  = val;
auto g = ci;

The type of g is int.

Remember, just because ci is const (read-only) doesn’t have any bearing on whether we want g to be const. It’s a separate variable. If we wanted g to be const, we would have said const auto as we did in case c above, and it would have been const irrespective of whether ci happened to be const or not.

const int&  cir = val;
auto h = cir;

The type of h is int.

Again, remember we just drop top-level const and & to get the basic type. If we wanted h to be const and/or &, we could just add it as shown with b, c, and d above.

const int*  cip = &val;
auto i = cip;

The type of i is const int*.

Note that this isn’t a top-level const, so we don’t drop it. We pronounce cip‘s declaration right to left: The type of cip is “pointer to const int,” not “const pointer to int.” What’s const is not cip, but rather *cip, the int it’s pointing to.

int* const  ipc = &val;
auto j = ipc;

The type of j is int*. This const is a top-level const, and ipc‘s being const is immaterial to whether we want j to be const.

const int* const cipc = &val;
auto k = cipc;

The type of k is const int*.

4. In the following code, what type does auto deduce for variables a and b, and why? Explain.

As we noted in #2, the only place where an auto variable deduces anything different from a template parameter is that auto deduces an initializer_list. This brings us to the final cases:

int val = 0;

auto a { val };
auto b = { val };

The type of both a and b is std::initializer_list<int>.

That’s the only difference between auto variable deduction and template parameter deduction—by specification, because auto deduction is defined in the standard as “follow those rules over there in the templates clause, plus deduce initializer_list.”

If you’re familiar with templates and curious how auto deduction and template deduction map to each other, the table below lists the main cases and shows the equivalent syntax between the two features. For the left column, I’ll put the variable and the initialization on separate lines to emphasize how they correspond to the separated template parameter and call site on the right.

Not only are the cases equivalent in expressive power, but you might even feel that some of the auto versions feel even slicker to you than their template counterparts.

Summary

Having auto variables really brings a feature we already had (template deduction) to an even wider audience. But so far we’ve only seen what auto does. The even more interesting question is how to use it. Which brings us to our next GotW…

Acknowledgments

Thanks in particular to the following for their feedback to improve this article: davidphilliposter, Phil Barila, Ralph Tandetzky, Marcel Wild.

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 2,111 other followers