GotW #1: Variable Initialization—or Is It? (3/10)

This first problem highlights the importance of understanding what you write. Here we have a few simple lines of code — most of which mean something different from all the others, even though the syntax varies only slightly.

 

Problem

JG Question

1. What is the difference, if any, among the following?

widget w;                   // (a)

widget w(); // (b)
widget w{}; // (c)

widget w(x); // (d)
widget w{x}; // (e)

widget w = x; // (f)
widget w = {x}; // (g)

auto w = x; // (h)
auto w = widget{x}; // (i)

Guru Questions

2. What do each of the following lines do?

vector<int> v1( 10, 20 );   // (a)

vector<int> v2{ 10, 20 }; // (b)

3. Besides the cases above, what other benefits are there to using { } to initialize objects?

4. When should you use ( ) vs. { } syntax to initialize objects? Why?

28 thoughts on “GotW #1: Variable Initialization—or Is It? (3/10)

  1. @bittermanandy Thanks for the clarification.

    Now that I typed your @handle to start a response, I can see you probably chose that name to match your sentiment? I get the impression that you are bitter that C++ isn’t a better “general purpose language”.

    It may not be what you expected it to be: it’s a language amenable to all tasks, but not necessarily eminent for all tasks.

    The point of C++11 is not that it should correct all the shortcomings of C++’s legacy. In my opinion, the point of C++11 is that IFF C++ is the right choice of tool for your particular job, you’ll be an order of magnitude more empowered and productive using it.

    I’m willing to say that roughly 50% (or probably, more) of the stuff that populates (More) Effective C++, STL etc. is no longer essential knowledge to survive: C++11 no longer needs to be the minefield C++ used to be. This is major news if you need to be using C++.

    Of course, if you don’t need to be using C++, the news is … “meh” (under the hood, it’s still the same minefield, right?!). But perhaps that says more about the audience than about the quality of the news.

    I’m certainly thinking C++11 is becoming a _more viable_ choice for my own future work. And the trends are certainly making the choice relevant.

  2. @bames53: This is in fact a very good observation. Constructors that just take some (random) parameters are typically confusing.

    vector<string> v(10); 
    

    What does it mean? Set capacity to 10? Set size to 10? Create a single-element vector? I can of course look it up in the docs, But why force me to do that if the meaning could be made obvious:

    vector<string> v(with_size, 10); // some tag
    vector<string> v(size := 10); // if we had named parameters
    

    Worse, what if I know (incorrectly) that v(10) reserves capacity, because this is what vector did in the other library I used? I wouldn’t be looking up something i already (think I) know.

    Especially for constructors (as opposed to normal functions) I believe we are missing the feature that would help us to express what particular arguments mean.

  3. My bad. In first example, always a default constructor is chosen. It actually should be

    Widget w{};

    and

    Widget w{{}};

    That is, there should be no difference between initializing with empty initializer_list and default initialization.

  4. My take on 4.
    In the presence of initializer-list constructor and default constructor an author of a class should care that

    Widget w;

    and

    Widget w{};

    are doing essentially the same thing. Moreover, IMHO if a class has initializer-list constructor then it should not have any other constructors except special ones (default, copy, move constructors). Clearly, STL already breaks those assumptions, so standard collections should be thought as exceptions to the rule.

    Also, if we have some Composite with initializer-list constructor then an author of a class should think about interaction of initializer-list constructor and copy/move constructor:

    CompositeWidget w;
    CompositeWidget w2{w};

    and

    CompositeWidget w;
    CompositeWidget w2(w);

    should do the same thing. Initializer-list constructor should not be provided if it’s not desirable for these two statements to be semantically the same.

    If these rules are followed then we can say that brace-initialization should be used in all cases (except when dealing with standard collections or with unknown types in template code that may happen to be standard collections).

  5. (a) widget w;
    If w is of static or thread_local storage duration, it is zero-initialized. Otherwise, it is default-initialized.

    (b) widget w();
    This is a function declaration.

    (c) widget w{};
    w is direct-initialized.

    (d) widget w(x);
    If x is a type, w is a function declaration. Otherwise, w is direct-initialized.

    (e) widget w{x};
    w is direct-list-initialized.

    (f) widget w = x;
    w is copy-initialized.

    (g) widget w = {x};
    w is copy-initialized.

    (h) auto w = x;
    w is copy-initialized.

    (i) auto w = widget{x};
    w is copy-initialized.

  6. Ok, let’s play:

    1. (a) and (c) create an object of type widget and assign it to variable w calling the default constructor. (b) declares a function of name w which returns a widget and takes no parameters. (d) and (e) are equivalent and create an object of type widget calling a constructor that can take an object of the same type as x as argument. (f) creates a widget by calling the overloaded assignment operator that can take x as argument OR casting x to a widget and assigning it to w through the copy constructor. (g) tries to initialize w by using a initialization list with x. (h) Creates a variable of type w with the same type as x calling the proper copy constructor or assignment operator. (i) Creates an object of type widget using x as parameter for the propper constructor and calls the move constructor to assign it to w.

    2. (a) creates a vector of 10 elements and initialize all its elements with the value 20. (b) Creates a vector with two elements, with value 10 and 20.

    3. {} offers more clear declaration of intent and is consistent with array initialization at declaration time.

    4. The curly brackets style should be used whenever possible in modern C++ code, falling back to () only if you intend to write code that should be backwards compatible with older compilers.

  7. sehe: yes, I realise the C++ standards committee is balancing a lot of different things; and I realise other languages are an option. The point is, people *are taking* the other option. In droves. The C++ standards committee might not have nice syntax and usability of the language high on their list of priorities. That’s fine. It just means people won’t use it (…even when compiler writers finally implement it). I can’t personally see the point in spending sixteen-plus year’s worth of time and effort on something that barely anybody is going to use, that’s all.

    Herb’s books and GotW are fascinating things, and a decade or so ago I used to read them avidly, excited by how much more I was learning about a language I thought I already knew. In that decade I’ve realised that knowing the difference between widget w() and widget w{} isn’t a badge of honour to wear proudly to prove one’s coding credentials; it’s a syntax disaster that means people are going to write buggy code… in part because the Standards Committee didn’t prioritize “prettyness” and “consistency”.

    I mean, clearly there’s nothing I can write here that will change it now. I just think it’s a real shame that the new version(s) of C++ has/have left it “the same as before only more so”, when it could have made it a modern, usable language instead. You are quite right though, there are other options, for which we can all be thankful. I suppose there’s not much point in me continuing to bang this particular drum so I’ll leave this discussion to actually answering the questions in the post… sorry for the digression.

  8. @bittermanandy welcome to c++. Similarity to any language other than C is not a driving design force. Rather, it is the other way around (referring to the C# case. ::global anyone?)

    Another key driving force is backwards code compatibility (using `var` over `auto` would have been nice. For a green-field language!).

    Also, the difference between a non-vexing parse and a most-vexing parse used to be 0 (zero) pixels, so actually, extending the lead to 2 (non-ugly) pixels is quite an improvement.

    Just a few quick shots. The C++ standards committee is balancing a **lot** more goals than just “prettyness” or even “consistency”. Feel free to use D, Boo, Go or C# if it pleases you more.

  9. > Consistency with other languages would be nice too. Why “auto” and not “var”, for example?
    I don’t see any reason to be consistent with “other languages” – especially when you’ll have a hard time being consistent with more than one at the same time. Using “var” would be consistent with C#. JavaScript also uses “var”, but modern JavaScript is moving away from that to “let”. Java doesn’t have type deduction and won’t get it, but the RFE that was rejected asked for “let”.
    “auto” has the big advantage that it was already a keyword, and so you don’t get the enormous code breakage that you would get from introducing something as common as “var” as a new keyword. (C# can afford this due to differences in general parsing strategy from C++ and stronger restrictions on where “var” can appear.)

  10. One thing that nobody has pointed out yet is the difference between 1e and 1g. Specifically, the former is direct-list-initialization, while the latter is copy-list-initialization. The difference being that only 1e will select explicit constructors. In other words, given a widget:

    struct widget {
    explicit widget(X x);
    };

    1e will compile, 1g will not. That’s the same as the difference between 1d and 1f.

  11. I started answering these fairly confidently, then I assumed that the difference between () and {} was broadly the same as in C#, and I realise I’m not actually sure that’s the case. With that in mind, I think the latter zero-initialises types without a default constructor of their own, otherwise acts like a constructor, unless it’s an initialiser list, in which case it’s like C#.

    What I am sure of is that this is a syntactic disaster area. You know what the *visual* difference between () and {} is, in a typical font in a typical IDE? Two pixels. If there’s any meaningful difference between them – if the difference between them could ever cause a bug – it’s going to be a nightmare to actually *notice* it when it happens.

    (To illustrate, just look at the two vector examples. One is a vector with 10 elements with all the same value, the other is a vector with 2 elements with different values… and yet you have to look at it twice to even realise there’s a difference between them. When reading the code in an actual codebase, you probably wouldn’t spot it).

    It’s great that C++ is adding some new stuff, and all; but the syntax and usability are dreadful, two years after a major release there aren’t any fully conforming compilers (and VS, the one I use by far the most often, is *way* off), and an update is already being planned, suggesting C++11 wasn’t even properly finished. I have to say that the Standards Committee are doing a fantastic job up of creating a theoretically better C++ up there in their ivory tower, but if nobody can use it (and of those that can, few do – witness the guy above who doesn’t even know what “auto” does, and he is *far* from alone) it’s no use to anybody.

    That’s not to say it shouldn’t be tried, of course. Seeking to improve C++ may well be a worthwhile endeavour (though taking sixteen years to do it just means everyone will have moved on). But when the “improvement” involves a two-pixel difference in syntax that totally changes how something behaves, that should be a massive red flag that something has gone badly wrong. Syntax matters, and the syntax of all the new stuff is universally fugly. (Consistency with other languages would be nice too. Why “auto” and not “var”, for example? And the syntax for lambdas – just why?) Too little, much too late, and klunky syntax. Sadly.

  12. Concerning the first question, besides what already applied in C++03, the main subtlety is brace or “uniform” initialization. What happens in all cases of w{….} depends on whether widget had an std::initializer_list constructor or is an aggregate. So a partial answer is “we can’t really know for sure what will happen without knowing more about widget”.

    Concerning the guru question, std::vector has an initializer_list constructor and that one trumps the others. So the first vector has size 10 and is full of 20s, the second one has size 2 and contains values 10 and 20.

  13. @Chris Chiesa: The *auto* keyword is part of the C++11 standard, whitch brings lots of new features, you should have look at it, it’s been in use since nov 2011. You can find more infos on http://isocpp.org/tour . Have a good C++11 update ;) !

  14. Interesting. When I click the “comment” link, I am taken to a snarky version of a 404 page, that asks “Where did you get such a link?” Got it from yer own danged email, is where, just this morning!

    The comment I was *going *to post would have been something like, “I’ve been a C++ programmer for years, and while I know I’ve never been ‘world class’ at it, I must *really *be out of the loop; I don’t recognize the *auto *keyword; what does that do?”

    Cheers, Chris Chiesa

  15. Having thought it through, my comment has at least two obvious mistakes.

    b is illegal altogether because of this kind of syntax being a function declaration returning widget. It makes me cry every time almost as much as having to place “typename” and “template” because of compiler warnings. It gives another benefit to using {}: if such a call is created by macro or vararg template expansion, it can be very troublesome to find out why this w() thingie does not work. A function declaration is of course legal C++, but among the other pieces of code it is an obvious typo.

    f and g of course use move-assignment (or copy assignment if move is not available) after construction and not move-construct. That’s just my typo. These will be almost always be optimized to using copy construction, but compilers are not obligated to do it.

  16. Widget x = y; // if Widget has a constructor that takes decltype( y ), the compiler will optimize to call that instead of op=()

  17. @MichalMocny

    I think this would have been very bad. ‘a’ should be the same as ‘c’, should be the same as ‘d’. And I don’t want to see ‘b’. The non-uniformity imposed by having {} prefer initializer_list constructors is far less than the non-uniformity you’re suggesting here:

    Although the rule could be worded simply “always use {}”, the actual usage would be more complicated; “use {} sometimes, use = {} sometimes, use {{}} sometimes,” etc. The exception when non-brace initialization is required is relatively rare compared to when you would require different kinds of brace initialization.

    Additionally, components designed with brace initialization in mind can and should avoid the problem by not having any constructors that might collide with list_initializer constructors. For example a `vector` designed today might have a third dummy parameter to disambiguate.

  18. 1) These are pretty easy. Let `X` be whatever type `x` is.

    a) default constructs `w` via default initialization. The contents of `widget` will be default-initialized if `widget` does not have a user-provided default constructor. Otherwise, they will be value-initialized.

    b) declares a function named `w` that takes no parameters and returns a `widget`.

    c) default constructs `w` via value initialization. The contents of `widget` will always be value-initialized.

    d) constructs `w` by calling a `widget` constructor that can takes a single `X` value either by value or by l-value reference.

    e) this depends. If there is an initializer-list constructor of `widget` who’s initializer_list is the type of `x`, then that constructor will be called with an initializer_list that is one element long. Otherwise, it will call a constructor that can takes a single `X` value either by value or by l-value reference.

    f) Implicit conversion is attempted on the value `x`. If `X` has a non-explicit `operator widget()` overload, then that will be used to convert `x` into `w`. Otherwise, if `widget` has a non-explicit constructor that can takes a single `X` by value or by l-value reference, then that constructor will be called. If any of these result in finding an explicit function, an error will result.

    g) If there is an initializer-list constructor of `widget` who’s initializer_list is the type of `x`, then that constructor will be called with an initializer_list that is one element long. Otherwise, it will attempt implicit conversion as in f) above.

    h) Constructs a `w`, of type `X`, via copy initialization from `x`. If no copy constructor is accessible (ie: has been deleted or otherwise cannot be called), a compiler error will result.

    i) Constructs a `w` of type `widget`, via copy initialization from a temporary. The temporary will be constructed in accord with e) above. Note that the actual copy of the temporary to `w` may be elided.

    2)

    a) Constructs a 20-element `vector`, where each element is initialized to 10.

    b) Constructs a 2-element `vector`. The first element is 20, the second is 10.

    3)

    This is kind of a difficult question. I say because because #2 is not a benefit of {} syntax at all. It’s a detriment, because you *cannot* use {} syntax to access the two-element constructor of `vector`. You could if you used `vector{20, 10}`; that would attempt to create `20` widgets, with the widgets being copy-constructed from a `widget` that was implicitly constructed from `10`.

    It is not a benefit that `vector{20, 10}` is *completely ambiguous* without knowing what `T` is. That `T=int` resolves to *completely* different code from `T=widget`. That’s not an advantage; that’s a detriment.

    The main benefit to using {}, to me, is to not have to use the typename of a variable type so often. For example:

    std::string GetLuaString(lua_State *L)
    {
    size_t len = 0;
    const char *str = lua_tolstring(L, -1, &len);
    return {str, len};
    }

    I don’t need to repeat `std::string` here. The compiler knows what I’m returning, so it figures it out.

    This also works for calling functions:

    void SomeFunc(std::string str);

    SomeFunc({“stuff”, 5});

    That’s not the most useful example, but you get the idea. It’s like a form of `auto` for parameters and return values. Though return values won’t be as important in C++14 since we can auto-deduce them and thus stick the type in the return statement. But it’ll still be important for those non-inline functions where we can’t auto-deduce them.

    4)

    Only if both of the following conditions are met:

    A) If you are ABSOLUTELY certain of the type you’re trying to initialize. That means you know exactly what it is and what constructors are available. This includes any template parameters of that type. So if you’re in some template code, and you have a `vector`, you can’t use {}.

    B) If the constructor you’re trying to call is reachable with {}. This ties into A; since you know the type, and you know the arguments, you can therefore know if the horrible {} ambiguity rules hide the constructor you want to call or not. If they don’t hide it, you can use {}. If they do, you can’t; you have to use ().

    Personally, I love uniform initialization, as a concept. But the ambiguity rules are so terrible that you can’t really trust it. If I were to give a professional opinion, I would say that, until the standards committee fixes them (and no formal proposals have been provided that do so. Though I have worked on a preliminary one: https://groups.google.com/a/isocpp.org/forum/#!searchin/std-proposals/More$20uniform$20initialization/std-proposals/hfDHD0GyfqI/xwTMNI276-kJ), there is only one time you should ever use it:

    To resolve the Most Vexing Parse. That is, case `1.b` above: `widget w()`. That’s the only time you can trust it to work, and even then, you can only use it if A and B are satisfied. Otherwise, you have to use extra parenthesis like we do now.

    I would love to be able to recommend you always use {}. Or even that you should use it whenever possible. But I cannot in good conscious do so. Not until the standard is fixed.

  19. Brace initialization binding tighter to initializer_list initialization is, I consider, an unfortunate language wart. (Please do correct me if I am mistaken.)

    Specifically, looking at Guru Question 2 example(b), that will create a vector with 2 int values 10,20. If you wanted to initialize 10 elements with the value 20, you *must* use regular bracket initialization (Question 2 example(a) notation).

    Much like leaving out make_unique meant we could not teach novices to not use “new” any more, this brace initialization decision means we cannot teach “always initialize with {}”. Instead we have to consider the implications of each initialization and peer inside class constructor details (e.g. I know this class has blah constructor, but hmm does it also have an initializer_list constructor? I’ll just stick to using brackets, oops vexing parse).

    What makes this seem more unfortunate, is that I think living with the reverse decision would have been totally fine:

    // Not true in real C++, alternative reality
    vector a{ 10, 20 }; // a.size() == 10
    vector b{{ 10, 20 }}; // b.size() == 2
    vector c = { 10, 20 }; // c.size() == 2
    vector d = vector{ 10, 20 }; // d.size() == 10

    What it would also mean, is that auto brace initialization would be less awkward:

    // Not true in real C++, alternative reality
    auto a{1}; // a is int
    auto b{{1}}; // b is initializer_list

    Thoughts? Am I wrong, is there some reason for which initializer_list being the default is a great choice?

  20. Just an on-the-spot non-pro opinion:

    1.

    * a,b,c — no real difference;
    * d, e — no real difference, but:
    * if widget is a struct (or a class with public fields), e can initialize its fields directly whilst d needs a constructor;
    * implicit-casting of x will not work in d;
    * if widget has an initializer-list-based constructor, it can be called with a one-element list in e, but not in d;
    * f, g — the same differences between ’em as for d and e, but they first create an object as in d and e, and after that move-construct it into a new object of class widget, the other difference is that f can result in an operator-based implicit cast rather than calling the single-arg constructor;
    * h — same as f (?), but cannot call operator-based cast as there is no cast happening anyhow, i — same as g (?);

    2. a creates a vector of type int, size 10, filled with 10 twenties; b creates a vector of type int, size 2, with elements 10 and 20.
    3. The main benefit is a uniform initialization manner for sequences, pairs, tuples and structures (it has its flaws, but still). This can help template programming a lot as it makes types more interchangeable. You can replace a tuple-based proof-of-concept code with a struct-based (giving it names) or class-constructor-based (giving it everything else) in no time if every piece of your initialization is using curly braces only.
    4. As far as I understand it, it is better to initialize objects with {} if you really mean “take this pile’o’data and put it there” and use () when you create more complex objects whose initialization is not about data, but about more complex things (like vector’s size-based constructor above). To put it simple, you should use {} to create std::pairs, but it’s not a good thing to use when initializing your SettingsManagerFactoryVisitor object with it’s 12 runtime polymorphic Strategy classes.

  21. 1.a) Creates a widget calling the default constructor
    1.b) A function receiving void and returning a Widget
    1.c) Creates a widget calling the default constructor
    1.d) Creates a widget calling a constructor receiving X
    1.e) Creates a widget calling a constructor with an initializer list parameter
    1.f) Creates a widget calling the copy-assignement constructor
    1.g) Creates a widget calling a constructor with an initializer list parameter
    1.h) Creates a variable of the same type of X, and copy X to it
    1.i) Creates a variable of type Widget and calls the move-assignment constructor with an initializer list parameter

    2.a) Creates a vector with 10 elements, all of value 20
    2.b) Creates a vector with 2 elements, 10 and 20

    3. It’s uniform, and avoid ambiguities.

    4. When you do not want to use an initializer list constructor. (Like the question 2.a, if you really want to create 10 elements of value 20 then it’s best to use “()” instead of writing ten “20” inside bracers)

  22. I’m not an expert but let’s see if my current knowledge match the standard:

    1)
    a) w is a default-initialized widget;
    b) w is a free function returning a widget;
    c) w is a default-initialized widget;
    d) w is a widget constructed using x, which have to be a compatible type with the constructor argument, with implicit cast allowed;
    e) w is a widget constructed using x, which have to be exactly the type with the constructor argument, with no implicit cast allowed OR if widget have an initializer-list constructor, it will be constructed using it.
    f) w is a widget constructed using x exactly like in d)
    g) this one I’m not sure, I guess it depends on the available widget constructors. It might be either
    1. w is a widget contructed using a constructor taking an initializer list as argument
    2. w is a widget contructed using a copy constructor using a temporary widget constructed using x as non-implicitely-cast constructor argument;
    3. same as 2. but the compiler is allowed to not construct the temporary, which would make it the same as d)
    h) w is of the same type as x and is constructed by copying x value;
    g) w is a widget constructed by constructor with argument taking exactly the type of x (with no implicit cast allowed);

    Damn I’m not totally confident in the g) case even if I use it sometime in my current projects.

    2)
    a) v1 is a vector of 10 int all initalized with the value 20;
    b) v2 is a vector of 2 elements: v2[0] == 10 && v2[0] == 20;

    This is because vector have an initializer-list constructor;

    3) Mainly to force type checking on constructor arguments as no implicit cast is allowed (between float and int for example) using this syntaxe.

    4) If the type don’t have an initializer-list constructor, then just use {}; otherwise, use {} until you want to call a specific constructor in which case use ();

    I can’t wait for the answers because the more I think about some cases, the more doubts I’m getting….

  23. Junior Guru
    1.
    a) initializes widget with the default widget ctor. If widget is a typedef to something like an int, then widget is unintialized.
    b) This is a function prototype, and is a closs relative of the “most vexing parse”.
    c) I think this is the same as a), except if widget is a primitive type. In that case, widget will be zero initialized.
    d) initializes widget with a one arg constructor taking x.
    e) this could be two different things, depending on widget’s definition. It could either be the same as d, or it could use widget’s initializer list constructor, if it has one. Even for the “same as d” case, you get slightly better behavior in the face of float -> int conversions.
    f) This is copy initialization. Widget(x) is invoked, and the temporary is moved or copied to the target ( w ). In release mode, this move / copy is likely elided.
    g) same as e
    h) copy initializes a w, with the same type as x
    i) copy initialization, and w is going to be of type x

    Guru
    2.
    a) Creates a vector of size 10, with each element set to 20
    b) Creates a vector of size 2, with element 0 set to 10 and element 1 set to 20
    3. {} helps in generic code. {} makes it easier to initialize containers. {} avoids narrowing conversions. {} ensures that everything gets initialized, even primitives.
    4. Use {} when you can, use () when you must?

  24. The main difference between (a, b, d, f, h) and (c, e, g, i) is that the former group is supported by Visual Studio and the latter is not.
    I know I’m trolling a little here, but there is no news since November about a more conforming Visual C++ and it’s been said to keep knocking until the door opens… :-)

Comments are closed.