GotW #93: Auto Variables, Part 2

Why prefer declaring variables using auto? Let us count some of the reasons why…

 

Problem

JG Question

1. In the following code, what actual or potential pitfalls exist in each labeled piece of code? Which of these pitfalls would using auto variable declarations fix, and why or why not?

// (a)
void traverser( const vector<int>& v ) {
for( vector<int>::iterator i = begin(v); i != end(v); i += 2 )
// ...
}

// (b)
vector<int> v1(5);
vector<int> v2 = 5;

// (c)
gadget get_gadget();
// ...
widget w = get_gadget();

// (d)
function<void(vector<int>)> get_size
= [](vector<int> x) { return x.size(); };

Guru Question

2. Same question, subtler examples: In the following code, what actual or potential pitfalls exist in each labeled piece of code? Which of these pitfalls would using auto variable declarations fix, and why or why not?

// (a)
widget w;

// (b)
vector<string> v;
int size = v.size();

// (c) x and y are of some built-in integral type
int total = x + y;

// (d) x and y are of some built-in integral type
int diff = x - y;
if(diff < 0) { /*...*/ }

// (e)
int i = f(1,2,3) * 42.0;

12 thoughts on “GotW #93: Auto Variables, Part 2

  1. // Note: posting _once_ again (I *really* wish there was an “edit” option at this point), feel free to remove my previous comments (and this note), was hoping to avoid the hassle with URL formatting, obviously didn’t work; also fixed a few typos and a bug; sorry for any inconvenience this might have caused :-)

    1.(a) We can generalize the code from using `int` to other value_types stored in std::vector.

    // On a side note: I usually prefer to use `it` for iterators and `i` for indexes for easy disambiguation.

    However, since we’re relying on the RandomAccessIterator concept when we do `it += 2`, we can’t generalize beyond the types which model it (as long as we’re talking about the Standard), `std::vector`, `std::array`. and `std::deque` (an old C-style array would work, too).
    While we could replace `it += 2` by `std::advance(it, 2)`, this would have performance implications (no longer guaranteed O(1) on this op).

    Overall, it’s worth remembering Scott Meyers’ advice from “Effective STL”: “Item 2: Beware the illusion of container-independent code.”

    However, given that there are multiple types modeling RandomAccessContainer for which `traverse` makes sense, there’s a case for making our `traverse` a bit more generic:

    template <typename RandomAccessContainer>
    void traverser( const RandomAccessContainer & v ) {
        using std::begin;
        using std::end;
        for( auto it = begin(v); it != end(v); it += 2 )
    

    First, the error part (easily tested for 1-sized vector): since we’re traversing with non-unit stride, we have a potential out of bounds error here (stepping too far). First, we have to change the test condition (`it` relies on being `RandomAccessIterator` for this type of traversal anyway, so that’s fine; see the aforementioned “Item 2”):

    for( auto it = begin(v); it < end(v); it += 2 )

    That’s not the end of it — advancing an iterator past the end is actually undefined behavior.
    Credit goes to James McNellis for a solution in this case:
    http://stackoverflow.com/questions/9387569/non-unit-iterator-stride-with-non-random-access-iterators

    See also:
    http://stackoverflow.com/questions/1057724/what-if-i-increment-an-iterator-by-2-when-it-points-onto-the-last-element-of-a-v

    Now, let me explain the using declarations `using std::begin` and `using std::end`.
    Note, that while `begin(v)` will work for `std::` containers thanks to ADL, it will not work for C-style arrays.
    At the same time, simply replacing it with `std::begin` would be a bad idea, since it would prevent the user from applying `traverse` to user-defined containers (which obviously don’t live in `std::`).
    Hence, analogously to what we do with `using std::swap` and `swap` in the copy-and-swap idiom, we pull in the `std` namespace into consideration with the using declaration, and keep using unqualified `begin`.
    Analogous considerations apply to `end`.

    Given all these issues, I’d actually *strongly* consider using indexes here — note that we won’t lose any genericity, since we’re relying on having a `RandomAccessContainer` anyway — again, to repeat Scott Meyers’ advice from “Effective STL”: “Item 2: Beware the illusion of container-independent code.”

    Hence, after further consideration, I think I might possibly go against the grain here and suggest the following instead (together with including `cstddef`):

    for( std::size_t i = 0; i < 0; i += 2 )

    If one insists on using iterators here, I think we have to go Boost and use `boost::strided_range` / `boost::adaptors::strided`.
    Example: http://www.boost.org/doc/libs/release/libs/range/doc/html/range/reference/adaptors/reference/strided.html

    // using C++14 polymorphic lambda
    
    #include <boost/range/adaptor/strided.hpp>
    #include <boost/range/algorithm/for_each.hpp> 
    
    template <typename RandomAccessContainer>
    void traverser( const RandomAccessContainer & input ) {
        using namespace boost::adaptors;
        boost::for_each(
            input | strided(2),
            [](auto value) { /* std::cout << value; */ });
    }
    

    To avoid unreadeable compiler diagnostics if our input cointainer’s type doesn’t model `RandomAccessContainer` concept:
    – with concepts lite in C++14 we’d constrain on `RandomAccessContainer` (and/or `RandomAccessIterator`) explicitly, instead of just typing out a name.
    – in C++11 we could add `static_assert` relying on the `iterator_category` being a `std::random_access_iterator_tag` and providing a readable error message otherwise.

    1.(b)
    Simply use uniform initialization:

    vector<int> v1{5};

    See also “Intuitive interface — Part I” on “Andrzej’s C++ blog” :-)
    http://akrzemi1.wordpress.com/2013/06/05/intuitive-interface-part-i/

    1.(c)
    Here we see a case where the code using a hard-coded type has become more brittle — it’s not robust to changes in the return-type, which could lead to an unexpected object slicing.
    Hence:

    auto w { get_gadget() };

    On a side note, perhaps it’s also a good idea to keep the standard convention of naming the factories “make”:

    auto w { make_gadget() };

    1.(d)

    `std::function` implies type-erasure and performance drop due to run-time operations, which is probably not what we want.
    Lambda types are not specified, so to keep using these directly we _have_ to use auto:

    auto get_size = [](vector<int> x) { return x.size(); };

    It’s worth noting that beyond being unspecified, type names are also unique, so `decltype` wouldn’t work either.
    See “Auto – A Necessary Evil?” by Roger Orr in the ACCU’s Overload Journal #115, June 2013.
    [PDF] http://accu.org/var/uploads/journals/Overload115.pdf

    On a side note, note that (for similar reasons) we’d also prefer to use `auto` if our right-hand side object was a forwarding call wrapper returned by `std::bind` — so, it’s not just needed for lambdas :-)

    2.(a)

    Can’t really see an application for `auto` here, I think `auto w { widget() }` might be going somewhat overboard (it’s not even shorter to type, don’t see any clarity gain).
    // In either case, we’re relying on the `widget` modeling the `DefaultConstructible` concept.

    2.(b)

    Problem #1: using an `int` for a variable storing the size is a horrid crime against humanity! :D

    I would’ve used `std::size_t` from `cstddef`, but I can see a case where `auto` could also work here.
    Not to be too optimistic/complacent, however, there’s a downside to using `auto` in a similar context:

    // WRONG: using `int` for size is bad and you should feel bad
    for (int iWRONG = 0; iWRONG < v.size(); i += 2)
    

    now, let’s say we’d try to use `auto`:

    // STILL WRONG: using `int` for size is bad and you should feel bad
    for (auto iWRONG = 0; iWRONG < v.size(); i += 2)
    

    The comment still stands, since the type of literal `0` is `int`!
    Hence, here we’d have to use a specific type:

    for (std::size_t i = 0; i < v.size(); i += 2)
    

    2.(c)

    Hard-coding `int` ignores the integral promotion and the “usual arithmetic conversions” (UAC) rules.
    The proper result-type is given by the type trait `std::common_type`, with `decltype`-obtained types of `x` and `y`.
    Given that it’s probably too much typing and explaining ;], let’s just go with `auto` here:

    auto total = x + y;

    2.(d)

    This could get tricky, since the UAC could imply a promotion-to-unsigned, making the later predicate `diff < 0` trivially always-false. To make this even more fun, it's actually implementation defined whether, say, `long int` or `unsigned int` "wins" in the promotion battle :-)

    See Stephan T. Lavavej – Core C++, 7, Usual Arithmetic Conversions and Template Metaprogramming:
    http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n

    See also:
    http://kanooth.com/blog/2010/08/cpp-templates-usual-arithmetic-conversions.html

    This won't do either:

    if (x < y) { /*...*/ }

    Compliant Solution from the CERT's INT02-CPP (which also explains how the above code is not robust to the integral promotion and UAC rules):
    https://www.securecoding.cert.org/confluence/display/cplusplus/INT02-CPP.+Understand+integer+conversion+rules

    Given the C++11's type traits, we can modify the above solution so that the argument to be given to `static_cast` is obtained using `std::make_signed`:
    http://en.cppreference.com/w/cpp/types/make_signed

    2.(e)

    Well, here the intent is unclear, since the truncation from `double` (implied by the `42,0` literal) to `int` could possibly have been desired — in this case, however, an explicit `static_cast` would be in order (for clarity).

    If the truncation is undesired, we'd be better off with the following:

    auto i = f(1,2,3) * 42.0;
  2. Sigh… one more typo :-) This time, in the code in “1.(a)”, which instead of

    for( std::size_t i = 0; i < 0; i += 2 )

    should say

    for( std::size_t i = 0; i < v.size(); i += 2 )

    Giving how easy it is to make typos in this context, I’m becoming more and more inclined toward recommending the Boost.Range solution here :-)

    Let me use this opportunity to make one more note on the genericity — the Boost.Range solution in fact only requires a type that is a model of Single Pass Range. This corresponds to an iterator type being a model of Single Pass Iterator (which has even less requirements than a Forward Traversal iterator, let alone a Random Access Iterator).

    One could ask then, isn’t requiring a RandomAccessContainer an over-specification?
    Well, this is similar to considering whether we should replace `it += 2` with `std::advance(it, 2)`.
    After thinking about it, I’d say that there are no good *general* answers here, since it depends on the use case; the trade-offs to consider are as follows:
    – either: clear-cut complexity guarantees (`it += 2` is either O(1) or doesn’t compile if we keep the requirement),
    – or: increased level of genericity (dropping the requirement; the cost of `std::advance(it, 2)` depends on the iterator category, which depends on the container).

    If we go down the road of increased genericity, we’d have to use a `SinglePassRange` concept instead (or `SinglePassContainer` / `SinglePassIterator` where applicable) and carefully document the performance impact of having types modeling a different levels of refinement of this concept.

  3. Come to think of it, I’d also replace the lambda in 1.(a) with the following (still C++14):

    [](const auto & value) { /* std::cout << value; */ }

    or, if the C++14 polymorphic lambdas support universal references (which they hopefully do), simply:

    [](auto && value) { /* std::cout << value; */ }

    (for automatic const-correctness with less typing; and to handle a certain nasty proxy case ;]).

  4. Again, assuming C++14 with polymorphic lambdas supporting universal references, let’s change the code in 1.(d) to:

    auto get_size = [](auto && x) { return x.size(); };

    (for the reasons see my previous comment).

  5. 1. In

    // (c)
    gadget get_gadget();
    // ...
    widget w = get_gadget();
    

    changing the variable declaration to

    auto w = get_gadget();
    

    could be less safe. In the first case, we get a compile error unless we have an automatic conversion from gadget to widget. If we have one, there is no error, but presumably then w is a functioning widget. If we write w.Foo(), we know we’re using widget::Foo(). Either there’s a compiler error or there are no surprises.

    In the second case, the type of w is gadget, but we don’t realize that. We’re presumably assuming the type of w is widget. As long as we don’t call any function that doesn’t have a counterpart of the same name in gadget, we’re not going to get a compile error, and we’re going to be mistaken in what we’re calling. If we write w.Foo(), we’re calling gadget::Foo(), which may do something unexpected if we’re unlucky.

  6. David: it’s not necessarily less safe, since the following assumption may not hold (it is not guaranteed): “presumably then `w` is a functioning `widget`.”
    If the `gadget` & `widget` types are in the IS-A relationship, we’re facing the so-called slicing problem, see:
    http://stackoverflow.com/questions/274626/what-is-the-slicing-problem-in-c

    That being said, your “As long,,,” explanation also applies to the object slicing problem and is a nice description :-)

  7. 1(a) The code is incorrect, since we are trying to get a non-const iterator to a const container. By using auto we would have avoided this programming error; the expected (and safe) const_iterator type would have been chosen.

  8. That’s a good catch, afrantzis!

    After thinking about it, this makes me tend to prefer the index-based solution (with std::size_t) or even better Boost.Range one (it’s better and IMO is to be preferred since it’s less error-prone and more expressive, analogously to how using STL algorithms is less error-prone & more expressive than hand-coded loops in general) — since, in addition to auto, I think we should also use std::cbegin and std::cend (from C++14) instead of their non-const counterparts.

  9. Correction: no, std::cbegin or std::cend won’t be necessary here, since the argument’s type is already const and thus the right, const-correct overloads of std::begin and std::end shall be chosen.

  10. @mttpd: Thanks for your comment. I’ve learnt a lot by reading it ;-)
    To come back on David’s suggestion, instead of:
    auto w = get_gadget();
    I would have used:
    auto w = widget { get_gadget() };
    making then explicit that I want w to be a widget object. This is similar to what have been done in 2a.1 from https://herbsutter.com/2013/05/22/gotw-5-solution-overriding-virtual-functions/
    However, I don’t have a good example why it might be important to force w to be of type widget instead of gadget. As you said, if gadget IS-A widget, we are still safe in using the gadget object with anything that wants a widget. The only reason could be if there is no implicit conversion from gadget to widget but then the original code won’t compile.

  11. @mttpd:

    // STILL WRONG: using `int` for size is bad and you should feel bad
    for (auto iWRONG = 0; iWRONG < v.size(); i += 2)

    How about this:

     auto i = 0u;
  12. @Unsigned Int:

    (How odd name you got :) )

    Come on, it’s the textbook case of the new hammer. Not everything is a nail – you don’t have to use auto for everything. auto is for the case of storing the result of a function/expression, with the appropriate type – you don’t want to do that here, rather you want to define a cycle variable, of a given type (size_t), initialized to zero.

    Auto is for when you want a variable of a ‘meh, idontcare’ type. If you want a given type, declare with the given type.

Comments are closed.