GotW #3: Using the Standard Library (or, Temporaries Revisited) (3/10)

Effective reuse is an important part of good software engineering. To demonstrate how much better off you can be by using standard library algorithms instead of handcrafting your own, let’s reconsider the previous question to demonstrate how many of the problems could have been avoided by simply reusing what’s already available in the standard library.

 

Problem

JG Question

1. What is the most widely used C++ library?

Guru Question

2. How many of the pitfalls in GotW #2 could have been avoided in the first place, if only the programmer had replaced the explicit iterator-based for loop with:

(a) a range-based for loop?

(b) a standard library algorithm call?

Demonstrate. (Note: As with GotW #2, don’t change the semantics of the function, even though they could be improved.)

To recap, here is the mostly-fixed function:

string find_addr( const list<employee>& emps, const string& name ) {
for( auto i = begin(emps); i != end(emps); ++i ) {
if( i->name() == name ) {
return i->addr;
}
}
return "";
}

35 thoughts on “GotW #3: Using the Standard Library (or, Temporaries Revisited) (3/10)

  1. Ouch. Sorry. “Let’s learn the lession from ambiguous expressions in “if” and “for” statements, and require “{}” in situations where there’s no benefit to allowing them to be elided.”

  2. I really hope that ‘expression lambdas’ don’t get added to a future version of C++. Consider:

    int i = 1, j =1;
    auto f = [](auto &i, auto &j) i++; j++;
    

    Unless we’re going to adopt a policy of “never let a lambda parameter name shadow a variable in scope”, then expression lambda syntax is just a trap for the unwary. Let’s learn the lesson from ambiguous expressions in

    if

    and

    for

    statements, and require

    {}

    in situations where there’s no benefit to allowing them to be elided.

    Adding (more) syntactical ambiguity to the language for the ‘benefit’ of typing two characters fewer is not a net win, in my opinion.

  3. @Olaf van der Spek: The other guys already told about find_if being self documenting: To discover that this code is doing a search, you only need to read the function name in the find_if case, but for range for, you need to check the loop body, and if the condition is more complex than this, you might need to take more than 0.1s to discover that it is a search. It’s also less error prone (no silently compiling mistakes e.g. if( name = emp.name() ) ).

    And if you are into unit testing, you would also like that find_if version has only 2 branches (I’m not really into testing STL implementation), while the range-for version has 4 (depending on how you define branches), so you need more test cases for 100% code coverage.

  4. @Bret Kuhns: C++14 does have generic lambdas, meaning “auto” type parameters that generate a templated function call operator. However, other parts of lambda syntax refinements, including potentially allowing expression lambdas like [](auto& e) ++e, , are still under active discussion.

  5. I seriously hope that Steve is just woefully uninformed, and we will in fact be receiving VC++ updates prior to VS.Next. That assertion is completely ridiculous (see VS2008 vs. VS2008 SP1 for a direct contradiction).

  6. I totally agree with Bret Kuhns that find_if is clearer than a raw loop. A raw loop is an implementation of the algorithm, not it’s name. We have abstractions to help us understand our code, let’s user them.

  7. http://blogs.msdn.com/b/bharry/archive/2013/05/08/some-thoughts-on-a-comment-about-vs-2012-3.aspx
    Steve Teixeira, MSFT 13 May 2013 12:17 PM #
    @Tom Kirby-Green, C++11 support remains a very high priority. However, we are targeting the next VS release for additional C++11 support (you’ve already seen the beginnings of this in the CTP). It is problematic for us to add new language features in VS Updates because doing so creates a situation whereas two VS2012 developers may not be able to share code with one another due to different levels of language support in different point-releases.

    —–

    How disappointing.. Sigh.

  8. @Olaf I said information density, not functionality. There are lots of things you can do to pack more functionality in less code, and the majority of those techniques are a bad idea because they reduce readability. Whereas STL algorithms give the reader more information in a tight space than imperative for/while loops.

  9. @bkuhns The for variant packs the same functionality into less code, so IMO that one wins the dense contest.
    However, the find_if variant is less lines.

  10. @PietroV:
    >I know, i call the employee’s constructor, but it’s more readable (I think).

    You don’t know whether employee::employee(std::string) or employee::operator==(…) exist, and even if they do, it is rather unlikely that two employees are only identified by their name and find() will most-likely return end().

  11. @Olaf What about if we’re skimming through code, trying to get a grasp on what it does. Let’s say my eyes catch this line:

    for (auto& e : emps) {

    All this tells me is that you’re looking through `emps`. If I instead see this line:

    auto it = find_if(emps, [&](auto& e) e.name == name );

    Then, well… the story’s over. I see exactly what it’s doing. Even if I don’t read the lambda, I know you’re looking for something in `emps` and whatever’s found, if any, is stored in `it`. I know this from reading only part of one line of code.

    The advantage of using STL algorithms is about information density, IMO.

  12. Well may be this sound strange for some peoples here but STL is not used all the time.
    In some projects I was working on STL usage was a bad sign and was not really permitted.
    Boost look like a great library but it is much too big and many times slow (compiler time and runtime).

    IMHO range based for loop is shorter and better as find_if.
    And less dependencies on STL :)

    string find_addr( const vector<employee>& emps, const  u16string& name ) {
        for(const auto& i : emps ) {    if( i.name() == name)   return i.addr;   }
        return "";
    }
    
  13. @Olaf
    range-based-for

    for (auto& e : emps) {
        if (name == e.name()) {
            return e.addr();
        }
    }
    return {};
    

    vs stl-range-find + generic lambda + N3560 section 2.2:

    auto it = find_if(emps, [&](auto& e) e.name == name );
    if (it != emps.end())
        return it->addr();
    return {};
    

    Not convinced :)

  14. @bilbothegravatar Also, the Range Study Group is targeting a TS after C++14. They fixed some wording for C++14 to remove some roadblocks for the TS, hopefully making it all the easier to adopt in C++17. Judging by your gravatar, I’ll assume you know to use boost.range for now ;-)

  15. @bilbothegravatar Herb mentioned the GotW’s are covering C++14 now. That lambda syntax is a new feature of C++14. It uses both generic/polymorphic lambdas (note the const auto& parameter), and the abbreviated syntax that makes the curly braces and return statement optional (the return becomes assumed).

  16. Comparing the boiler plate needed for std::find_if vs. the code needed to write this with a range-based-for loop IMHO clearly shows that we need range algorithms (not in C++14, right?) in the standard and (at least) auto lambdas (in C++14, right?).

    And, btw. [CODE][&](const auto& e) e.name() == name;[/CODE] … being able to get rid of the curly braces for one-line lambdas would also be nice. Is this actually in C++14 of a typo of @BretKuhns?

  17. (Don’t you hate this editor??? That’s what I call an arrogant interface :P – ) – Let’s try again…

    std::optional<std::string> find_addr( const list<employee>& emps, const std::string& name )
    {
        std::optional<std::string> result;
        find_if(begin(emps), end(emps), [&result, &name](const auto& employee) {
            if(employee.name() == name){
                result = employee.addr;
                return true;
             }
            else
               return false; } );
        return result;
    }
    

    usage:

    auto result = find_addr(employees, lookup_name);
    if(result)
    std::cout << "Employee's address found: " << *result << std::endl;
    else
    std::cout << "Sorry. Employee not found." << std::endl;
    
  18. Using C++14 this function should return a std::optional:

    [CODE]
    std::optional find_addr( const list& emps, const string& name )
    {
        std::optional result;
        find_if(begin(emps), end(emps), [&result, &name](const auto& employee) {
            if(employee.name() == name){
                result = employee.addr;
             return true;
    }
    else
    return false;
        } );
        return result;
    }
    [/CODE]

    usage:

    [CODE]
    auto result = find_addr(employees, lookup_name);
    if(result)
    std::cout << "Employee's address found: " << *result << std::endl;
    else
    std::cout << "Sorry. Employee not found." << std::endl;
    [/CODE]

  19. 1. What is the most widely used C++ library? The Standard Library, hopefully followed by Boost.

    2. a) See other answers, the right thing to do is to use find_if.
    2. b) Quoting from my answer to the previous question:

    Taking the parameters by const reference and using boost::find_if solves all the “sought” problems (it’s just a range-based wrapper to the STL call):

    auto find_addr(const list& emps, const string& name) {
    const auto pos = find_if(emps, [&](const auto& e){ return e.name == name; });
    return pos != end(emps) ? pos->addr : string();
    }

    In code review I would probably complain that with the find_addr function you get an empty string if:
    – the employee does not exist, _and_
    – the employee exists but its adress has not been set yet (e.g. addr was default constructed to an empty string).

    If the function is required I’d recommend using boost::optional for the return type. However I would _strongly_ recommend that the best way to improve this function is not to write it in the first place. Is just a one-liner call to find_if. And by using find_if directly the semantics of not found are crystal clear.

    @Gabriel Garcia used std::optional up there. I still think the best thing is not to write this function at all.

  20. For fun I’ll try something new :)

    string find_addr( const list<employee>& emps, const string& name ) 
    {
        string result;
        find_if(begin(emps), end(emps), [&result, &name](const auto& employee) 
        { 
            bool found = employee.name() == name; 
            if(found)
                result = employee.addr;
            return found;
        } );
        return result;
    }
    
  21. > What would these people say about the range-version std::find?

    I’m a part of those people (I’m Mikhail Belyaev with wordpress messing my name for every second comment) and I would say it rocks. I even have my own set of range-based algorithms in my current project (as well as a function that wraps any pair of iterators into a range-for-enabled structure). Too bad we don’t have ’em in the stl yet.
    On the other hand, some complex loopy operations will still be expressed with range-for much better.

  22. C++14 + Concepts Lite:

    template <Container Cont>
        requires std::is_same<typename Value_type<Cont>, employee>
    std::optional<string> find_addr(const Cont &cont, const string &name) {
        for (const auto &cur : cont) {
            if (cur.name() == name) {
                return cur.addr;
            }
        }
        return std::nullopt;
    }
    

    (I may be missing a thing or two, regarding Concepts Lite correctness)

  23. > What would these people say about the range-version std::find?

    I’m not part of those people, but I’d really love to have all algorithms range version, because it would both combine simplicity (whitch means mostly relyability) and full expressivity !

  24. I wonder why do people prefer range-for over std::find/find_if. Is it because the ‘redundant’ begin(), end() pair? What would these people say about the range-version std::find? It should be roughly

    auto it = find(emps, employee(name));
  25. 1) STL

    2a) range-for:

    string find_addr( list const& emps, string const& name ) {
        for( auto const& emp : emps )
            if( emp.name() == name )
                return emp.addr;
        return "";
    }
    

    2b) stl algo:
    I’ll try a different approach..

    string find_addr( list const& emps, string const& name ) {
        auto found = find(begin(emps), end(emps), employee(name) );
        if( found != end(emps) )
            return found->addr;
        return "";
    }
    

    I know, i call the employee’s constructor, but it’s more readable (I think).
    I still prefer the range-for version.

  26. I would just add a niggling note that since we use end(emps) multiple times, but it won’t change, it wouldn’t hurt to store that once & only once:

    string find_addr(const list& emps, const string& name )
    {
    auto notfound = end(emps);
    auto location = find_if(begin(emps), notfound, [&name](const auto& emp)
    {
    return emp.name() == name;
    });

    if (location != notfound)
    return location->addr;

    return “”;
    }

  27. 1. STL is the best.
    2. STL Algoritms + lambda are the best.

    I won’t give the range for based loop (uh, need an abbreviation for this :D) since is simple and already given by everyone here.

    I will rather try to point something about the lambda+algo one :

    string find_addr(const list<employee>& emps, const string& name ) 
    {
        auto empIt = find_if(begin(emps), end(emps), [&name](const auto& emp)
        {
            return emp.name() == name;
        });
    
        if(empIt != end(emps))
            return empIt ->addr;
    
        return "";
    }
    

    Since we write for C++14, we can use auto. This is cool because it simplifies a generic code. I personnaly always try to avoid repeating a type.

    In C++11 (because I can’t use C++14 in production code of course), I have to write this :

    string find_addr(const list<employee>& emps, const string& name ) 
    {
        auto empIt = find_if(begin(emps), end(emps), [&name](const decltype(emps)::value_type& emp)
        {
            return emp.name() == name;
        });
    
        if(empIt != end(emps))
            return empIt->addr;
    
        return "";
    }
    

    But even if I’m restricting to STL like containers in the C++11 version, it is highly preferable than repeating the type of what is stored in emps (perhaps const decltype(*begin(emps))& would be better ? I think it lacks of clarity but it would have the great advantage of working on any kind of container, STL like or not).

    With this, I am able to change the container type, or even the type in the container (as long as it has a name() method and an addr member attribute) from one and only one point (here the function declaration) without changing at all the meaning and the validity of the code.
    This increases code reusability.

    For the difference between RBFL and lambda+algo :
    * RBFL is simpler : no extra variables, no extra code. But it needs either a comment or a longer read time later.
    * lambda+algo is more expressive : the code itself says it wants to *find* the *emp* *whose name* is *name* and *return* *its addr* *if you found one* or *”” as the default value*. Could you even be more expressive on what this code does in a comment poorly paraphrasing the code ?

  28. Maybe this isn’t appropriate but using Linq in C++, it can be done in one line without much clutter:

     
    string find_addr(const list<employee>& emps, const string& name ) 
    {
        return LINQ(from(emp, emps) where(emp.name() == name)) | linq::first_or_default;
    }
    
  29. 1. If you take the question like “what library is used in most C++ projects”, it’s probably iostreams. If you take it like “what library’s functions are called most of all”, it’s STL.
    2. As I’ve already proposed range-for-based loop, here it is again:

    string find_addr(const list<employee>& emps, const string& name ) {
        for( const auto& emp : emps)
            if( emp.name() == name ) return emp.addr;
        return "";
    }
    

    for_each:

    string find_addr(const list<employee>& emps, const string& name ) {
        auto where = 
               std::find_if(cbegin(emps), cend(emps), [&name](const employee& emp) { return emp.name() == name; });
        return (where == cend(emps))? "" : where->addr;
    }
    

    What do we get from both?
    1. We do not need to memoize end(emps);
    2. No possible ++ issues (hey, nobody can guarantee he never wrote postfix ++ accidentally);
    3. We’ve got and employee& inside the body rather than some_fancy_iterator.

    What do we get from range-for?
    It soo much more clear what’s happening with less code. Not all algorithm combinations can be expressed with this loop though.

  30. string find_addr( const list<employee>& emps, const string& name ) {
        for( const auto& i : emps ) {
            if( i.name() == name ) {
                return i.addr;
            }
        }
        return "";
    }
    string find_addr( const list<employee>& emps, const string& name ) {
        auto iter = find_if(cbegin(emps), cend(emps), [&](const auto& e) e.name() == name );
        return iter != cend(emps) 
             ? iter->addr
             : "";
    }
  31. string find_addr( const list<employee>& emps, const string& name ) {
        for( auto& i : emps ) {
            if( i.name() == name )
                return i.addr;
        }
        return "";
    }
    
    string find_addr( const list<employee>& emps, const string& name ) {
    	auto i = find_if(begin(emps), end(emps), [&](const employee& a) { return a.name() == name; } );
    	return i == end(emps) ? "" : i->addr;
    }
    

    Using find_if is quite verbose, the for variant is simpler. I’d still return employee* though. :p

Comments are closed.