Why no container-based algorithms?

A few minutes ago, a colleague on another team asked:

I really enjoyed your talk on Modern C++ from the Build conference, and have a quick question: Could there be a simpler syntax – something like:

foreach(collection, lambda_function) // or some other syntactic name for “foreach”

which would expand to

for_each(begin(collection), end(collection), lambda_function)

Same for find_if, etc.

This was considered and is desirable. In today’s C++ it’s easy to do for some algorithms, but not others. The main problem is overloading, which needs better template support (i.e., C++0x concepts which were proposed but didn’t make it) and/or adding enable_if.

Briefly, the basic problem is that you already have predicate overloads for some algorithms:

template<typename Iter>
void sort( Iter, Iter ); // 1

template<typename Iter, typename Pred>
void sort( Iter, Iter, Pred ); // 2

So far, so good. But what we want to add is:

template<typename Container>
void sort( Container& ); // 3

template<typename Container, typename Pred>
void sort( Container&, Pred ); // 4

And 4 is difficult to distinguish cleanly from 1 today. Both are match-anything function templates taking two parameters, and 1 would be preferred by the language when the argument types are identical, but in common cases you encounter unfortunate effects. For example, Howard Hinnant points out that if you try to call sort with (say) an iterator and a const_iterator, if we had only 1 you’d get a reasonably clear error message, but now because 4 is able to match different parameter types the compiler will instead try to invoke 4, which isn’t at all close to the original intent, and you’ll get a deeply strange template error message somewhere in the bowels of 4’s implementation because it wasn’t expecting anything like an iterator for either parameter.

Of course, not all algorithms have this issue where there are already overloads with different numbers of parameters. I’m hopeful that the standard library will get range-based overloads of all standard algorithms that are enable_if’d to avoid the problem or can use concepts if those make it into a future standard.

Here’s what the enable_if workaround might look like (I haven’t actually tried this though):

template<typename Iter>
typename enable_if< !CallableWithBeginAndEnd< Iter >::value, void >::type
sort( Iter, Iter ); // 1

template<typename Iter, typename Pred>
void sort( Iter, Iter, Pred ); // 2

template<typename Container>
void sort( Container& ); // 3

template<typename Container, typename Pred>
typename enable_if< CallableWithBeginAndEnd<Container>::value, void >::type
sort( Container&, Pred ); // 4

35 thoughts on “Why no container-based algorithms?

  1. @Gevorg: Duh. Fixed. That’s what I get for throwing in edits on the fly without compiling — gcc would have merrily reported “error: need ‘typename’ before ‘std::enableif< …" etc.

  2. Herb, now the ‘typename’ keywords aren’t where they should be.
    Instead of enable_if< typename CallableWithBeginAndEnd<Container>::value, void >::type
    it should’ve been typename enable_if< CallableWithBeginAndEnd<Container>::value, void >::type
    Similarly for disable_if. Well we don’t have disable_if in std so I guess it is assumed the user has written one himself. Alternatively, boost::disable_if_c, or better, boost::disable_if and omitting ::value part.

  3. Jon said: Why not change the name of the functions that take a collection? Eg sort_col()

    Herb said: You could, but then you lose the ability to overload which can be desirable.

    My question: why would you want to be able to overload in this particular instance? Sort with two iterators – you want to maybe sort only half of the collection. Sort a container – sort them all. Different intents, different names.

    A better function name would be sort_all().
    Then the meaning of that line of code becomes clearer with even less code to read. And if you accidentally pass it two iterators then it can tell you.

  4. Isn’t the new C++11 range-based for supposed to cover this use case? The only significant difference between this proposal and range-based for is that what would be the function block of the lambda statement in this proposal becomes the range-based for’s statement block.

  5. Compared to what it costs to sort a container, it’s not a lot you have to code. So it does hinder you to sort at every corner. code length correlates with costs. a good thing, isn’t it?

  6. Is iterators must go replacement(aka STL with ranges) avaliable from boost? I tried reading documentation for boost::range but it is a bit inconclusive :)

  7. Cool, Ill check it out, but I will probably never use it in production because I have phobia of nonboost or nonstd libraries. :)

  8. As Tomas says, enable_if is your friend here.

    template<typename Container, typename Pred>
    disable_if< is_same<Container, Pred>::value, void>::type
    sort( Container&, Pred );

  9. I did this at my last company – I wrote templates called things like for_all etc that took in the collection as a parameter rather than a couple of parameters which specified iterators.
    John

  10. You may want to checkout zenXml on sourceforge. It’s intent is to provide smallest implementations for both UTF8 and XML conversions possible. But two caveats: It’s C++11 only (needs gcc >= 4.5.2, VS2010). Similarity to my name are *not* accidental! :)

  11. Tnx, Just to be clear my question here was why it isnt in the standard, on SO was how to work around it not being in the standard.

  12. Well I presume Unicode in not as glorious as core language features or fancy libs like atomic, mutex and thread that depend on those so people didnt care. :( I wish that there was some “wishfund” for boost where people could donate some small amount of cash for libs that they want implemented. I’m not rich, but I would gladly donate some small amount(25$) for a decent Unicode boost lib. And 5k people donating should be a decent incentive for basic functionality… Also I presume people would like better XML support… I know that this probably wont happen because I get the feeling that most of the open source people feel that money is evil. :)

  13. Given that enable_if is now part of the standard what exactly speaks against using this construct in standard library? Are we waiting for new version of concepts?

  14. And if it weren’t for backwards compatibility, I’d say rename the iterator-based ones. It seems that using begin and end iterators is by far the most common usage.

    Too bad that decision wasn’t made way back when.

  15. There are two simple and robust solutions to the issue which don’t involve meta-programming here: a) put the range versions in another namespace, like boost does; b) name them differently, since code that deals with iterators and ranges has to treat them differently anyway. Using meta-programming to solve it is mentally satisfying, but it is not obvious how it makes the user code shorter or cleaner (again, since code that deals with iterators and ranges has to treat them differently anyway).

  16. Actually not hard. Implementing utf8/utf16/utf32 conversions as templates took myself only 190 lines of code.
    I don’t know if these fundamental algorithms are part of C++11, but it would be quite a shame if not. Were in 2011 and Unicode is everwhere…

  17. Well Im not an expert but how hard is to make utf8/16/32 strings that are “easily”(number of code lines, not easy in fast ) convertable to eachother and that are easily fprintable(aka you can specify how you want file to be encoded,)

  18. > And 4 is indistinguishable from 1 today
    Why not throw some template metaprogramming at it? It’s not hard to come up with a solution that distinguishes both cases at compile time:
    Simply put, a container could be identified by having “begin(), end()” members, while an iterator via iterator_traits.
    I’ve done something similar with an XML serialization library, that detects all STL and non-STL types and “does the right thing”.

    I’ll acknowledge, there are other considerations, style maybe, but feasibility? Hey I’d be glad to implement it! :) And while at it, I’d implement a member remove_if(Predicate p) for all container types, not just std::list).

  19. Of course all the angle brackets were eaten, the first func had templates class A, class B, and the second one had class A. And obviously an stdio.h was included.

  20. It’s not exactly clear for me why 1) and 4) are indistinguisable. 1) has the same argument types while 4) has different argument types, so the only way ambiguity would arise if the container and the predicate had the same type, which doesn’t seem realistic. Perhaps there’s some technical issue with template instantiation and function overloading I don’t realize that would cause issues in some cases, but this seems to work:

    #include
    template
    void func(A a, B b) {
    printf(“Heterogeneous\n”);
    }

    template
    void func(A a, A a2) {
    printf(“Homogeneous\n”);
    }

    int main()
    {
    func(1, ‘a’);
    func(1, 2);
    }

  21. Thanks Herb, this looks much better. One last catch if I may; it’s missing typename before enable_ifs. Thanks for the article by the way, now I can link people to it next time they ask me why no container overloads for std:: algorithms :)

  22. Good catches, thanks. As for the !, I was writing down the idea quickly and couldn’t remember offhand if boolean logic would work, but I think you’re right that it should. Updated.

  23. Herb, don’t you need (1) enable_if’d instead of (2) ?
    (2) takes three parameters so already doesn’t conflict with other overloads.
    Also, don’t you need to pass Iter to NotCallableWithNonmemberBeginAndEnd instead of Container (no type named Container there) ?
    Somehow we don’t have std::disable_if like we do in boost, but why not use !CallableWithNonmemberBeginAndEnd::value for (1) to avoid the need of writing another traits class?
    Thank you.

  24. +1 for Howard Hinnant’s answer. Of course I went ahead and posted my variadic template solution.

  25. There are many, *many* different things that could be called “unicode support.” What exactly are you looking for?

    The fullest version of Unicode support means being able to classify codepoints (asking if codepoint X is a combining character, or whatever), combine codepoints into graphemes, unicode normalization, and many, many other things. These are not simple things to do.

    Do note that the vast majority of things that were added to C++11’s standard library were either existing practice (ie: Boost) or slight modifications thereof. Virtually nothing was brought into the standard library that was not already implemented, or something that was not associated with C++11-specific language features (emplace, etc).

    It’s one thing to come up with a language feature out of whole cloth. It’s quite a bigger challenge to do that for a library. Especially for something that is as big and complex as “Unicode support.”

    Currently, the closest thing to existing practice for full-on Unicode support in C++ land is ICU. And that’s not a very standard C++-friendly library. It has its own locales, it has its own strings (not templated), it has its own way of doing everything. It’s a Java library that is implemented in C++.

    In short, a C++ standard Unicode library would have to be written and tested *before* being incorporated into the standard. As nice as Unicode support would be, the last thing you want in the standard is *bad* Unicode support.

  26. Herb, why there isnt a better unicode support in c++11. I might be misinformed, but from I see support is almost nonexistant except u16/32strings and matching char types.
    Also sorry to bother you again but I think you might have missed my original question…
    this really bugs me: Why you dont make include guards obsolete and repetitive inclusion explicit.
    Another small iritating thing. :why cant i write array my3Darray;

    BTW nice article, I knew before that concepts would enable this, but never know the “deep” reason

  27. You could, but then you lose the ability to overload which can be desirable. For one thing, we’d then have to teach multiple function names — that’s admittedly not the end of the world, but it does make things uglier with for_each vs. for_each_coll or something like that. It just looks like a workaround.

Comments are closed.