A quick poll about order of evaluation…

Consider this program fragment:

std::vector<int> v = { 0, 0 };
int i = 0;
v[i++] = i++;
std::cout << v[0] << v[1] << endl;

My question is not what it might print under today’s C++ rules. The third line runs afoul of two different categories of undefined and unspecified behavior.

Rather, my question is what you would like the result to be. Please let me know.

116 thoughts on “A quick poll about order of evaluation…

  1. What is wrong with You people? How it could possibly be 01? that would mean i++ evaluates to 1 two consecutive times! This better not happen regardless the order of evaluation!
    :D
    Consider v[i++] = v[i++] = i++; // <– how this should evaluate?

  2. If I didn’t know any better, I would assume the following based off of the OOO.

    v[i++] = i++; // Just evaluating this expression.

    v[i++] // place address of v[0] on the stack, increment i (i == 1);
    i++ // load i (i == 1) on the stack, increment i (i == 2);
    = // pop the value 1 from the stack, pop address of v[0] from the
    // stack, assign v[0] the value of 1;

    I would expect an output of 10.

  3. I was only thinking yesterday that everyone knows what multiple uses of

    i++

    should do so why don’t that define it properly in the standard? I figured maybe compilers have better opportunity to optimize code where multiple increments are not used or something like that. After seeing this survey I realize that people’s ideas of what it should do vary quite a lot. I went for

    10

    on the basis that i is incremented after it is used each time. Now I see that is not as obvious as I thought it would be.

  4. It seems that some form of extra control for postincrement/postdecrement is needed in the expression.I propose using redundant parentheses around expression whenever we want the accumulated increment/decrement side-effect to take immediate effect at the point of use of redundant parentheses. The syntaxes are

    (expr);

    (expr)++;

    (expr)–;

    E.g.

    int f(int, int, int){ return i; }

    int i=0, V[3] = {}; //initial condition

    V[i++] = i++; => V[0]=0, V[1]=0, V[2]=0; Final value of i is 2; no ambiguity now

    V[(i)] = i–; => V[0]=0, V[1]=2, V[2]=0; Final value of i is 1

    i=2; //re-init

    V[(i)++] = V[(i)++] = V[i–] = i–; => V[0]=2, V[1]=2, V[2]=2; Final value of i is 2.

    i=f(i=0, i++, (i)++); => f(0,0,1) and returns 1; But final value of i is 2;

    i=f(i=0, i++, i++); => f(0,0,0) and returns 0; But final value of i is 2

    If expr itself needs redundant parentheses, then an extra redundant parentheses can be added and so on. E.g.

    (expr ? (expr1) : (expr2))++

    which is different from

    (expr ? expr1 : expr2)++

    , the latter of which is a normal postincrement/postdecrement

    If there is no accumulated increments/decrements at the point where forced postincrement/postdecrement is used, then the redundant parentheses is redundant. E.g.

    (i)++; //similar to i++

    (i)++ + (i)++; //the left redundant-parentheses is redundant, but the right redundant-parentheses is not

    I also propose operands be evaluated according to their associativity, and postfix-expression be evaluated left-to-right, as proposed in N4228, and forced/unforced postincrement/postdecrement both yield lvalue.

    –expr++; //ok now

    ++expr–; //ok now

  5. i think output should be 02 .Because :

    std::vector<int> v = { 0, 0 };
    int i = 0;
    v[i++]// i is 1 now.
     = i++ ; //i is 2 now.
    std::cout << 
    v[0] //0,because v[0] was 0 and its value hasn't changed
    << v[1]// 2,because it became 2.
     << endl;
    

    i think that’s the answer.Thanks

  6. (In the following, = is used for assignment, and == for equality.)

    If we were to evaluate left to right:
    before => v[0] == 0, v[1] == 0, i == 0
    v[i++] = RHS => v[0] = RHS, i == 1
    v[0] = i++ => v[0] = 1, i == 2
    after => v[0] == 1, v[1] == 0, i == 2

    If we were to evaluate right to left:
    before => v[0] == 0, v[1] == 0, i == 0
    LHS = i++ => RHS == 0, i == 1
    v[i++] = 0 => v[1] = 0, i == 2
    after => v[0] == 0, v[1] == 0, i == 2

    If there were an order, left to right would be more natural, but I could cope with either. If there is a reason for not specifying an order (I can’t see any particular use for this syntax), then give me a compilation error. My only preference would be: don’t give me ‘unspecified’.

  7. Code should be evaluated left to right when piped (multiple times), reduced on both sides before assigned from the right side and evaluated one line after the other in order. Therefore the correct behavior should be:

    std::vector v = { 0, 0 };
    int i = 0;

    There is nothing suspicious here, yet.

    v[i++] = i++;

    This should make

    v[1] = 1;

    or

    v = { 0, 1 };

    And then

    std::cout << v[0] << v[1] << endl;

    would print [quote]01[/quote] to the console.

  8. Suppose we were doing this with a user-defined type, so instead of

    v[i++]=i++;
    

    we had

    v.operator[](i.operator++())..operator=(i.operator++());
    

    In this case, we have unspecified behavior (feel free to correct me if I’m wrong) because the order of function calls isn’t completely specified. There’s advice to make UDTs that do arithmetic work like ints, so it seems reasonable to me to make the ints work like UDTs here. That’s why I voted for “unspecified”.

    I don’t want to specify a behavior here, because that’s more stuff I’d have to learn to read the code. Ideally, I’d like code that requires complicated rules to understand, and can easily be written less ambiguously, to be a genuine error. C++ has enough readability problems, for a variety of reasons.

    I’ve become less fond of undefined behavior since gcc started optimizing it out so aggressively. In cases like accessing outside the bounds of an array, I think it appropriate, since the likely outcomes (exception, wrong answer) are sufficiently different from one another. In this case, we’re going to get some reasonably valid result under any reasonable interpretation of what the computer is likely to do, so unspecified.

  9. What is going to become of this poll anyway? Will it sound the death knell for N4228?

  10. I’m a UB man (it’ll learn people not to write this stuff!) followed by a 0 1 man, because I expect an RHS expression to be calculated before the LHS storage location is calculated. I particularly expect this with right-to-left associativity.
    Which raises the question of what I would expect in a similar code-snippet using the equality operator (left-to-right associativity). I should expect the LHS post-increment to be evaluated before the RHS post-increment, I suppose, but you know what? I’ve just consulted my C++ back-brain, and I don’t.
    And please don’t hurt my brain by mutating the question to, say, the comma operator. I don’t even know whether this is defined behaviour or not.
    I guess what I’m saying is that, if this monstrosity has to be allowed for some reason, then it should be an exceptionally clear and well-defined reason. If it’s strictly based on operator associativity, I could learn to live with that.
    (Luckily, the scope resolution operator doesn’t come into play here!)

  11. I voted for 10 because I think the left-hand expression should be evaluated first. The command a = b translates in my mind to a.operator=(b), and it’s more natural to know the type of “a” (and the operators it contains) before we actually evaluate and pass arguments to the call.

    However, there is also this case:

    a = b = c = d

    Which is currently interpreted as a = (b = (c = d)). That means the right-hand side is evaluated first. So, in a second thought, C/C++ programmers might be more used to right-hand evaluation happening before left-hand.

  12. My vote goes for the 00 result.

    I haven’t tested current results, but “in my mind” that is an “assignment” and that involves two different expressions. As I would do if I faced a blackboard with “a = b + 1” I would first focus on the right side of the assignment (“b + 1”) so I would solve that first FULLY as the first expression (as if it were a function call). Then I’d assign that value to the left expression (whatever it is).

    Adding post-increments to the equation, I “would” define their behavior to be applied AFTER the computation has been performed and BEFORE the computed value is used, in a left to right manner (or, again in my mind “expression context”)

    [CODE]
    int a = 4;
    int b = (a++ + a++);
    [/CODE]

    I would expect “9” as a result (4+5), leaving variable “a” as 6 before assigning the result to “b”.

    In case of several successive expressions, like in:

    [CODE]
    int a = 4;
    int b = func(a++ + a++, a++);
    [/CODE]

    I would expect the two expressions being evaluated left to right and the function invoked with arguments “9,6”.

    Turning back to the original example:

    [CODE]
    std::vector v = { 0, 0 };
    int i = 0;
    v[i++] = i++;
    std::cout << v[0] << v[1] << endl;
    [/CODE]

    I'd expect the right hand expression to yield value "0", be post-incremented before the variable is again used in the left hand expression so

    [CODE]
    int i = 0;
    const auto result = i;
    i = i + 1;
    v[i] = result;
    i = i + 1;
    [/CODE]

    Having said that, an issue arises… it would not be the same
    [CODE]
    v[i++] = i++;
    [/CODE]

    than

    [CODE]
    void assignFunc(int& a,int &b)
    {
    a = b;
    }
    assignFunc(v[i++],i++);
    [/CODE]

    In any case, such an obvious "double post increment" scenario a compiler warning/error could be nice – I understand that performing alias detection could be a real nightmare.

    Just my two cents!

  13. I’m still voting for compiler diagnostic. In that specific case (i++ twice) this is obvious.

    For more devious constructs where the compiler can’t conclude (i++ vs j++ with names so defined that there could be an alias, like because of templates for instance, or thru pointers with no restrict in effect so that we don’t know it’s the same pointer), I would still vote for a diagnostic.

    If that code is so convoluted that the compiler can’t figure out it’s safe, it doesn’t belong in a valid program. We’re not talking normal Turing machine rules. We’re talking a specific case where it’s perfectly okay to answer “this is safe” or “I don’t know, so disallow it”.

    In the worst case, you end up splitting the instruction and adding a ;. That’s much better than hard-to-diagnose bugs.

    Forcing an order of evaluation is wrong. It puts more constraints on the compiler itself, which already has a hard time doing a good job on modern architectures.

  14. This should definitely be a compile error. Not even a warning, but a hard error. This code is undefined behavior, and every compiler we know of is able to analyse local code and figure out double i++ in one single instruction.

  15. Taking into consideration the following:
    u[i++] = v[i++] = i++;
    and the fact that operator++ could be user defined, and the fact that operator= associates right-to-left, it makes sense to expand it as:

    tmp = i
    i = i+1
    
    v[i] = tmp
    i = i+1
    
    u[i] = v[i]
    i = i+1
    
  16. int i = 0;
    v[i++] = i++;

    Herb, by the way, after looking a bit closer, the postfix looks correct by logic, because the vector being an internal system based feature is close to the language itself, a distance away from something coming into a database, externally, by way of string or integer. Anything assigned to it , by prefix would make sense, by standard. Features, ususally being closerr to the first things on the compilers list, after the common data types and such, looks like it would shovel the similar for the postfix, being lower on the compilers list, by structural logic. makes some sense, yet it is’ a complexity based implementation. An old saying followed this. “When approchment to a problem set needs a work around.” That’s where I’ve seen this kind of code, in my own. But if the rest of the program’s code doesn’t reflect this level of complexity, then you could probably have someone make the whole program easier, by way of resource management.

  17. int i = 0;
    v[i++] = i++;

    There’s nothing here that seems much afoul here. One would just be dealing with two logic layers of abstraction. Usually if you see code like that in a file, the person has the potential to max out the programs capabilities at a different open points.

    Code like that, read on a daily code review basis, requires a set of desk glasses.

    I wrote code like that in school projects where I exceeded the level of abstraction common to the programming exercises by complexity of 2.

    I guess the level of abstraction were common, we’d be done with part of the main scientific problem sets of the past 40 years or so, only if expertise becomes a rational thing in the areas of the senior sciences, rather than complementary research and result handling.

  18. I’d like this being (in standard’s terms) “ill formed (no diagnostics required)”. That’s almost status quo (as far as I can tell it’s answer 5).

    You cannot do any real justice among the “good” answers, only force one of them, or keep the status quo of UB. And I don’t even care here if it’s the launching-missiles or just the you-can-get-any-answer kind of UB, both of them is equally wrong. (BTW, I would rather call the later “implementation specified” than “unspecified” behavior, keep the UB term for potential missile launches.)

    The whole “v[i++] = i++;” thing is over-engineered anyway, programmers should write easily understandable code, not this; it is in no way worse to write “v[i] = i; i+=2;” instead (or whatever version you actually wanted). It’s also better from teachability POW: One shouldn’t have to waste time to teach/learn stuff that is not supposed to write anyway.

    Make expressions with ambiguous sequence point ordering an error, that will enforce better code and straightforward to teach. I added the “no diagnostics required” part so that current compilers are immediately conformant with the language change, but to still encourage future compiler versions to emit an error (or warning; some of them already do).

  19. 10 (if my in-mind compiler served me right), i.e left to right. This really boils down to “what is easier to teach to a beginner” – in which case it seems pretty obvious to me. C++ is read by people left to right, then I would believe that that is also likely going to be what people expect is going to happen. “But this part here is placed BEFORE that, then why isn’t it running in that order?”

    I also think that Sean Parent’s point about what other languages do is a very valid one.

  20. “12” should be the answer.
    Think about Ms. Liskov:

    /* following the Liskov idea of substitution... */
    
    #include <vector>
    #include <iostream>
    using namespace std;
    
    int pincr (int& x, const char & who)
    {
        cout << "   f: " << x << " who: " << &who << endl;    
        return ++x;
    }
    
    struct IntBox {
        IntBox () : i(0) {}
        int i;
        int operator++(int x) {++i; return i;}
    };
    
    int main()
    {
        vector<int> v = { 0, 0 };
        int i = 0;
    
        v[i++] = i++;
        cout << "herb´s: " << v[0] << v[1] << endl;
    
        i = 0;
        v[pincr(i,*"LHS")] = pincr(i,*"RHS") ;
        cout << "funcIncr:" << v[0] << v[1] << endl;
    
        IntBox I;
        v[I++] = I++;
        cout << "operator++: " << v[0] << v[1] << endl;
        
        i = 0;
        auto lam = [&](){return i++;};
        v[lam()] = lam();
        cout << "two lambdas: " << v[0] << v[1] << endl;
    
    }
    
    
  21. @Michael (Post-)incrementing as the last operation of the full expression would, among other things, break existing code, since as things stand now, side effects of arguments to a function must be complete before the function is called.

  22. Do we have something like f(g(i++),i++); here? I like 00 because it can be phrased as a simple rule – if several sub-expressions of single statement contain post increments than incrementing is to be performed as the very last operation.

    10 or 01 would require hard to memorize rules…

  23. I vote for 10 as I prefer left-to-right evaluation, which goes the same way that a human reads the code. But if this can cause compilers to miss optimization opportunities, I would go with unspecified. The pole is somewhat misleading is 00 represent two different behaviors – i[0]=0 and i[1]=0. In addition, I can’t understand how the result 01 got so many votes, why would anybody expect that?

  24. Is there a value in letting different compilers deal with things like this their own ways rather than to standardize a particular behavior? Personally I prefer things things to be standardized as much as possible.

  25. So does C#. Its not like Herb doesn’t know this already. As always with C++ if we don’t make things at least a wee bit complicated, we’d get withdrawal symptoms.

  26. (really? – strips links without allowing me to edit?) Search stackoverflow for “What are the rules for evaluation order in Java?”

  27. Commenting on own post – Initially I read 00 to imply no writes – as noted by others it can imply r to l evaluation. Remainder of my comment stands. I don’t have an issue with getting rid of undefined behavior, but it should be done with an argument from data. I have no data in support of either side, and in this case I think speculating about what the compiler might (or might not) do is just that. As for what the evaluation order should be, I’d look at other languages before creating a new set of rules – it sounds like people are trying to apply precedence rules to subexpression evaluation ordering – see the answer here .

  28. @Matt C, the compiler won’t know in general which of f_1(i) or f_2(i) will be more complicated to evaluate, especially since this can change depending on conditionals and run time state.

    I would consider it far more important to achieve consistent behavior, rather than to reach for possibly better performance in some cases at the cost of potentially inconsistent behavior.

    In particular, I would not want it to be the case that the behavior changes if I choose to first create an explicit named reference to the destination location of an assignment.

    int & foo = expression, e.g. involving v[f_1()]
    foo = f_2();
    // potentially more operations involving foo
    

    The only way to ensure consistency of behavior with or without a named reference is to evaluate the LHS of the assignment first, just as you first posed. Otherwise, seemingly equivalent code could behave differently.

  29. So when I looked at the question, I immediately thought, “I would expect this to execute left-to-right”.

    v[i++] => lhs = v[0], i = 1.
    i++ => rhs = 1, i = 2.
    lhs = rhs => v[0] = 1.

    So that gives the “10” result.

    But, that means that the reference to v[0] has to be held somewhere. Either in a register, or (possibly expensively) pushed onto the stack. Now, the right hand side of an expression is usually *far* more complicated than the left, and putting that register out of use could be detrimental to performance.

    And so I’d rather leave it unspecified so that the compiler can do whatever it works out to be fastest, because whatever rule takes place here doesn’t *just* affect aliasing problems, but all assignments. Consider also:

    void complicated_1(int &);
    void complicated_2(int &);
    
    v[complicated_1(i)] = complicated_2(i);
    
  30. @Joseph Mansfield: I don’t understand the C++ community’s infatuation with leaving things unspecified _even_ when given an opportunity to specify a particular outcome. In this particular case, I would like to ask you (and just about every commenter watching this thread and maybe Herb himself), what would be technical argument against specifying a strict left-to-right order of evaluation? (leaving aside the question of programmer expectations for the moment)

  31. From a first look I would expect it to write 0 to v[1], so {0,0}.

    Why?
    – Right side is executed first (the value has to be generated before you can assign it).
    – i++ returns 0, i is now 1
    – After this the array is accessed i++ returns 1, i is now 2
    – So basically it should be v[1] = 0;

    But ….
    If you rewrite it to use more than one line there are multiple possibilities.

    int value = i++;
    int& target = v[i++];
    target = value;
    
    int& target = v[i++];
    int value = i++;
    target = value;
    

    As of this I guess undefined is the correct way to go to be “consistent” with different implementation variants.

  32. I would expect it to write 0 to v[1], so {0,0}.

    Why?
    – Right side is executed first (the value has to be generated before you can assign it).
    – i++ returns 0, i is now 1
    – After this the array is accessed i++ returns 1, i is now 2
    – So basically it should be v[1] = 0;

    But ….
    If you rewrite it to use more than one line there are multiple possibilities.

    int value = i++;
    int& target = v[i++];
    target = value;

    int& target = v[i++];
    int value = i++;
    target = value;

    As of this I guess undefined is the correct way to go to be “consistent”.

  33. I voted for unspecified for two reasons:

    1. I do not like the idea of enforcing further sequencing rules on expressions. The current sequenced-before rules are intuitive and map nicely to the syntax tree.

    2. It is clear that there really is some known set of side effects that might arise from this expression. I can’t see any reason for a compiler to manipulate this in anyway that would result in some outcome that is not expected. An execution path with multiple known outcomes is “unspecified behaviour”. Undefined behaviour is just too loose.

  34. @Evan I’m utterly baffled why you would want a statement in a programming language, which is written for no other purpose than asking a computer to produce a desired result, to have an ambiguous interpretation. It’s all fun and games until someone loses an eye.

  35. @Ben Craig

    > I would prefer: “Unspecified (one of the first three options, but could be different on different compilers, or even different runs of the same compiler)”.

    I would want something even looser: the compiler is not required to pick the same evaluation for different occurrences of the same construct even in the same run, or for the same occurrences in different contexts (in the case of different template instantiations or different copies of an inlined function).

  36. @bdsoftware:

    > Thus, the only two possible outputs would be 00 and 10. i don’t see how 01 could ever happen.

    That’s because you’re not thinking creatively enough. Consider just x = y++. One way of implementing this is morally equivalent to

    y += 1
    x = y - 1

    Now, I’m not sure why a compiler would do this, but I can imagine that one would in certain weird situations. (Now, that’s not true of just a source transformation with no knowledge of the implementation because there are overflow issues, but that doesn’t apply to code generators, which know the behavior under overflow.)

    Similarly, one way of compiling v[j++] = i++ is the following:

    i += 1
    j += 1
    v[j-1] = i-1

    If i and j are the same, you have

    i += 2
    v[i - 1] = i - 1   //v[1]=1
  37. it seems simplest to have the above work similar to the way *pDest++=*pSrc++; works

  38. Just to point at the elephant in the room:

    It seems like this should be an issue for the C committee (WG14), not the C++ committee (WG21).

    At the very least, WG21 should get a commitment from WG14 not to resolve the issue in an incompatible way. It might be (grudgingly) acceptable for it to be well-formed in one and UD/US in the other, but if the same expression yielded different results in C vs. C++, it would be a disaster.

  39. Fun poll! I like these kinds of discussions and there are some interesting replies as well. Feel free to post more of these :)

  40. if we transform it into regular functions:

    assign(nth_elem(v,post_inc(i)), post_inc(i));

    Looking at it that way, I don’t see any reason that this case should be treated any way other than how it already is: unspecified.

    Just because it involves operators doesn’t mean we should suddenly start introducing new ad hoc rules about how it should be evaluated.

  41. There is an obvious solution here not presented in the poll options: deprecate the ++ and — operators.

    I’m only being half-facetious here. Is it that much more convenient to write i++ instead of i += 1? What is the real benefit to having these weird operators?

  42. >> The main problem with unspecified or undefined behavior is that very often, the behavior you get is the
    >> behavior you expect, and so you never notice that your code has a problem. Some time later, a new compiler
    >> or platform comes along that sees the code differently, and all of a sudden the latent bug mysteriously
    >> appears.

    This is already evident in the map example where m[0] = *m.begin(); works just fine with VC++ but fails with compilers from other platforms like gcc. See Tino Didriksen’s post above.

  43. The primary (only?) purpose of a programming language is to communicate to the computer a set of actions that you want it to take. Thus there is no benefit to having a language with ambiguous interpretations. The main problem with unspecified or undefined behavior is that very often, the behavior you get is the behavior you expect, and so you never notice that your code has a problem. Some time later, a new compiler or platform comes along that sees the code differently, and all of a sudden the latent bug mysteriously appears. I have always maintained that the optimization opportunities afforded by unspecified evaluation order are trivial compared to the errors introduced into code by people who don’t understand its nuances. (Even the most expert C++ users often fall into this class.)

  44. I would like to get a compiler error. There is just too much stuff happening on that single line. Everyone will have different expectations depending on background and experience as of what the right behavior should be. It will break a lot of code tho. But in the case of “v[i++] = i++;” it will only break code that is already broken. In the case of “v[i++] = j++;” where j is a reference to i and the compiler cannot prove it… well… leave it as undefined behavior. You might suggest that compilers should provide a diagnostic saying “there is too much stuff going on on that line, if j and i alias the same memory, the behavior will be undefined”. Any compiler can do that.

  45. Amazing the different views! No agreement here, and barely a preference in one direction or another. My mind, POP!

  46. I favor 10 because I want strict left-to-right evaluation for all expressions (even for assignments, although they would remain right-associative). Strict left-to-right has the advantage of being simple to explain and understand. Even if you think assignments should evaluate the right side first, once someone tells you that they don’t, you remember it, and from then on you can unambiguously interpret expressions. So v[i++] = i++ evaluates as

    pv   = &v
    pil   = &i
    pvi  = &(*pv).operator[](*pil)  // &v[0]
    *pil += 1  // i = 1
    pir  = &i
    vir  = *pir  // 1
    *pir += 1  // i = 2
    *pvi = vir  // v[0] = 1
    
  47. Herb – FWIW, I want the output to be 1 0 and pretty much would like to +1 Sean Parent’s comment above that if you are going to nail down an order its better you have it going one way (left-to-right).

  48. So I thought it through, and my gut said, “I would like to have this read left to right.” So, v[i++] is evaluated, cool. that’s a reference to v[0], and i=1. Then the right hand is evaluated, temp = 1, i = 2. Then the assignment v[0] = temp occurs, leaving v[0] = 1, v[1] = 0, which prints 10. This I clicked.

    And then it occurred to me: the right hand side of an expression is usually far more complicated than the left. Do I *really* want the simple, left hand side to be evaluated first and then have the resultant address either hug a register that the complicated expression could use, or have it (perhaps expensively) occupy a stack slot until the assignment is complete? I really don’t think I do.

  49. My vote is 10. I’d expect the LHS of = to be evaluated before RHS, and I have had to fix code that assumed this was already true because VC++ does it that way.

    For example, given an std::map, map[0] = *map.begin(); works fine on VC++ where LHS comes first, but segfaults on g++ where RHS comes first.

  50. Toby Speight: Disable NoScript for the poll. ;-)

    I’m all for sequencing the base expression of a member call before its arguments evaluation (i.e. in a.f().g(x(), y()), f() is guaranteed to be called before x() or y()). But on the poll, I voted for UB, because I just don’t think sequencing around assignment (and especially for increment/decrement) is in any way useful.

  51. Being pedantic, I would expect a compiler error on ‘endl’, since other tokens in ‘std’ are qualified – suggesting that ‘using namespace std’ has not been declared.

    Otherwise, UB with a compiler warning and a stern letter to whoever committed the code.

    It would be nice if the standard could specify where and when the increment takes place, but if it was that easy…

  52. I tried to vote, but the “Vote” button didn’t actually submit. (and that after editing down my “other” to fit…)

    It seems reasonable to require that multiple assignments between sequence points shouldn’t cause the legendary nasal demons; any real implementation will assign /something/ to i, and use /some/ value of i. In this case, that might result in an out of range index into v, and whatever undefined behaviour that results in, but the assignments to i should give an unspecified arithmetic value to i, and no other effect.

    I think it would be a mistake to impose one specific required result on the compiler, as current processors with multiple arithmetic units may well evaluate the increments concurrently. Forcing a sequence point when values /might/ be aliased harms generation of efficient code.

    The reason for saying “an unspecified arithmetic value” is for cases where the variable is larger than the native size of the processor, so that two (or more) stores are required, and one may only partially overwrite the other.

    Like everyone else, I would like my compilers to issue a warning when it thinks multiple assignment to a single variable is likely between sequence points.

    All the above also goes for things such as y = f(++x, x--); and the like, of course (to counter those who suggest that assignment operators ought to be sequence points).

  53. Herb, unfortunately I filled the poll based on the question in your WordPress e-mail notification, which contains the erroneous v[2] and directly links to the polldaddy site, so there was no hint about the update. My answer to the updated question would have been slightly different. Hope only few people fall into this trap…

  54. At first I voted 00, thinking it’d make sense to evaluate RHS first, then LHS (or first rvalue, then lvalue)
    Then I asked myself what would I expect if exceptions are thrown? Should LHS-exception or RHS-exception be thrown first?
    I’d assume that it makes more sense the LHS throws first, because you wouldn’t need to evaluate RHS if there is no valid lvalue to assign the result to (especially if there are side effects on RHS).
    Thus 10 would make more sense

  55. The moment you allow multiple post-increments, code ceases to make any sense.

    int subtract(int a, int b) { return a-b; }
    
    int main() { 
      int n=0;
      int& i=n;
      int& j=n;
    
      if (i++ > j++) {
        printf("%d\n", subtract(i++, j++) ); // 0? 1? -1?
      }
      if (i++ <= j++) {
        printf("%d\n", subtract(j++, i++) ); // 0? 1? -1?
      }
    }
    

    Neither left-to-right nor right-to-left gives straightforward, intuitive, behavior, and if behavior is legal but unspecified – then it’s ridiculously easy to get to undefined behavior using the results.
    That being the case, I’m not seeing much utility in defining behavior for this (even, or maybe especially, unspecified among legal options) if it just results in OTHER undefined behavior which’ll be even harder to debug.

  56. 00. If the r-value is evaluated first, a temp variable can be inserted between evaluation and assignment without changing the meaning of either part. I.e. (v[i++] = i++;) would be equivalent to (int t = i++; v[i++] = t;)

  57. Herb, I think you should reset poll results and start again. Yesterday evening (my time) I voted “Other – out_of_range exception” when I saw v[2] being accessed. But now, after you’ve fixed the typo I voted differently.

  58. In my opinion 00 is the most reasonable (i.e.

    v[1] = 0;

    ). If I erased my years of C++ experience I think I would hope/expect that the right-hand side of = is evaluated before the left-hand side.

  59. Code like this shouldn’t be written and relied upon in the first place, so I think this specific case shouldn’t be taken into consideration while defining sequencing rules. Based on other cases and general idea “UB is bad”, evaluation should probably be fully sequenced from left to right. This gives the answer 10 for this particular contrived example. And +1 to Sean, we need data and measurements first of all – how accounting of relatively rare aliasing cases prevents optimization in much more common (~99%?) non-aliasing cases.

  60. To me the “00” option seems the most understandable. Both i++:es will be evaluated after the semicolon, so i should be = 2 after the full statement, but remain 0 inside it.

  61. First I voted for 00, but after some thinking I believe more logical choice would be 10. Basically if we treat = a little bit like , or ; that would mean we need to do first everything that is on left-hand then everything on right-hand and then assignment. Just like ( ( v[i++] ) = ( i++ ) );.

    I tried to dig in C++ standard, but it didn’t go well. So instead I will use simpler explanation: if post increment operator (X++) if making a copy of X, then increasing X and returning reference to copy of X (I wonder if that won’t make some mess), then it means that everything is happening in one go. So RHS i++ (from example) is already modified by LHS v[i++].

    But I am no expert so probably whatever I wrote here is just some misunderstanding from my side. ;) Sorry!

  62. Herb, your preference of left to right in most cases but right to left in the case of assignment seems to be extremely confusing to me.

    Consider:

    #include <string>
    
    struct X
    {
      X& operator=(const std::string&);
      X operator+(const std::string&);
    };
    
    int main()
    {
      auto x = X{};
    
      x.operator=("hello"); // right to left?
      x.operator+("world"); // left to right?
    }
    
    

    Why would the order of evaluation be different between those two operators?

  63. Quite frankly, what I expect is for this to not pass a code review. It’s hard to parse as a human, that’s grounds enough for staying 20ft away. If I had to pick one of your options, I’d go with undefined (unspecified => not portable => bad times going between Windows and UNIX), but I’d rather this be a compiler error, or at least a warning in the cases that are less difficult to detect.

  64. I am all for a compiler error.

    The easier it is to use the language, the better. And one of the things I have noticed, that newcomers take a long time to learn, are all the corner cases in C++. Thus, less is better, and the more the compiler helps you avoid them, the better.

  65. I chose “Other: Unspecified, but not necessarily one of the given options.” My thinking was that I didn’t want to give the compiler license to do unlimited program transforms based on this, but I also didn’t think it mattered what the particular results were, and even a crash would be fine. Although after a bit more thought I’ve started thinking that maybe unlimited program transformations are fine, so perhaps now I would vote ‘undefined behavior.’

    For those who want this to be a compiler error, I think that’s certainly valuable, but I’m not sure it’s reasonable to put into the standard some definition of ‘the simple cases that can be handled in practice’. Compilers should certainly offer to diagnose whatever they can. I don’t think VC++ does currently, and they should certainly look at adding a diagnostic for this. For gcc you can already do -Werror=sequence-point, and this will catch the simple cases (for some definition of ‘simple cases’).

  66. Take this with a grain of salt as I’m not a professional programmer. However I saw this post just before getting ready to go to work and couldn’t get it out of my mind. In any event if we assume a compiler would actually compile this code without error I would expect V[0] = 1, V[1] = 0. I actually voted for this as the acceptable result.

    Why? Mainly because I expect a post decrement to happen immediately after the value is used. That would mean the RHS is already “1” when it is assigned to the array. If it isn’t then we are implying that there is an alias of i hanging around someplace or that evaluation did not take place as expected.

    Speaking of evaluation, one of the biggest problems I have with C++ is the lack of rigid rules for evaluation of a statement.

    Now the question then becomes does my view here make sense. Maybe as C++ is now it doesn’t but the evolution of C++ needs move towards less Undefined Behaviour. If we can’t force rational evaluation then we need to demand that compilers generate an error or warning.

    In any event this is certainly an interesting discussion.

  67. I went with 10. I expected the behaviour to be something equivalent to this:

    std::vector v = { 0, 0 };
    int i = 0;

    int& x = v.operator[](i);
    i++;

    x = i;
    i++;

    std::cout << v[0] << v[1] << endl;

  68. I voted UB because of v[2] but I noticed it was a typo. I bet some people made the same mistake so the results might not be reliable.

  69. If the RHS expression of an assignment is considered to be evaluated first, then one could not always safely and transparently create and use a reference to the destination. In order to have the option of equivalently using either a reference to the evaluated destination (which should be evaluated only once) or the original expression, that requires that the location referred to by the LHS expression of the assignment is determined first.

    The alternative of evaluating the RHS of an assignment first (before the LHS) would mean that there could be many cases of subtle behavior changes when one instead makes and uses a reference to the evaluated destination of an assignment. This would be especially possible whenever the destination location is determined by evaluating a function or nontrivial expression. The potential interactions might be missed.

    Increasing the potential for subtle behavior changes due to a seemingly equivalent use of a destination reference would not be a good outcome.

  70. I would like it to be UB. UB is a more persuasive argument to tell people not to do line 3.

    Increment deserves its own semi-colon. That way, average programmers like me and most of us over the world do not need to care what the order of evaluation is. Even if the behaviour became defined, embedding an increment inside a statement should be considered less teachable and readable.

    If it is found evenly matched between the four choices of defined behaviour in the poll, it can be an evident that the order of evaluation is simply not intuitive.

  71. I’ll expect the compiler to evaluate the expression from right to left and then evaluate the i++ at the right (= 1) first, and then assign the reslult to v[i++] (which turns to be v[1] due to the post increment).

    my 2¢

  72. @Marco: No worries, there are lots of inputs and examples (including both usability and performance examples) that need to be considered in this question. This particular poll was prompted by some assertions I encountered of the form “of course people would expect the answer to be X” and so I thought I’d ask to get a rough data point on one narrow corner of the OOE issue.

    Like any single example, this one may be imperfect (here I could have made it simpler by removing postfix ++ and initializing values to something else like 9 to be able to distinguish original 0’s vs written 0’s). However, not only the votes but also the reasons in the comments (and in the “Other” write-in votes) have interesting and useful information. They specifically have shed useful light on the question of “do people expect/want OOE in assignment expressions to be left-to-right, right-to-left, something other specified order, unspecified, or undefined?” in the context of an example that did more than just that. Also, both in this poll and related offline discussions over the past 24 hours, there have been opinions expressed I hadn’t seen expressed before, and that too was useful.

    FWIW, my own preference is still toward right-to-left evaluation for (possibly compound) assignment, left-to-right for most (not necessarily all) other things, if performance experiments show the perf difference to be truly in the noise using modern platforms and compilers. There are a lot of corollaries to making assignment right-to-left, however, including symmetry between operators vs. function calls. It’s a complex question that deserves far more than a single example, but this was a key example that helped shed light on people’s expectations in one specific case.

    I may occasionally post a few more like this. Thanks for the useful opinions and feedback! Further comments always welcome.

  73. Make it a compiler error when the compiler can prove `i` aliases `i` (as opposed to `i` and `j` being function arguments, for example), otherwise keep it UB.

    To be honest, writing a statement like that with multiple side-effects looks like over-cleverness to me, and if the programmer is overly clever he or she better actually be.

  74. I find it curious that “v[i++] = i++” can produce a different result compared to the equivalent statement rewritten as function calls, e.g. for exposition: “assign(v.at(inc(i)), inc(i));“. Perhaps the rules should at least be tightened to enforce the same result. Then it would be just a question of argument evaluation order.

    #include <vector>
    #include <iostream>
     
    using namespace std;
     
    auto main() -> int
    {
      // Test 1:
      {
        auto v = vector<int>{9, 9};
        auto i = 0;
        v[i++] = i++;
        cout << v[0] << v[1] << endl; // VS 2013 Debug: "19", Release: "09"
      }
     
      // Test 2:
      {
        auto assign = [](int& lv, int rv) { return lv = rv, lv; };
        auto inc = [](int& lv) { auto t = lv; ++lv; return t; };
     
        auto v = vector<int>{9, 9};
        auto i = 0;
        assign(v.at(inc(i)), inc(i));
        cout << v[0] << v[1] << endl; // VS 2013 Debug: "90", Release: "90"
      }
      return 0;
    }
    
  75. I would like it to be the same as in:

    std::vector<std::vector> v(2);
    int i = 0;
    v[i++] = std::vector{i++};

    Consider the last line in alternate form:

    v.operator[](i++).operator=(std::vector{i++});

    I would expect `v == { {1}, {} }’. By analogy, I answered “10” to your question.

  76. 00 isn’t an option because the act of reading i in the subexpression i++ is the result of applying the operator so it isn’t possible to “defer” the increments until after the statement is executed. Surprisingly 00 has the most votes. To see how this breaks – try and implement it with your own operator++().

    The Java and C# rules are that subexpressions are evaluate left to right. I see no reason to invent a different rule for C++ so if it had to be defined, the answer would be 10.
    As you point out, the broader problem is one of aliasing. Because modifying an lvalue twice without a sequence point is UB, the compiler is free to assume that two such operations do not alias and can reorder subexpression evaluation accordingly.

    Are we only concerned with local side effects from order of evaluation or all side effects? Exactly what are you considering here?

    If you want to change it, make an argument based on data rather than polls. With current compilers, what is the performance penalty for left to right evaluation when aliasing (or side effects?) cannot be determined? What does worst case and average look like? Is there anyway for a developer to recover the performance (perhaps look at issues with current extensions to specify that pointers don’t alias)? How many real problems are caused by the current rules?

    Without data, I’d vote to leave it alone.

  77. The 00 answer does not enable to distinguish two distinct possibilities.
    Suppose the first line was: std::vector v = { 7, 7 };
    Then both 70 and 07 are possible outputs.
    I voted for 00 in the poll because I would like the output to be 70 in my version of the code, not because I wanted 07.

  78. My feeling would be that “result_expression” should be computed first in “variable_expression= result_expression”.
    — If I move result_expression into a function, I don’t want the variable to be computed differently.
    — If a create a temporary variable before assigning to variable_expression, I again don’t want a different value;

    So v[i++] = i++; should be the same as auto temp= i++; v[i++] = temp;

  79. Herb, this poll can be maliciously skewed by voting multiple times using incognito mode (private browsing in internet explorer). So, I hope you won’t present the results of this poll to thr C++ committee and use it to steer the direction of C++.

    Thanks

  80. The code is very poor style. If we can’t categorically make it ill-formed/illegal (I actually don’t have a problem with falling back to unspecified/specified behaviour in complex/intractable aliasing cases), I would go according to the rule of rhs before lhs in an assignment. Within the RHS expression, it should probably be left-to-right.

  81. My feeling is that the right hand side of the equal sign would evaluate first then the left hand side. So i++ would create a temporary copy with the value zero, then it would increment i to one. Then the left hand side would create another temporary copy (before incrementing i) of the value of i which would now be one. Then that temporary int would be passed to the subscription operator of the vector along with the right had side temporary int with the value zero. Which would result in the vector with the values {0,0}. This seems most natural to me.

    In my opinion, the use of post increment operators is not only a premature optimization, it is an anti-optimization. Sure the compiler would probably be smart enough to optimize this in the case of ints, but for more complex objects the post increment could be arbitrarily expensive.

  82. Keep it legal and unspecified, let UBSan work it out. Btw, Visual Studio should have build in UBSan as well.

  83. I am torn on this, on one hand I don’t want changes that will hobble optimization opportunities but after reading N4228:

    http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4228.pdf

    I can also see how unspecified behavior can be hard to catch even for experienced developers. Understanding how the second example in the paper breaks took a surprising amount of work and language lawyering and ended up being one of the larger SO answers I ever wrote and that does not seem desirable state either.

  84. While we’re on the subject, I vote for int(4.2e97) to be unspecified, rather than undefined.

  85. Instead of: “Unspecified (one of the first three options, but could be different on different compilers)”
    I would prefer: “Unspecified (one of the first three options, but could be different on different compilers, or even different runs of the same compiler)”.

    This gives compilers a lot of leeway to choose the ordering that makes sense given the surrounding context. If you choose the ‘unspecified’ option as written, then the compiler probably has to be consistent from run to run, removing some flexibility.

    I don’t like the behavior to be unspecified. That allows the compiler to do evil things, like optimizing out the ‘cout’ call. GCC has done things I would consider ‘user hostile’, by optimizing out integer overflow checks that relied on common, but technically undefined behavior. I would rather not give that degree of freedom to the compiler.

  86. The general question is whether to evaluate the destination before or after evaluating the source value being assigned, e.g. v[ f() ] = x() + y() - z(); etc. Do you first evaluate f() or first evaluate x() + y() + z()?

    I inclined toward evaluating the *destination first*, and then the source afteward. When done in this order, then one could define a reference to the destination before doing the calculations and this revision would not change the outcome. It would remain equivalent.

    Having the option of defining a reference to the destination (without changing the outcome) then also allows breaking the operations into multiple distinct steps, eg.

    int & foo = ...one time determination of destination including i++ or f() or whatever...
    foo = x();
    ...
    foo += y();
    ...
    foo -= z();
    

    By performing the calculation in steps, that also allows interrogating the either the partial result along the way or considering other conditions. Perhaps some parts of the source calculation would be optional.

    If one is going to break up the calculation of the source value, one might certainly prefer to precalculate the destination once and once only and save that result as a reference that can be used repeatedly without recalculating or changing the destination location.

    For consistency and predictability, that implies that the destination location should be determined *before* calculating the source value(s) that are assigned.

  87. I really hate the idea of making that code legal. The best answer would certainly be “ill-formed”, which is almost certainly halting-complete in the general case, which is why I am a firm believer in keeping this undefined. Basically most of the time the status-quo is almost perfect with just one place, where I would still like to have something else:

    template<typename... Args> void ignore_arguments(const Args&...){}
    
    template<typename... Args>
    void fun(const Args&... args) {
        // other_fun(args)...; // please add support for this
        ignore_arguments(other_fun(args)...); // there should be some way to do basically this with defined order of evaluation (recursion really makes for ugly traces)
    }
    
  88. OK, with the spurious v[2] out of the picture:

    It seems to me there are 3 possible ways the assignment may resolve:

    v[0] = 0;     // both post-increments performed after full expression
    v[1] = 0;     // RHS's post increment performed before evaluation of LHS
    v[0]  = 1;    // LHS's post increment performed before evaluation of RHS
    

    Thus, the only two possible outputs would be 00 and 10. i don’t see how 01 could ever happen.

    So I’d pick “Unspecified, could be one of the first two options”, which isn’t one of your choices.

  89. Vector should have 0 at index 0 and 1. Accessing index 2 throws exeption out of bounds.

  90. > Unless there’s a compelling use case for it to be legal, why not just make it a compiler error?

    It might be non-trivial to detect such cases. Then we would have to make it Ill-formed with no diagnostic required, which is essentially equivalent to UB.

  91. As much as I appreciate the status quo of C++ – the language of choice for “pay what you use” and “you know what you are doing”, I don’t see why many people say, in all seriousness, they think that undefined is better than unspecified.

    Undefined, as in “nasal demons” is /bad/ for C++. It’s /great/ for compilers. In the end, it’s premature optimization all over (remember the “make it correct, then make it fast” mantra?). Only with UB they can’t be blamed: it’s the compiler doing it, which makes it fine, of course.

    As we make it unspecified, compilers still get loads of leeway – to get optimal performance on each target platform, whilst removing the unwelcome demons.

    Simple deal. Perhaps c++ could use a `unsafe { …. }` construct in which compilers get the full roam (or a `checked { … }` like thing for the inverse. Mmm.

  92. Is the v[2] included to emphasize that this is not meant to be judged according to C++’s current rules?

  93. Ehh, you are indexing element 2, but vector only have 2 elements (index 0 and 1). I hope std::vector will throw an exception and that this is well defined behavoior :-).

  94. Unless there’s a compelling use case for it to be legal, why not just make it a compiler error? It doesn’t matter what rule chosen, it’s going to confuse a significant number of programmers. If not, I guess got with “undefined” and strongly encourage compilers to smack anyone who writes code like that with a wet fish.

  95. Herb, Did you mean to include “v[2]” in the output statement? Your options only show two characters… and v[2] should DEFINITELY be UB. -leor

    At 03:47 PM 12/1/2014, Sutter’s Mill did say: >Herb Sutter posted: “Consider this program >fragment: std::vector v = { 0, 0 }; int i = >0; v[i++] = i++; std::cout << v[0] << v[1] <v[2] <print under today’s C++ rules. The third line runs afoul of t” >

Comments are closed.