Unnecessary and/or temporary objects are frequent culprits that can throw all your hard work—and your program’s performance—right out the window. How can you spot them and avoid them?
Problem
JG Question
1. What is a temporary object?
Guru Question
2. You are doing a code review. A programmer has written the following function, which uses unnecessary temporary or extra objects in at least three places. How many can you identify, and how should the programmer fix them?
string find_addr( list<employee> emps, string name ) {
for( auto i = begin(emps); i != end(emps); i++ ) {
if( *i == name ) {
return i->addr;
}
}
return "";
}
Do not change the operational semantics of this function, even though they could be improved.
Solution
1. What is a temporary object?
Informally, a temporary object is an unnamed object that you can’t take the address of. A temporary is often created as an intermediate value during the evaluation of an expression, such as an object created by returning a value from a function, performing an implicit conversion, or throwing an exception. We usually call a temporary object an “rvalue,” so named because it can appear on the “r”ight hand side of an assignment. Here are some simple examples:
widget f(); // f returns a temporary widget object
auto a = 0, b = 1;
auto c = a + b; // "a+b" creates a temporary int object
In contrast, in the same code we have objects like a and c that do each have a name and a memory address. Such an object is usually called an “lvalue,” because it can appear on the “l”eft hand side of an assignment.
That’s a simplification of the truth, but it’s generally all you need to know. More precisely, C++ now has five categories of values, but distinguishing them is primarily useful for writing down the language specification, and you can mostly ignore them and just think about “rvalues” for temporary objects without names and whose addresses can’t be taken, and “lvalues” for non-temporary objects that have names and whose addresses can be taken.
2. How many unnecessary temporary objects can you identify, and how should the programmer fix them?
Believe it or not, this short function harbors three obvious cases of unnecessary temporaries or extra copies of objects, two subtler ones, and three red herrings.
The parameters are passed by value.
The most obvious extra copies are buried in the function signature itself:
string find_addr( list<employee> emps, string name )
The parameters should be passed by const&—that is, const list<employee>& and const string&, respectively—instead of by value. Pass-by-value forces the compiler to make complete copy of both objects, which can be expensive and, here, is completely unnecessary.
Guideline: Prefer passing a read-only parameter by const& if you are only going to read from it (not make a copy of it).
Pedantic note: Yes, with pass-by-value, if the caller passed a temporary list or string argument then it could be moved from rather than copied. But I’m deliberately saying “forces the compiler to make a complete copy” here because no caller is realistically going to be passing a temporary list to find_addr, except by mistake.
Non-issue: Initializing with “=”.
Next we come to the first red herring, in the for loop’s initialization:
for( auto i = begin(emps); /*...*/ )
You might be tempted to say that this code should prefer to be spelled auto i(begin(emps)) rather than auto i = begin(emps), on the grounds that the = syntax incurs an extra temporary object, even if it might be optimized away. After all, as we saw in GotW #1, usually that extra = means the two-step “convert to a temporary then copy/move” of copy-initialization—but recall that doesn’t apply when using auto like this. Why?
Remember that auto always deduces the exact type of the initializer expression, minus top-level const and & which don’t matter for conversions, and so… presto! there cannot be any need for a conversion and we directly construct i.
So there is no difference between auto i(begin(emps)) and auto i = begin(emps). Which syntax you choose is up to you, but it depends only on taste, not on temporaries or any other performance or semantic difference.
Guideline: Prefer declaring variables using auto. Among other reasons to do so, it naturally guarantees zero extra temporaries due to implicit conversions.
The end of the range is recalculated on each loop iteration.
Another potential avoidable temporary occurs in the for loop’s termination condition:
for( /*...*/ ; i != end(emps); /*...*/ )
For most containers, including list, calling end() returns a temporary object that must be constructed and destroyed, even though the value will not change.
Normally when a value will not change, instead of recomputing it (and reconstructing and redestroying it) on every loop iteration, we would want to compute the value only once, store it in a local object, and reuse it.
Guideline: Prefer precomputing values that won’t change, instead of recreating objects unnecessarily.
However, a caution is in order: In practice, for simple inline functions like list<T>::end() in particular used in a loop, compilers routinely notice their values won’t change and hoist them out of the loop for you without you having to do it yourself. So I actually don’t recommend any change to hoist the end calculation here, because that would make the code slightly more complex and the definition of premature optimization is making the code more complex in the name of efficiency without data that it’s actually needed. Clarity comes first:
Definition: Premature optimization is when you make code more complex in the name of efficiency without data that it’s actually needed.
Guideline: Write for clarity and correctness first. Don’t optimize prematurely, before you have profiler data proving the optimization is needed, especially in the case of calls to simple inline calls to short functions that compilers normally can handle for you.
The iterator increment uses postincrement.
Next, consider the way we increment i in the for loop:
for( /*...*/ ; i++ )
This temporary is more subtle, but it’s easy to understand once you remember how preincrement and postincrement differ. Postincrement is usually less efficient than preincrement because it has to remember and return its original value.
Postincrement for a class T should normally be implemented using the canonical form as follows:
T T::operator++(int)() {
auto old = *this; // remember our original value
++*this; // always implement postincr in terms of preincr
return old; // return our original value
}
Now it’s easy to see why postincrement is less efficient than preincrement: Postincrement has to do all the same work as preincrement, but in addition it also has to construct and return another object containing the original value.
Guideline: For consistency, always implement postincrement in terms of preincrement, otherwise your users will get surprising (and often unpleasant) results.
In the problem’s code, the original value is never used, and so there’s no reason to use postincrement. Preincrement should be used instead. Although the difference is unlikely to matter for a built-in type or a simple iterator type, where the compiler can often optimize away the extra unneeded work for you, it’s still a good habit not to ask for more than you need.
Guideline: Prefer preincrement. Only use postincrement if you’re going to use the original value.
“But wait, you’re being inconsistent!” I can just hear someone saying. “That’s premature optimization. You said that compilers can hoist the end() call out of the loop, and it’s just as easy for a compiler to optimize away this postincrement temporary.”
That’s true, but it doesn’t imply premature optimization. Preferring ++i does not mean writing more complex code in the name of performance before you can prove it’s needed—++i is not more complex than i++, so it’s not as if you need performance data to justify using it! Rather, preferring ++i is avoiding premature pessimization, which means avoiding writing equivalently complex code that needlessly asks for extra work that it’s just going to ignore anyway.
Definition: Premature pessimization is when you write code that is slower than it needs to be, usually by asking for unnecessary extra work, when equivalently complex code would be faster and should just naturally flow out of your fingers.
The comparison might use an implicit conversion.
Next, we come to this:
if( *i == name )
The employee class isn’t shown in the problem, but we can deduce a few things about it. For this code to work, employee likely must either have a conversion to string or a conversion constructor taking a string. Both cases create a temporary object, invoking either operator== for strings or operator== for employees. (Only if there does happen to be an operator== that takes one of each, or employee has a conversion to a reference, that is, string&, is a temporary not needed.)
Guideline: Watch out for hidden temporaries created by implicit conversions. One good way to avoid this is to make constructors and conversion operators explicit by default unless implicit conversions are really desirable.
Probably a non-issue: return “”.
return "";
Here we unavoidably create a temporary (unless we change the return type, but we shouldn’t; see below), but the question is: Is there a better way?
As written, return “”; calls the string constructor that takes a const char*, and if the string implementation you’re using either (a) is smart enough to check for the case where it’s being passed an empty string, or (b) uses the small string optimization (SSO) that stores strings up to a certain size directly within the string object instead of on the heap, no heap allocation will happen.
Indeed, every string implementation I checked is smart enough not to perform an allocation here, which is maximally efficient for string, and so in practice there’s nothing to optimize. But what alternatives do we have? Let’s consider two.
First, you might consider re-spelling this as return “”s; which is new in C++14. That essentially relies on the same implementation smarts to check for empty strings or to use SSO, just in a different function—the literal operator””.
Second, you might consider re-spelling this as return { };. On implementations that are both non-smart and non-SSO, this might have a slight advantage over the others because it invokes the default constructor, and so even the most naïve implementation is likely not to do an allocation since clearly no value is needed.
In summary, there’s no difference in practice among returning “”, “”s, or { }; use whichever you prefer for stylistic reasons. If your string implementation is either smart or uses SSO, which covers all implementations I know of, there’s exactly zero allocation difference.
Note: SSO is a wonderful optimization for avoiding allocation overhead and contention, and every modern string ought to use it. If your string implementation doesn’t use SSO (as of this writing, I’m looking at you, libstdc++), write to your standard library implementer—it really should.
Non-issue: Multiple returns.
return i->addr;
return "";
This was a second subtle red herring, designed to lure in errant disciples of the “single-entry/single-exit” (SE/SE) persuasion.
I In the past, I’ve heard some people argue that it’s better to declare a local string object to hold the return value and have a single return statement that returns that string, such as writing string ret; … ret = i->addr; break; … return ret;. The idea, they say, is that this will assist the optimizer perform the ‘named return value optimization.’
The truth is that whether single-return will improve or degrade performance can depend greatly on your actual code and compiler. In this case, the problem is that creating a single local string object and then assigning it would mean calling string’s default constructor and then possibly its assignment operator, instead of just a single constructor as in our original code. “But,” you ask, “how expensive could a plain old string default constructor be?” Well, here’s how the “two-return” version performed on one popular compiler last time I tried it:
- with optimizations disabled: two-return 5% faster than a “return value” string object
- with aggressive optimizations: two-return 40% faster than a “return value” string object
Note what this means: Not only did the single-return version generate slower code on this particular compiler on this particular day, but the slowdown was greater with optimizations turned on. In other words, a single-return version didn’t assist optimization, but actively interfered with it by making the code more complex.
In general, note that SE/SE is an obsolete idea and has always been wrong. “Single entry,” or the idea that functions should always be entered in one place (at their start) and not with goto jumps from the caller’s code directly to random places inside the function body, was and is an immensely valuable advance in computer science. It’s what made libraries possible, because it meant you could package up a function and reuse it and the function would always know its starting state, where it begins, regardless of the calling code. “Single exit,” on the other hand, got unfairly popular on the basis of optimization (‘if there’s a single return the compiler can perform return value optimization better’—see counterexample above) and symmetry (‘if single entry is good, single exit must be good too’) but that is wrong because the reasons don’t hold in reverse—allowing a caller to jump in is bad because it’s not under the function’s control, but allowing the function itself to return early when it knows it’s done is perfectly fine and fully under the function’s control. To put the final nail in the coffin, note that “single exit” has always been a fiction in any language that has exceptions, because you can get an early exceptional return from any point where you call something that could throw an exception.
Non-issue: Return by value.
Which brings us to the third red herring:
string find_addr( /*...*/ )
Because C++ naturally enables move semantics for returned values like this string object, there’s usually little to be gained by trying to avoid the temporary when you return by value. For example, if the caller writes auto address = find_addr( mylist, “Marvin the Robot” );, there will be at most a cheap move (not a deep copy) of the returned temporary into address, and compilers are allowed to optimize away even that cheap move and construct the result into address directly.
But what if you did feel tempted to try to avoid a temporary in all return cases by returning a string& instead of string? Here’s one way you might try doing it that avoids the pitfall of returning a dangling reference to a local or temporary object:
const string& find_addr( /* ... */ ) {
for( /* ... */ ) {
if( /* found */ ) {
return i->addr;
}
}
static const string empty;
return empty;
}
To demonstrate why this is brittle, here’s an extra question:
For the above function, write the documentation for how long the returned reference is valid.
Go ahead, we’ll wait.
Done? Okay, let’s consider: If the object is found, we are returning a reference to a string inside an employee object inside the list, and so the reference itself is only valid for the lifetime of said employee object inside the list. So we might try something like this (assuming an empty address is not valid for any employee):
“If the returned string is nonempty, then the reference is valid until the next time you modify the employee object for which this is the address, including if you remove that employee from the list.”
Those are very brittle semantics, not least because the first (but far from only) problem that immediately arises is that the caller has no idea which employee that is—not only doesn’t he have a pointer or reference to the right employee object, but he may not even be able to easily figure out which one it is if two employees could have the same address. Second, calling code can be notoriously forgetful and careless about the lifetimes of the returned reference, as in the following code which compiles just fine:
auto& a = find_addr( emps, "John Doe" ); // yay, avoided temporary!
emps.clear();
cout << a; // oops
When the calling code does something like this and uses a reference beyond its lifetime, the bug will typically be intermittent and very difficult to diagnose. Indeed, one of the most common mistakes programmers make with the standard library is to use iterators after they are no longer valid, which is pretty much the same thing as using a reference beyond its lifetime; see GotW #18 for details about the accidental use of invalid iterators.
Summary
There are some other optimization opportunities. Ignoring these for now, here is one possible corrected version of find_addr which fixes the unnecessary temporaries. To avoid a possible conversion in the employee/string comparison, we’ll assume there’s something like an employee::name() function and that .name() == name has equivalent semantics.
Note another reason to prefer declaring local variables with auto: Because the list<employee> parameter is now const, calling begin and end return a different type—not iterators but const_iterators—but auto naturally deduces the right thing so you don’t have to remember to make that change in your code.
string find_addr( const list<employee>& emps, const string& name ) {
for( auto i = begin(emps); i != end(emps); ++i ) {
if( i->name() == name ) {
return i->addr;
}
}
return "";
}
Acknowledgments
Thanks in particular to the following for their feedback to improve this article: “litb1,” Daan Nusman, “Adrian,” Michael Marcin, Ville Voutilainen, Rick Yorgason, “kkoehne,” and Olaf van der Spek.