Cringe not: Vectors are guaranteed to be contiguous

Andy Koenig is the expert’s expert, and I rarely disagree with him. And, well, when I do disagree I’m invariably wrong… but there’s a first time for everything, so I’ll take my chances one more time.

I completely agree with the overall sentiment of Andy’s blog entry today:

I spend a fair amount of time reading (and sometimes responding to) questions in the C++ newsgroups. Every once in a while, someone asks a question that makes me cringe.

What makes a question cringe-worthy?

Usually it is a question that implies that the person asking it is trying to do something inappropriate.

Asking how to violate programming-language abstractions is similar: If you have to ask, you probably shouldn’t be doing it.

Amen! Bravo! Absolutely correct. Great stuff. Except that the example in question isn’t violating an abstraction:

For example, I just saw one such question: Are the elements of a std::vector contiguous? Here is why that question made me cringe.

Every C++ container is part of an abstraction that includes several companion iterators. The normal way of accessing a container’s elements is through such an iterator.

I can think of only one reason why one should care whether the elements of a vector are in continuous memory, and that is if you intend to use pointers, rather than iterators, to access those elements. Doing so, of course, violates the abstraction.

There is nothing wrong per se with violating abstractions: As Robert Dewar told me more years ago than I care to remember, some programs are poorly designed on purpose. However, there is something wrong with violating abstractions when you know so little of the data structures used to implement those abstractions that you have to ask strangers on Usenet about your proposed violation. To put it more bluntly: If you have to ask whether vector elements are contiguous, you probably should not be trying to make use of that knowledge.

The reason this analysis isn’t quite fair is that contiguity is in fact part of the vector abstraction. It’s so important, in fact, that when it was discovered that the C++98 standard didn’t completely guarantee contiguity, the C++03 standard was amended to explicitly add the guarantee.

Why is it so important that vectors be contiguous? Because that’s what you need to guarantee that a vector is layout-compatible with a C array, and therefore we have no reason not to use vector as a superior and type-safe alternative to arrays even when we need to exchange data with C code. So vector is our gateway to other languages and most operating systems features, whose lingua franca is the venerable C array.

And it’s not just vector: The TR1 and C++0x std::array, which implements fixed-size arrays, is also guaranteed to be contiguous for the same reasons. (std::array is available in Boost and, ahem, the VC++ TR1 implementation we shipped today.)

So why do people continually ask whether the elements of a std::vector (or std::array) are stored contiguously? The most likely reason is that they want to know if they can cough up pointers to the internals to share the data, either to read or to write, with other code that deals in C arrays. That’s a valid use, and one important enough to guarantee in the standard.

24 thoughts on “Cringe not: Vectors are guaranteed to be contiguous

  1. if i wanted to keep a vector my_matrix, using my_matrix.push_back((T *)malloc(sizeof(T)*1024), and index the matrix as a 1-dimensional vector from my_matrix[0], would it be contiguous? see here: https://gist.github.com/1172090 – it doesn’t work, though this is what I had in mind when reading vectors are guaranteed contiguous. i see now that &my_matrix[0] is what is contiguous, so i could not grow a vector using more vectors, but only single values.

  2. Storing single bits but having multiple GB’s of RAM is nonsense in most cases; be aware that those containers have to perform bit manipulations for representating each bit as bool (byte). I would use this container type on machines which have insufficient RAM. Machines like that have most likely also a low clocked processor. And if you have a high access rate (in a loop, etc.), then those containers will produce a proportionally high overhead…

    Enough RAM => don’t use it (use something like ‘vector’ instead)

  3. OpenGL example:
    Mmh, what about having a vector (or an other container if needed) for managing a bunch of display list ID’s (GLuints), in combination with .reserve() and then call all of them in one shot with glCallLists(vector.size() * sizeof(GLuint),GL_UNSIGNED_INT,(const GLvoid *) vectorStartPointer) ? IMHO, sth like that could make sense…

  4. All of the STL containers have their abstractions and their implementations. The abstraction is conveyed through the iterator – all containers support iteration for accessing, inserting, deleting, etc. Any function that uses iterators should, in theory, generate the same result on all containers, although performance may vary.

    The containers also have their concrete implementations. Set will only hold unique instances. Vector is contiguous. List supports quick insertion and removal along its length.

    Then, there is the grey area, where implementation and abstraction blend. The fact is, if we could embrace only the abstraction of containers, we would only have one: std::container.

    It is very rare that any code is completely abstract when it comes to stl containers – and I would posit that it is entirely possible that code that fully embraces abstraction can miss the opportunity to use the strength of any given container to address a problem domain.

    That being said, we must all guard against trying to be too clever.

  5. Having to deal with a C API, what would be the recommended way NOT to “break” the vector abstraction?
    Maybe something like the following:

    template T* c_array_from(const std::vector& V){
    T* pNew = new T[v.size()];
    std::copy(v.begin(), v.end(), pNew);
    return pNew;
    }

    and leave to the caller the duty to delete[] the pointer?

    ps: btw Herb this is my first post to your blog. Thanks for all your work on C++!

  6. Contiguous doesn’t guarantee that it isn’t packed. std::vector&ltbool&gt packs the bits such that (&vec[0])[1] is not equivalent to vec[1]. In the implementation I’m using it allocates in blocks of 32 and stores one bool per bit.

  7. @Gianni, its not the compiler which would mostly guarantee memory contiguity for vectors/arrays but the design of the data structure itself. Although, one could quote pathological cases using _alloca which may change with each compiler version.

    But I doubt if such is the case with vectors. Besides, you could always have a peek inside .

  8. Hi Herb,
    great post!

    Does STL implementation in Visual C++ 2005 guarantee std::vector contiguity?
    Do you think that could be change in next releases of Microsoft C++ compilers?

    Thank you!

  9. Sorry but I must disagree with AkiRoss comment

    First: C++ vectors are not slow. See for example the Stepanov benchmark code that compare the performance of raw C arrays with vectors and others STL containers for various operations. The main difference of speed between C-arrays and vectors is due to initialising all elements of a vector while C-array are just raw reserved memory. Access (both read and write) is about the same for vector as for C-array. Peoples should test it before claiming something is too slow.

    Second: one might want performance but also safety and be trying to choose a sensible compromise between the two. C-arrays might be fast but they are unsafe and in fact, they become wasteful when one doesn’t know in advance how much data is going to be stored in it. (The old: “Oh well, let’s make it big enough for the biggest amount we think we’ll ever need”)

    Third: one might be writing a C++ application but need to interface with some C functions. The contiguous buffer property allow this to be done simply and efficiently.

    Lastly, it is not “Does this particular library implement std::vector this way?”. It is the C++ standard. If a vector is not contiguous, it is not a std::vector. One can use &v[0] safely one all C++ standard std::vectors.

  10. I have-to add, memory layout is also fairly critical if you’re wanting to write code that is cache sensitive. The continuous memory layout of std::vector means it can be used in these circumstances, the non-contiguous memory of other collections means they aren’t.

  11. I don’t understand: C++ introduces abstraction, good. But why use abstraction when you obviously don’t need to?
    I mean, if you need performances, just use plain old C arrays! They are as fast as light, why bothering with slow abstraction and C++ library?
    If you need a std::vector, because you don’t care about performances, but you’ve to exchange data with a C program, still this is the wrong question to ask.
    To me, Andy Koenig has absolutely right: abstraction is needed in some circumstances and if you need to break it, just don’t use it, because you’re doing it wrong.
    To be more clear: if you want to know about contiguity of data in sdl::vector, there is something wrong (apart being just curious, of course), because you want to break the abstraction.
    If you need to break it, you can’t go and ask implementation details, because this is abstraction no more! If you care about implementation details because you need to handle data at low level, you need to ask specific questions, like: “this particular library does implement a std::vector in this way?” or “may i use it for exchanging data using pointers?”. Or just create your own class for partially-hide data.

    My two cents.
    Nice post, anyway :)

  12. Herb,

    I partly agree with you. Do you consider it as a good code if we use pointers instead of iterator to iterate through the vector in C++? Why you don’t think it breaks the abstraction if we use pointers to iterate through a vector in C++? There is a vector class and it have abstraction to iterate through each item. Hence if we are using a pointer to iterate through the vector we don’t use the exposed iterator and thereby breaks the abstraction. And you mentioned it is “OK” to use pointers to traverse through vector when we share the vector to a C program and it is purely “C”. But I guess whenever we use pointers to iterate through the vector in “C++” it becomes a breaking of abstraction because there is another effective alternate.

    But the question “Are the elements of a std::vector contiguous?” may not be a cringing one if the question is being explained in – C compatibility point of view/performance point of view.

    Thank you,
    Amal

  13. You didn’t list another (to me) obvious reason for wanting to know if a vector is contiguous in memory – performance.

    Now we all know that premature optimisation is bad, but if the developer was trying to investigate peformance improvements over his existing code that uses maybe set or list or somesuch then the question is quite a reasonable one.

    Only being able to think of one reason someone may care if vectors are contiguous in memory suggests not much thought was applied at all.

  14. Wow, it cut everything between < and > chars. Hope that this will go right:

    template <class T>
    T* c_array_from( std::vector<T>& vec )
    {
    return &vec[0];
    }

    Anyway, if it don’t, I think that everyone has got idea.

  15. Martin: how about this:

    template
    T* c_array_from( std::vector& vec )
    {
    return &vec[0];
    }

    Usage: some_c_function( c_array_from( vec_of_ints ) );

    Anyway, I still like your idea more, for two reasons:

    1. Member function looks better to me here :)
    2. Symmetry with basic_string’s c_str().

  16. One problem I see with using std::vector as an array gateway is that to access the C-array we have to write “&v[0]” which might leave someone reading the code puzzling as to what’s going on … I really wish they’d added a vector::c_array() function …

  17. “What is with std::string?”

    Short answer: In practice, on every implementation I know, std::string’s contents are contiguous and null-terminated. In draft C++0x, std::string contents are required to be contiguous and there is a proposal to also require them to be null-terminated.

    Longer answer follows:

    There are three things to consider: 1. What the C++09/03 standard and the current draft of C++0x guarantee. 2. What implementations actually do. 3. What draft C++0x might additionally guarantee that’s still under discussion.

    1. In current ISO C++ (C++98 and C++03), std::string is not fully required to store its data contiguously. You’re supposed to use str.c_str() to get a pointer to a contiguous and null-terminated string (albeit a read-only string). However, current ISO C++ does require &str[0] to cough up a pointer to contiguous string data (but not necessarily null-terminated!), so there wasn’t much leeway for implementers to have non-contiguous strings, anyway. For C++0x we have already adopted the guarantee that std::string contents must indeed be stored contiguously. For details, see http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530 .

    2. Every std::string implementation I know of is in fact contiguous and null-terminates its buffer. So, although it isn’t formally guaranteed, in practice you can probably get away with calling &str[0] to get a pointer to a contiguous and null-terminated string. (But to be safe, you should still use str.c_str().)

    3. Still-breaking news: There is an active proposal to tighten this up further in C++0x and require null-termination and possibly ban copy-on-write implementations, for concurrency-related reasons. Here’s the paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html . I think that one or both of the proposals in this paper is likely to be adopted, but we’ll see at the next meeting or two.

    (Thanks to Peter Dimov for pointing me to the wording in the draft.)

  18. Herb,

    This is an excellent point. I think as developers we are constantly struggling with the trade-off of writing elegant code that respects abstract ideals and “getting work done” in the messy world of cross language compatibility and legacy code. We should all pause a bit to empathize before “cringing” and passing judgments on our fellow developers “questionable” code.

    One problem that is hard to manage in a language like C++ is conveying to those who come after you that you are engaging in a necessary evil (violating an abstraction for a good reason) rather than being lazy or demonstrating ignorance of proper idioms. Comments help but all too often comments are ignored. I’ve seen “questionable code” propagate like a virus from places where it was justified to places where the proper idioms could have been used.

    Such is life, I guess, when you work in a language that gives you more than enough rope to hang yourself.

    -Sal

Comments are closed.