In the last post, we discussed the cost of using STL vector to module size. Now let’s take a look at how STL vector manages its data. Dia2Dump (Source code available in Microsoft Visual Studio 8\Dia SDK\Samples\Dia2Dump directory) shows the following internal structure of vector<short:>
Class vector<short> derives from _Vector_val<short>, which in turn derives from _Container_base. _Vector_val has one member variable, _Alval, which is of the type allocator<short>. Also allocator<short> does not have any member variable; its still occupies 1-byte space. Due to alignment, the cost is rounded up to 4-bytes. Class vector<short> itself has three member variables, all pointers to short. So the total size of a vector<short> is 16-bytes for 32-bit machines, in release build, and 32-bytes for 64-bit machines. The first member variable _Alval is never been actually accessed in release build binary code, not initialized, not referenced. For 32-bit machine, 4 extra bytes are added for debug build.
Class vector<T> manages its dynamic storage using three pointers, _Myfirst points to the first allocated slot, _Mylast points to the next free slot, and _Myend points after the last allocated slot. All three are initialized to null pointers. _Myfirst and _Myend change when the vector’s dynamic buffer is allocated or reallocated, their difference is the capacity of the vector. _Mylast changed when new elements are added to the vector. When it reaches _Myend, the vector is full. The difference between _Mylast and _Myfirst is the number of elements in the vector.
When not enough space is available in vector to add new elements, allocator<short> is used to allocate new space. Older elements are moved to the new space, empty space is filled with blank, new elements are injected into place, and then finally the old storage space is freed. To avoid large number of reallocations, STL vector tries to grow by 50% each time.
The following piece of code tests how STL vector grows when push_back is called repetitively:
The code above measures the size of vector<int> itself and how it manages its dynamic storage. It prints out one line after each time a re-allocation occurs (vector<int>::capacity() changes). Here is the output:
From the output, we can see STL vector grows by 50%: 9 + 9/2 = 13, 13 + 13/2 = 19, 19 + 19/2 = 28, … So on average, 25% more space than what’s needed could be not utilitied; while in worst case is 50%. Each time when STL vector needs to grow, it allocates the new space first before freeing the old space; realloc is not called. So the older storage is not reused even when realloc is possible. This can generate lots of heap fragmentation. To avoid these problems, application should call vector::resize or vector::reserve to change how dynamic storage is managed. If push_back is used to grow a vector one element at a time, there will be log(N)/log(1.5) times of memory re-allocation.
Here is a summary of STL vector’s data size cost:
PS: The actual command to use Dia2Dump to dump type information is (notice there needs to be a space between two ‘>’s):