Desiging Datastructures for Longevity

One of the more problematic areas of long-lived software is the versioning and updating of shared structures. As improvements come to a package, it's inevitable that you'll have to add more fields to a structure. Your structure today may contain two pointers, but tomorrow needs to contain three. If you ship all the bits at the same time, then there's never any problem - everything just gets recompiled and the right sizes are reserved on various stacks, and accesses to those fields don't accidentally overlap existing slots. If you don't ship all at the same time, or you persist data structures out to disk, you're eventually going to run into a problem. Here's an example:

struct Foo {
    void *pvSomething;
    void *pvSomethingElse;
}
void Bar(const Foo *pcFoo);
void Zot() {
    Foo f = { NULL, NULL };
    Bar(&f);
}

In this v0 of the structure layout, the compiler reseved two pointer-sized things on the stack, and initialized them both to zero. Let's further say that Bar and Zot live in different modules. Now, assume that something comes along and you decided to add two more pointer-sized things to your structure. After rebuilding the module containing Bar, you try testing and notice "random" access violations when reading that new field. What's wrong? With a little debugging of Bar, you notice that the new field is a random value when coming from Zot .. You've just been bitten by structure versioning!

The problem here is that since you didn't recompile Zot with the new structure header, the compiler's baked-in two-pointers structure is smaller than the expected three-pointers structure in the new Bar. That third pointer, when read from Bar, points to the next local past "f" in Zot, often times the return address. How do you solve this problem? What can be done to avoid it in the future?

First off, it's important to recognize that this problem happens with any layout-dependent object, from classes to structures to vtables. If you change the layout of an object in a way that changes its behavior for existing producers/consumers, you must provide a versioning mechanism. In the world of vtables, the typical answer is that new functions are added to the end of a vtable. Existing callers continue to call vtable[x], new callers call vtable[y], and as long as x < y all is good. Structures, on the other hand, don't do as well here. Often, structures are embedded in persisted storage, in an array, or other such places that are hard to version. The case of being passed on the stack (or from heap!) is a simplification of this problem.

Solution 1 - Add a Version field

This is the simplest, but possibly least useful answer over time. In each structure, have the first field be called Version, or something similar. In your shared headers, create #define values starting at 1 and moving forward as you revision the structure. Generators of structure instances specify the version number that they are generating, and consumers of the structure change how they interpret the structure based on the version found.

As an upside, as long as this version field is consistently maintained, you may completely shift the layout of the structure over time - remove some fields, add some fields, retype some, whatever strikes your fancy. The version number makes sure that it's correctly interpreted by the callees of your functions.

The downside to this approach is that sometime, somewhere, you're going to forget to update the version number. Additionally, you'll have to maintain all those old structure definitions you created over time so that consumers can know what "version 3" really looked like. This approach is attactive because of its simplicity, but the fact that it requires manual intervention to stay correct suggests that there's more to be done.

Solution 2 - Add a Size field

Aside from explicit versioning of the structure, you can definitely get a hand in this problem from the compiler itself. To do this, add a field called Sizeto the top of all structures whose size may shift over time. Initializations of these structures must also change somewhat, as they must also specify the size. You can do this as such:

struct Foo {
    unsigned long Size;
    void *pvSomething;
    void *pvSomethingElse;
};
void Zot() {
    Foo f = { sizeof(f) };
}

This has the lovely side-effect of zero-initializing the remaining members of the Foo instance. The callee changes as such, using the included handy macros:

#define FIELD_OFFSET(TType, Field) ((size_t)(&(((TType*)NULL)->Field)))
#define FIELD_SIZE(TType, Field) (sizeof(((TType*)NULL)->Field))
#define SIZE_THROUGH_FIELD(TType, Field) (FIELD_OFFSET(TType, Field) + FIELD_SIZE(TType, Field))
void Bar(Foo *pf) {
     if (pf->Size >= SIZE_THROUGH_FIELD(Foo, pvSomething))
         /* Use pf->pvSomething */
    if (pf->Size >= SIZE_THROUGH_FIELD(Foo, pvSomethingElse))
        /* Use pf->pvSomethingElse */
}

So when the Foo structure gets another two pointers, the "old" code in Zot continues to set the size to what it knew about (which on 32-bit machines would have been 12), and the "new" code in Bar (which contains extra uses of SIZE_THROUGH_FIELD) does the Right Thing when it detects that the structure passed is only 12 bytes long rather than the 16 or 20 bytes that it might need. In this manner, Bar is able to adjust its behavior when it finds old clients, and even newer clients could behave better if they only have to set certain members.

Solution 3 - Use a Flags field

As a hybrid of #1 and #2, adding a field containing flags is another good answer, but it doesn't directly answer the problem of size and layout changes over time. As you add members to a structure, add a #define for the member's validity. When that member is set and available, or it into the Flags member. Consumers of the structure must look at the Flags member before inspecting other members of the structure. As "old" generators won't know about new flags, and don't set those new fields, you're guaranteed that new consumers of old structures won't behave poorly. Here's an example:

#define FOO_FLAGS_PVSOMETHING_VALID (0x00000001)
#define FOO_FLAGS_PVSOMETHINGELSE_VALID (0x00000002)
struct Foo {
    unsigned long Flags;
    void *pvSomething;
    void *pvSomethingElse;
};
void Zot() {
    Foo f = { 0 };
    f.pvSomething = /* ? */
    f.Flags |= FOO_FLAGS_PVSOMETHING_VALID;
    Bar(&f);
} void Bar(const Foo *pf) {
    if (pf->Flags & FOO_FLAGS_PVSOMETHING_VALID)
         /* Consume pf->pvSomething */
    if (pf->Flags & FOO_FLAGS_PVSOMETHINGELSE_VALID)
         /* Consume pf->pvSomethingElse */
}

When you add another field to Foo, you also create #define FOO_FLAGS_ANOTHERFIELD_VALID (0x00000004). Bar changes to check that flag before using the AnotherField member. Even though Zot didn't know about AnotherField, since it didn't set that new flag, there's no harm done when an old-sized structure is passed to a new-sized-reading function. The downside with this approach is that you can typically only have 32 fields. And, if you steal flags values for other purposes (changing the meaning of a member when a second flag is set), you've got even less. This method says nothing about the actual layout of the data structure, and requires that new fields are also always added at the end.

Conclusions

Versioning structures is hard, but you must do it if you plan to service independently two components that share an object (structure or otherwise). The easy answer of setting a version member in the structure makes this very easy from the client side, but makes it much harder from the consumer side where each structure definition must be kept around. Adding a "valid through" size field makes the code obviously tied to the size of the structure, but requires some help from the compiler and some macro magic on the consumer's side to interpet the results. Flags indicating which fields are set and valid is probably the easiest, but pushes that "what is valid" work onto the caller - they have to remember to set the correct bit when they want the callee to interpet a field.

What's the optimal solution? Take both #2 and #3 together. Have the first field be the valid size in bytes, because you get that for free from the compiler; saying Foo f = { sizeof(f) }; will just be natural after a while. The second field contains flags, indicating which of the possibly-valid fields should be interpreted. Not all data types have a handy "not valid" value like NULL for pointers. How do you indicate that an int is invalid and should not be read? If you use the Flags approach, you can know this - if the flag for the field isn't set, then don't read it.

Future-proofing your data structures should be done as part of the initial design. Once the first set of compiled bits has walked out the door, it's too late to introduce these mechanisms. Design with servicability in mind and you won't have to worry about structure-layout skew.