Alignment (part 2): Packing

Yesterday, I wrote a bit about how the C compiler determines the alignment of structures.  I left with an example of an MS-DOS structure that didn't follow the alignment rules.

So how do you deal with this?

The first question that comes to mind is "What's the language standard say about this behavior?".  Well, this morning, I chased down what appears to be an online version of the C standard.  In section 6.5.2.1, you find:

[#11] Each non-bit-field member  of  a  structure  or  union
       object   is  aligned  in  an  implementation-defined  manner
       appropriate to its type.

Hmm. The standard says that the structure alignment is implementation defined, that's not much of a help.

It turns out that Microsoft's C compiler defines a series of #pragma's that allow the developer to specify the structure alignment, as I implied yesterday, they're called #pragma pack.  Essentially the #pragma pack allows the caller to override the compilers default rules for structure alignment.  If you say #pragma pack(4) you're saying "if the natural alignment of this structure is greater than 4, treat it as 4 instead".  Similarly, #pragma pack(1) says "Treat all the members of this structure as having a natural alignment of 1 byte".  And #pragma pack(16) says "Treat all members as having a natural alignment of 16 - essentially it's a NOP on current architectures".  You can specify the packing for an entire source file with the -Zp compiler command line switch, but that's usually hideous overkill.  Instead, most people just use #pragma pack(n) around their structure definitions.

So lets look at the examples from yesterday and see what #pragma pack does to the structure.

First, here's struct A:

struct A{   int _FieldA1;   char _FieldA2;   short _FieldA3;   char _FieldA4;   long _FieldA5;   void *_FieldA6;};

Lets consider what happens with struct A when compiled with #pragma pack(1), #pragma pack(4) and #pragma pack(16):

Field Name Field Size Field Offset (default) Field Offset (#pragma pack(1) Field Offset (#pragma pack(4) Field Offset (#pragma pack(16)
_FieldA1 4 0 0 0 0
_FieldA2 1 4 4 4 4
_FieldA3 2 6 5 6 8
_FieldA4 1 8 7 8 12
_FieldA5 4 12 8 12 16
_FieldA6 4 (8 on 64 bit) 16 12 16 20

So if you specify #pragma pack(1), you get the tightest possible packing for data.

Now why on earth might you want to specify #pragma pack?  It turns out that for the vast majority of applications it doesn't matter.  For example, most of the Win32 API set is defined without packing (there are some exceptions like some of the messages for common controls).  And more importantly, you shouldn't ever have to specify a structure packing before including any of the Windows header files - the windows header files are supposed to be built to work regardless of any external compiler switches or flags.  This is why you see all the "#include <pshpack2.h>" etc in the header files - this ensures that the structure packing rules are locked in regardless of the -Zp switch.

But there IS one situation where this becomes important.  As I mentioned yesterday, some operating systems (like MS-DOS) have their data structures built with #pragma pack(1).  And some networking protocols (such as the CIFS protocol) are defined with packed structures.  So if you define structures to interact with those protocols, then you need to use #pragma pack(1) to ensure that your structure definition lines up with the alignment of the networked protocol.

One other thing to keep in mind. The C language contract for allocators (malloc, free, etc) is:

The  pointer  returned  if  the allocation  succeeds  is  suitably aligned so that it may be assigned to a pointer to any type of object and then used to access  such  an object  or an array of such objects in the space allocated (until the  space  is  explicitly  freed  or reallocated). 

What this means is that a C/C++ allocator cannot return an address whose address isn't at least a multiple of 8 (the largest native allocation alignment). In reality, most allocators return memory aligned on a 16 or 32 byte boundary to take advantage of cache line effects.

Edit: Corrected rules for #pragma pack(16)