Alignment (part 1)

I got an email the other day from someone (who will remain nameless) complaining about the fact that some of the NT structures had to be declared with #pragma pack:

Consider file WinBase.h from the Platform SDK, containing the following
declaration:
> typedef struct _WIN32_FIND_DATAA {
> DWORD dwFileAttributes;
> FILETIME ftCreationTime;
[...] 

By this person's logic, since the ftCreationTime field in the WIN32_FIND_DATAA structure was an 8 byte FILETIME structure, the ftCreationTime should be at offset 8 from the start of the structure, but he discovered that it was at offset 4 from the start.

And he was convinced that either there was something incorrect in the documentation for structure packing in Windows or that there had to be a hidden #pragma pack directive (which changes the default structure packing) in the Windows headers.

I realized after reading this article that many people have forgotten the lessons learned from the early days of MS-DOS.  This kind of stuff was known to every developer who worked in C and coded to MS-DOS.  You see, back in the MS-DOS days, memory was king.  So DOS packed all of its data structures as tightly as possible to save memory.

And if you attempted to make MS-DOS system calls from C, you invariably ran into packing issues.  For example, the MS-DOS get country data (which returned internationalization information) returned a structure that contained:

Offset Length Description
0x00 2 Date format
0x02 5 Currency Symbol (ASCIZ string)
0x07 2 Thousands separator (ASCIZ string)
0x09 2 Decimal separator (ASCIZ string)
0x0b 2 Date separator (ASCIZ string)
0x0d 2 Time separator (ASCIZ string)
0x0f 1 Bit Field
0x10 1 Currency places
0x11 1 Time format
0x12 4 Case-map call address (DWORD)
0x16 2 Data-list separator (ASCIZ string)
0x18 10 Reserved

If I was to represent this as a C structure, the naive representation would be:

 struct INTL_DATA{  WORD _DateFormat;  CHAR _CurrencySymbol[5];  CHAR _ThousandsSeparator[2];  CHAR _DecimalSeparator[2];  CHAR _DateSeparator[2];  CHAR _TimeSeparator[2];  BYTE _Padding;  BYTE _CurrencyPlaces;  BYTE _TimeFormat;  LPVOID _CaseMapCallAddress;  BYTE _DataListSeparator[2];  BYTE _Reserved[10];};

The problem is that this structure definition wouldn't work.

You see, the compiler has a fairly straightforward set of rules defining how structures are aligned, and this structure violates them.

The compilers rules are:  In general, data is aligned on it's "natural" boundary.

What's that mean?  "Natural" boundary?  What on earth is that thing?

Typically, the "natural" alignment of data is based on its size.  A 1 byte field can be located at any address in memory.  A 2 byte field (a short) should only appear at even addresses in memory.  A 4 byte field (a long) should only appear at multiples of 4 bytes in memory.  And an 8 byte field (a longlong) should only appear at multiples of 8 bytes in memory.

A simple example goes a long way towards explaining this:

struct A{   int _FieldA1;   char _FieldA2;   short _FieldA3;   char _FieldA4;   long _FieldA5;   void *_FieldA6;};Consider what the compiler's going to do with a variable of type struct A.  The first thing it does is to lay the structure out in "memory":

Field Index Field Name Field Size
0 _FieldA1 4
1 _FieldA2 1
2 _FieldA3 2
3 _FieldA4 1
4 _FieldA5 4
5 _FieldA6 4 (8 on 64 bit)

The next thing it does it to assign an offset for each field:

Field Index Field Name Field Size Field Offset
0 _FieldA1 4 0
1 _FieldA2 1 4
2 _FieldA3 2 6
3 _FieldA4 1 8
4 _FieldA5 4 12
5 _FieldA6 4 (8 on 64 bit) 16

This is where the "natural" alignment comes to play - The natural alignment for _FieldA3 is 2 bytes (it's a word), so it gets put at offset 6 - there's some empty space left in the structure between _FieldA2 and _FieldA3 (and between _FieldA4 and _FieldA5).  The overall "sizeof" struct A is 20 bytes (24 on 64 bit platforms).

For a given structure, the alignment of the structure is determined by the worst case field within the structure.  So in struct A's case, the alignment of the structure is either 4 bytes (on 32bit platforms) or 8 bytes (on 64bit platforms).

When the compiler lays out nested structures, it just follows its rules recursively.  Consider this structure:

struct B{   short _FieldB1;   struct A _FieldB2;   int _FieldB3;   struct A _FieldB4; }

Field Index Field Name Field Size
0 _FieldB1 2
1 _FieldB2.FieldA1 4
2 _FieldB2.FieldA2 1
3 _FieldB2.FIeldA3 2
4 _FieldB2.FieldA4 1
5 _FieldB2.FieldA5 4
6 _FieldB2.FieldA6 4 (8 on 64bit)
7 _FieldB3 4
8 _FieldB4.FieldA1 4
9 _FieldB4.FieldA2 1
10 _FieldB4.FIeldA3 2
11 _FieldB4.FieldA4 1
12 _FieldB4.FieldA5 4
13 _FieldB4.FieldA6 4 (8 on 64bit)

And again, the compiler then assigns offsets to the fields:

Field Index Field Name Field Size Field Offset
0 _FieldB1 2 0
1 _FieldB2._FieldA1 4 4+0=4 (8+0=8 on 64 bit)
2 _FieldB2._FieldA2 1 4+4=8 (8+4=12 on 64 bit)
3 _FieldB2._FIeldA3 2 4+6=10 (8+6=14 on 64 bit)
4 _FieldB2._FieldA4 1 4+8=12 (8+8=16 on 64 bit)
5 _FieldB2._FieldA5 4 4+12=16 (8+12=20 on 64 bit)
6 _FieldB2._FieldA6 4 (8 on 64bit) 4+16=20 (8+16=24 on 64 bit)
7 _FieldB3 4 24 (32 on 64 bit)
8 _FieldB4._FieldA1 4 28 (40 on 64 bit)
9 _FieldB4._FieldA2 1 28+4=32 (40+4=44 on 64 bit)
10 _FieldB4._FIeldA3 2 28+6=34 (40+6=46 on 64 bit)
11 _FieldB4._FieldA4 1 28+8=36 (40+8=48 on 64 bit)
12 _FieldB4._FieldA5 4 28+12=40 (40+12=52 on 64 bit)
13 _FieldB4._FieldA6 4 (8 on 64bit) 28+16=44 (40+16=56 on 64 bit)

Note that _FieldB2._FieldA1 has different offsets depending on whether it's 32bit or 64bit - this is because of the rule I mentioned earlier - _FieldB2 is aligned to the natural alignment of struct A, which is 4 bytes on 32 bit platforms and 8 bytes on 64 bit platforms.

It's also interesting to see what happens with a slight rearrangement of the fields in struct A:

struct A{   int _FieldA1;   char _FieldA2;   char _FieldA4;   short _FieldA3;   long _FieldA5;   void *_FieldA6;};

All I did was to move _FieldA4 up next to _FieldA2.  But this change dramatically changed what happens when the structure is laid out in memory:

Field Index Field Name Field Size Field Offset
0 _FieldA1 4 0
1 _FieldA2 1 4
3 _FieldA4 1 5
2 _FieldA3 2 6
4 _FieldA5 4 8
5 _FieldA6 4 (8 on 64 bit) 12 (16 on 64 bit)

So on 32bit platforms, by just moving one field, the structure's memory footprint shrunk by 4 bytes!  Note that the size of the data contained in the structure didn't change at all - all that was changed was the packing of the data in the structure.  Also note that the total size of the structure didn't change on 64 bit platforms - this is because _FieldA3 pushes the alignment off on _FieldA6. 

So let's go back to the original question about WIN32_FIND_DATAA.  The structure in winbase.h is:

typedef struct _WIN32_FIND_DATAA {DWORD dwFileAttributes;FILETIME ftCreationTime;   :How does this get laid out in memory?

Lets run through the exercise above.

A FILETIME is:

typedef struct _FILETIME {    DWORD dwLowDateTime;    DWORD dwHighDateTime;} FILETIME, *PFILETIME, *LPFILETIME;

So, continuing as before:

Field Index Field Name Field Size Field Offset
0 dwFileAttributes 4 0
1 ftCreationTime.dwLowDateTime 4 4
2 ftCreationTime.dwHighDateTime 4 8
:     : : :

And the mystery of my reader is solved - ftCreationTime is aligned at offset 4 because it only has offset 4 member variables.

And finally, lets consider the INTL_DATA structure mentioned above - why did I say that it didn't work?

Field Index Field Name Field Size Field Offset
0 _DateFormat 2 0x0
1 _CurrencySymbol 5 0x2
2 _ThousandsSeparator 2 0x7
3 _DecimalSeparator 2 0x9
4 _DateSeparator 2 11
5 _TimeSeparator 2 13
6 _Padding 1 15
7 _CurrencyPlaces 1 16
8 _TimeFormat 1 17
9 _CaseMapCallAddress 4 20
10 _DataListSeparator 2 24
11 _Reserved 10 26

We're just fine up until we get to the _CaseMapCallAddress field.  That one's supposed to be at offset 18 according to the MS-DOS documentation, but it's at offset 20 in the C structure!  This is because the natural alignment of a far pointer to a function was 32bits even on Win16.

Tomorrow, I'll write about how this was resolved for MS-DOS clients (and refine the packing algorithm mentioned above)

Most of the content in this post was already been posted in GrantRi's blog post Alignment from last summer, it's an excellent reference to the topic.  In addition, Raymond Chen wrote about the FILETIME issue here.