What structure packing do the Windows SDK header files expect?


/Zp8

In words, integral types and pointers up to eight bytes in size are stored at their natural alignment. Larger types are stored at eight-byte alignment.

Type Alignment
BYTE, char, UCHAR 1-byte alignment
WORD, WCHAR, short, USHORT 2-byte alignment
DWORD, int, long, UINT, ULONG, float, 32-bit pointer 4-byte alignment
QWORD, __int64, unsigned __int64, double, 64-bit pointer 8-byte alignment

In other words (saying it a third time), let T be an integral or pointer type. If sizeof(T) ≤ 8, then T is aligned at a sizeof(T)-byte boundary. If sizeof(T) ≥ 8, then T is aligned at an 8-byte boundary.

Comments (32)
  1. Tom says:

    That’s rather interesting.  I always assumed Windows used 4 byte alignment for all integral types of 4 bytes or less so that the fields would be properly aligned to the bus for performance reasons.  Of course, I may have simply extrapolated that from the fact that a BOOL is actually a DWORD in disguise (a DWORD is awfully big for information encoded in a single bit).

  2. Alexander Grigoriev says:

    Tom,

    BOOL is ‘int’, not DWORD. Since Win16. BOOLEAN, on the other hand, is a byte.

  3. Eric says:

    int…DWORD…whatever…There the same size.

  4. Eric says:

    int…DWORD…whatever…There the same size.

  5. Vasili Zaitsev says:

    There is a weird layout bug in MSVC where the presence of virtual functions screws up the layout. Observe the layout in memory of the following struct with and without the comment on double.

    class Y

    {

       virtual void f(); //4bytes for vfptr

       int x;   //4bytes

    //    double y;     //8bytes

    };

    It should be [4(vfptr)|4(a)] and [4(vfptr)|4(a)|8(b)]

    instead its [4(vfptr)|4-Padding|4(a)|4-padding|8(b)]

  6. Vasili Zaitsev says:

    meh,

    :%s/(a)/(x)/g

    :%s/(b)/(y)/g

  7. John says:

    Vasili:  It looks like the standard leaves the layout up to the implementation, so it’s not really a bug.

  8. Jules says:

    Tom: "I always assumed Windows used 4 byte alignment for all integral types of 4 bytes or less so that the fields would be properly aligned to the bus for performance reasons."

    I don’t see how such alignment would improve performance.  If I need to get two bytes from a dword, why does it matter if it’s in the top or bottom pair?

    AIUI, the only case where alignment causes extra delays is when the misalignment means the processor needs to fetch additional words of memory (word being 64 bits on most modern machines). Which this arrangement completely avoids.

    Eric:  "int…DWORD…whatever…There the same size."

    not in win16 they aren’t.

  9. Dave says:

    My Grandpa told me stories about win16.

    I think is goes back to when he fought the Keiser.

  10. a DWORD is awfully big for information encoded in a single bit

    Ah, but you’re forgetting parity… and error correction… and 29 other things I can’t think of at the moment.

  11. Sebastian Redl says:

    and 29 other things I can’t think of at the moment.

    FILE_NOT_FOUND

  12. porter says:

    … and does it match the default packing of ‘midl’ & ‘mktyplib’?

  13. Jens says:

    Eric:  "int…DWORD…whatever…There the same size."

    Actually, no. We recently (2007) had an application fail horribly because it was ported from 32-bit Windows to 64-bit Unix. When sizeof(int) suddenly became 8 instead of 4, doing a read(handle, &version_number, sizeof(int)) gave us file version numbers about eight billions larger than expected.

    If you REALLY want 32-bit values, use int32_t instead of int. Please.

  14. Worf says:

    Funny, I always thought the default alignment would be 16 bytes, a "paragraph". This is because of the x86, of course, supporting it implicitly with the segment:offset method.

    I wonder if the alignment is 32-bits on ARM (Windows Embedded CE or Windows Mobile) – ARM, like most architectures, cannot access unaligned memory (throws exception).

  15. Mike Dimmick says:

    Worf, segment:offset is extremely ancient real mode x86. In protected mode, as soon as you turn on segmented addressing, the segment is no longer used directly but the relative address of a segment descriptor in one of the local or global descriptor tables. That descriptor can have any base address – a full 32 bits is available to describe the base address. The Intel manual says they should be aligned to 16-byte boundaries but it is not required.

    In 32-bit mode, the offset can address any part of the 4GB physical address space. With paging enabled, the segment base address plus the offset calculate a virtual address; this is translated to a physical address via the Translation Look-aside Buffer (TLB) and page tables. If in Processor Address Extensions mode, the page table points to (architecturally) 64-bit physical addresses, although only 36 bits are implemented on 32-bit x86 processors. (I believe 48 bits are available on x64, even in 32-bit mode.) Windows now generally runs in PAE mode even if less than 4GB is fitted, because the No Execute/eXecute Disable feature is implemented only in PAE mode, as there was no space for the new bit in the 32-bit Page Table Entry. It’s the top bit of the 64-bit physical address in the extended PTE used in PAE mode.

    Segments are almost entirely unused in Windows, with the single code segment set to a base of 0 and limit of 4GB, SS DS and ES all pointing to a data segment, also with base 0 and limit 4GB, and FS pointing to a data segment whose base is the address of the thread environment block (TEB). This allows code to reference e.g. the exception handler chain and thread-local storage via the FS segment – the system sets FS correctly for each new thread.

    x64 processors have vestigial segment support in 64-bit "long mode". ES, DS and SS segments are completely unused and some of the opcodes which reference them were reused for the new features. They are implicitly 0-based. Limits are no longer used. CS must still point to a code segment. FS and GS can still be used for offset addressing.

    Alignment is an issue on all processors; x86 will not (by default) raise exceptions for misalignment, but it will take substantially longer to retrieve misaligned data. ARM will always raise exceptions as it has no compatibility behaviour. Default alignment on ARM is the same as on x86, 8 bytes.

    You can still compile ‘misaligned’ code on ARM using packing pragmas or options, but the code balloons in size as it has to detect the current alignment and perform the correct fixup for every possible misalignment. Don’t do it unless you need to for compatibility with a previously-created data file.

  16. wolf550e says:

    SSE types require 16-byte alignment. I guess they aren’t used in the Windows SDK? How about directX or something where they are used?

    [Yeah, how about DirectX? Check out d3dx8math.h. (Hey, you can do the research yourself. Don’t make me do it.) -Raymond]
  17. Austin Donnelly says:

    Can someone give me an example T such that sizeof(T) >= 8 bytes.  Remember, T must also be either an integral or pointer type.

    I can’t think of any.

  18. configurator says:

    @Austin Donnelly: int128, of course.

    You didn’t say it has to already exist.

  19. GregM says:

    "Can someone give me an example T such that sizeof(T) >= 8 bytes."

    Raymond gave several examples.  I assume you actually meant sizeof(T) > 8 bytes.

    long double, if it were actually implemented on Windows as more precise than double, would fit sizeof(T) > 8 bytes.

    "Remember, T must also be either an integral or pointer type."

    Why does it need to be an integral or pointer type?  Structures can contain other structures.

  20. Tim Smith says:

    @Vasili

    I was just working on a packing problem with the same basic layout as your layout.  I didn’t see that problem.  It packed as expected.  

    Is your example more complicated in reality?

  21. Neil says:

    The last time I used an ARM it didn’t raise an exception for an unaligned read, but only the bottom byte was guaranteed to be correct as all it did was read the aligned value and rotate it by the appropriate multiple of 8 bits.

  22. laonianren says:

    I haven’t programmed ARM since 1992, so this may well be out-of-date.

    The bottom two bits of an address aren’t given to the memory controller, so reads are implicitly 4-byte aligned.  These two bits are multiplied by 8 and used to rotate the 32-bit value that’s been read.  If you also set the B (byte) flag on your instruction the top 24 bits of the value are set to zero.  This simulates byte-size reads.  If you don’t set the B flag you do get the full 32 bits (contrary to what Neil said), but they’ve been rotated.  If you need to read then rotate a value you can save an instruction.  Writing works similarly.

    ARM also supports multi-word reads and writes.  These are slightly quicker if 16-byte aligned, but work with any alignment.

  23. mikeb says:

    @Vasili Zaitsev: why do you consider that a bug? I’m sure it’s laid out that way so that 2 classes that are the same except for one having a vptr and one not a vptr get laid out in the same way. Also, Remember that the location if the vptr is not specified (actually, the compiler if free to not use one at all if it wants to implement virtual method dispatch some other way).  I think that there maybe even are some situations where MSVC doesn’t put the vptr at the start of the object (though I’m not really sure about that).  See http://blogs.msdn.com/oldnewthing/archive/2006/01/20/515327.aspx.

  24. ChrisR says:

    @ Austin Donnelly:

    Well, if you really meant >= 8, here’s an example using VC7.1 (pointer to member of a class derived using multiple inheritance).

    class Base1

    {

    public:

      virtual int DoSomething()

      {

         return 1;

      }

      int x;

    };

    class Base2

    {

    public:

      virtual int DoSomethingElse()

      {

         return 1;

      }

      int y;

    };

    class Derived : public Base1, public Base2

    {

    public:

      virtual int DoSomething()

      {

         return 0;

      }

      virtual int DoSomethingElse()

      {

         return 0;

      }

    };

    typedef int (Derived::* PFN_Derived)();

    PFN_Derived pfnFunc = Derived::DoSomething;

    sizeof( pfnFunc ) == 8


    For further reading:

    http://blogs.msdn.com/oldnewthing/archive/2004/02/09/70002.aspx

  25. mikeb says:

    How’s this for sizeof(T) > 8:

    class A {};

    class B {};

    class VirtD: public virtual A, public virtual B {

    public:

       virtual int Dfunc() { return 5; };

    };

    typedef int (VirtD::* Derived_mfp)();

    int main()

    {

       VirtD virtd;

       Derived_mfp mfp = &VirtD::Dfunc;

       printf( "sizeof( mfp) == %dn", sizeof( mfp));

    }

    Displays: sizeof( mfp) == 12

    See http://www.codeproject.com/KB/cpp/FastDelegate.aspx for more fun with member function pointers.

  26. porter says:

    > it was ported from 32-bit Windows to 64-bit Unix. When sizeof(int) suddenly became 8 instead of 4

    Unlikely, 64 bit UNIXes use the LP64 model and still have int as 32 bit, but have long as 64bit.

  27. mikeb says:

    > Unlikely, 64 bit UNIXes use the LP64 model and still have int as 32 bit, but have long as 64bit.

    Hmm – I thought that there was at least one major Unix variant that used the ILP64 model. In any case, maybe he meant that sizeof(long) suddenly became 8 instead of 4, and that was the cause of the bug?

  28. Worf says:

    FYI, modern ARM cores require types to be aligned appropriately, but you can do 8/16 bit accesses without a data abort. Unaligned accesses always throw a data abort, so you must access/shift in software.

  29. steelbytes says:

    DrawTextW and TextOutW are sensitive to memory aligment when drawing to a DIBSection

    http://louis.steelbytes.com/DrawTextBug.html

    [The pointer is declared as LPCWSTR (pointer to WCHAR) not LPUWSTR (pointer to unaligned WCHAR). Passing an unaligned pointer to a function that expects an aligned pointer is a violation of the ground rules of programming and therefore all bets are off. -Raymond]
  30. porter says:

    > read(handle, &version_number, sizeof(int))

    He should also have used "sizeof(version_number)" rather than sizeof some type that may or not be what version_number actually is.

  31. Steven says:

    It got caught out with this recently when compiling code using bluetooth winsock calls (on Windows Mobile). Most of the Windows include files have the packing set at the top of the file but the ws2bth.h file doesn’t.

    On Windows the ws2bth.h is actually packed to 1-byte boundaries, which doesn’t work so well on ARM processors, so it appears the packing was removed but not explicitly set to 8. This caused problems of course because for some reason the application I was compiling was aligning on 4 byte boundaries.

    Funnily enough if you look at ws2bth.h for Windows Mobile 5.0 you will see this lonesome comment:

    // Turn 1 byte packing of structures on

  32. Yuhong Bao says:

    [The pointer is declared as LPCWSTR (pointer to WCHAR) not LPUWSTR (pointer to unaligned WCHAR). Passing an unaligned pointer to a function that expects an aligned pointer is a violation of the ground rules of programming and therefore all bets are off. -Raymond]

    BTW, on alignment, why doesn’t NT set CR0.AM by default? I am not expecting any OS to set EFLAGS.AC by default for compatibility reasons, but OSes should at least set CR0.AM so that apps that do set EFLAGS.AC will actually get the alignment check exceptions. It would help a lot when testing for alignment issues.

Comments are closed.