If you’re going to write your own allocator, you need to respect the MEMORY_ALLOCATION_ALIGNMENT

This time, I'm not going to set up a story. I'm just going to go straight to the punch line.

A customer overrode the new operator in order to add additional instrumentation. Something like this:

    DWORD Awesome1;
    DWORD Awesome2;

// error checking elided for expository purposes
void *operator new(size_t n)
  EXTRASTUFF *extra = (EXTRASTUFF)malloc(sizeof(EXTRASTUFF) + n);
  extra->Awesome1 = get_awesome_1();
  extra->Awesome2 = get_awesome_2();
  return ((BYTE *)extra) + sizeof(EXTRASTUFF);

// use your imagination to implement
// operators new[], delete, and delete[]

This worked out okay on 32-bit systems because in 32-bit Windows, MEMORY_ALLOCATION_ALIGNMENT is 8, and sizeof(EXTRASTUFF) is also 8. If you start with a value that is a multiple of 8, then add 8 to it, the result is still a multiple of 8, so the pointer returned by the custom operator new remains properly aligned.

But on 64-bit systems, things went awry. On 64-bit systems, MEMORY_ALLOCATION_ALIGNMENT is 16, As a result, the custom operator new handed out guaranteed-misaligned memory.

The misalignment went undetected for a long time, but the sleeping bug finally woke up when somebody allocated a structure that contained an SLIST_ENTRY. As we saw earlier, the SLIST_ENTRY really does need to be aligned according to the MEMORY_ALLOCATION_ALIGNMENT, especially on 64-bit systems, because 64-bit Windows takes advantage of the extra "guaranteed to be zero" bits that 16-byte alignment gives you. If your SLIST_ENTRY is not 16-byte aligned, then those "guaranteed to be zero" bits are not actually zero, and then the algorithm breaks down.

Result: Memory corruption and eventually a crash.

Comments (22)
  1. God, I love managed code. (And I recognize C to also be a valid language choice).

  2. Joshua says:

    I fail to understand how misaligned changes the "guaranteed to be zero" so it is not.

    I do, however, see the problem when certain 64 bit instructions are no longer atomic when misaligned (I forget which ones offhand).

  3. Z.T. says:

    Can that SLIST code have the equivalent of "assert((this & 15) == 0" ? Asserts are better than documentation.

    [I just checked. The assertion is there. Of course, assertions are enabled only in chk builds… -Raymond]
  4. Z.T. says:

    @Joshua: the pointer itself has zeroes in its 4 least significant bits, because it points to a 16 byte aligned address. Now you can stuff things (like count/length/size) into the pointer itself, and dereference through "*(((uintptr_t)p)&~((uintptr_t)15))". See en.wikipedia.org/…/Tagged_pointer

  5. Rodrigo says:

    I'd really like to see the correct implementation of this.

  6. Nobody says:

    This is one reason I would prefer to work on a CPU where all misaligned access faulted.  Less room for ambiguity if your thing faults all the time.

    (Although, I'd argue that altering the low bits of a pointer as metadata is kind of asking for this type of bug.  The only place where I would not frown on this in a code review is where hardware interfaces require it.  x86 page tables come to mind.)

  7. AsmGuru62 says:

    The correct one would be that one (hopefully):

    struct EXTRASTUFF


       DWORD_PTR Awesome1;

       DWORD_PTR Awesome2;


    [You're trying too hard. Just use DECLSPEC_ALIGN(MEMORY_ALLOCATION_ALIGNMENT). -Raymond]
  8. Joshua says:

    [You're trying too hard. Just use DECLSPEC_ALIGN(MEMORY_ALLOCATION_ALIGNMENT). -Raymond]

    The K&R C handbook essentially says to do it this way. (It actually said to union with a type that was the same size as the alignment.)

  9. James says:

    I'm confused again.

    Aren't you supposed to use _aligned_malloc for lists ?

    So using new was on its own a bug when allocating 'SLIST_ENTRY', isn't that correct ?

    // Initialize the list header to a MEMORY_ALLOCATION_ALIGNMENT boundary.


    From: msdn.microsoft.com/…/ms686962(v=vs.85).aspx

  10. Evan says:


    (1) From Raymond's description, the list was a member of another structure so that wouldn't have worked anyway

    (2) There's no requirement I can find at the page you linked or a couple I followed from there. All it says is that the SLIST_HEADER must be aligned at MEMORY_ALLOCATION_ALIGNMENT. _aligned_mallloc() is one way to achieve that (which is why they refer you to it) but it's by no means the only.

  11. James says:


    I have never understood the alignment thing.

    Never understood the rules behind when it is needed and when it should/recommended be used.

    Is it the pointer that need to be in a boundary or the position in the parent structure of the SLIST_HEADER or the whole parent structure needs to be aligned. All the above ?

    Been programming for over 10 years. Never needed to learn it.

    This makes me feel uneducated again.

    I have to put it in the todo list.

  12. Yuhong Bao says:

    [I just checked. The assertion is there. Of course, assertions are enabled only in chk builds… -Raymond]

    I think CMPXCHG16B will itself fault if the structure is not 16-byte aligned.

  13. Evan says:


    If I have some "struct S { … SLIST_ENTRY list; … };" and an object "struct S s;", technically speaking it's only the 's.list' field that needs that alignment. In other words, "&s + offsetof(S, list)" must be a multiple of MEMORY_ALLOCATION_ALIGNMENT, but nothing unusual is imposed on 's' itself. (Of course, other members may have other alignment needs.)

    But in practice this also means that "&s" needs to be aligned to that boundary as well, and enough padding added (perhaps none) to get 'offsetof(S, list)' out to be a multiple of MEMORY_ALLOCATION_ALIGNMENT as well. (Suppose MEMORY_ALLOCATION_ALIGNMENT is 16 and offsetof(S, list) is 8. To get 's.list' aligned on a 16-byte boundary, 's' would have to be aligned on an 8 byte boundary but NOT a 16 byte boundary. An alignment requirement like that is… strange, to say the least.)

    (At least, this is my understanding. I'm not sure I've ever really had to deal with anything with unusual alignment characteristics.)

  14. SimonRev says:

    @Rodrigo — here is my quick approach:

    struct EXTRASTUFF


       DWORD Awesome1;

       DWORD Awesome2;



    // error checking elided for expository purposes

    void *operator new(size_t n)



     extra->Awesome1 = get_awesome_1();

     extra->Awesome2 = get_awesome_2();

     return ((BYTE *)extra) + MEMORY_ALLOCATION_ALIGNMENT;


    But there are probably better ones

  15. Someone says:


    In C, the largest alignment requirement of a struct member becomes the alignment requirement of the whole struct. In the order of declaration, all struct members are shifted by the compiler to the next offset that meet their individual alignment requirement.

    For sizeof(double) == 8,

    struct S {char a; double b; char c;};


    struct S {char a; char pad1[7]; double b; char c; char pad2[7]; }

    with a alignment requirement of 8 and a size of 24.

    The size of a struct is always a multiple of the alignment requirement. This has the consequence that arrays in C don' need to add space between array elements to get the elements aligned.

  16. Anonymous Coward says:

    Can someone explain to me why the 32-bit alignment is 8? You'd expect it to be 4.

  17. Rick C says:

    Anonymous:  Follow the link in Raymond's "you're trying too hard" comment.

  18. Someone says:

    sizeof(double) = 8. An allocator has to support structs containing all base types of the programming language. If the largest primitive type in C(++) requires 8, then the allocator must return memory aligned to 8.

    But MEMORY_ALLOCATION_ALIGNMENT is defined in WinNT.h, not in the C header files. There may be machine instructions with strange alignment requirements, for example alignment to a physical page boundary, but is there something in 64-bit C++ itself which requires an alignment of 16? Does the C runtime really need to respect MEMORY_ALLOCATION_ALIGNMENT from WinNT.h?

  19. Matt says:


    For 32-bit, the memory alignment is 64-bit in order to give aligned read access to doubles, which improves the performance of SSE instructions (which require 8-byte alignment) and lockcmpxchg32 instructions.

    For 64-bit, the memory alignment is 128-bit in order to give aligned read access to MMX instructions and lockcmpxchg64 instructions

    These two requirements allow the compiler to assume best case performance when doing certain instructions such as MMX (which are used in lots of wierd places, like some optimised memsets and memcpys). Instead of testing whether to do the fast (aligned) instruction or the slow instruction with a test and jump (which in practice kills the performance benefit) – and in the lockxchgcmp instructions also loses you cross-processor atomicity, Microsoft simply states that such pointers must be aligned on 64-bit and 128-bit respectively.

    In practice, since the paradigm of (FOO*)malloc(sizeof(FOO)) is so common, malloc needs to return a size that means that given a FOO, it is likely that the output pointer matches the alignment of FOO. As described above, this is either 64-bit or 128-bit alignment. If you need bigger alignment for weird reasons (CONTEXT_AMD64 is a good example; it needs 128-bit alignment even on WOW64), you just have to call _aligned_malloc() or carefully position your struction WITHIN the malloc'ed region, instead of just at the front of it.

  20. James says:

    (Nitpick: The EXTRASTUFF cast on the malloc result actually should be to EXTRASTUFF*.)

  21. 640k says:

    This wouldn't have been a problem if segment allocation code from DOS had been reused.

  22. Jon says:

    Is there any way to detect/crash on misaligned access as soon as it occurs, say in debug builds?

Comments are closed.