The alignment declaration specifier is in bytes, not bits


Explicit object alignment is not something most people worry about when writing code, which means that when you decide to worry about it, you may be a bit rusty on how the declarations work. (After all, if it’s something you worried about all the time, then you wouldn’t have trouble remembering how to do it!)

I was looking at some customer code, and there was a class who had a data member with an explicit alignment declaration.

class Whatever {
    ...
    __declspec(align(32)) LONG m_lSomething; // Must be DWORD-aligned to make writes atomic
    ...
};

I pointed out that the comment didn’t match the code. The comment says that the variable needs to be DWORD-aligned (which in Windows-speak means aligned on a 32-bit boundary), but the code aligns it on a 32-byte boundary, eight times as generous as required. On the other hand, maybe they really did want the member aligned on a 32-byte boundary (say to put it on its own cache line).

Turns out that in this case, the comment was correct and the code was wrong. To force a variable to align on a DWORD boundary, you want to say __declspec(align(4)). Save yourself a bunch of unnecessary padding bytes.

But in fact, in this case, the customer was simply trying too hard. The code was compiled with default alignment, which aligns integer types on their natural boundaries anyway. The compiler was going to align the variable even if you didn’t specify anything.

[Raymond is currently away; this message was pre-recorded.]

Comments (18)
  1. W says:

    IMHO it's a good idea to specify the alignment explicitly if you rely on it.

  2. Pierre B. says:

    Frankly, I'm more worried about the comment claiming that aligning the address makes the write atomic. Maybe they meant it is a necessary, but not sufficient, requirement for the writes to be atomic. At least they didn't declare it volatile, so they /might/ have known what they were doing and actually written access functions in hand-coded assembly.

    p[Access to the variable was done via InterlockedXxx. -Raymond]
  3. asdbsd says:

    Still, there are different compilers and nobody can guarantee that Microsoft doesn't change their default alignment one day… or that someone for whatever reason doesn't set it for the whole project. Better safe than sorry.

  4. Dan Bugglin says:

    @W And similarly, when you don't rely on it, it's best to rely on the compiler to optimize instead of potentially creating a deoptimization yourself.

  5. Alex Grigoriev says:

    On the other hand, I used align(64) to align a whole part of my structure to a cache line, to improve per-processor cache locality. In this case, the whole structure also has to be declared with at least align(64).

  6. Krishty says:

    It's a pity __declspec(align(#)) accepts literals only.

    __declspec(align(sizeof(DWORD))) would have been helpful here, and it would also simplify alignment in template code. But now we'll have to wait for C++0x's alignas().

  7. Agile says:

    And that's why you shouldn't write comments in code. At all. Hopefully they are atleast valid when the code are written, but later on when code are changed, comments are usually not updated.

  8. ms blog 0.001 gamma says:

    The blog software used here is SLOW and BUGGY.

  9. Adam Rosenfield says:

    I prefer macros that allow you to write platform-neutral code for alignment, e.g.:

    #if defined(_MSC_VER)

    #define ALIGNED(n) __dclspec(align(n))

    #elif defined(GNUC)

    #define ALIGNED(n) attribute((aligned(n)))

    #elif defined(SOME_OTHER_COMPILER)

    // etc.

    #endif

    LONG ALIGNED(32) m_lSomething;

  10. Mike Dimmick says:

    @asdbsd: no compiler vendor would do that. x86 is the weirdo: other processor families will not even load a value from a misaligned address, you get an alignment fault. On x86 (and x64) it will *work*, but it will take much longer as the processor tries it, faults internally, and does the fixup. It may turn out to be implemented in microcode ROM rather than in native micro-ops, which is a lot slower.

    On IA-64, Windows can automatically handle your alignment faults if you opt in, but handling the exception is really slow.

    If you really want it to misalign data, you can use #pragma pack (or /Zp) to pack in the data tighter. If you're that obsessed with the sizes, though, you'll reorganize your structures to put the bytes next to each other, the shorts, and the longs (and the floating-point values), so they can all nicely align without the compiler adding extra padding bytes anyway. Compilers for processors that don't handle misalignment will generate code that loads each part into a separate register and does the shifting and ORing to load it properly, and likewise for stores, but of course this cannot be atomic. It usually wastes some registers too, and bloats the code.

    Porting code from the desktop, or from old DOS handhelds, to Windows CE can often be fun if the structures were misaligned. Choose between compatible structures, if you need to load something from a common file format, and pay the penalty of the bloated code, or remove the packing, but make sure each use of each structure is under the same packing value, or you get very weird results when different bits of code think the member you're trying to access is at different offsets.

  11. Mike Dimmick says:

    @asdbsd: no compiler vendor would do that. x86 is the weirdo: other processor families will not even load a value from a misaligned address, you get an alignment fault. On x86 (and x64) it will *work*, but it will take much longer as the processor tries it, faults internally, and does the fixup. It may turn out to be implemented in microcode ROM rather than in native micro-ops, which is a lot slower.

    On IA-64, Windows can automatically handle your alignment faults if you opt in, but handling the exception is really slow.

    If you really want it to misalign data, you can use #pragma pack (or /Zp) to pack in the data tighter. If you're that obsessed with the sizes, though, you'll reorganize your structures to put the bytes next to each other, the shorts, and the longs (and the floating-point values), so they can all nicely align without the compiler adding extra padding bytes anyway. Compilers for processors that don't handle misalignment will generate code that loads each part into a separate register and does the shifting and ORing to load it properly, and likewise for stores, but of course this cannot be atomic. It usually wastes some registers too, and bloats the code.

    Porting code from the desktop, or from old DOS handhelds, to Windows CE can often be fun if the structures were misaligned. Choose between compatible structures, if you need to load something from a common file format, and pay the penalty of the bloated code, or remove the packing, but make sure each use of each structure is under the same packing value, or you get very weird results when different bits of code think the member you're trying to access is at different offsets.

  12. peterchen says:

    @Agile: Because "Huh? WTF?" is better than "Oh, there's a difference between code and comment, let me check"?

  13. Ben Hutchings says:

    @Mike Dimmick: Misalignment is cheap on x86, in fact it may even be free if the access doesn't cross a cache line boundary. Some of the RISC architectures made misalignment very expensive originally but generally changed that later.

  14. Gabest says:

    Trying to align class members is pointless anyway, if new isn't overriden and returns a base address with lower granularity. Thinking of __m128 members, for example.

  15. Worf says:

    x86 is one of the few architectures where misalignment is allowed. Most others do one of two things – they access aligned (e.g., a 32-bit access zeros the least significant 2 bits), or cause an interrupt. Either way, one gets you odd results, the other a crash.

  16. GWO says:

    There are some alignment issues between some windows compilers.  If you build a DLL with GCC/G++/MingW (and turn on extremely aggressive optimisation and SSE-vectorisation) and dynamically load it from a VB application, you can sometimes get misaligned stack errors — i.e. the optimisation on the DLL assumes that the stack is 16-byte aligned, suitable for SSE instructions.  Under Windows, this won't always be the case.  There is a compiler flag to drop this assumption..

  17. Neil says:

    ARM presumably saved on transistors by making its int and byte reads follow almost the same code i.e. read aligned int, rotate, transfer int/byte to register, but it also meant that a short-aligned short read could be emulated with a (possibly unaligned) int read and a mask. (I understand that current processors now include short reads and writes.)

    VC++7.1 doesn't check stack alignment either so for instance libpixman will crash on an unaligned SSE access if you compile using standard debug settings.

  18. Joseph Koss says:

    I can confirm (I do a lot of x86 assembly language programming) that on 32-bit x86 that misaligned dword reads are normally as efficient as aligned dword reads.

    As someone else mentioned, one of the cases where this isnt true is when the read straddles a cache line, but not mentioned is that this penalty may still be better than the alternative (increasing your structure size reduces the density of useful data in the cache.) This is also something that compilers just dont get "right" unless "right" happens to be align-everything (because thats what most x86 compilers do by default.)

    Normally the align-everything rule is optimal. I personally use explicit alignment quite a bit (and align everything!), but thats normally because I drop down to assembler often enough that I find myself frequently having the structure defined twice.. once in the HLL and once again as a masm struct. Any time you are doing mixed language stuff you should be explicit.

Comments are closed.