Alignment

I've seen this come up in conversations a few times so I thought others might be interested in it too.

First off, I think since the large majority of us use x86-based processors we forget that one day in our CS or EE class where they talked about alignment.  So technically this should just be a refresher course.  The reason it needs refreshing is because x86 is letting it's market dominance slip and more and more of us are wanting our apps to run on other processors.  Also remember that most of this doesn't apply to managed code, but some of it does, so don't just stop reading because you use C# and or VB and never use pointers.

From here on out I'm going to use ARM-based chips as my example just because I know the most about them.

What is Alignment?

The easiest way to think of it is that all of your computer's memory is just one big array of a given type (starting at address 0), and thus every time you read or write memory, you're just reading or writing from the array.  In the case of 32-bit integers, this means that you'll never need the lower order 2 bits of any address because they'll always be zero.  In the case of 64-bit integers, the low order 3 bits will always be zero.  The reason for this lies all the way down at the CPU level.

Many (maybe even all) CPUs offer instructions to load difference sized values from a memory address into a register.  For ARM-based chips this is ld (for load dword, 4 bytes), lh (load half-word, 2 bytes), lb (load byte).  The ARM CPU further restricts that the address used for the loads must be properly aligned according to the data being loaded.  For DWORDs, the address must be 4-byte aligned (the low-order 2 buts must be 0), for WORDs it must be 2-byte aligned (the low-order bit must be 0), etc.  If the address is not properly aligned, the CPU will throw and exception (similar to how fp-exceptions occur or divide-by-zero, etc.).  My best guess for why they do it is because they want to make sure that all 4 bytes of a DWORD are in the same page, or something to do with the memory bus, etc.  Consider loading a DWORD from 0xFFFFFFFE: Byte 0 is at 0xFFFFFFFE, Byte 1 is at 0xFFFFFFFF, Byte 2 is at 0x00000000, and Byte 3 is at 0x00000001!

Why is Alignment Important?

The short answer is that if you violate alignment your application will get an exception if you run it on just about any other CPU besides x86.  This includes PocketPCs, a ton of WindowsCE devices, IA64-based machines, etc.  Now for the interesting part: even on x86 alignment matters.  Several of the MMX, SSE, and SSE/2 instructions perform significantly better if the memory addresses involved are properly aligned, and some of them will even raise exceptions in certain modes!

So how to 'Align' everything?

The easiest way is to just not fool with the compiler/OS.  The compiler will automatically layout locals and parameters on properly aligned addresses.  The same goes for malloc, realloc, new, etc.  The problem arises when you allocate one thing, but then treat it like something else.  The simple case is that you allocate an array of 2 shorts (16-bit integers), and then cast it to a pointer to an int (32-bit integer).  Here you are playing Russian Roulette and have a 50% chance of getting an alignment fault.  The reason is that any even address is properly aligned for shorts, but only half of those addresses are properly aligned for ints.  Some memory allocators try to help by always aligning everything to the maximum alignment, but that needlessly wastes memory.

If you do need to treat something as 2 different types, then create a union so the compiler knows about it.  Then the compiler will always align it to the maximum alignment needed for any of the union members.

The other big way to unalign something is to override the compiler's default layout mechanism.  In C# this is done using LayoutKind.Explicit, in C++ it's done by anything that messes with the default packing size (#pragma pack and related).  When you do this the compiler/OS will properly align the start of the struct, but you are responsible for making sure that the interior members are properly aligned.  Example:

struct MisAligned {
    byte someValue;
    int someOtherValue;
};

By default someValue will be at offset 0, and someOtherValue will be at offset 4.  Yes, that 'wastes'  3 bytes for each MisAligned struct, but that often isn't a big deal.  If you mess with the packing size, like set it down to 1, someOtherValue will be at offset 1.  This means that someOtherValue will always be at an odd offset and thus always cause an alignment fault!

So the rule is don't mess with how the compiler lays out fields in a struct.  If you need to pack more data into less space, re-arrange the data so the compiler doesn't have to waste space on padding.  The easiest way to do this is to always organize your fields according to size: largest to smallest.  For C++ this applies to classes and structs.  For managed code (C# and VB) this really only applies to structs.  The reason is that classes default to LayoutKind.Auto which means that the runtime is free to reorder the fields within the type.  The runtime currently orders fields to get minimal padding.

What if I have to have unaligned data?

Well, needless to say there are situations where data is going to be unaligned.  The most common case is reading stuff off of disk or networks where data has been tightly packed.  For C++ you simply need to declare pointers using the UNALIGNED macro (like int UNALIGNED * pui).  Then when the compiler generates code to dereference this pointer it will generate code to do the right thing and not cause an alignment fault even if the address is not properly aligned.

At this point one might ask why aren't all pointers marked as UNALIGNED?  The short answer is that it's slower.  Some CPUs help by having unaligned variants of all load and store instructions.  These variants are generally slower than their aligned counter parts but not too bad (if you want to find out how bad find a PocketPC and run some tests, but remember this is entirely CPU dependent).  Other CPUs force the compiler to just generate a byte-by-byte load or store and possibly waste several registers in the process (sort of like a memcpy from a register to memory).  This I think is the slowest, but again it is entirely architecture dependent.

It is a macro so that your code is more portable.  On alignment sensitive architectures, it is defined to be __unaligned, otherwise it is just ignored.

MSIL took a different approach.  Instead of decorating a pointer, it modifies the actual load or store opcodes with an unaligned opcode prefix.  This prefix instructs the JIT/runtime to generate the proper code just like the native compiler does for the UNALIGNED macro.  Unfortunately the C# language designers chose not to complicate the C# language in any way to expose this.  So if you want to load or store unaligned data in C# you'll have to use something else (there was talk about adding helper routines to mscorlib.dll to do the unaligned load or store), or manually do the byte-by-byte load or store.

Here's a good random link with more docs on the subject.

Hope you enjoyed the reading, and possibly learned a few things.

--Grant