If you need anything other than natural alignment, you have to ask for it


If you need variables to be aligned a particular way, you need to ask for it.

Let's say I have the following code:

void fn() 
{ 
 int a; 
 char b; 
 long c; 
 char d[10];
} 

What would the alignment of the starting adresses of a,b,c and d be?

What would the alignment be if the memory were allocated on heap?

If this alignment varies for different data types within the same translation unit, is there a way to force uniform alignment for all types?

If you need a particular alignment, you have to ask for it. By default, all you can count on is that variables are aligned according to their natural requirements.

First, of course, there is no guarantee that local variables even reside on the stack. The optimizer may very well decide that particular local variables can reside in registers, in which case it has no alignment at all!

There are a few ways to force a particular alignment. The one that fits the C language standard is to use a union:

union char_with_int_alignment {
 char ch;
 int Alignment;
} u;

Given this union, you can say u.ch to obtain a character whose alignment is suitable for an integer.

The Visual C++ compiler supports a declaration specifier to override the default alignment of a variable.

typedef struct __declspec(align(16)) _M128 {
    unsigned __int64 Low;
    __int64 High;
} M128, *PM128;

This structure consists of two eight-byte members. Without the __declspec(align(#)) directive, the alignment of this structure would be 8-byte, since that is the alignment of the members with the most restrictive alignment. (Both unsigned __int64 and __int64 are naturally 8-byte-aligned.) But with the directive, the aligment is expanded to 16 bytes, which is more restrictive than what the structure normally would be. This particular structure is declared with more restrictive alignment because it is intended to be use to hold 128-bit values that will be used by the 128-bit XMM registers.

A third way to force alignment with the Visual C++ compiler is to use the #pragma pack(#) directive. (There is also a "push" variation of this pragma which remembers the previous ambient alignment, which can be restored by a "pop" directive. And the /Zp# directive allows you to specify this pragma from the compiler command line.) This directive specifies that members can be placed at alignments suitable for #-byte objects rather than their natural alignment requirements, if the natural alignment is more restrictive. For example, if you set the pack alignment to 2, then all objects that are bigger than two bytes will be aligned as if they were two-byte objects. This can cause 32-bit values and 64-bit values to become mis-aligned; it is assumed that you know what you're doing any can compensate accordingly.

For example, consider this structure whose natural alignment has been altered:

#pragma pack(1)
struct misaligned_members {
 WORD w;
 DWORD dw;
 BYTE b;
};

Given this structure, you cannot pass the address of the dw member to a function that expects a pointer to a DWORD, since the ground rules for programming specify that all pointers must be aligned unless unaligned pointers are explicitly permitted.

void ExpectsAlignedPointer(DWORD *pdw);
void UnalignedPointerOkay(UNALIGNED DWORD *pdw);

misaligned_members s;
ExpectsAlignedPointer(&s.dw); // wrong
UnalignedPointerOkay(&s.dw);  // okay

What about the member w? Is it aligned or not? Well, it depends.

If you allocate a single structure on the heap, then the w member is aligned, since heap allocations are always aligned in a manner suitable for any fundamental data type. (I vaguely recall some possible weirdness with 10-byte floating point values, but that's not relevant to the topic at hand.)

misaligned_members *p = (misaligned_members)
    HeapAllocate(hheap, 0, sizeof(misaligned_members));

Given this code fragment, the member p->w is aligned since the entire structure is suitably aligned, and therefore so too is w. If you allocated an array, however, things are different.

misaligned_members *p = (misaligned_members)
    HeapAllocate(hheap, 0, 2*sizeof(misaligned_members));

In this code fragment, p[1].w is not aligned because the entire misaligned_members structure is 2+4+1=7 bytes in size since the packing is set to 1. Therefore, the second structure begins at an unaligned offset relative to the start of the array.

One final issue is the expectations for alignment when using header files provided by an outside component. If you are writing a header file that will be consumed by others, and you require special alignment, you need to say so explicitly in your header file, because you don't control the code that will be including your header file. Furthermore, if your header file changes any compiler settings, you need to restore them before your header file is complete. If you don't follow this rule, then you create the situation where a program stops working if a program changes the order in which it includes seemingly-unrelated header files.

// this code works
#include <foo.h>
#include <bar.h>

// this code doesn't
#include <bar.h>
#include <foo.h>

The problem was that bar.h changed the default structure alignment and failed to return it to the original value before it was over. As a result, in the second case, the structure alignment for the foo.h header file got "infected" and no longer matched the structure alignment used by the foo library.

You can imagine an analogous scenario where deleting a header file can cause a program to stop working.

Therefore, if you're writing a header file that will be used by others, and you require nonstandard alignment for your structures, you should use this pattern to change the default alignment:

#include <pshpack1.h> // change alignment to 1
... stuff that assumes byte packing ...
#include <poppack.h>  // return to original alignment

In this way, you "leave things the way you found them" and avoid the mysterious infection scenarios described above.

Comments (27)
  1. nathan_works says:

    Help me out, since I’m not creative enough to envision non-academic scenarios where such things are needed.. About all I can think of was back to Database class where we mangled a disk simulator to write B-tree database structures to the fake disk, and read them back into arbitrary data records. In that scenario, no data was ever aligned, and it was pretty obnoxious. Outside of that, I’ve never dealt with data alignment or needing to have it aligned in any commercial software I’ve worked on. So why would someone want or need to know this ?

  2. Csaboka says:

    Well, x86 CPUs are quite forgiving about memory alignment, so you can get away with ignoring it altogether. (The code will run more slowly when accessing incorrectly aligned data, but it will still be correct.) Other CPUs can raise an exception or do other Bad Things when you try to access misaligned data, and then correct alignment becomes vital.

    In other words, if you ignore alignment issues, you can write code that seems to be perfect ANSI C, and runs on PCs, but dies spectacularly when running it on something else…

  3. njkayaker says:

    I have a binary file format that I needed to support (years ago) across different versions of Windows (16bit and 32bit). Since the default alignments are different for 16bit and 32bit programs, I was able to force the alignment to be the same in all environments.

    Basically, my case was being able to pass structured data between different computers.

    I think the need for this is fairly rare.

  4. Anonymous says:

    As njkayaker says, anything involving structured data might need specific alignment, if only for speed.  This is especially important for using network protocols efficiently.  The Win32 API is built on structures, some of which have unnatural alignment.

    3D code and high-speed stuff that depends on the cache will require careful alignment.  In managed languages it matters less, but if you’re writing in C++, you should always know at least roughly what your memory looks like.

  5. B says:

    This may mark me as a newbie to this territory, but why not just stick with union any time you need alignment? It’s standard, and doesn’t have the many caveats that you attach to the other methods.

  6. Bobby says:

    Most major backup formats use packed data structures on tape or in backup files.

  7. Frank Schwab says:

    B –

    I almost never see structure packing used when a program runs on a platform, and is "self contained".  I see it quite a bit when two programs, running on different computers, are trying to communicate using a shared set of structures.  Anonymous above touched on the same point with Network structures.

    ‘C’ is very vague about structure packing.  For example, on the embedded RISC processor I’m using now, the compiler likes to align everything on 32 bit boundaries; so a structure like:

    struct

    {

     unsigned char a;

     unsigned char b;

    } c;

    takes 8 bytes of memory.  If I pass this structure across a communications port (Say, TCP/IP across Ethernet, or even USB) to another computer, the second computer may not correctly read the data.  If, for example, the compiler on the second computer aligns the two elements on byte boundaries, the structure would only take up 2 bytes on the second computer, and the likelihood of succesfully transferring information between the two is low.

  8. Steve Nuchia says:

    I’ve done intense systems-level programming in C and C++ for over two decades.  In that time I’ve found about a dozen compiler bugs, used goto in production code perhaps three or four times, and had to step outside the portable parts of the languages to specify data alignment twice.

    On the other hand, I’ve had to know what the compiler was doing about alignment pretty much every day.  And just because I was able to use the portable portions of the language to get the alignment I needed doesn’t mean the code was portable — ifdefs are frequently needed.

    Back in the day the Unix programming community was infected by the "all the world is a VAX" disease.  Programmers who had the luxury of workingonly on the VAX believed ints were 32 bits and if p is null then *p is zero, to name just two of the symptoms of this disease.  Today it is the PC that forms the basis for what most programmers think they know about portability.

    This is another one of those posts from Mr. Chen that is incredibly valuable if you happen to need it but completely useless otherwise.  If you think it is useless that’s just because the scope of your work has not yet included the need for it.  Think and learn.

    Word to the wise: in the brave new multicore world alignment and allocation issues will become one of the two or three determining factors of program performance.  This is really very different from anything most programmers have experienced before, and to fix the problems post-hoc will be very hard.

  9. vijairaj says:

    When it comes to packing, I would prefer gcc’s way of doing it – struct {…} attribute ((packed)). This method allows to specify the alignment requirement as an attribute to the structure definition itself, rather than as a separate #pragma push & pop pack. I have seen code in many places where a header file has #pragma pack(push) in the top and a pop at the end. When an unaware user copies one of these packed structures, then the results are scary.

  10. benkaras says:

    Vijairaj: Read Raymond’s post again.  Look for the paragraph where he shows __declspec(align(16)).

    B: Unions cannot pack things tighter than they otherwise would be

    Frank hit it on the head.  If you have shared data, you often must be explicit about the alignment.  It can get really hairy when you have a 32bit program writing data that will be read bya  64bit program.  That’s when alignment matters most.

    It is also rather important to pack persisted data since smaller reads make for faster reads.  

    Finally, in Windows shell namespace extensions, it is a common practice to pack IDLists since they take up memory and are sometimes persisted.

  11. Anonymous says:

    Forcing the alignment is not enough for reading/writing a binary file format (or passing data to/from another computer). You also have to worry about endianess (for instance, TCP/IP is big-endian, while x86 is little-endian).

  12. rbirkby says:

    So as you’re a shell guy, whose bright idea was it to specify SHCOLUMNINFO as 1-byte alignment? Was the expectation that there would be so many columns in a details view that the saving would be worthwhile? Really? Even when Explorer was first written for Cairo back in 1991?

  13. Igor Levicki says:

    What pisses me off is the inability to get aligned memory back from new[] operator.

    Say you have a class:

    class Foo

    {

    public:

    __m128  vec;
    
    Foo(void)
    
    {
    
        vec = _mm_setzero_ps();
    
    }
    

    };

    When you do this:

    Foo *p = new Foo[5];

    It will crash because of MOVAPS in the constructor is expecting 16-byte aligned memory.

    To fix the problem, you have to overload new[] and delete[] and use _mm_malloc() and _mm_free() even though compiler could align the memory for you — I believe it has enough information about the Foo object size and alignment requirements at compile time.

  14. njkayaker says:

    "This may mark me as a newbie to this territory, but why not just stick with union any time you need alignment? It’s standard, and doesn’t have the many caveats that you attach to the other methods."

    Unions allow different types to -share- the same memory by forcing an alignment that works for all the types. (Note that heap memory is typically aligned for all types: It’s union for all.)

    The "packing" stuff mucks with the gaps between different memory blocks in a structure (it has nothing to do with sharing).

  15. scorpion007 says:

    Shouldn’t you be casting to (misaligned_members *) instead of (misaligned_members)?

    And isn’t the function HeapAlloc(), not HeapAllocate()?

    But other than that, thanks for the fantastic information :)

    [There may very well be errors. I assume you’re smart enough to be able to fix them on your own. -Raymond]
  16. KJK::Hyperion says:

    Automatic (stack) allocation doesn’t guarantee the correct alignment, only "good enough" alignment, i.e., given "type stack_variable;", "(INT_PTR)&stack_variable % TYPE_ALIGNMENT(type)" might not be zero. This is an issue if half your code is in user mode and the other half in kernel mode, and you need to call ProbeForRead/ProbeForWrite on one such object (they will raise STATUS_DATATYPE_MISALIGNMENT). Setting packing to a larger value won’t help. The safest solution is to declare "type" as "DECLSPEC_ALIGN(MEMORY_ALLOCATION_ALIGNMENT) type"

  17. Gabest says:

    I ran into a compiler bug around the time when openmp was new in vc, it didn’t align declspec’ed stack variables correctly.

  18. Mike Dimmick says:

    Pragma pack was far more common on DOS-based handheld computers, because packing could actually make the difference between the customer’s data fitting into RAM and not fitting. This counted for both running program memory and ‘disk’ storage, for these devices didn’t have fixed disks, only RAM disks of up to 4MB, but RAM was very expensive on these units!

    On Windows CE on ARM processors you want to avoid it like the plague because it massively bloats your code, as the processor itself will generate misalignment faults, so the code has to include every possible pattern for misalignment.

    In a very few cases when porting code from the old devices to the new, we want the on-‘disk’ structures to be the same, and we’ll leave the packing pragma in. Otherwise, it comes out.

  19. Stewart says:

    B, the other problem with only ever using union is that it only provides a way to increase the packing – you can’t reduce it, so say you have the following struct

    struct { short a; short b; int c; } a;

    By default on WIN32 using VC++ a and b will be packed into four bytes (IIRC) and c will follow so sizeof(a) will be 8 bytes. If you wanted a to occupy four bytes, this could be done with a union (this is most often done when you have a one element array at the end of a structure which is really of variable length).

    Unfortunately, if you have this

    struct ( short a; int b } a;

    a will be two bytes followed by two bytes of padding to make b dword aligned (again on win32 with VC++) – there is no way that I know of with a union to tell the compiler that actually you want the size of this structure to be six bytes.

  20. Allan says:

    Are there cases where misalignement of data on Windows/x86 causes slower execution?

    Most docs I searched were either for RISC architecture, or were somewhat imprecise with the concept.

    For example, the following link:

    http://msdn2.microsoft.com/en-us/library/aa984851(vs.71).aspx

    Speaks about performance issue on 386 and RISC processors, but nothing on 486 or pentiums. Are these last processors immune to the speed degradation induced by misaligned data?

    Thanks!

    [You need to improve your search engine skills. I searched for site:intel.com misaligned and found lots of information, including this page which says that a misaligned access causes a six- to twelve-cycle stall. -Raymond]
  21. Igor Levicki says:

    "Are there cases where misalignement of data on Windows/x86 causes slower execution?"

    There are also cases where it causes NO execution. For SIMD data types if you acccess unaligned memory with MOVAPS/MOVDQA (instead of MOVUPS/MOVDQU) you get a crash.

    MOVUPS/MOVDQU in combination with unaligned memory can result in 50% slower execution time compared to properly aligned data and MOVAPS/MOVDQA.

    Always align your data in memory, especially for SIMD data types!

    Also, if you have two threads accessing variables which reside next to each other in memory, make sure you align those variables to the cache line boundary (i.e. so that they end up in different cache lines) in order to prevent false sharing which is a significant performance penalty).

  22. Igor Levicki says:

    Here — Intel® 64 and IA-32 Architectures Optimization Reference Manual:

    http://developer.intel.com/design/processor/manuals/248966.pdf

    You may find that x86 is not as forgiving when it comes to alignment after all, it is just that some of you have chosen to ignore it.

  23. Allan says:

    > You need to improve your search engine skills.

    Thanks. I’m reading the page right now. At first glance, I’m feel like, ahem, I’m reading japanese…

    :-p

    Anyway, this is the way we learn, so…

    :-)

    > You may find that x86 is not as forgiving when it comes to alignment after all, it is just that some of you have chosen to ignore it.

    Thanks for the link to the PDF.

    I will forward the info to the people who *did* make misalignement-related decisions years before I joined the team.

    ;-p

  24. Miral says:

    I know the feeling.  I work with a large codebase that’s pretty much entirely been #pragma pack(1)’d — largely because it’s been inherited from way back in the DOS ages and it has a lot of structures representing on-disk record layouts.  I’ve been itching to get rid of it (or at least reduce the scope a bit) for a while now but haven’t been able to get approval.

    Interestingly it all still runs just fine even with the misalignments — and this is for a soft-real-time control system, so performance is important…

  25. Ian says:

    I got badly bitten by alignment issues on an embedded platform. The X86 version ran without problems, but on the PPC version I was getting the occasional wrong value for a floating point number.

    It turned out to be a bug on the logic board. Reading a misaligned float on that particular processor was supposed to raise an exception, but the exception was being lost and the program was silently continuing with zero instead of the proper value.

    Discovering it was an alignment issue took a bit of psychic debugging to start with. Fixing it was the next problem.

    I couldn’t change the alignment so I had to write a macro to copy the misaligned data (as a bit-pattern) into an aligned buffer and then read it from there as a float.

    Finding every instance where the program read a float that could possibly be misaligned was an experience I would prefer not to have to repeat. It is not out of the question that there are one or two instances left in there, and I might never know.

  26. Mike Dimmick says:

    There are still a few pages on MSDN which refer to ‘386’ as a general term for 32-bit x86 processors. For 386, generally, read ‘386 and later’ unless specifically noted otherwise.

    Another thing you need to watch out for is misaligning anything you plan to use with interlocked instructions. If you do, x86 processors will still perform the requested operation (add, exchange, compare) BUT it won’t be properly interlocked – it’ll do two read/write cycles to the two aligned locations with the LOCK line asserted, but the intra-CPU bus doesn’t have any way to associate the two operations as being atomic.

    I’m not sure but I think Raymond’s example link is actually talking about how the Intel Fortran compiler is using SSE instructions (that would fit with the 16-byte = 128-bit alignment), although it could be related to cache lines (the processor *actually* asks for data from main memory in cache-line chunks, to take advantage of the ‘burst mode’ sequential output of all RAM modules since EDO, unless performing interlocked accesses or other non-combining reads or writes). I’m not sure how big current cache lines are, I think it’s 64 bytes.

    Speaking of cache lines, here’s something interesting I found relating to Core 2: apparently regular unaligned reads are not too bad, but you get a perf hit if they cross cache line boundaries, and an even bigger one crossing page boundaries. Source: http://forums.xkcd.com/viewtopic.php?f=11&t=15337. Because cache line boundaries and page boundaries are (large) multiples of 8 bytes, if your structures are aligned you’ll be OK, or at least you won’t hit this particular problem.

    A lot of programmers have a mental model of the computer that is twenty years out of date. To understand the modern machine, I suggest watching Herb Sutter’s presentation to the Northwest C++ Users’ Group, "Machine Architecture: Things Your Programming Language Never Told You" at http://www.nwcpp.org/Meetings/2007/09.html.

  27. anonymous coward says:

    Raymond,

    you state:

    "There are a few ways to force a particular alignment. The one that fits the C language standard is to use a union:

    union char_with_int_alignment {

    char ch;

    int Alignment;

    } u;

    Given this union, you can say u.ch to obtain a character whose alignment is suitable for an integer."

    The C language standard does not state (or even know) anything about alignment. In fact, the way a union is described, a compiler is even free to treat a "union" like a "struct", and still be a C compiler.

    Another option for a compiler might be to put the ch at ((char*)&Alignment)[1] or so, and still be according to the C standard. (No, I am not arguing if this would make sense in most cases, but the compiler is indeed free to do so).

    The fine print: There is NO way to do this totally portable. Of course, experience tells us that on most compilers, this will indeed work.

Comments are closed.