Importance of alignment even on x86 machines, part 2


The various Interlocked functions (InterlockedIncrement, and so on) require that the variable being updated be properly aligned, even on x86, a platform where the CPU silently fixes unaligned memory access invisibly.

If you pass an unaligned pointer to one of the Interlocked functions, the operation will still succeed, but the result won't be atomic. Another processor may see a partially-completed update.

This is a particularly insidious bug since it happens only on multiprocessor machines under very tight timing conditions. You will be hard-pressed to reproduce this in the laboratory.

(A commenter stole my thunder and remarked on it yesterday.)

Moral of the story: Same as yesterday. Mind your alignment.

Comments (7)
  1. Dan Shechter says:

    Actually, this is not entirely true.

    Operations (such as Interlocked*) are using the lock# signal on the bus.

    According to Intel’s documentation, IA-32 System programming guide, section 7.1.2.2:

    "The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK

    semantics are followed for as many bus cycles as necessary to update the entire operand."

    In addition, for PERFORMANCE reasons:

    "… it is recommend that locked accesses be aligned on their natural boundaries for better

    system performance:

    • Any boundary for an 8-bit access (locked or otherwise).

    • 16-bit boundary for locked word accesses.

    • 32-bit boundary for locked doubleword access.

    • 64-bit boundary for locked quadword access.

    "

  2. This may be true for the most recent IA-32 processors, but it certainly was NOT true in the past. There have been very real bugs (even as far back as the 8088) where unaligned use of the LOCK instruction caused inconsistant results.

  3. Jordan Russell says:

    FYI, I checked an old Pentium Pro manual and it includes the same statement.

    But one has to wonder whether it holds true on non-Intel x86 processors. I wouldn’t count on it…

  4. Phaeron says:

    I wouldn’t worry about any current non-Intel x86 CPUs. The clones have to maintain a very high level of compatibility or else lots of software, particularly OS kernels, will break. This includes handling a page fault which causes the CPU to attempt invoke a fault handler from an interrupt vector that is on a non-resident page….

    Digging around a bit on Google Groups, however, I found an article that describes systems that don’t support unaligned locked transactions (http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=7ulma9%24sg%241%40nnrp1.deja.com). It seems that even if the CPU supports such transactions, older multiprocessor glue logic may not. The article also says that such systems aren’t very compatible in general, though.

  5. Mack says:

    I thought the times when people would derogatorily call AMD’s CPUs "clones" of Intel designs are past…

  6. Norman Diamond says:

    8/30/2004 10:28 AM Larry Osterman

    > There have been very real bugs (even as far

    > back as the 8088) where unaligned use of the

    > LOCK instruction caused inconsistant

    > results.

    Something is odd here.

    The 8086 did 16-bit bus accesses and it’s easy to imagine how unaligned addresses might not mix well with the LOCK prefix. (It’s also easy to imagine that Intel might have already considered and decided to make it work, so this guess is just a guess.)

    The 8088 did 8-bit bus accesses so a 16-bit operand required 2 bus accesses. There was no meaning to 16-bit alignment.

    8/30/2004 1:33 PM Phaeron

    > I wouldn’t worry about any current non-Intel

    > x86 CPUs.

    Neither did one of my co-workers about 3 years ago. His code accessed fields of .BMP structures as defined and unaligned. I don’t remember if Windows CE recovered after killing the application, or if Windows CE hanged. Anyway I tried to teach him about alignment and the memcpy() function. It turned out that my boss had told him that all the fields were aligned and he shouldn’t worry about unaligned accesses. Oops.

    Personally I think it’s better if the hardware does handle unaligned operands automatically. Applications don’t always get to specify the data layout, and graceful degradation of hardware performance is still tons faster than calling memcpy().

  7. Cooney says:

    Mack:

    > I thought the times when people would derogatorily call AMD’s CPUs "clones" of Intel designs are past…

    Well, they are clones, just not design clones. Whether you like it or not Intel still calls the shots on x86-32 behavior. AMD may manages to grab the wheel for x86-64, but only time will tell.

Comments are closed.