The Alpha AXP, part 13: On treating a 64-bit processor as if it were a 32-bit processor


The Alpha AXP is natively a 64-bit processor, but Windows NT for Alpha AXP treated it as a 32-bit processor. How is that possible?

The Alpha AXP registers are full 64-bit registers, and you are welcome to use all 64 bits to perform your computations, in the same way that you are welcome to use the 32-bit registers of the x86 even when the processor is in 16-bit mode.

The 32-bit-ness of the operating system comes into play when it comes to memory access. Although the address space is 64 bits, Windows NT uses only 4GB of it. It splits the memory down the middle, with the lower 2GB of memory at addresses 0x00000000`00000000 through 0x00000000`7FFFFFFF, and the upper 2GB of memory at addresses 0xFFFFFFFF`80000000 through 0xFFFFFFFF`FFFFFFFF.

It is no accident that this split exactly matches the Alpha AXP's canonical format for 32-bit integers. It means that the code can truncate all addresses to the lower 32 bits, and then sign-extend the values to recover the original pointers. Sign-extending 32-bit values to 64-bit values is a common operation on the Alpha AXP, and it generally happens "for free".

We saw earlier that there is some weirdness in calculating constants near the 2GB boundary, and Windows NT finesses the problem by simply declaring those addresses off-limits.

The fact that the processor is still a 64-bit processor means that you can carve out memory and stick it in the address space that is otherwise going to waste. That's what Very Large Memory (VLM) does. You can use functions like Virtual­Alloc­Vlm to allocate memory, with the additional note that, "Hey, like, you can put this memory anywhere in the 64-bit address space. You don't have to limit yourself to the 32-bit-compatible portion of the address space."

Of course, your code had to remember to preserve all 64 bits of these addresses, and to use the full 64-bit value as a memory address. I don't know if any compilers supported 64-bit pointers on Alpha AXP, but if you did, then you had 8 terabytes of address space to play with.

Comments (20)
  1. Pietro Gagliardi (andlabs) says:

    That splitting the memory down the middle near the beginning of the article reminds me of how the word-indirect addressing modes work on the 68000 family of processors: sign-extending a 16-bit word to a 32-bit long to either give you access to the lowest possible or highest possible 32KB of the processor’s address space. On the Sega Genesis, the game cartridge is mapped to the bottom of memory (it’s the boot ROM) and the 64KB of general purpose RAM are mapped to the top, so games will often access the top of RAM using the word-indirect modes; in disassemblies, this comes up as something along the lines of move.w (#$FFFFFEDC).w,d0. Using the same for the low 32KB of ROM (for instance, in function-call instructions) is also common, but I’m not sure how common… (I have also seen games use other similar “tricks” to save bytes in function calls, though.)

  2. pc says:

    So, Windows programs compiled for Alpha could allocate and use anything within the full 64-bit space, but then could only make OS calls with pointers within the 32-bit-compatible space? Somehow that surprises me, though it makes some amount of sense as a transitioning to 64-bit strategy, if the only 64-bitness stuff is within individual programs.

    Was this also true when using 32-bit Windows on the Itanium and/or AMD64 processors? Or were they different enough that they had to be more “all or nothing”?

    1. The Alpha does not have a 32-bit mode. It is all 64-bit mode all the time. Windows NT had to set up 64-bit page tables and a 64-bit address space, but voluntarily chose to use only the 32-bit part of the address space. Since it was a voluntary restriction, you could also voluntarily choose to use some of it after all.

      On the other hand, x86-64 is a separate mode from x86-32. if the processor is in 32-bit mode, then you set up 32-bit page tables and a 32-bit address space, and 64-bit addresses simply don’t exist. Indeed, x86-64 and x86-32 encode instructions differently, so you can’t even execute the same instructions.

      1. Ben Voigt says:

        But people can and do (on non-Microsoft OS) run x86_64 processors in “long mode” and then use 32-bit addresses. See https://en.wikipedia.org/wiki/X32_ABI

        This seems essentially the same as what Windows NT did on the Alpha.

        1. Yup. But 32-bit Windows on x86-64 does not do that. 32-bit Windows uses the processor in x86-32 mode. (Because if it used the processor in x86-64 mode with 32-bit pointers, then it wouldn’t run on 32-bit CPUs.)

        2. Yuhong Bao says:

          This reminds me of /LARGEADDRESSAWARE:NO supported in Windows.

          1. Joshua says:

            Which, surprisingly, works on 64 bit Windows programs last I checked. I think these days the system DLLs are loaded above the 2GB barrier anyways but other than that the flag forces stuff below 2GB so you really could do 32 bit pointers.

        3. Jan Ringoš says:

          It would be amazing to have X32 ABI with all it’s benefits version of Windows. Of course, one would need to recompile all the software akin to ARM version of Windows (before CHPE that is), but I could live with that, even if it would come only in a Server Core flavor.

          1. poizan42 says:

            You don’t need kernel support for that. It’s not like 64-bit NT even has any 32-bit system calls. You would probably want to add some “FAR” pointer support to your compiler though to call the normal 64-bit API libraries. You could really get away with only calling ntdll and user32 which never allocates memory for you so you just always pass in a 32-bit address (but Raymond may send an assassin to kill you for such blatant use of an undocumented interface). Alternatively you could just make the process a normal 32-bit process and then switch in and out of 64-bit code with a long jump to 0x23/0x33 to call 32-bit apis if you can live with the extra performance penalty. That is also undocumented, but at least it’s not using any api’s that might be unstable. In that case there is probably also some interesting questions about registering unwind data properly if your PE is a 32-bit PE.

  3. Yuhong Bao says:

    I believe that it was Visual C++ 5.0 that added __ptr64 support, which PVOID64 etc in turn uses.

  4. Marvy says:

    I think I’m missing something obvious. Why just 8TB? That’s only 43 bits. Who stole the rest?

    1. haltiamreptar says:

      The original alpha only broke out 43 of the 64 address lines, and early cpus were required to keep the unimplemented bits 0.

      1. Fabian Giesen says:

        Address lines are for physical addresses; this is about virtual address space. The Alpha 21064 “only” had 34 physical address lines (16GB of physical memory max), 21164 had 40-bit physical (1TB), not sure if this ever got increased later; the 43-bit virtual address space limit is for other reasons.

    2. Fabian Giesen says:

      64-bit processors have 64-bit pointers but generally don’t allow arbitrary 64-bit values as addresses. Instead, they usually have some subset of the address space that can be mapped, and addresses outside that range are always considered invalid.

      This is because extra address bits have a real cost in hardware. These are extra bits that need extra wires to be passed around the chip; they make some important caches (mostly TLBs) larger and slower because they now need to accept, store and compare larger addresses; and having larger addresses makes the virtual->physical address translation more involved.

      So all existing implementations of 64-bit architectures don’t actually do full 64-bit addresses, they do somewhat less, and increase them as memory sizes go up. The idea being that while it’s really hard to increase pointer sizes (since it causes all kinds of binary compatibility problems), increasing virtual address space size while leaving pointer size the same requires some changes to the OS but is almost completely invisible to app code, as long as apps don’t try to be clever and bit-pack other values in pointers, because they know real addresses are only a subset of the entire 64-bit space.

      Current x86-64 CPUs have a 48-bit address space (an extension to 57-bit has recently been specified). 64-bit ARM (AArch64) can’t currently do more than 56 bits (and has a mode where the top 8 bits are explicitly ignored by the CPU and can be used by the app for pointer tagging).

      x86-64 uses a tree-structured page table scheme (implemented in hardware) where the smallest page size is 4 kilobytes (that takes care of the bottom 12 bits) and subsequent translation levels add 9 bits each (512 entries because each level of translation is effectively an array of pointers, and 512 entries * 8 bytes/pointer = 4096 bytes neatly fits inside a single page). First-level gives us 12+9=21 bits of address space, then 30, 39, and 48 (the current limit) for a 4-level tree walk. (That’s also why the extension is to the odd-sounding 57 instead of 56 bits; that’s one more tree level.)

      Alpha also typically used a tree-structured page table (“software”/firmware, implemented in PALcode) with 8KB pages and subsequent levels adding 10 bits each (again, 1024 entries times 8 bits equals 8K, so that individual translation tables are exactly one page). One tree level gives us 13+10=23 bits, then 33, and finally 43 bits for a 3-level tree walk. Had Alpha stuck around long enough to get extended for another round, it likely would have gotten a 53-bit address space next, and then 63 after that.

      1. Yuhong Bao says:

        I believe technically the physical/virtual address space is supposed to depend on the page size. It is just that the vast majority of implementations only support the 8K page size. Of course, even with SList based on 64-bit cmpxchg 44-bit virtual addressing was possible, and the Alpha used LL/SC instead so you didn’t need the sequence number only the depth.

      2. Marvy says:

        Well that was a very thorough answer. Thank you.

  5. Neil says:

    Surely if the processor had generally given you zero-extending for free (instead of sign-extending), then that would work in much the same way, the only difference being where you needed to map the pages?

    1. It would, but it would also create issues if you wanted to expand the virtual space in use.

      With sign extension and the Windows NT memory map for Alpha, a future Alpha 64-bit Windows continues to split memory down the middle – 0x00000000`00000000 to 0x7FFFFFFF`FFFFFFFF is “user memory” and 0x80000000`00000000 through 0xFFFFFFFF`FFFFFFFF is “OS memory”.

      With zero extension, you have awkwardness in the first 4GB – 0x00000000`00000000 to 0x00000000`7FFFFFFF is user memory, 0x80000000`00000000 to 0x00000000`FFFFFFFF is OS memory, then you’re back to user memory until (say) 0x80000000`7FFFFFFF, then OS memory again. The boundaries are more awkward in this scheme, and it’s harder for the OS to grow from 2GB user address space to larger address spaces.

      FWIW, this is why x86-64 canonical addresses are sign extended (and the CPU is expected to fault if the top N bits for a processor implementing 64-N bits of virtual address space are not identical); it avoids the messy split when you want to extend virtual address space.

  6. Tinker says:

    The architecture series deserve their own tag. Just sayin’.

    1. Joshua says:

      Seconded

Comments are closed.

Skip to main content