Why does the x86 have so few registers?


One of the comments to my discussion of 16-bit calling conventions wondered why the 8086 had so few registers.

The 8086 was a 16-bit version of the even older 8080 processor, which had six 8-bit registers, named A, B, C, D, E, H, and L. The registers could be used in pairs to products three 16-bit pseudo-registers, BC, DE, and HL. What's more, you could put a 16-bit address into the HL register and use the pseudo-register "M" to deference it. So, for example, you could write "MOV B, M" and this meant to load the 8-bit value pointed to by the HL register pair into the B register.

The 8086 took these 8080 registers and mapped them sort of like this:

  • A -> AL
  • H -> BH, L -> BL; HL -> BX; M -> [BX]
  • B -> CH, C -> CL; BC -> CX
  • D -> DH, E -> DL; DE -> DX

This is why the 8086 instruction set can only dereference through the [BX] register and not the [CX] or [DX] registers: On the original 8080, you could not dereference through [BC] or [DE], only thorugh M=[HL].

This much so far is pretty official. The instruction set for the 8086 was chosen to be upwardly-compatible with the 8080, so as to facilitate machine translation of existing 8-bit code to this new 16-bit processor. Even the MS-DOS function calls were designed so as to faciliate machine translation.

What about the SI and DI registers? I suspect they were inspired by the IX and IY registers available on the Z-80, a competitor to the 8080 which took the 8080 instruction set and extended it with more registers. The Z-80 allowed you to dereference through [IX] and [IY], so the 8086 lets you dereference through [SI] and [DI].

And what about the BP register? I suspect that was invented on the fly in order to facilitate stack-based parameter passing. Notice that the BP register is the only 8086 register that defaults to the SS segment register and which can be used to access memory directly.

Why not add even more registers, like today's processors with their palette of 16 or even 128 registers? Why limit the 8086 to only eight registers (AX, BX, CX, DX, SI, DI, BP, SP)? Well, that was then and this is now. At that time, processors did not have lots of registers. The 68000 had a whopping sixteen registers, but if you look more closely, only half of them were general purpose arithmetic registers; the other half were used only for accessing memory.

Comments (23)
  1. Henk Devos says:

    My guess is that this long history (going back to the 4-bits 4004) must have a negative impack on the design of the processor.

    Maybe it would have been better to emulate older processors in software at certain stages, instead of building this emulation into the processor itself and staying 100% compatible.

    Would it be fair to say that if 8-bit and 16-bit compatibility would not exist, there would be more free space on the chip and consequently more registers?

  2. Eric Wilson says:

    Side note: When it says above, "At that time, processors did not have lots of registers." the reason is not because of any sort of hidden agenda against more registers, or any belief that more registers were unneeded. It was purely a decision based on the cost of transistors at the time. Remember, these processors were built at a time when 29,000 transitors was all you could fit on the die:)

    Unfortunetly, Intel has never been able to reconcile the desire for more registers with the desire for backwards compatibility (they very wisely it turns out stuck with compatibility). That being said, internally ever modern CPU from Intel uses register renaming so that the lack of registers has only a small effect on performance, if any.

  3. James Curran says:

    I’m not sure of Transitor count alone can explain the lack of registers on the 8086. The Z-8000 (Zilog’s successor to the Z-80, and competitor of the 8086), had 16 16-bit registers, all general purpose (in addition the program control registers: IP, SP etc).

    (OTOH, fat lot of good that did for Zilog, who seems to have disppeared in the fog of time….)

  4. S N says:

    I thought about it too. But then I remembered the two instructions that exists and used mostly.

    PUSHA (PUSH ALL)

    POPA (POP ALL)

    If you include the new resgiester as part of above two instructions, then think about what would happen to the performance of the existing applications. They would be crawling rather than speeding.

  5. Raymond Chen says:

    For compatibility reasons of course you can’t change the registers pushed and popped by pusha/popa. So if new registers are added you’d need a new instruction to push and pop those new registers (like pushad and popad when the registers got extended to 32 bits).

  6. Peter Lund/firefly@diku.dk says:

    pusha/popa were introduced in 80188/80186. The original 8088/8086 didn’t have them.

    What baffles me about the x86 is not the low number of registers or how they are all somewhat less than general purpose or how unorthogonal it is (I like optimizing assembler code :) ).

    No, what baffles me is one tiny little piece of orthogonality in all the chaos: the SP register is treated as a GPR. The instructions that operate on registers that aren’t fixed have three bits to encode the register they operate on (two-address instructions have two of those fields, of course). Why is SP one of those eight registers that can be encoded there? Wouldn’t it have made sense to special-case SP so you could move between BP and SP, you could add/subtract from SP, you could move between SP and AX and you could load SS and SP together from memory (the 8088/8086 couldn’t do that — this is your cue to talk about microcode bugs, Raymond ;) ). Why on earth can you negate SP, xor it with dx, shift it, rotate it, etc?

  7. Eric Wilson says:

    Absolutely. A sure recipe for disaster is to have a OS allocate some (fixed) amount of space for the register set (let say 256 bytes for arguement sake), then have the CPU instruction PUSHA write an extra 16 bytes:)

    Can you say, "buffer overrun"? I knew you could:)

  8. Raymond Chen says:

    Nowadays it doesn’t matter — efficient memory caching takes the place of many, many registers.

  9. Raymond Chen says:

    (note: the above "Raymond Chen" comment was not from me; must be somebody else with the same name as me, or somebody trying to fool people)

  10. asdf says:

    This is giving me bad flashbacks of gameboy assembly programming. Make is stop Raymond, make it stop.

  11. Russ C. says:

    I’ve just got a real nasty urge to dig out Turbo C++ 3.0 and start emitting :)

  12. At one level you can’t answer this kind of question in hindsight, knowing what we know now. There’s so much context in decsions like this: what kinds of programs are people writing, how complex are compilers and can they really take advantage of having a larger number of registers (most couldn’t).

    Aside from all that, here’s a practical issue: if you’re running a pre-emptive multitasking operating system, every context switch requires swapping all the register values, which can take a long time (measured in CPU cycles). RISC processors have lots of registers; CISC processors tend to have much fewer. That makes RISC much better for tight, non-preempted data-intensive loops, and CISC better for essentially everything else.

  13. Peter Lund/firefly@diku.dk says:

    Erik Wilson: the original ARM chip from the middle of the eighties used around 27000 transistors and had 32-bit registers, a barrel shifter, a 32-bit ALU, and around 20 registers (16 registers you could access, one of them being the PC, another would be designated as SP — some of these had "shadows" for fast interrupt processing). The 68K had 68000 transistors and was slower. So it was not just a question of transistor budgets but also a question of design philosophies.

    8088/8086 date back from the mid to late seventies where the transistor budgets were *much* smaller. And then there was the backwards compatibility thing, as Raymond says.

  14. Raymond Chen says:

    (Curiously nobody has yet noticed my fantastic inability to count. A,B,C,D,E,H,L = seven registers, not six.)

  15. asdf says:

    Gotta love off by one errors.

  16. Mark Hurd says:

    The 8-bit 6502 (Vic20, Apple II, etc) only had 4: A IX IY SP. That is 1 general purpose, 2 index registers, and a stack pointer.

    It had smaller, faster opcodes to access the zero page 0x0000-0x00FF, and the stack was between 0x01FF-0x0100 (going down).

    Considering it ran at 1MHz, it is always pleasing to compare the quality and performance of the programs (especially games) written for it with the IBM-PC programs of the same period.

  17. Raymond Chen says:

    Ah the 6502. Actually I believe the registers were called A, X, Y, and S. Maybe it depends whose mnemonics you were using. I’ve heard the 6502 described as "a 256-register processor [zero page] except that you had to write your program in microcode."

  18. Mark Hurd says:

    Yes, I used names similar to the ones already mentioned to help people understand their purpose.

    If I’d come across the C compiler for the Commodore 64/128 when it was produced I may not have yet made the transition to Windows PCs for hobby computing. (That is, while I was using Unix and Windows at work, often it was for stuff that I couldn’t bring home anyway, so my Commodore 128 was my home computer well into the beginnings of the Internet.) Unfortunately, I only found a C64/128 C compiler after I owned a PC, and the speed difference was too much to try to keep using it.

    There’s probably a few features of the C64 that still aren’t provided in today’s systems, but there are two that are available that I’d like to know why they’re not made easily accessible:

    – sprites: That is like the mouse cursor, (and menus?) that hardware handles rather than software keeping the background clean

    – ‘computer generated’ sound: The C64 emulators use DirectX to manufacture the SID chip sounds but I’m surprised there hasn’t been an API as simple as the SID chip available. I’ve guessed it’s because of licencing issues and non-standardisation of sound cards, but I’d like to know.

    I’ve not needed either of these for work, of course, so I’m ignorant of the current possibilities, but I’m as interested in any historical stories as available APIs.

  19. Raymond Chen says:

    Windows used sprites (hardware cursors) but eventually sprites were ditched because…

    1. sprites were monochrome only (everybody likes color cursors)

    2. you can’t do alphablend dropshadow stuff on sprites (new for Windows 2000)

    It’s possible that these restrictions have since been lifted; those were the reasons at the time.

  20. Patrik Weibull says:

    Pushing all registers onto the stack is fast, and this is what you do between each time slice in a multitasking system. So, with respect to multitasking, the x86 with its few registers is quite good.

  21. Peter Jacobi says:

    Anybody still looking here?

    IMHO there was another important reasons to

    – keep the register count low

    – make the registers special purpose

    For the sake of a small instruction size!

    Every added register, every new use for an

    old register, is likely to give troubles how to encode the new instructions. Look Z80 and later the evolution of the X86. It’s all ugly "Prefix" stuff.

Comments are closed.

Skip to main content