The PowerPC 600 series, part 1: Introduction


The PowerPC is a RISC processor architecture which grew out of IBM's POWER architecture. Windows NT support was introduced in Windows NT 3.51, and it didn't last long; the last version to support it was Windows NT 4.0. Despite not being supported by the flagship operating system, it continued to be supported by Windows CE, and a later version of the PowerPC was chosen as the processor for the Xbox 360.

As with all the processor retrospective series, I'm going to focus on how Windows NT used the PowerPC in user mode because the original audience for all of these discussions was user-mode developers trying to get up to speed debugging their programs on PowerPC.

The PowerPC 600 series started out as a 32-bit processor, with 64-bit support arriving in the 620. The earliest record I can find (not that I looked very hard) shows Windows NT supporting the 603 and 604 processors. I guess this makes sense, because Wikipedia says that the 603 was the first processor to support the full PowerPC instruction set. The 603 could complete a maximum of two instructions per cycle; the 604 could do up to four. The 603 did not have a dynamic branch predictor, but the 604 did. Both could forward arithmetic operations into the next arithmetic operation, so consecutive integer arithmetic operations did not stall, even if the second depended on the result of the first.

The PowerPC 600 series processors are natively big-endian, with an option for little-endian operation. Windows NT uses the processor in 32-bit little-endian mode.¹ Even though the processor can be put into little-endian mode, this affects only how bytes are swapped when they are read from or written to memory; the instructions themselves still operate in a big-endian way, Among other things, the bits in a register are numbered from most-significant to least-significant: Bit 0 is the high-order bit, and bit 31 is the low-order bit.

The PowerPC has 32 integer registers, each 32 bits wide. They are officially named GPR0 through GPR31, but the assembler just calls them 0 through 31. This is ridiculously confusing,² so nobody uses the purely numeric names. People call them r0 through r31. (Some assemblers call them r.0 through r.31.)

Register Mnemonic Meaning Preserved? Notes
gpr0 r0 No Of limited use
gpr1 r1 stack pointer Yes Includes 232-byte negative red zone
gpr2 r2 table of contents Yes, mostly Access to global variables
gpr3gpr10 r3r10 argument No On function entry, contains function parameters
gpr11 r11 temporary No For function glue
gpr12 r12 temporary No prologue and epilogue helper
gpr13 r13 read-only Yes TEB
gpr14gpr31 r14r31 saved Yes

Note that this does not exactly line up with the PowerPC register conventions for other platforms. (Many other platforms assign special meanings to gpr11 through gpr13.)

The stack must be kept on an 8-byte boundary. There is a large red zone of 232 bytes at negative offsets from the stack pointer. We'll see the importance of this when we look at function prologues.

The function return value is placed in r3.

The r0 register is of limited use because many instructions cannot use a source of r0. We'll see more about that later.

We'll learn about the table of contents, function glue, and epilogue/prologue helpers later when we cover Windows NT software conventions.

In addition to the general-purpose integer registers, there are a number of special-purpose 32-bit integer registers. There are only nineteen of these special-purpose registers, but the numbers range from spr1 to spr1013. (The number space is very sparsely populated, but I guess they reserved room for adding more registers in the future.) These are the ones you're likely to see in user-mode code:

Register Mnemonic Meaning Preserved? Notes
spr1 xer Status bits No Integer exception register
spr8 lr link register No On function entry, contains return address
spr9 ctr counter No Dedicated counter or jump target
fpscr fpscr Status bits ? Floating point status and control register

I've never had to deal with floating point on the PowerPC, so I don't know what parts of fpscr need to be preserved and what parts don't.

We'll learn more about the other special registers as the need arises.

Remember how the Itanium, MIPS, and Alpha don't have a flags register? Well, the PowerPC scoffs at them. "Flags register? You say you want a flags register? I've got your flags register right here. In fact, I've got eight sets of flags registers." They are named cr0 through cr7, each four bits wide. (The "cr" stands for condition register.) The pseudo-register cr can be used to treat them as one giant 32-bit register.³ Remember that the PowerPC is a big-endian processor, so cr0 occupies the most significant bits of cr, and so cr7 occupies the least significant bits.

Condition register cr0 is the implicit target of integer operations, and Condition register cr1 is the implicit target of floating point operations. I don't know which condition registers must be preserved across calls, because I've never found any code that needed to.

The PowerPC also has 32 floating-point double-precision registers, officially named FPR0 through FPR31.

Register Mnemonic Preserved? Notes
fpr0 f0 No temporary
fpr1fpr13 f1f13 No Function parameters
fpr14fpr31 f14f31 Yes

As for instruction encoding, each instruction is 32 bits wide and must be aligned on a four-byte boundary. The instruction whose encoding is 0x00000000 is reserved as an invalid instruction, so trying to execute a page of zeros will instantly fault.

The general syntax for multi-operand opcodes is

    opcode  destination, source1, source2, source3...

with the notable exception of store instructions, which put the source register on the left and the address destination on the right.

The architectural terms for operand sizes are byte, halfword (2 bytes), word (4 bytes), doubleword (8 bytes), and quadword (16 bytes). In 32-bit operation, the largest unit that can be operated on directly is the word.

In opcode names, the word arithmetic is used to emphasize that the operands are treated as signed (usually abbreviated a), and the words logical (l) and unsigned (u) or sometimes zero-extended (z) are used to emphasize that the operands are treated as unsigned. I guess they couldn't make up their mind what to call it unsigned operations, so they chose one at random each time they needed one. Note further that these conventions are not uniformly applied, so stay alert.

The processor maintains the fiction that every instruction is retired completely before the next one starts. Consequently, there are no architectural branch delay slots or load delay slots. It also means that when an exception is raised, all instructions preceding the exception have run to completion, and no instructions after the exception will appear to have started.

Internally, the processor may perform operations out of order or in parallel or speculatively, and it may introduce stalls if your dependencies are too close together, but the processor does its best to hide this from the code being executed.

There are two notable exceptions to the principle of sequential operation:

  • Floating point exceptions in imprecise mode can be delayed beyond the instruction that triggered the exception.
  • Self-modifying code requires special instructions to evict the old instructions out of the I-cache.

Both reads and writes to memory can be reordered, and reads can be speculated. Storing a value may partly succeed before raising an exception. (For example, an unaligned store that crosses into an invalid page may write to the valid page and then take an exception on the invalid page.)

Okay, that's enough background. We'll pick up next time by taking a closer look at those condition registers.

¹ When the processor is in 32-bit mode, you can still execute 64-bit instructions. However, since Windows NT did not require a 64-bit capable version of the PowerPC processor, PowerPC programs for Windows NT had to perform runtime detection of 64-bit support and run either a 32-bit friendly version of the code or a 64-bit version of the code. In practice, nobody did this. They just stuck to 32-bit code. (Even though you could use 64-bit instructions in 32-bit mode, the ABI preserves only the least-significant 32 bits of saved registers.)

² The designers of the PowerPC assembly language appear to be dedicated to making their instruction set as confusing as possible by making the assembly language be just barely more readable than machine code. For example, to say "Decrement the counter, and branch if the result is zero and the eq flag is set in cr3", they want you to write

    bc  2, 14, destination

Because obviously 2 means "decrement counter and branch if the result is zero and the specific flag is set", and naturally 14 means "the eq flag in cr3."

The Windows disassembler substitutes names for some (but not all) of these magic numbers at disassembly so you don't have to remember all the codes.

³ You might think, "Who's to say which is the real register and which is the pseudo-register? You could equivalently think of cr as the real register, and the cr# registers as pseudo-registers!" Perhaps so, but the processor can execute operations on different cr# registers in parallel. If cr were the real register, then you would expect multiple operations on different cr# registers to be dependent on each other since they are all operating on cr.

Comments (19)
  1. Looking forward to the rest of this series!

  2. Vas Crabb says:

    “Integer exception register”? It’s alway “fixed point exception register” in the IBM docs. Also “fixed point unit” rather than “integer unit”, etc.

    1. The column is called “Notes”, not “Official name”. The purpose of these notes is to provide just enough information so you can debug problems, not to provide official documentation. If I called it “Fixed point exception register” it would be less obvious what that means.

      1. Darran Rowe says:

        Fixed point normally implies decimal numbers which are not whole, which use a format that does not let the decimal point move. For example a 32:32 format, where 32 bits are used for the portion left of the decimal point, and 32 bits are used for the portion right of the decimal point.
        While you can see integer numbers as a special case of fixed point numbers where 0 bits are used for the portion right of the decimal point. Fixed point doesn’t normally imply integer.

        1. While true, the terminology is so rarely used that it would take most people a few minutes for that to click (I know when I read Vas’ comment I initially read it as ‘floating point’, and even then didn’t make the connection until I read yours, thinking it was some kind of instruction-level IBM-specific term). Integer is a term everyone uses and is immediately obvious, and that’s ideal for blogs.

        2. Antonio Rodríguez says:

          Well, the “fixed point unit” expression has an historical root. Back in the early 50s, all computations were made in integer units, using integer binary arithmetic. As most of these computers were oriented to scientific applications, they needed to manipulate real numbers in some way, so they used integers scaled by a constant factor. Note that these were binary integers, not decimal ones.

          When IBM introduced their first computer with floating point unit (the IBM 704) in 1954, the integer unit was called “fixed point unit”, in contrast to the newer “floating point unit”. That denomination was carried all along IBM’s mainframe family until the POWER architecture, from where it was inherited by the PowerPC.

          Now, this blog is called The Old New Thing. So don’t sue me for telling stories older than most readers (including myself!).

        3. Julien Oster says:

          A bit off topic, but maybe still interesting (it kind of fits the theme of this blog): The notation I usually deal with would be 64:32 for what you mean, i.e. 64bit word size, 32bit of that as fraction size. This notation allows for seemingly counter-intuitive things of a fraction size bigger than the (total!) word size, or even negative fraction sizes.

          Fraction sizes bigger than the word size itself are useful when the magnitude of the numbers you want to be able to represent are smaller than 1, e.g. if you only want to represent numbers from 0 to 0.125 (not including), then the first three fractional bits (0.5, 0.25, 0.125) are always zero, and you can e.g. use 16:19 for 16bit precision of numbers between 0 and 0.125.

          Negative fraction sizes are useful when you have big numbers that you don’t care about the full precision that the usually necessary integer word size would give you, e.g. 8:-1 allows you to represent numbers between 0 and 512 instead of 0 to 256 that 8bit would usually give you, but only the even numbers. With 8:-2, you’re up to 0-1024, but you lose the distinction between the first two lower bits.

          MATLAB uses this fixed point notation, and at least some FPGA environments use it as well when constructing e.g. fixed point filters.

          In that notation, 32:32 would allow representation of 2^32 different numbers between 0 and 1 (exclusive).

  3. Yukkuri says:

    Sweet, another ISA series!

  4. Roman says:

    Which SPR number is fpscr?

    1. Trick question. fpscr is not a SPR at all!

  5. DWalker07 says:

    “Here we go again”. A reference to a recent movie, perhaps? :-)

    1. Brian_EE says:

      Or perhaps a Whitesnake song from the late 1980’s.

  6. cheong00 says:

    I wonder, since there is a Mac development division inside Microsoft (I still remember running IE5 on old iMac), did you heard anyone trying to run NT4 on pre-OSX Mac?

    1. Vas Crabb says:

      Won’t work. On traditional PowerPC (including the 600 series and 7400 series) you can’t run a big endian OS on a little endian motherboard and vice versa. WinNT and Solaris for PowerPC run little endian, MacOS runs big endian, and AIX is available for either.

      The reason is that traditional PowerPC in little endian mode doesn’t actually do little endian memory access within (64-bit) doublewords. It just twiddles the low three bits of the address so that as long as you only do aligned accesses for anything bigger than bytes (halfword/word/doubleword) you’ll see little endian semantics (incidentally the MAME emulator uses the same trick when emulating a little endian guest on a big endian host and vice versa). On a little endian motherboard, you wire the 64-bit data bus with the byte order inverted, and as long as software keeps accesses aligned, it all works.

      I say “traditional” because some 750 derivatives and the newer POWER chips that implement PPC64LE can actually run in a true little endian mode. But that happen until long after Windows NT stopped running on PowerPC.

    2. James Sutherland says:

      I don’t know about NT4, but I seem to recall the Xbox OS is NT-based, so it definitely ran on PowerMac G5s for a while – they were the first developer kits! https://www.journaldulapin.com/2015/01/25/this-power-mac-g5-is-almost-a-xbox-360/

  7. Mike says:

    Raymond wrote “Among other things, the bits in a register are numbered from most-significant to least-significant: Bit 0 is the high-order bit, and bit 31 is the low-order bit. ”

    Is this for real??? At the assemby/machinecode level? How on earth did this get handled at the ‘C’ level? Once RAM (a 32-bit word) is moved into the CPU, I would have expected bit 0 to BE bit 0 (2^0) and bit 31 to BE bit 31 (2^31). Anything else would be nuts. To then reverse bit ordering in bytes seems just plain insane! Please elaborate.

    1. You can’t address individual bits in C. Taking the address of a bitfield is illegal.

      Bit ordering only matters for certain assembly instructions, where the processor docs say that this instruction performs operation X on bits Y-Z. It’s the compiler’s job to understand the CPU’s bit ordering and translate the C code to the instructions with the correct bit numbers in them.

    2. Richard says:

      I believe bit numbering in C is an implementation defined behavior.

      Is bit 0 the left-most bit or the right-most bit? Little endian says bit 0 is the right most bit, while big endian says bit 0 is the left most bit.

      Which “endianness” is better may depend on what you have been exposed to. All PC (Intel/AMD) processors are little endian. But it hasn’t always been that way. Most of the systems I used when I started programming (at the time, in assembly) were big endian.

      See “https://en.wikipedia.org/wiki/Endianness”.

      Usless trivia — network byte order is big endian! So when your PC sends a TCP packet to another PC, the multi-byte fields in the headers get flipped on the sending computer, then get flipped again on the receiveing computer.

      1. Vas Crabb says:

        Endianness generally refers to byte ordering for multi-byte values in RAM. Numbering of bits within a value is separate. The Motorola 68000 family uses big endian byte ordering (address of a word is address of its most significant byte, more significant bytes at lower memory addresses), but it numbers bits within a value starting at 0 for least significant bit and increasing towards more significant bits. It doesn’t really make a difference for C since bit instructions can’t be generated directly. The compiler can generate them when you test against a one-bit mask, set/clear/invert an individual bit, etc. and it knows how the target CPU numbers bits.

Comments are closed.

Skip to main content