The PowerPC 600 series, part 3: Arithmetic


Before we start with arithmetic, we need to have a talk about carry.

The PowerPC uses true carry for both addition and subtraction. This is different from the x86 family of processors, for which the carry flag is actually a borrow bit when used in subtraction. You can read more about the difference on Wikipedia. There are some instructions which perform a combined addition and subtraction, and in that case, the only sane choice is to use true carry. (If you had chosen carry as borrow, then it wouldn't be clear whether the final carry bit represented the carry from the addition or the borrow from subtraction.)

To emphasize the fact that the PowerPC uses true carry, I will rewrite all subtractions as additions, taking advantage of the twos complement identity

    -x = ~x + 1

Okay, now we can do some arithmetic. Let's start with addition.

    add     rd, ra, rb      ; rd = ra + rb
    add.    rd, ra, rb      ; rd = ra + rb, update cr0
    addo    rd, ra, rb      ; rd = ra + rb, update         XER overflow bits
    addo.   rd, ra, rb      ; rd = ra + rb, update cr0 and XER overflow bits

These instructions add two source registers and optionally update the xer register to capture any possible overflow (by appending an o), and also optionally update the cr0 register to reflect the sign of the result and any summary overflow (by appending a period).

I don't know what they were thinking, using an easily-overlooked mark of punctuation to carry important information.

There is also a version of the above instruction that takes a signed 16-bit immediate:

    addi    rd, ra/0, imm16 ; rd = ra/0 + (int16_t)imm16

Note that this variant does not accept o or . suffixes.

The ra/0 notation means "This can be any general purpose register, but if you ask for r0, you actually get the constant zero." The register r0 is weird like that. Sometimes it stands for itself, but sometimes it reads as zero. As a result, the r0 register isn't used much.

The assembler lets you write r0 through r31 as synonyms for the integers 0 through 31, so the following are equivalent:

    add     r3, r0, r4      ; r3 = r0 + r4
    add      3,  0,  4      ; r3 = r0 + r4
    add     r3, r0,  4      ; r3 = r0 + r4

This can get very confusing. That last example sure looks like you're setting r3 to r0 plus 4, but it's not. The 4 is in a position where a register is expected, so it actually means r4.

Similarly, you might think you're adding an immediate to r0 when you write

    addi    r3, r0, 256     ; r3 = r0 + 256, right?

but nope, the value of 0 as the second operand to addi is interpreted as the constant zero, not register number zero.

Fortunately, the Windows disassembler always calls registers by their mnemonic rather than by number.

Wait, we're not done with addition yet.

    ; add and set carry
    addc    rd, ra, rb      ; rd = ra + rb, update carry
    addc.   rd, ra, rb      ; rd = ra + rb, update carry and cr0
    addco   rd, ra, rb      ; rd = ra + rb, update carry         and XER overflow bits
    addco.  rd, ra, rb      ; rd = ra + rb, update carry and cr0 and XER overflow bits

The "add and set carry" instructions act like the corresponding regular add instructions, except that the also update the carry bit in xer based on whether a carry propagated out of the highest-order bit.

    ; add extended
    adde    rd, ra, rb      ; rd = ra + rb + carry, update carry
    adde.   rd, ra, rb      ; rd = ra + rb + carry, update carry and cr0
    addeo   rd, ra, rb      ; rd = ra + rb + carry, update carry         and XER overflow bits
    addeo.  rd, ra, rb      ; rd = ra + rb + carry, update carry and cr0 and XER overflow bits

The "add extended" instructions act like the corresponding "add and set carry" instructions, except that they also add 1 if the carry bit was set. This makes multiword addition convenient.

    ; add minus one extended
    addme   rd, ra          ; rd = ra + carry + ~0, update carry
    addme.  rd, ra          ; rd = ra + carry + ~0, update carry and cr0
    addmeo  rd, ra          ; rd = ra + carry + ~0, update carry         and XER overflow bits
    addmeo. rd, ra          ; rd = ra + carry + ~0, update carry and cr0 and XER overflow bits

The "add minus one extended" instruction is like "add extended" except that the second parameter is hard-coded to −1. I wrote ~0 instead of −1 to emphasize that we are using true carry. (This is the combined addition-and-subtraction instruction I alluded to at the top of the article. It adds carry and then subtracts one.) Added: As commenter Neil noted below, through the magic of true carry, this is the same as "subtract zero extended", which makes it handy for multiword arithmetic.

    ; add zero extended
    addze   rd, ra          ; rd = ra + carry, update carry
    addze.  rd, ra          ; rd = ra + carry, update carry and cr0
    addzeo  rd, ra          ; rd = ra + carry, update carry         and XER overflow bits
    addzeo. rd, ra          ; rd = ra + carry, update carry and cr0 and XER overflow bits

The "add zero extended" instruction is like "add extended" except that the second parameter is hard-coded to zero.

And then there are some instructions that take signed 16-bit immediates:

    ; add immediate shifted
    addis   rd, ra/0, imm16  ; rd = ra/0 + (imm16 << 16)

    ; add immediate and set carry
    addic   rd, ra, imm16    ; rd = ra + (int16_t)imm16, update carry

    ; add immediate and set carry and update cr0
    addic.  rd, ra, imm16    ; rd = ra + (int16_t)imm16, update carry and cr0

Phew, that was addition. There are also subtraction instructions, which should look mostly familiar now that you've seen addition.

    ; subtract from
    subf    rd, ra, rb      ; rd = ~ra + rb + 1
    subf.   rd, ra, rb      ; rd = ~ra + rb + 1, update cr0
    subfo   rd, ra, rb      ; rd = ~ra + rb + 1, update         XER overflow bits
    subfo.  rd, ra, rb      ; rd = ~ra + rb + 1, update cr0 and XER overflow bits

    ; subtract from and set carry
    subfc   rd, ra, rb      ; rd = ~ra + rb + 1, update carry
    subfc.  rd, ra, rb      ; rd = ~ra + rb + 1, update carry and cr0
    subfco  rd, ra, rb      ; rd = ~ra + rb + 1, update carry         and XER overflow bits
    subfco. rd, ra, rb      ; rd = ~ra + rb + 1, update carry and cr0 and XER overflow bits

    ; subtract from extended
    subfe    rd, ra, rb     ; rd = ~ra + rb + carry, update carry
    subfe.   rd, ra, rb     ; rd = ~ra + rb + carry, update carry and cr0
    subfeo   rd, ra, rb     ; rd = ~ra + rb + carry, update carry         and XER overflow bits
    subfeo.  rd, ra, rb     ; rd = ~ra + rb + carry, update carry and cr0 and XER overflow bits

    ; subtract from minus one extended
    subfme   rd, ra         ; rd = ~ra + carry + ~0, update carry
    subfme.  rd, ra         ; rd = ~ra + carry + ~0, update carry and cr0
    subfmeo  rd, ra         ; rd = ~ra + carry + ~0, update carry         and XER overflow bits
    subfmeo. rd, ra         ; rd = ~ra + carry + ~0, update carry and cr0 and XER overflow bits

    ; subtract from zero extended
    subfze   rd, ra         ; rd = ~ra + carry, update carry
    subfze.  rd, ra         ; rd = ~ra + carry, update carry and cr0
    subfzeo  rd, ra         ; rd = ~ra + carry, update carry         and XER overflow bits
    subfzeo. rd, ra         ; rd = ~ra + carry, update carry and cr0 and XER overflow bits

    ; subtract from immediate and set carry
    subfic  rd, ra, imm16   ; rd = ~ra + (int16_t)imm16 + 1, update carry

Note that the instruction is "subtract from", not "subtract". The second operand is subtracted from the third operand; in other words, the two operands are backwards. Fortunately, the assembler provides a family of synthetic instructions that simply swap the last two operands:

    subf    rd, rb, ra      ; sub  rd, ra, rb
    ; similarly "sub.", "subo", and "subo.".

    subfc   rd, rb, ra      ; subc rd, ra, rb
    ; similarly "subc.", "subco", and "subco.".

Second problem is that there is no subfis to subtract a shifted immediate, nor is there subfic. to update flags after subtracting from an immediate. But the assembler can synthesize those too:

    addi    rd, ra/0, -imm16 ; subi   rd, ra/0, imm16
    addis   rd, ra/0, -imm16 ; subis  rd, ra/0, imm16
    addic   rd, ra, -imm16   ; subic  rd, ra, imm16
    addic.  rd, ra, -imm16   ; subic. rd, ra, imm16

PowerPC's use of true carry allows this trick to work while still preserving the semantics of carry and overflow.

We wrap up with multiplication and division.

    ; multiply low immediate
    mulli   rd, ra, imm16    ; rd = (int32_t)ra * (int16_t)imm16

    ; multiply low word
    mullw   rd, ra, rb       ; rd = (int32_t)ra * (int32_t)rb
    ; also "mullw.", "mullwo", and "mullwo.".

    ; multiply high word
    mulhw   rd, ra, rb       ; rd = ((int32_t)ra * (int32_t)rb) >> 32
    ; also "mulhw."

    ; multiply high word unsigned
    mulhwu  rd, ra, rb       ; rd = ((uint32_t)ra * (uint32_t)rb) >> 32
    ; also "mulhwu."

The "multiply low" instructions perform the multiplication and return the low-order 32 bits. The "multiply high" instructions return the high-order 32 bits.

Finally, we have division:

    ; divide word
    divw    rd, ra, rb       ; rd = (int32_t)ra ÷ (int32_t)rb
    ; also "divw.", "divwo", and "divwo.".

    ; divide word unsigned
    divwu   rd, ra, rb       ; rd = (uint32_t)ra ÷ (uint32_t)rb
    ; also "divwu.", "divwuo", and "divwuo.".

If you try to divide by zero or (for divw) if you try to divide 0x80000000 by −1, then the results are garbage, and if you used the o version of the instruction, then the overflow flag is set. No trap is generated. (If you didn't use the o version, then you get no indication that anything went wrong. You just get garbage.)

There is no modulus instruction. If you want to get the remainder, take the quotient, multiple it by the divisor, and subtract it from the dividend.

Okay, that was arithmetic. Next up are the bitwise logical operators and combining arithmetic and logical operators to load constants.

Bonus snark: For a reduced instruction set computer, it sure has an awful lot of instructions. And we haven't even gotten to control flow yet.

Comments (22)

  1. Antonio Rodríguez says:

    In the mid-to-late 90s we entered the post-RISC epoch. RISC architectures got more complex while CISC ones migrated to RISC back ends with CISC-to-RISC translation units as front ends (if you read documentation about the Pentium Pro or any newer x86/x64 architecture, make a mental s/micro-op/RISC instruction/). And both introduced advanced techniques such as out-of-order execution, register renaming, branch prediction and speculative execution at about the same time. In the end, the biggest architectural difference between both families is the presence (or absence) of the translation unit.

    1. kantos says:

      Actually not even that, many ‘RISC’ machines have translation units so they can run different internal microarchitectures.

      1. Tanveer Badar says:

        All right, they might have translation units but the intention is completely different. In a x86 like CISC architecture, translation unit’s job is to punch out simpler micro-instructions which are easier to implement, pipeline etc. In processors like MIPS, the translation unit provides architectural emulation and some times bug for bug compatibility.

  2. Yukkuri says:

    If they’re going to do that with r0 they should have gone all the way and hardwired r0 to 0

    1. HiTechHiTouch says:

      The IBM way is/was to pass function return values in R0. Thus R0 was kept accessible for holding stuff.

      The traditional rule is that when address arithmetic (calculation) is involved, R0=constant zero=omitted from the calculation. Zero in the register field of instructions that directly use the ALU (arithmetic, logical, compare, etc.) refer to the register contents. In other instructions, a zero in the register field could be used for signaling instruction variations.

      Note that I’m speaking about architectures previous to PowerPC which influenced the designers.

      1. Peeter Joot says:

        A commenter wrote: “The IBM way is/was to pass function return values in R0.”

        That isn’t the case on AIX. function return values (and the first param) are always in r3.

  3. Tanveer Badar says:

    Raymond, reduced in “RISC” means reduced complexity not reduced number of instructions. If that were the case, most CISC ISA would be called RISC instead.

    I hope your bonus chatter was intentional and you are checking how attentive your readers are.

  4. Evan says:

    > The second operand is subtracted from the third operand; in other words, the two operands are backwards. [and other quotes from previous articles]

    Sometimes I wonder if people who design assembly languages deliberately make them awful because it makes them feel more like a Real Programmer.

    <flamebait> See also: AT&T syntax for x86 </flamebait> :-)

    1. Fabian Giesen says:

      The “subtract from” thing is actually there for a reason, namely the immediate variants! Note that “addi” has a signed immediate. There’s no need to have a “subtract immediate” instruction because to say subtract 123 from r0, you can just add -123 to it.

      However, the variant you can’t do with a signed “add immediate” is ” – register”. Which is, conveniently, exactly what “subfi” does. For register-register variants, it doesn’t really matter which convention you choose (regular sub is “a + ~b + 1”, subtract-from is “~a + b + 1”), but IBM picked the one that makes the immediate forms more useful, while also keeping the instruction encoding regular. PPC is one of relatively few architectures that let you do this in one instruction; on most, you need two instructions (either a load immediate follow by a register-register subtract, or a negate followed by an add-immediate). The only other example I can think of off the top of my head is 32-bit ARM which has “rsb” (reverse subtract). (The 64-bit A64 encoding removes it.)

      1. Evan says:

        I’m not talking about the actual operations that are accessible or what the processor does, and don’t dispute that something like “subfi” is be useful in a semantic sense. I’m just talking about the concrete assembler syntax.

        In the case of the sub instructions, the question would be why the ‘subf’ form are the “native” instructions while the ‘sub’ forms are the synthetic instructions. But that’s just the most recent example. Yesterday there was the “only a processor’s mother could love” comment about crand 4*cr3+eq, 4*cr2+lt, 4*cr6+gt vs crand 14, 8, 25, but even the first of those is pretty terrible compared to something with a more “native” syntactic support in the assembler. Even the fact that, in “native” syntax, you can’t tell immediates from register names without knowing what the instruction does because “r3” is just written “3”.

        And you may say this doesn’t matter because you can just write the asm in the way you want with macros and using the sub synthetic instructions instead of the “native” subf, but I think that this is a woefully incomplete solution. It means that all code you might want to read should be written that way, but in reality that won’t be true. I’m probably biased by my own uses of assembly, but nearly all ways *I* use it are reading what is produced either by -S to the compiler or by a disassembler. And guess what it will produce? Not sub, but subf. GNU objdump does apparently produce by default the alias-based things like 4*cr3+eq, and uses “r” prefixes for registers; Raymond earlier wrote that the Windows disassembler substitutes some, but not all, of those symbolic names. (And like I said, even “4*cr3+eq” is pretty awful, especially if the assembler would accept a mistaken “cr3+eq” thing.) Apparently neither GCC nor Clang do either of these things with -S however, by default. (I didn’t play around with options.)

        You might argue that it’s closer to the actual instruction encoding, but I don’t think this is a great argument either. Again I’m probably biased by my own uses of assembly, but I (i) use asm a lot, especially if you include reading, and (ii) almost never care about the actual physical encoding in machine code. I suspect that’s pretty typical.

        1. Vas Crabb says:

          PowerPC is regular enough that you can disassemble a lot of it mentally (provided you can “see” bitfields in your head). You just need to remember the fields are 6-5-5-5-10-1 for most instructions. The fields three 5-bit fields line up with the three instruction operands. This gives you a bit of inconsistency like stores having the source register (officially the “target register”) on the left, and the “subtract from” syntax, but it makes dealing with the machine code by hand easier. Contrast this with SPARC where the order of operands in the assembly language differs from the order of fields in the instruction encoding. PowerPC is easier to deal with if you need to actually deal with the instruction encoding.

      2. Fabian Giesen says:

        This got a bit mangled due to careless markup on my side. It was intended to read “immediate – register”, but I foolish put the word “immediate” in angle brackets. :)

      3. Someone says:

        “The “subtract from” thing is actually there for a reason.”

        The assembler syntax (in this case, the order of source and destination of the OP) should be consistent, regardless of how the bits in the actual instruction are assigned.

        For an extreme example of instruction encoding, see 4.6.12 T4, an ARM Thumb-2 instruction: http://read.pudn.com/downloads159/doc/709030/Thumb-2SupplementReferenceManual.pdf (taken from https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html). It’s the job of the assembler to put it all together in the right form and order.

  5. bbanelli says:

    Well, except “reduced” doesn’t really implies bare number of instructions, rather their complexity (or simplicity, that is), right?

    1. Wear says:

      One would assume that implementing 1 instruction is simpler than implementing 2 which in turn means that reduced complexity would imply reduced number of instructions.

      Also it’s Bonus snark which would imply it’s not serious.

  6. Vas Crabb says:

    Ah, the tricksy r0. The Apple assembler actually required you to use r0, r1, r2, etc. when referring to registers, and 0 for constant zero. It would throw an error if you wrote r0 when a zero that field represents constant zero, and vice versa, to help avoid confusion. On the other hand, the IBM assembler doesn’t accept r0, r1, r2, etc. and requires you to just use numbers everywhere. You could make macros, but that wouldn’t get you you the safety benefit.

    1. DWalker07 says:

      “The IBM assembler doesn’t accept r0, r1, etc.”

      Which IBM assembler are you talking about? I wrote a lot of code in IBM S/370 and S/390 assembler (Assembler G, Assembler H, etc.) for many years. We absolutely used R0, R1, etc. in our code.

      1. DWalker07 says:

        Of course, you might have been talking about “the IBM assembler for PowerPC”. In which case you may well be right.

        1. Vas Crabb says:

          Yeah, I was talking about the IBM POWER/PowerPC assembler for AIX on RS/6000 (and similarly the Apple PowerPC assembler) back in the ’90s. It’s a bit of a weird implementation choice given they already allowed rN syntax on S/390 etc.

  7. Neil says:

    Subtract from? Are these guys looking through microcode-coloured glasses?

    I tried to work out what the difference between “add minus one extended” and a hypothetical “subtract zero extended” (which is the operation I’d want to use) but I can’t tell whether the carry semantics are the same.

    1. Vas Crabb says:

      Yes, that’s the whole point of it. It’s to apply the borrow when you want to subtract a 32-bit value from a 64-bit value.

  8. Someone says:

    Even today, is there any compiler that can use that much instruction variations for such simple operations like integer addition/subtraction?

    Also: Did they cut the more useful features, because they wasted too much OP codes for every thinkable variation of things like that?

Skip to main content