The Alpha AXP has no corresponding trap variant for arithmetic carry. So how would you detect carry?¹

Answer: The same way you detect carry in C, or pretty much any other programming language that doesn't support carry.

To detect carry during addition, you check whether the sum is less than either addend. If the sum is less than one addend, then it will also be less than the other addend, so use whichever addend is most convenient.

; Rc = Ra + Rb, with Rd receiving carry ; Assumes Rc is not the same as Ra ADDxRa, Rb, Rc ; Rc = Ra + Rb CMPULT Ra, Rc, Rd ; Rd = carry ; Rc = Ra + Rb, with Rd receiving carry ; Assumes Rc is not the same as Rb ADDxRa, Rb, Rc ; Rc = Ra + Rb CMPULT Rb, Rc, Rd ; Rd = carry ; Rc = Rc + Rc, with Rd receiving carry ; Assumes Rd is distinct from Rc BIS Rd, Rc, Rc ; Rd = Rc ADDxRc, Rc, Rc ; Rc = Rc + Rc CMPULT Rd, Rc, Rd ; Rd = carry

The last case is where the output overwrites both inputs,
so we have to stash one of the inputs in `Rd`
so we can compare it to the result afterwards.

To detect borrow during subtraction, you check whether the subtrahend is greater than the minuend.

; Rc = Ra - Rb, with Rd receiving borrow ; Assumes Rd is distinct from both inputs CMPULT Ra, Rb, Rd ; Rd = borrow SUBxRa, Rb, Rc ; Rc = Ra - Rb

To detect carry during multiplication, you capture the upper bits of the extended result.

; Rc = Ra *U Rb, with Rd receiving carry; 32-bit multiply ZAPNOT Ra, #15, Ra ; zero-extend Ra from 32 to 64 bits ZAPNOT Rb, #15, Rb ; zero-extend Rb from 32 to 64 bits MULQ Ra, Rb, Rc ; Rc = Ra *U Rb (64-bit multiply) SRA Rc, #32, Rd ; Rd = excess to carry forward ADDL Rc, zero, Rc ; Convert Rc to canonical form ; Rc = Ra *U Rb, with Rd receiving carry; 64-bit multiply ; Assumes Rd is distinct from both inputs UMULH Ra, Rb, Rd ; Rd = excess to carry forward MULQ Ra, Rb, Rc ; Rc = Ra *U Rb (64-bit multiply)

In the subtraction and multiplication sequences above,
you can elide the final instruction if `Rd`
is identical to `Rc`.
(In other words, if you care only about the carry and not the arithmetic
result.)

**Exercise**:
Why did I sometimes calculate `Rd` early
and sometimes late?

**Exercise 2**:
Why didn't I have to convert `Rd` to canonical form
at the end of the 32-bit multiply?

¹ The Itanium processor also doesn't have a flags register, but nobody seemed to be upset that it didn't provide a way to detect arithmetic carry or overflow.

While I understand why intel went the way they did originally, having a flags register is really convenient for assembly programmers. I’m kinda surprised they didn’t introduce three operand versions of the common instructions to support this kind of logic to remove the dependencies.

In the 6502, two of the four arithmetic bits of the status register can be determined statically by looking at the result (S -the sign flag, bit 7- and Z -the zero flag, set if the result is zero-). The other two, V and C, are carry flags (which get the same value for additions, and opposite values for subtraction) and can be derived from the result using the algorithms explained in this article. For logical and shift/rotate operations, all four bits can be determined statically by looking at the result or the operands (Z is set if result is zero; C gets the discarded bit in shifts/rotates, zero otherwise; S gets a copy of bit 7 of the result; and V, strangely, gets a copy of bit 6). If you look at other CISC processors (like the 8086 or the Z80), you’ll find that they handle things roughly the same way (except for the queer bit V, of course).

With that in mind, having arithmetic flags is convenient (it makes the code more compact), but completely unnecessary if you can do conditional branches on the status of an arbitrary register. It’s natural that CISC processors choose to use them, and RISC processors choose to go without them.

Most RISC processors choose to have an integer condition code register though. The trick is they generally provide instruction variants that don’t update the flags register so you can avoid interlocks unless you actually care about the flags from a particular operation. PowerPC has eight integer condition code registers so you can pre-calculate several comparison results. PowerPC also doesn’t keep integer carry in the condition code registers, it’s stored in the fixed point exception register (XER), and very few instructions actually set it. You need to use a special addition instruction if you want to generate carry for use in an extended precision operation.

The classic RISCs don’t really agree much on that particular point. MIPS and Alpha forego condition codes entirely. SPARC and ARM have a single condition code, but require you to opt-in on instructions to update it. PA-RISC has multiple sets of carry/borrow bits in a status register but no condition code register (and various types of conditional moves, conditional branches based on comparisons, arithmetic-then-branch instructions, and skipping instructions based on conditions). PowerPC has eight 4-bit condition code registers with carry/borrow handled separately as you note. PPC also throws in boolean operations on condition registers, which are not as useful as one would hope, because most programming languages require short-circuit evaluation of conditionals. For example, “if (x && widget->field == 1)” can’t generally use PPC “crand”, because “widget” might be an invalid pointer when x is false.

Architecturally, not having any condition code register at all is definitely simplest.

For in-order implementations, having a few sets of results (like PA-RISCs multiple carries or PPCs 8 condition registers) gives compilers more flexibility in instruction scheduling. The flipside is that this complicates out-of-order implementations, which now need to perform register renaming on the condition codes/flags as well. It’s easier to rename a single condition code register with dedicated logic than it is to keep track of several of them.

Instructions that produce condition codes and always write all flags or none of them are fairly easy to handle. What sucks for out-of-order is when some instructions only update some flag bits. 32-bit ARM is a case in point: some instructions write NZCV (e.g. ADDS), some NZC (ANDS with shifted register operand), some NZ (ANDS with immediate operand). This requires either executing all such instructions in order and merging flags, or renaming NZ, C, and V separately. That’s relatively complicated behavior for an architecture generally considered to be RISC. (It’s also fixed in the A64 ISA).

Exercise:

Does the chosen calculation order improve performance on the 21064?

Exercise 2:

The right arithmetic shift (SRA) replicates the sign bit across the upper 32 bits.

Nope, nothing to do with 21064. It’s a correctness issue, and all the information you need should be covered in earlier installments.

In c and c++, signed overflow is undefined. That means that you shouldn’t test if a+b<a to detect overflow. The compiler is allowed to optimize that to b<0. If alpha had unsigned arithmetic, then a comparison to c would be more appropriate.

All of the operations are on unsigned integers.

`a + b < a`

doesn’t detect overflow if the operands are signed – it detects whether b is negative *or* overflow.This article is about carry using unsigned operands, not overflow on signed operands.

No one has gotten the exercises yet, I see….

David Bremner got exercise 2. The answer to the first exercise is “Because Rc might be identical to Ra or Rb. We need to calculate carry after the registers containing the values we need are produced, but before they get overwritten.”