The Alpha AXP, part 2: Integer calculations

Here are some of the integer computational operations available on the Alpha AXP. I'm going to cover only the instructions used in general-purpose programming, since that's the sort of thing you're most likely to encounter when debugging everyday application code. In particular, I'm not going to cover the various multimedia instructions.

Integer arithmetic operations come in two flavors, one that operates on the full 64-bit register, and another that operates on the least significant 32 bits of the value. As I noted earlier, the general rule in the Alpha AXP is that if the result of an operation is a 32-bit value and the destination is a register, then the value is sign-extended to a 64-bit value. This means that if you use the 32-bit versions of these instructions, the results will be sign-extended to 64-bit values.

The general notation for calculations is to provide the source operands first, and the destination operand last.

    ADDQ    Ra, Rb, Rc  ; Rc = Ra + Rb
    ADDQ    Ra, #b, Rc  ; Rc = Ra + b

    ADDL    Ra, Rb, Rc  ; Rc = (int64_t)((int32_t)Ra + (int32_t)Rb)
    ADDL    Ra, #b, Rc  ; Rc = (int64_t)((int32_t)Ra +           b)

The ADD instruction has four variants. The 64-bit versions add the two source values and puts the result in the destination Rc register. The 32-bit versions add the least significant 32-bit values in the source registers, calculates a 32-bit result, and then sign extends that result to a 64-bit value, putting the final result in the Rc register. You can add two registers, or you can add a register and a small constant in the range 0 to 255.

In the future, I'm going to write x to mean "L or Q", and Rb/#b to mean "a register (Rb) or a small constant in the range 0 to 255."

    SUBx    Ra, Rb/#b, Rc  ; Rc = Ra - Rb
    MULx    Ra, Rb/#b, Rc  ; Rc = Ra * Rb
    UMULH   Ra, Rb/#b, Rc  ; Rc = (Ra *U Rb) >> 64

The SUB instructions perform subtraction, and the MUL instructions perform multiplication. The UMULH instruction performs a 64×64 unsigned multiplication, and stores the high 64 bits of the 128-bit intermediate result. (If you want the low 64 bits, then use the regular MULQ instruction.)

Note that there is no integer division operation. There are three common workarounds:

  • Use a helper function.
  • If dividing by a constant n, you may be able to use the UMULH instruction to multiply by (2⁶⁴÷n) and then extract the high 64 bits (which means to divide by 2⁶⁴).
  • Convert both values to floating point, perform a floating point division, and then convert the result back to an integer.

So hopefully you don't do a lot of integer division.

    S4ADDx  Ra, Rb/#b, Rc  ; Rc = Ra * 4 + Rb/#b
    S8ADDx  Ra, Rb/#b, Rc  ; Rc = Ra * 8 + Rb/#b

    S4SUBx  Ra, Rb/#b, Rc  ; Rc = Ra * 4 - Rb/#b
    S8SUBx  Ra, Rb/#b, Rc  ; Rc = Ra * 8 - Rb/#b

The scaled addition and subtraction instructions multiply Ra by 4 or 8 before adding or subtracting Rb/#b. These are commonly used to calculate effective addresses as part of an array indexing operation.

Next come the bit-twiddling instructions. Note that these instructions always operate on full 64-bit registers. (But if both inputs are in canonical form, then so too will the result.)

    AND     Ra, Rb/#b, Rc  ; Rc = Ra &  Rb/#b
    BIS     Ra, Rb/#b, Rc  ; Rc = Ra |  Rb/#b "bit set"
    XOR     Ra, Rb/#b, Rc  ; Rc = Ra ^  Rb/#b
    BIC     Ra, Rb/#b, Rc  ; Rc = Ra & ~Rb/#b "bit clear"
    ORNOT   Ra, Rb/#b, Rc  ; Rc = Ra | ~Rb/#b
    EQV     Ra, Rb/#b, Rc  ; Rc = Ra ^ ~Rb/#b "bit equivalence"

Officially, the C in BIC stands for "complement", but I find it easier to remember if I pretend that it stands for "clear", because it clears the bits in Ra as selected by Rb/#b. For example,

    BIC     t0, #3, t2     ; clear bottom two bits of t0

This takes the value in t0, clears the bottom two bits (#3), and puts the result into t2.

The EQV and ORNOT instructions are not widely used, but I included them for completeness.

There are three bit-shifting instructions.

    SLL     Ra, Rb/#b, Rc  ; Rc =           Ra << (Rb/#b % 64)
    SRL     Ra, Rb/#b, Rc  ; Rc = (uint64_t)Ra >> (Rb/#b % 64)
    SRA     Ra, Rb/#b, Rc  ; Rc = ( int64_t)Ra >> (Rb/#b % 64)

The right-shift has two variants, depending on whether you want the shifted value to be zero-filled (unsigned, or logical shift) or sign-filled (signed, or arithmetic shift). Note that there are no 32-bit versions of the bit shifting instructions. They always operate on the full 64-bit register.

There are some rarely-used computation instructions that I'm not going to go into, like "count number of leading zero bits" and all the multimedia instructions. There are also some other computation instructions that are closely related to other functions of the processor, so I'll defer those to the appropriate section. Next time, we'll look at memory access, including the computation instructions tailored to support memory operations.

Bonus chatter: There are a number of idioms that let you express other concepts in terms of the instructions above.

    BIS     zero, zero, zero    ; NOP (writes to zero are ignored)
    BIS     zero, zero, Rc      ; Set Rc to zero
    ADDL    zero, #b, Rc        ; Set Rc to a small constant
    SUBL    zero, #b, Rc        ; Set Rc to a small negative constant
    BIS     Ra, Ra, Rc          ; Copy Ra to Rc
    BIS     zero, Ra, Rc        ; Copy Ra to Rc
    BIS     Ra, zero, Rc        ; Copy Ra to Rc
    SUBx    zero, Ra, Rc        ; Rc = -Ra
    ORNOTx  zero, Ra, Rc        ; Rc = ~Ra
    ADDL    zero, Rb, Rc        ; Rc = (int64_t)(int32_t)Rb

Note that I gave three ways to copy one register to another. The first is the one recommended by DEC. The second is the one the Microsoft compiler generates. Windows NT requires that copying registers in function prologues and epilogues must be performed with one of the three given formats in order for the instruction to be unwindable.

I showed idioms for loading small positive and negative constants, but we'll see next time that there's something that works for medium-sized constants.

The last idiom is an important one because it forces a 32-bit value into canonical form. This is useful when there isn't a 32-bit version of the instruction you want, such as a shift instruction.

Comments (11)
  1. kantos says:

    odd that the compiler generates a form that DEC didn’t recommend. I know they go out of their way to do that sort of optimization on x86

  2. David Bremner says:

    Integer division by a constant can be converted to a reciprocal multiplication and a correction step (usually a shift or add). Division by Invariant Integers using Multiplication is a widely cited paper on this topic.

    1. Scarlet Manuka says:

      Presumably that would be the part in the article where he says:
      • If dividing by a constant n, you may be able to use the UMULH instruction to multiply by (2⁶⁴÷n) and then extract the high 64 bits (which means to divide by 2⁶⁴).

  3. Matteo Italia says:

    Ouch, the destination operand on the right!

  4. Joshua says:

    If it weren’t for the fact I can’t get a modern HTTPS stack I would consider running NT4 on Alpha for a web server and laugh as memory corruption exploits that should yield SYSTEM don’t because their shellcode is for the wrong processor.

  5. Simon Clarkstone says:

    Those logical operations look rather like a bitwise-logic unit that can switch between AND, OR, and XOR, plus an optional inversion on the right operand. That plus the choice of which way round the operands are gives you all of the bitwise operations except nand and nor.

    1. I like your observation so much I reordered the table to reflect it.

    2. laonianren says:

      The instruction encoding supports this. The function codes (using Raymond’s ordering) are:

      0x00 AND
      0x20 BIS
      0x40 XOR
      0x08 BIC
      0x28 ORNOT
      0x48 EQV

  6. cheong00 says:

    I wonder that since all registers are 64 bit, will there be exception/interrupt thrown when you add 2 32-bit values with ADDL and result in carry/overflow.

    If not, I think the “count number of leading zero bits” may in fact have some use.

  7. Thiago Macieira says:

    How does the compiler produce 32-bit logical right shifts? If loading 32-bit value from memory will sign extend the value to 64-bit, the upper half might contain 1 bits. Then, using SRL would shift in 1s instead of zeroes.

    To perform that, the compiler would need to zero out the upper half first. How does it do that optimally?

    I can think only of a very inefficient way with the instructions presented so far (zero a register, shift logically left 32, then AND with the actual value).

Comments are closed.

Skip to main content