The Alpha AXP, part 5: Conditional operations and control flow

The Alpha AXP has no flags register. Conditional operations are performed based on the current value of a general-purpose register. The conditions available on the Alpha AXP are the following:

EQ  if zero
NE  if not zero
GE  if signed greater than or equal to zero
GT  if signed greater than zero
LE  if signed less than or equal to zero
LT  if signed less than zero
LBC  if low bit clear (if even)
LBS  if low bit set (if odd)

In the discussion below, the abbreviation cc represents one of the above condition codes.

The conditional move instructions test a source register against a condition, and if the condition is true, the destination register receives the second source.

    CMOVcc  Ra, Rb/#b, Rc   ; if Ra meets condition, then Rc = Rb/#b

You can also generate booleans from conditions. Note that the set of conditions here is not the same as the standard set of conditions above!

    CMPEQ   Ra, Rb/#b, Rc   ; Rc = (Ra == Rb/#b)
    CMPLT   Ra, Rb/#b, Rc   ; Rc = (Ra < Rb/#b) signed comparison
    CMPLE   Ra, Rb/#b, Rc   ; Rc = (Ra ≤ Rb/#b) signed comparison
    CMPULT  Ra, Rb/#b, Rc   ; Rc = (Ra < Rb/#b) unsigned comparison
    CMPULE  Ra, Rb/#b, Rc   ; Rc = (Ra ≤ Rb/#b) unsigned comparison

These comparison operators produce values of exactly 0 or 1, according to the result of the comparison, and the comparison is against the full 64-bit register value.

Conditional jump instructions provide a condition and a register, as well as a jump target.

    Bcc     Ra, destination

where cc is one of the condition codes above. The instruction tests the specified register against the condition, and if true, control is transferred to the destination. The test is against the full 64-bit register value, and the destination is encoded as a 21-bit value, in units of instructions (4 bytes), which provides a reach of ±4MB.

Conditional branches backward are predicted taken. Conditional branches forward are predicted not taken.

There are two types of unconditional branches. They are functionally the same but have different consequences for the return address predictor.

    BR      Ra, destination ; not expected to return
    BSR     Ra, destination ; expected to return

These instructions store the address of the subsequent instruction (the return address) in the Ra register and then transfer to the destination. The BR instruction does not push the return address onto the return address predictor stack; the BSR instruction does.

The BR instruction is typically used with zero as the register to receive the return address, since the value is almost always thrown away. (Recall that there is a special exemption for branch instructions to the usual rule that instructions which write to zero can be optimized away.)

The Win32 calling convention dictates that the ra register holds the return address on entry to a function.

There are four indirect jump instructions which are all functionally equivalent but differ in their effect on the return address predictor.

    JMP     Ra, (Rb), hint16    ; not expected to return
    JSR     Ra, (Rb), hint16    ; expected to return
    RET     Ra, (Rb), hint16    ; end of function
    JSR_CO  Ra, (Rb), hint16    ; coroutine

The Ra register receives the return address, typically zero in the case of JMP and RET, and conventionally ra in the case of JSR. As you have probably guessed, JMP has no effect on the return address predictor, JSR pushes the return address onto the predictor stack, and RET pops the return address off of the predictor stack and predicts a transfer to the popped value. The weird guy is JSR_CO which replaces the return address at the top of the predictor stack with the new return address and predicts a transfer to the old value.

The official name of JSR_CO is JSR_COROUTINE, but it doesn't really matter because I have never see JSR_CO in practice.

For the JMP and JSR instructions, the "hint" is a static prediction of the low 16 bits of the value in Rb.

The RET and JSR_CO instructions don't need a hint because they have their own return address predictor. However, DEC recommends that the hint for a RET instruction be 1 for a return from a procedure, and 0 otherwise. We'll see more about this another day.

The Microsoft compiler doesn't generate hints; it just sets the hint to zero. Profile-guided optimization didn't come to Visual C++ until after support for the Alpha AXP was dropped, but if it were still in support, I'm assuming that profile-guided optimization would have filled in the hint.

Non-virtual calls will look generally like this:

    ; Put the parameters in a0 through a5
    ; by whatever means appropriate.
    ; Excess parameters go on the stack.
    ; (Not shown here.)
    BIS     zero, s1, a0    ; copied from another register
    LDL     a1, 32(sp)      ; loaded from memory
    ADDL    zero, #1, a2    ; calculated in place

    BSR     ra, destination ; call the other function
    ; result is in the v0 register

Virtual calls load the destination from the target's vtable:

    ; Put the parameters in a0 through a5
    ; by whatever means appropriate.
    ; Excess parameters go on the stack.
    ; (Not shown here.)
    ; "this" goes into a0.
    BIS     zero, s1, a0    ; copied from another register
    LDL     a1, 32(sp)      ; loaded from memory
    ADDL    zero, #1, a2    ; calculated in place

    LDL     t0, (a0)        ; load vtable
    LDL     t0, 8(t0)       ; load function from vtable
    BSR     ra, (t0)        ; call the function pointer
    ; result is in the v0 register

Calls to exported functions are indirect through a global variable, which means we need to get the address of that global.

    ; Put the parameters in a0 through a5
    ; by whatever means appropriate.
    ; Excess parameters go on the stack.
    ; (Not shown here.)
    BIS     zero, s1, a0    ; copied from another register
    LDL     a1, 32(sp)      ; loaded from memory
    ADDL    zero, #1, a2    ; calculated in place

    LDAH    t0, xxxx(zero)  ; 64KB block where global variable resides
    LDL     t0, yyyy(t0)    ; load the global variable
    BSR     ra, (t0)        ; call the function pointer
    ; result is in the v0 register

The above examples use the LDL instruction, which loads a register from memory. We'll learn more about memory access next time.

Comments (21)

  1. DWalker07 says:

    I found this interesting, and in hindsight, probably obvious: “Conditional branches backward are predicted taken. Conditional branches forward are predicted not taken.” Something I never really thought about before, but I’m sure it’s true.

    1. David-T says:

      True for the Alpha AXP (presumably), definitely not true for modern CISC x86(-64) processors, which have ingenious branch prediction mechanisms.

    2. It’s a fairly standard “starter algorithm.”

    3. Antonio Rodríguez says:

      And I guess it’s one that works surprisingly well given its simplicity: most (if not all) backward jumps are at the end of loops, which are expected to run more than once (thus, it’s more probable that the branch is taken). But forward jumps are usually at conditional sentences, which, if written properly*, should have the general case at the “then” and the corner case at the “else”, so most times they won’t be taken (you see, the conditional branch at an “if” is taken to skip the “then” and go to the “else” or continue outside).

      * IMHO, for readability of code, conditionals should be written that way. I’d say that static analysis in the compiler could sort them that way, but static analysis wasn’t so advanced in the 90s, and that time machine is still in the works…

      1. Karellen says:


        I think a majority of the conditionals I write are of the form “if (failure) return;” or “if (failure) break;” or “if (failure) continue;”, where most of the time the failure is unexpected, and the corner case is the “then”, while the general case is the (implicit) “else”.

        And a large proportion of what’s left are “if (config_A) then… else if (config_B) then…” where the general case is unpredictable, and somewhat evenly matched between different setups, but will always be the same for any specific setup.

      2. Antonio Rodríguez says:

        When cited the preferred order for thens and elses, I was talking about the situations where there is a general case and a corner case. Of course, there is no use in trying to order an if with no probable outcome (or one without an else!). But whenever I use a chain of ifs to test for several conditions, I try to sort them in order of probability, putting first the most probable ones (the code is easier to understand, and in the long run, you have to test less conditions to find the right case). The only exception is when testing for a case is expensive: then I try to put cheaper tests before (one of the few cases when I favor performance over style nowadays).

        Anyway, when writing in a high level language, I try to have in mind how it may get translated into machine instructions and use that knowledge to write more efficient code. I guess how many programmers do that nowadays.

        1. Karellen says:

          I’m not sure I agree:
          if (!success) return;
          is equivalent to:
          if (!success) {
          else {
          …it’s just that in this case the ‘else’ is explicit rather than implicit. Note that this is also equivalent to:
          if (success) {
          You can do a similar thing with “continue”, by reversing the test and nesting the remainder of the loop by one level.

          You were saying that, if success is the common case, you’d pick version 3, because the common case is part of the “then” rather than the “else”. A number of my coworkers also prefer it, for reasons unknown. Whereas I prefer version 1.

          1. Antonio Rodríguez says:

            In this case, version 3 has the advantage that it lets you have a single exit point in the function, apart from the if optimization. Maybe your coworkers use it because of that.

            I prefer to only have exit points in two places: at the very beginning of the function, during parameter and condition validation (fail fast before doing any work!), and a single “work done” point at the last line, returning the value stored in a variable called “res” which is set previously at whichever execution path the function takes. It helps understanding the working of the function, and the single exit point lets you set a breakpoint you are sure will be hit.

          2. Kevin says:

            Of course, an optimizing compiler with knowledge of the target architecture’s branch predictor could convert any of these to the “right” conditional branch arrangement. Ideally, you write what makes sense to you, and the compiler deals with it.

          3. Karellen says:

            I’ve never understood the reasoning behind the “single exit point”. What is up with that?

            Whereas having the code that makes up the actual intended logic of your function be only one or two indents in throughout the entire function body – that reads so much more cleanly to me than having pieces get gradually more and more indented, followed by a long string of “}” lines at the end. I find it makes your source control changesets much cleaner on average, too.

          4. J says:

            Having a single exit point makes it easier to ensure that any temporary resources used during the function have been returned to their prior state. It’s mostly a hangover from when languages didn’t have mechanisms to handle that for you, and all the cleanup code had to be hand-written every time.

            It’s worth noting though, that the compiler isn’t necessarily going to put your blocks in the order you wrote them in the program. For example, given the following code:

            SomeTypeWithADestructor foo;
            if (shouldReturnEarly()) return;
            // … more code

            There are kind of two ways the compiler could arrange things:
            – A block which invokes foo’s destructor, does anything else that needs to be done in the function epilogue, and returns. The if statement is a forwards conditional jump over this block if the shouldReturnEarly() returns false.
            – A forwards conditional jump that jumps to the part of the normal epilogue where foo’s destructor is called.

            Since the function epilogue is largely the same for the early-return and normal-return cases, I would generally expect that the compiler would merge them and implement the early-return just as a conditional forward jump to the right part of the epilogue.

          5. JDG says:

            My personal approach is generally to put the shorter of the two code blocks (“then” vs “else”) first, unless it turns out really awkard. I like having the “else” clause as close as possible, visually, to the root “if” statement, makes it harder to overlook.

          6. smf says:


            If you are putting so much code in your if {} block, that you can no longer see the else then you should be refactoring the code into functional methods. It depends on what the code is trying to do, people jump through hoops to get a single exit and then have to make up other rules which tangle the code further. You can certainly create spaghetti code without touching goto.

          7. Someone says:

            “the single exit point lets you set a breakpoint you are sure will be hit.”

            In .NET as also as Delphi (at least in Debug builts) this is a non-issue. The compiler transforms early exists in jumps to the real exit, where the stack frame gets discarded. Just set a break point at the last “}” or “end;” or “End Sub”

            “a single exit point makes it easier to ensure that any temporary resources”

            Use try/finally or “using” for each and every locally allocated resource. Any language without such constructs is a pain.

        2. smf says:


          Yeah, I don’t understand always using a single exit point. There are times when a single exit point is preferable and times when other patterns are preferable. The code should be clear enough to identify where the exit points are.

          If you’re creating dense code with lots of ? and code jammed together so that you can miss the return statement, then you’re probably missing a whole lot more than the return when you’re reading as well.

          I try to make my code read like english. My brain thanks me for that as it learnt english first.

      3. DWalker07 says:

        Yes, I was thinking about the branch at the end of a loop.

  2. Pierre B. says:

    I have to say I’ve started to hate the assembler design: arithmetic op are in src1, src2, dst order, yet some other op like LDL the dst is the first operand. (Same for branch, where the destination register receiving the return address is put first.)

    The brain has to constantly switch from dst-last to dst-first. Why didn’t they design stuff like LDL to have dst last?

    1. Peter Doubleday says:

      Be thankful it’s not infix …

    2. Antonio Rodríguez says:

      I guess it mimics the physical layout of the encoded instruction. But I think you are right: it’s inconsistent, and makes reading and writing code a lot more difficult and error prone. Luckily, nowadays most code is generated by compilers, and you only have to look at it when debugging: last time I wrote production code in assembly was exactly 20 years ago (August 1997). That’s without taking into account a couple of games I have developed for the Apple II in my spare time, of course, but that’s an entirely different story…

  3. Peter Lund says:

    LDGP? Pseudo instructions in general?

  4. Joshua says:

    So I am finding quite a bit of what I thought I knew to be now quite undermined.

    In particular, the Alpha processor is somehow a super-scalar processor that can’t read one memory address per instruction cycle. This is strange of its own (it must read one memory address per instruction cycle to execute an instruction) for any processor that has separate L1 and L2 caches.

    I did some experiments on an alpha simulator that said that 2 instruction cycles per memory read was a fine amount and lowering it to one yielded hardly any gains, but now I expect that this is because all the compiled alpha code I had to run on the alpha simulator was designed for that. I’m pretty sure now that doing the same on an x86 would yield worse results.

    In addition, their sillyness of using bzip2-0.0.1 for the simulation baseline probably took everything out of it. I suspect it never even got out of the getc() loop to the compression algorithm before the simulation terminated for length. Boo.

    I’ve been kind of pampered as of late with the parallax spin processor and its 496 general purpose registers though. On that processor taking 3 instruction cycles to read or write memory is plenty fast enough.

Skip to main content