The MIPS R4000, part 5: Memory access (aligned)

The MIPS R4000 has one addressing mode: Register indirect with displacement.

    LW      rd, disp16(rs)  ; rd = *( int32_t*)(rs + disp16)
    LH      rd, disp16(rs)  ; rd = *( int16_t*)(rs + disp16)
    LHU     rd, disp16(rs)  ; rd = *(uint16_t*)(rs + disp16)
    LB      rd, disp16(rs)  ; rd = *(  int8_t*)(rs + disp16)
    LBU     rd, disp16(rs)  ; rd = *( uint8_t*)(rs + disp16)

The load instructions load an aligned word, halfword, or byte from the address specified by adding the 16-bit signed displacement to the source register (known as the "base register").¹ By convention, the displacement can be omitted, in which case it is taken to be zero.

The plain versions of these instructions sign-extend to a 32-bit value; the U versions zero-extend.

There are corresponding aligned store instructions.

    SW      rs, disp16(rd)  ; *( int32_t*)(rd + disp16) = (int32_t)rs
    SH      rs, disp16(rd)  ; *( int16_t*)(rd + disp16) = (int16_t)rs
    SB      rs, disp16(rd)  ; *(  int8_t*)(rd + disp16) = ( int8_t)rs

In all cases, if the effective address turns out not to be suitably aligned, an alignment fault occurs. Windows NT handles the alignment fault by loading the value using the unaligned memory access instructions (which we'll see next time), and then resuming execution. The overhead of the emulation swamps the cost of having done it correctly in the first place, so if you know that the address may be unaligned, then you are far better off using the unaligned memory access instructions instead of having the kernel fix it up for you.

The assembler emulates absolute addressing with the help of the at assembler temporary register. For example, the pseudo-instruction

    LW      rd, global_variable

loads an aligned word from a global variable.

Let A be the address of the global variable, and let

  • YYYY = (int16_t)(A & 0xFFFF) and
  • XXXX = (A − YYYY) >> 16

Then the assembler generates the following two instructions:

    LUI     at, XXXX
    LW      rd, YYYY(at)

Note that if the bottom 16 bits of the address are greater than 0x8000, then that results in a negative value for YYYY, and XXXX will be one greater than the upper 16 bits of the address.

Another pseudo-instruction is

    LW      rd, imm32(rs)

You may want to do this if indexing a global array. A straightforward implementation of the pseudo-instruction would be

    LUI     at, XXXX        ; load high part
    ADDIU   at, at, YYYY    ; add in the low part
    ADDU    at, at, rs      ; add in the byte offset
    LW      rd, (at)        ; load the word

but this can be shortened by an instruction by merging the fixed offset YYYY into the displacement of the effective address calculation in the LW. The result is

    LUI     at, XXXX
    ADDU    at, at, rs
    LW      rd, YYYY(at)

While the assembler emulation is convenient, it may not be the most efficient. If you are accessing the global variable more than once, or if you are accessing more than one variable within the same 64KB region, you can share the LUI instruction among them.

For example, suppose global1 and global2 reside in the same 64KB block of memory.

    ; lazy version of global2 = global1 + 1
    LW      r1, global1
    ADDIU   r1, r1, 1
    SW      r1, global2

This expands to

    LUI     at, XXXX
    LW      r1, YYYY(at)
    ADDIU   r1, r1, 1
    LUI     at, XXXX
    SW      r1, ZZZZ(at)

You can factor out the XXXX into a register that you reuse for the entire section of code.

    ; sneakier version of global2 = global1 + 1
    LUI     r2, XXXX
    LW      r1, YYYY(r2)
    ADDIU   r1, r1, 1
    SW      r1, ZZZZ(r2)
    ; can keep using r2 to access other variables in the block

In theory, you could even store constants in your data segment, but since loading a 32-bit constant takes only two instructions at most, you probably won't bother.

Next time, we'll look at unaligned access.

¹ In earlier versions of the MIPS architecture, there was a load delay slot: The value retrieved by a load instruction was not available until two instructions later.

We saw last time that the MIPS architecture supports forwarding of arithmetic computations. Why can't it forward memory access?

The memory stage comes after the execute stage. This means that the result of a memory load in the memory stage cannot be forwarded into the execute stage of the next instruction, because the memory stage of the first instructions takes place at the same time as the execute stage of the second instruction. The earliest the result of the load can be consumed is two instructions later.

That means that in the sequence

    LW      r1, (r2)        ; load word from r2 into r1
    ADDIU   r3, r1, 1       ; r3 = r1 + 1

the ADDIU instruction operated on the old value of r1,² not the value that was loaded from memory. If you want to add 1 to the value loaded from memory, you need to insert some other instruction in the load delay slot:

    LW      r1, (r2)        ; load word from r2 into r1
    NOP                     ; load delay slot
    ADDIU   r3, r1, 1       ; r3 = r1 + 1

The MIPS III architecture removed the load delay slot. On the R4000, if you try to access the value of a register immediately after loading it, the processor stalls until the value becomes ready. Sure, the stall is bad, but it's better than running ahead with the wrong value!

² This is true only if no hardware interrupt occurred. If an interrupt occurred, then the load would complete during the kernel transition, and then when the kernel resumed execution, the ADDIU would operate on the loaded value after all. Therefore, the value of the destination register of a load instruction should be treated as garbage until the load delay clears.

Comments (9)
  1. 12BitSlab says:

    In some ways, it reminds me of Base/Displacement addressing on S/360. S/360 used that scheme because any 32 bit address could be represented in 16 bits. The first four bits represented a 32 bit register and the remaining 12 bits was the added displacement from the base.

    When S/360 was designed, memory was the most expensive component.

    Of course, we didn’t have to worry about any load delays. I imagine that could cause more than a couple of hard to find bugs.

  2. laonianren says:

    LW t0,8(t1)

    Worst syntax ever! I wonder if it was inspired by C arrays where you can write buffer[8] as 8[buffer].

    1. Well, you might think it’s the worst, but it’s also very popular among multiple assembly languages. (And I prefer it over syntax like LW t0, t1, 8.)

      1. laonianren says:

        So I did some reading. I didn’t realise this style of address offset is actually very common being x86 AT&T syntax. I still don’t like it though!

        Having done 6502 before ARM I found ARM-style LW t0,[t1,#8] somewhat natural in its use of #. And LW t0,[t1,t2] is still moderately readable because it’s obvious which bits are getting added together. Though I think Intel got it right: use a plus sign to represent addition.

    2. Matteo Italia says:

      Clearly you haven’t seen AT&T syntax for x86 memory operands.

    3. dandan says:

      More like the C syntax was inspired by assembly. You have your history quite backwards.

      Of course, the correct syntax would be (lw t0 (+ t1 8)).

    4. Falcon says:

      68k assembly had similar syntax:

      MOVE.L 8(a0), d0

    5. Simon Clarkstone says:

      This reminds me of reading a paper on the world’s first CAD program, Sketchpad. The authour spent a few pages describing what we now call structs and C’s -> operator, because they were unfamiliar to programmers at the time. C’s -> operator is now thought of as “get struct pointer, add field offset, fetch/store at the resulting address”, but in Sketchpad was implemented using addressing modes normally used to index arrays at static addresses with registers (the sort of arrays that Fortran etc programs would use lots of).

      Nowadays, if you want the equivalent of C’s -> operator then “field_offset(pointer_register)” is a weird syntax, but if you want to index an array at a static address using a register then “array_start(index_register)” is normal Algol-descendant syntax, especially because you’d be using a named constant for array_start.

      1. Joker_vD says:

        Oh, those antiquated relics from the Heroic (or, as some would call it, Dark) Age of the CS! They belong in the museum, but alas, one doesn’t simply change the assembler syntax.

Comments are closed.

Skip to main content