The MIPS R4000 has one addressing mode: Register indirect with displacement.
LW rd, disp16(rs) ; rd = *( int32_t*)(rs + disp16)
LH rd, disp16(rs) ; rd = *( int16_t*)(rs + disp16)
LHU rd, disp16(rs) ; rd = *(uint16_t*)(rs + disp16)
LB rd, disp16(rs) ; rd = *( int8_t*)(rs + disp16)
LBU rd, disp16(rs) ; rd = *( uint8_t*)(rs + disp16)
The load instructions load an aligned word, halfword, or byte from the address specified by adding the 16-bit signed displacement to the source register (known as the "base register").¹ By convention, the displacement can be omitted, in which case it is taken to be zero.
The plain versions of these instructions sign-extend to a 32-bit value;
the U versions zero-extend.
There are corresponding aligned store instructions.
SW rs, disp16(rd) ; *( int32_t*)(rd + disp16) = (int32_t)rs
SH rs, disp16(rd) ; *( int16_t*)(rd + disp16) = (int16_t)rs
SB rs, disp16(rd) ; *( int8_t*)(rd + disp16) = ( int8_t)rs
In all cases, if the effective address turns out not to be suitably aligned, an alignment fault occurs. Windows NT handles the alignment fault by loading the value using the unaligned memory access instructions (which we'll see next time), and then resuming execution. The overhead of the emulation swamps the cost of having done it correctly in the first place, so if you know that the address may be unaligned, then you are far better off using the unaligned memory access instructions instead of having the kernel fix it up for you.
The assembler emulates absolute addressing with the help of the at assembler temporary register. For example, the pseudo-instruction
LW rd, global_variable
loads an aligned word from a global variable.
Let A be the address of the global variable, and let
YYYY = (int16_t)(A & 0xFFFF)andXXXX = (A − YYYY) >> 16
Then the assembler generates the following two instructions:
LUI at, XXXX
LW rd, YYYY(at)
Note that if the bottom 16 bits of the address
are greater than 0x8000,
then that results in a negative value for YYYY,
and XXXX will be one greater than the upper 16 bits
of the address.
Another pseudo-instruction is
LW rd, imm32(rs)
You may want to do this if indexing a global array. A straightforward implementation of the pseudo-instruction would be
LUI at, XXXX ; load high part
ADDIU at, at, YYYY ; add in the low part
ADDU at, at, rs ; add in the byte offset
LW rd, (at) ; load the word
but this can be shortened by an instruction
by merging the fixed offset YYYY into the displacement
of the effective address calculation in the LW.
The result is
LUI at, XXXX
ADDU at, at, rs
LW rd, YYYY(at)
While the assembler emulation is convenient,
it may not be the most efficient.
If you are accessing the global variable more than once,
or if you are accessing more than one variable within the same
64KB
region,
you can share the LUI instruction among them.
For example, suppose global1 and
global2 reside in the same
64KB
block of memory.
; lazy version of global2 = global1 + 1
LW r1, global1
ADDIU r1, r1, 1
SW r1, global2
This expands to
LUI at, XXXX
LW r1, YYYY(at)
ADDIU r1, r1, 1
LUI at, XXXX
SW r1, ZZZZ(at)
You can factor out the XXXX into a register
that you reuse for the entire section of code.
; sneakier version of global2 = global1 + 1
LUI r2, XXXX
LW r1, YYYY(r2)
ADDIU r1, r1, 1
SW r1, ZZZZ(r2)
; can keep using r2 to access other variables in the block
In theory, you could even store constants in your data segment, but since loading a 32-bit constant takes only two instructions at most, you probably won't bother.
Next time, we'll look at unaligned access.
¹ In earlier versions of the MIPS architecture, there was a load delay slot: The value retrieved by a load instruction was not available until two instructions later.
We saw last time that the MIPS architecture supports forwarding of arithmetic computations. Why can't it forward memory access?
The memory stage comes after the execute stage. This means that the result of a memory load in the memory stage cannot be forwarded into the execute stage of the next instruction, because the memory stage of the first instructions takes place at the same time as the execute stage of the second instruction. The earliest the result of the load can be consumed is two instructions later.
That means that in the sequence
LW r1, (r2) ; load word from r2 into r1
ADDIU r3, r1, 1 ; r3 = r1 + 1
the ADDIU instruction operated on the old
value of r1,² not the value that was loaded from memory.
If you want to add 1 to the value loaded from memory, you need
to insert some other instruction in the load delay slot:
LW r1, (r2) ; load word from r2 into r1
NOP ; load delay slot
ADDIU r3, r1, 1 ; r3 = r1 + 1
The MIPS III architecture removed the load delay slot. On the R4000, if you try to access the value of a register immediately after loading it, the processor stalls until the value becomes ready. Sure, the stall is bad, but it's better than running ahead with the wrong value!
²
This is true only if no hardware interrupt occurred.
If an interrupt occurred, then the load would complete during
the kernel transition,
and then
when the kernel resumed execution, the ADDIU would
operate on the loaded value after all.
Therefore, the value of the destination register of a load instruction
should be treated as garbage until the load delay clears.
In some ways, it reminds me of Base/Displacement addressing on S/360. S/360 used that scheme because any 32 bit address could be represented in 16 bits. The first four bits represented a 32 bit register and the remaining 12 bits was the added displacement from the base.
When S/360 was designed, memory was the most expensive component.
Of course, we didn’t have to worry about any load delays. I imagine that could cause more than a couple of hard to find bugs.
LW t0,8(t1)
Worst syntax ever! I wonder if it was inspired by C arrays where you can write buffer[8] as 8[buffer].
Well, you might think it’s the worst, but it’s also very popular among multiple assembly languages. (And I prefer it over syntax like
LW t0, t1, 8.)So I did some reading. I didn’t realise this style of address offset is actually very common being x86 AT&T syntax. I still don’t like it though!
Having done 6502 before ARM I found ARM-style LW t0,[t1,#8] somewhat natural in its use of #. And LW t0,[t1,t2] is still moderately readable because it’s obvious which bits are getting added together. Though I think Intel got it right: use a plus sign to represent addition.
Clearly you haven’t seen AT&T syntax for x86 memory operands.
More like the C syntax was inspired by assembly. You have your history quite backwards.
Of course, the correct syntax would be (lw t0 (+ t1 8)).
68k assembly had similar syntax:
MOVE.L 8(a0), d0
This reminds me of reading a paper on the world’s first CAD program, Sketchpad. The authour spent a few pages describing what we now call structs and C’s -> operator, because they were unfamiliar to programmers at the time. C’s -> operator is now thought of as “get struct pointer, add field offset, fetch/store at the resulting address”, but in Sketchpad was implemented using addressing modes normally used to index arrays at static addresses with registers (the sort of arrays that Fortran etc programs would use lots of).
Nowadays, if you want the equivalent of C’s -> operator then “field_offset(pointer_register)” is a weird syntax, but if you want to index an array at a static address using a register then “array_start(index_register)” is normal Algol-descendant syntax, especially because you’d be using a named constant for array_start.
Oh, those antiquated relics from the Heroic (or, as some would call it, Dark) Age of the CS! They belong in the museum, but alas, one doesn’t simply change the assembler syntax.