The MIPS R4000 has the usual collection of arithmetic operations, but the mnemonics are confusingly-named. The general notation for arithmetic operations is
OP destination, source1, source2
with the destination register on the left and the source register or registers on the right.
Okay, here goes. We start with addition and subtraction.
ADD rd, rs, rt ; rd = rs + rt, trap on overflow
ADDU rd, rs, rt ; rd = rs + rt, no trap on overflow
SUB rd, rs, rt ; rd = rs - rt, trap on overflow
SUBU rd, rs, rt ; rd = rs - rt, no trap on overflow
The ADD and SUB instructions perform
addition and subtraction and raise a trap if a signed overflow occurs.
The ADDU and SUBU instructions do the
same thing, but without the overflow trap.
The U suffix officially means "unsigned",
but this is confusing because the addition can be performed on both
signed and unsigned values, thanks to twos complement.
The real issue is whether an overflow trap is raised.
There are also versions of the addition instructions that accept a 16-bit signed immediate as a second addend:
ADDI rd, rs, imm16 ; rd = rs + (int16_t)imm16, trap on overflow
ADDIU rd, rs, imm16 ; rd = rs + (int16_t)imm16, no trap on overflow
Note that the U is double-confusing here,
because even though the U officially stands for
"unsigned",
the immediate value is treated as signed,
and the addition is suitable for both signed and unsigned values.
There are no corresponding SUBI or SUBIU
instructions,
but they can be synthesized:
ADDI rd, rs, -imm16 ; SUBI rd, rs, imm16
ADDIU rd, rs, -imm16 ; SUBIU rd, rs, imm16
(Of course, this doesn't work if the value you want to subtract is −32768, but hey, it mostly works.)
The next group of instructions is the bitwise operations. These never trap.¹
AND rd, rs, rt ; rd = rs & rt
ANDI rd, rs, imm16 ; rd = rs & (uint16_t)imm16
OR rd, rs, rt ; rd = rs | rt
ORI rd, rs, imm16 ; rd = rs | (uint16_t)imm16
XOR rd, rs, rt ; rd = rs ^ rt
XORI rd, rs, imm16 ; rd = rs ^ (uint16_t)imm16
NOR rd, rs, rt ; rd = ~(rs | rt)
Note the inconsistency: The addition instructions treat the immediate as a signed 16-bit value (and sign-extend it to a 32-bit value), but the bitwise logical operations treat it as an unsigned 16-bit value (and zero-extend it to a 32-bit value). Stay alert!
The last group of instructions for today is the shift instructions. These also never trap.
SLL rd, rs, imm5 ; rd = rs << imm5
SLLV rd, rs, rt ; rd = rs << (rt % 32)
SRL rd, rs, imm5 ; rd = rs >>U imm5
SRLV rd, rs, rt ; rd = rs >>U (rt % 32)
SRA rd, rs, imm5 ; rd = rs >> imm5
SRAV rd, rs, rt ; rd = rs >> (rt % 32)
The mnemonics stand for "shift left logical",
"shift right logical"
and "shift right arithmetic".
The V suffix stands for "variable",
and indicates that the shift amount comes from a register
rather than an immediate.
Yup, that's another inconsistency.
Following the pattern of the addition and bitwise logical groups,
these instructions should have been named
SLL for shifting by an amount specified by a register
and
SLLI for shifting by an amount specified by an
immediate.
Go figure.
There are no built-in sign-extension or zero-extension instructions. You can get zero-extension in one instruction by explicitly masking out the upper bytes:
; zero extend byte to word
ANDI rd, rs, 0xFF ; rd = ( uint8_t)rs
; zero extend halfword to word
ANDI rd, rs, 0xFFFF ; rd = (uint16_t)rs
Sign extension requires two instructions.
; sign extend byte to word
SLL rd, rs, 24 ; rd = rs << 24
SRA rd, rd, 24 ; rd = (int32_t)rd >> 24
; sign extend halfword to word
SLL rd, rs, 16 ; rd = rs << 16
SRA rd, rd, 16 ; rd = (int32_t)rd >> 16
And I'm going to mention these instructions here because I can't find a good place to put them:
SYSCALL imm20 ; system call
BREAK imm20 ; breakpoint
Both instructions trap into the kernel. The system call instruction is intended to be used to make operation system calls; the breakpoint instruction is intended to be used for software breakpoints. Both instructions carry a 20-bit immediate payload that can be used for whatever purpose the operating system chooses.
Here are some more instructions you can synthesize from the official instructions:
SUB rd, zero, rs ; NEG rd, rs
SUBU rd, zero, rs ; NEGU rd, rs
ADDU rd, zero, rs ; MOVE rd, rs
OR rd, zero, rs ; MOVE rd, rs
NOR rd, zero, rs ; NOT rd, rs
SLL zero, zero, 0 ; NOP
SLL zero, zero, 1 ; SSNOP
There are many possible ways of synthesizing a MOVE
instruction,
but in order to be able to unwind exceptions,
Windows NT requires that register motion in the prologue or
epilogue of a function
must take one of the two forms given above.
Similarly, there are many ways of performing a NOP.
Basically, any non-trapping 32-bit
computation that targets the zero
register is functionally a nop,
but the two above are treated specially by the processor.
-
NOP=SLL zero, zero, 0is special-cased by the processor as a nop that can be optimized out entirely. Use it when you need to pad out some code for space. -
SSNOP=SLL zero, zero, 1is special-cased by the processor as a nop that must be issued, and it will not be simultaneously issued with any other instruction. Use it when you need to pad out some code for time. (TheSSstands for "super-scalar".)
The encoding of SLL zero, zero, 0 happens to be
0x00000000,
which I'm sure is not a coincidence.
I'm not convinced that it's a good idea, though.
I would have chosen 0x00000000
to be the encoding of a breakpoint or invalid instruction.
Okay, those are the 32-bit computation instructions. Next time, we'll look at multiplication, division, and the temperamental HI and LO registers.
¹
Alas, there is no
NORI instruction.
You think I'm joking, but I'm not.
Be patient.
In the 6502, the BRK instruction’s opcode is 0x00 ($00 in 6502 notation). This has the side effect of dropping into the debugger whenever there is a jump to a position in memory that doesn’t contain code (0 is pretty common among data or uninitialized memory, so there is a big chance that the processor will hit a zero pretty soon if something goes bad).
Just be careful: SLL 0, 0, 0 might not work properly on some processors: https://apebox.org/wordpress/linux/545
Having all-bits-zero be a valid instruction is definitely not a good idea… having it be NOP is worse. It’s easy to have a bug in your virtual memory system that results in getting pages of zeros where you intended to have code; if that happens, and you jump to that page, the processor silently executes the NOPs. Then it either (if you’re lucky) runs off the end of that chunk of memory and triggers a fatal trap, or (otherwise) gets to a non-zeroed page, where it will continue executing in some entirely unrelated code. When that code then crashes it can be very difficult to figure out what happened or how you managed to get a clearly impossible call stack.
Another not-good idea is how some microcontrollers such as ARM Cortex-M have vector table at 0x0000. Having null pointer be a mapped address is bad enough, having it be a mapped address that always contains lots of valid pointers is even worse. It’s easy to end up in a very confusing state when calling e.g. a virtual function on a null pointer to C++ object.
The x86 has that in real mode. I’m 2 steps from the guy who formatted his hd by it.
x86 realmode has the same issue, with the IVT living at address 0:0.
In fact, if I’m not totally mistaken, there were special hacks in Win95 (using expand-down segments) to protect it from misbehaving applications.
0x00 was the NOP instruction on the Z80 as well (the first processor I ever got down to machine code with). Of course, not many people were putting a virtual memory system onto one of those.
I had a Z80-based S-100 box (made by Digitex) that sported 256KB of RAM and a 20MB hard drive. It was also a multi-user system, so it’s more than likely that it wasn’t using virtual memory as we know it now, but just switching between static pages on context switches.
The Motorola 68k had both a 16-bit NOP with a specific code, and a 0x00000000 instruction that decoded to “ORI.b #0, D0” (or in C terms, “D0 |= 0”) which is which is, in fact, also a no-op.
Close to a no-op, but not quite – it affected condition codes.
Well thinking about it, it might not be a complete no-op: It’s possible it sets the status bits (zero, negative etc.) according to the D0 register.
> The U suffix officially means “unsigned”, but this is confusing because the addition can be performed on both signed and unsigned values, thanks to twos complement. The real issue is whether an overflow trap is raised.
Not that this makes them a good name, but I wonder if the names come from C. C signed int overflow is undefined behavior, meaning a trap is an acceptable response (and I actually think is a /good/ response, though I’m sure that’s controversial), so the trapping ADD/SUB instructions can be used. Unsigned overflow* /is/ defined though, so it can’t trap — or rather, it could, but then it would just have to continue execution, so that’s slow — and you have to use the non-trapping ADDU/SUBU instructions.
I don’t know what MIPS compliers actually do, and Compiler Explorer doesn’t seem to have a MIPS option. It’d be interesting to know.
* Technically, the C and C++ standards say that, because it’s defined, unsigned arithmetic /can’t/ overflow, because overflow is UB. But I’m not convinced anyone uses “overflow” in that sense who doesn’t read the C/C++ standards for fun. :-)
I’ve definitely seen C++ code that overflows signed integers on purpose. In an emulator for the sound chip of the SNES, the calculations are done in 16-bit 2’s complement, and to get the original wraparound behavior (needed to get the wind sound in Chrono Trigger), you have to do essentially this:
int32_t value = adpcm_decode();
value = value <> 16;
No you don’t “have to do” any such things. Undefined behavior is exactly that: undefined. Writing the correct code is a negligible extra amount of work and means your code will actually work.