Limiting the bottom byte of an XMM register and clearing the other bytes


Suppose you have a value in an XMM register and you want to limit the bottom byte to a particular value and set all the other bytes to zero. (Yes, I needed to do this.)

One way to do this is to apply the two steps in sequence:

; value to truncate/limit is in xmm0

; First, zero out the top 15 bytes
    pslldq  xmm0, 15
    psrldq  xmm0, 15

; Now limit the bottom byte to N
    mov     al, N
    movd    xmm1, eax
    pminub  xmm0, xmm1

But you can do it all in one step by realizing that min(x, 0) = 0 for all unsigned values x.

; value to truncate/limit is in xmm0
    mov     eax, N
    movd    xmm1, eax
    pminub  xmm0, xmm1

In pictures:

xmm0 xmm1 xmm0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
x min N = min(x, N)

In intrinsics:

__m128i min_low_byte_and_set_upper_bytes_to_zero(__m128i x, uint8_t N)
{
 return _mm_min_epi8(x, _mm_cvtsi32_si128(N));
}
Comments (9)
  1. Matt says:

    Raymond – what were you working on that needed all of this MMX stuff?

    [Writing a CPU emulator, y'know, just for fun. -Raymond]
  2. Joshua says:

    [Writing a CPU emulator, y'know, just for fun. -Raymond]

    Well OK then.

  3. Kevin says:

    One of my coworkers is implementing a filesystem in node.js, just for fun.

    I should probably find something more geeky to work on in my spare time…

  4. sdfsdfsdfasdfasdfasdfasdfasdf says:

    [Writing a CPU emulator, y'know, just for fun. -Raymond]

    Oooh, I can speculate on which CPU it is. 3.2 GHz PowerPC Tri-Core Xenon. For Xbox 360 game emulation on XB1.

    You go, Raymond. Show those CPUs who's the systems engineer

  5. @sdf keyboard masher:

    The tricky part of console emulation isn't emulating the CPU.  It's emulating the GPU and timings between CPU/GPU/RAM communication that games can depend on very critically in order to eke out every last bit of performance possible.  That's typically why most systems that provide full backwards compatibility include hardware from the previous generation (which even then can still break a few of the more sensitive games), and why most software high-level emulation often can only provide partial compatibility with previous-gen titles without specific fixes.

  6. Zan Lynx' says:

    Yeah. It is pretty surprising how much CPU is needed to emulate even really old hardware like the C64 and Atari systems. The processor is easy. But then there's the sound and video chips, and all of their very precise timing behavior. Emulators even need to include things like the CRT beam scan position.

    In that regard, current game consoles are pretty far away from "the metal".  Pff, they don't even need to use a timing loop to flip sprite positions every 30 scan lines. Which was broken if your CPU wasn't running at exactly 1 MHz.

  7. I thought it was 1.22 MHz?  And that's assuming you're using NTSC monitors.  PAL was a whole other story and required a completely different CPU.

  8. John Barton says:

    In the case of the Atari 800, the clock frequency was 1/2 of colorburst or about 1.79 MHz.

  9. Skyborne says:

    I never thought emulation was too interesting (how hard can it be to translate some opcodes?) but then I ran across this grand experiment: andrewkelley.me/…/jamulator.html  The author sets out to write a recompiler for NES games to run natively…

Comments are closed.

Skip to main content