Integer signum in SSE


The signum function is defined as follows:

signum(x) =  −1  if x < 0
signum(x) =  if x = 0
signum(x) =  +1  if x > 0

There are a couple of ways of calculating this in SSE integers.

One way is to convert the C idiom

int signum(int x) { return (x > 0) - (x < 0); }

The SSE translation of this is mostly straightforward. The quirk is that the SSE comparison functions return −1 to indicate true, whereas C uses +1 to represent true. But this is easy to take into account:

x > 0  ⇔  − pcmpgt(x, 0)
x < 0  ⇔  − pcmpgt(0, x)

Substituting this into the original signum function, we get

signum(x) =  (x > 0)  −  (x < 0)
− pcmpgt(x, 0)  −  − pcmpgt(0, x)
− pcmpgt(x, 0)  +  pcmpgt(0, x)
pcmpgt(0, x)  −  pcmpgt(x, 0)

In assembly:

        ; assume x is in xmm0

        pxor    xmm1, xmm1
        pxor    xmm2, xmm2
        pcmpgtw xmm1, xmm0 ; xmm1 = pcmpgt(0, x)
        pcmpgtw xmm0, xmm2 ; xmm0 = pcmpgt(x, 0)
        psubw   xmm0, xmm1 ; xmm0 = signum
        ; answer is in xmm0

With intrinsics:

__m128i signum16(__m128i x)
{
    return _mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), x),
                         _mm_cmpgt_epi16(x, _mm_setzero_si128()));
}

This pattern extends mutatus mutandis to signum8, signum32, and signum64.

Another solution is to use the signed minimum and maximum opcodes, using the formula

signum(x) = min(max(x, −1), +1)

In assembly:

        ; assume x is in xmm0

        pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
        pmaxsw  xmm0, xmm1
        psrlw   xmm1, 15   ; xmm1 = +1 in all lanes
        pminsw  xmm0, xmm1
        ; answer is in xmm0

With intrinsics:

__m128i signum16(__m128i x)
{
    // alternatively: minusones = _mm_set1_epi16(-1);
    __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
                                        _mm_setzero_si128());
    x = _mm_max_epi16(x, minusones);

    // alternatively: ones = _mm_set1_epi16(1);
    __m128i ones = _mm_srl_epi16(minusones, 15);
    x = _mm_min_epi16(x, ones);

    return x;
}

The catch here is that SSE2 supports only 16-bit signed minimum and maximum; to get other bit sizes, you need to bump up to SSE4. But if you're going to do that, you may as well use the psign instruction. In assembly:

        ; assume x is in xmm0

        pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
        psrlw   xmm1, 15   ; xmm1 = +1 in all lanes
        psignw  xmm1, xmm0 ; apply sign of x to xmm1
        ; answer is in xmm1

With intrinsics:

__m128i signum16(__m128i x)
{
    // alternatively: ones = _mm_set1_epi16(1);
    __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
                                        _mm_setzero_si128());
    __m128i ones = _mm_srl_epi16(minusones, 15);
    return _mm_sign_epi16(ones, x);
}

The psign instruction applies the sign of its second argument to its first argument. We load up the first argument with the value +1 in all lanes, then apply the sign of x, which negates the value if the corresponding lane of x is negative; sets the value to zero if the lane is zero, and leaves it alone if the corresponding lane is positive.

Comments (4)
  1. That was actually fun to read. :)

  2. S.T. says:

    Very interesting indeed.  I cannot but help pointing out, though, that it is "mutatis mutandis" (ablative absolute) and not "mutatus mutandis".

  3. The listed assembly code doesn't do what you intended: you meant to return pcmpgt(0,x)−pcmpgt(x,0), which is xmm1-xmm0 using your register allocation, but in fact you return xmm0-xmm1.  Also, SSE2 does signum8, signum16 and signum32 only; you need SSE4 for 'pcmpgtq'.

  4. Thanks for this article - it is quite interesting

Comments are closed.

Skip to main content