The signum function is defined as follows:

signum(x) = |
−1 | if x < 0 |

signum(x) = |
0 | if x = 0 |

signum(x) = |
+1 | if x > 0 |

There are a couple of ways of calculating this in SSE integers.

One way is to convert the C idiom

int signum(int x) { return (x > 0) - (x < 0); }

The SSE translation of this is mostly straightforward.
The quirk is that the SSE comparison functions return −1
to indicate `true`

,
whereas C uses +1 to represent `true`

.
But this is easy to take into account:

x > 0 |
⇔ | − pcmpgt(x, 0) |

x < 0 |
⇔ | − pcmpgt(0, x) |

Substituting this into the original `signum`

function,
we get

signum(x) = |
(x > 0) |
− | (x < 0) |

= | − pcmpgt(x, 0) |
− | − pcmpgt(0, x) |

= | − pcmpgt(x, 0) |
+ | pcmpgt(0, x) |

= | pcmpgt(0, x) |
− | pcmpgt(x, 0) |

In assembly:

; assume x is in xmm0 pxor xmm1, xmm1 pxor xmm2, xmm2 pcmpgtw xmm1, xmm0 ; xmm1 = pcmpgt(0, x) pcmpgtw xmm0, xmm2 ; xmm0 = pcmpgt(x, 0) psubw xmm0, xmm1 ; xmm0 = signum ; answer is in xmm0

With intrinsics:

__m128i signum16(__m128i x) { return _mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), x), _mm_cmpgt_epi16(x, _mm_setzero_si128())); }

This pattern extends *mutatus mutandis* to
`signum8`

,
`signum32`

,
and
`signum64`

.

Another solution is to use the signed minimum and maximum opcodes, using the formula

signum(x) = min(max(x, −1), +1) |

In assembly:

; assume x is in xmm0 pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes pmaxsw xmm0, xmm1 psrlw xmm1, 15 ; xmm1 = +1 in all lanes pminsw xmm0, xmm1 ; answer is in xmm0

With intrinsics:

__m128i signum16(__m128i x) { // alternatively: minusones = _mm_set1_epi16(-1); __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(), _mm_setzero_si128()); x = _mm_max_epi16(x, minusones); // alternatively: ones = _mm_set1_epi16(1); __m128i ones = _mm_srl_epi16(minusones, 15); x = _mm_min_epi16(x, ones); return x; }

The catch here is that
SSE2 supports only 16-bit signed minimum and maximum;
to get other bit sizes, you need to bump up to SSE4.
But if you're going to do that, you may as well use the
`psign`

instruction.
In assembly:

; assume x is in xmm0 pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes psrlw xmm1, 15 ; xmm1 = +1 in all lanes psignw xmm1, xmm0 ; apply sign of x to xmm1 ; answer is in xmm1

With intrinsics:

__m128i signum16(__m128i x) { // alternatively: ones = _mm_set1_epi16(1); __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(), _mm_setzero_si128()); __m128i ones = _mm_srl_epi16(minusones, 15); return _mm_sign_epi16(ones, x); }

The `psign`

instruction applies the sign of its second
argument to its first argument.
We load up the first argument
with the value `+1`

in all lanes,
then apply the sign of `x`,
which negates the value if the corresponding lane of `x`
is negative;
sets the value to zero if the lane is zero,
and leaves it alone if the corresponding lane is positive.

That was actually fun to read. :)

Very interesting indeed. I cannot but help pointing out, though, that it is "mutatis mutandis" (ablative absolute) and not "mutatus mutandis".

The listed assembly code doesn't do what you intended: you meant to return pcmpgt(0,x)−pcmpgt(x,0), which is xmm1-xmm0 using your register allocation, but in fact you return xmm0-xmm1. Also, SSE2 does signum8, signum16 and signum32 only; you need SSE4 for 'pcmpgtq'.

Thanks for this article – it is quite interesting