This is an intellectual exercise: when shifts a 32-bit unsigned integer in C++, how to detect whether the calculation overflows efficiently?

Here is the function prototype. shl_overflow will return true if v << cl overflows (cl is between 0 and 31. And we assume that sizeof(unsigned long) == 4 and sizeof(unsigned long long) == 8).

bool shl_overflow(unsigned long v, int cl)

The most natural way to implement this function is to extend v to 64-bit integer:

bool shl_overflow(unsigned long v, int cl)

{

unsigned long long vl = v;

return (vl << cl >> 32) != 0;

}

Now, let’s dig into the assembly world. We’ll limit the discussion on x86.

mov eax, DWORD PTR _v$[esp-4]

mov ecx, DWORD PTR _cl$[esp-4]

xor edx, edx

call __allshl

xor eax, eax

or eax, edx

jne overflow

The implementation has to use three specific registers: eax, edx and ecx. And there is an expensive external function call.

If you step into __allshl in the debugger, you can find that it will use shld to shift 64-bit integer. VC provides some intrinsics which map to CPU instructions. For example, __ll_lshift will map to shld.

Because the high dword of vl is 0, we can simplify our code:

bool shl_overflow(unsigned long v, int cl)

{

unsigned long long vl = __ll_lshift(v, cl);

return (static_cast<unsigned long>(vl >> 32)) != 0;

}

The assembly looks like:

mov eax, DWORD PTR _v$[esp-4]

mov ecx, DWORD PTR _cl$[esp-4]

xor edx, edx

shld edx, eax, cl

test edx

jne overflow

Much better now.

Another approach is based on bit representation.

bool shl_overflow(unsigned long v, int cl)

{

v = _rotl(v, cl);

unsigned long index;

return _BitScanForward(&index, v) ? index >= cl : false;

}

The idea is simple. If v << cl overflows, that means the most significant cl bits of v should contains "1".

There are two ways to test that.

1. Scan v from the least significant bits to the most, and test the index against 32 – cl. However, we have to handle the case when cl = 0.

2. Rotate v cl bits left first, so the most significant cl bits will be the least significant cl bits. Then we can scan and test the index against cl directly.

Notice that, the scan may fail if v is 0. The second way is simpler and more efficient.

The assembly looks like:

mov ecx, DWORD PTR _cl$[esp-4]

mov eax, DWORD PTR _v$[esp-4]

rol eax, cl

bsf eax, eax

je notoverflow

cmp eax, ecx

jl overflow

It only uses two registers. It can also be extended to handle 64-bit shift. One drawback is an extra conditional jump (The extra jump can be replaced by "cmovz eax, ecx", but there is no way to ask the compiler to generate that)