The case of the inconsistent right shift results…

One of our testers just filed a bug against something I’m working on.  They reported that if they compiled code which calculated: 1130149156 >> –05701653 it generated different results on 32bit and 64bit operating systems.  On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.

I realized that I could produce a simple reproduction for the scenario to dig into it a bit deeper:

 int _tmain(int argc, _TCHAR* argv[])
{
    __int64 shift = 0x435cb524;
    __int64 amount = 0x55;
    __int64 result = shift >> amount;
    std::cout << shift << " >> " << amount << " = " << result << std::endl;
    return 0;
}

That’s pretty straightforward and it *does* reproduce the behavior.  On x86 it reports 0 and on x64 it reports 0x21a.  I can understand the x86 result (you’re shifting right more than the processor size, it shifts off the end and you get 0) but not the x64. What’s going on?

Well, for starters I asked our C language folks.  I know I’m shifting by more than the processor word size (85), but the results should be the same, right?

Well no.  The immediate answer I got was:

From C++ 03, 5.8/1: The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

Ok.  It’s undefined behavior.  But that doesn’t really explain the difference.  When in doubt, let’s go to the assembly….

 000000013F5215D3  mov         rax,qword ptr [amount]  
000000013F5215D8  movzx       ecx,al  
000000013F5215DB  mov         rax,qword ptr [shift]  
000000013F5215E0  sar         rax,cl  
000000013F5215E3  mov         qword ptr [result],rax  
000000013F5215E8  mov         rdx,qword ptr [shift] 

The relevant instruction is highlighted.  It’s doing a shift arithmetic right of “shift” by “amount”.

What about the x86 version?

 00CC14CA  mov         ecx,dword ptr [amount]  
00CC14CD  mov         eax,dword ptr [shift]  
00CC14D0  mov         edx,dword ptr [ebp-8]  
00CC14D3  call        @ILT+85(__allshr) (0CC105Ah)  
00CC14D8  mov         dword ptr [result],eax  
00CC14DB  mov         dword ptr [ebp-28h],edx  

Now that’s interesting.  The x64 version is using a processor shift function but on 32bit machines, it’s using a C runtime library function (__allshr).  And the one that’s weird is the x64 version.

While I don’t have an x64 processor manual, I *do* have a 286 processor manual from back in the day (I have all sorts of stuff in my office).  And in my 80286 manual, I found:

“If a shift count greater than 31 is attempted, only the bottom five bits of the shift count are used. (the iAPX 86 uses all eight bits of the shift count.)”

A co-worker gave me the current text:

The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

So the mystery is now solved.  The shift of 0x55 only considers the low 6 bits.  The low 6 bits of 0x55 is 0x15 or 21.  0x435cb524 >> 21 is 0x21a.

One could argue that this is a bug in the __allshr function on x86 but you really can’t argue with “the behavior is undefined”.  Both scenarios are doing the “right thing”.  That’s the beauty of the “behavior is undefined” wording.  The compiler would be perfectly within spec if it decided to reformat my hard drive when it encountered this (although I’m happy it doesn’t Smile).

Now our feature crew just needs to figure out how best to resolve the bug.