Debugging a crash due to heavy floating point operations

There are several scenarios where a program including .NET applications perform a huge amount of floating point operations. In .NET Applications, operations like adding floating point numbers, Double.Equals() , etc. are good examples of floating point operations.

Generally, floating point operations are carried out will the help of the FPCW (Floating Point Control Word) register. Managed code is mostly unlikely to touch this code but the underlying native code reads status and performs FPU control operations with the help of this register.

Sometimes, we may see some exceptions arising in managed code when the contents of the FPCW register are altered and not reset appropriately. There may be various reasons why the value was not properly reset.

However, let’s see how we can drill down the root cause of such exceptions if the FPCW register is the altered by native code and a managed code becomes the victim of it.

I am going to use Windbg for this diagnosis. Let’s say you have a managed application (Executable) and you know that it does a lot of FP operations and usually crashes with System.Double Exceptions or exceptions related to FP (like divide by Zero, etc.) .

1)      Let’s go ahead and fist launch our managed application from the Windbg. File -> Open Executable

2)      Then you navigate to the executable and launch it. The debugger breaks and waits for permission to proceed.

3) At this point we configure the debugger to dump all the modules when the FPCW is altered. We run the sxe -c "~*e r@fpcw;g" ld command.

4) Hit “g” and let the executable proceed with execution.

5) The application runs for a while and then breaks due to a floating point exception breaking t into the debugger as follows :-

fpcw=0000027f

fpcw=0000027f

ModLoad: 0d9f0000 0da19000 E:\Problem\Problematic.dll <--

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000007f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

fpcw=0000027f

  (d80.5c8): Unknown exception - code c0000090 (first chance)

(d80.5c8): Unknown exception - code c0000090 (first chance)

(d80.5c8): Unknown exception - code c0000090 (first chance)

(d80.5c8): Unknown exception - code c0000090 (first chance)

(d80.5c8): Unknown exception - code c0000090 (first chance)

(d80.5c8): Unknown exception - code c0000090 (first chance)

:::::::::::::::::CRASH::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

eax=000000c0 ebx=7744bd5c ecx=00000ca0 edx=0000000c esi=04da6cc4 edi=77434e40

eip=77a470b4 esp=07e1fdb4 ebp=07e1fde0 iopl=0 nv up ei pl zr na pe nc

cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

ntdll!KiFastSystemCallRet:

77a470b4 c3 ret

 

 

6) As you approach towards the end of the crash, you will see the last module that altered the FPCW. In our case , its E:\Problem\Problematic.dll   

7) It is highly possible that the component is an unmanaged user mode component of the assembly that last visited the FPCW before the crash.

8) The problem can be solved by removing the above component and test the application.

9) Above scenario can occur sometimes (not always) and hence it helps sometimes to check the FPCW register and changes to it.

 

Happy Debugging!!