How did the invalid floating point operand exception get raised when I disabled it?


Last time, we learned about the dangers of uninitialized floating point variables but left with a puzzle: Why wasn't this caught during internal testing?

I dropped a hint when I described how SNaNs work: You have to ask the processor to raise an exception when it encounters a signaling NaN, and the program disabled that exception. Why was an exception being raised when it had been disabled?

The clue to the cause was that the customer that was encountering the crash reported that it tended to happen after they printed a report. It turns out that the customer's printer driver was re-enabling the invalid operand exception in its DLL_PROCESS_ATTACH handler. Since the exception was enabled, the SNaN exception, which was previously masked, was now live, and it crashed the program.

I've also seen DLLs change the floating point rounding state in their DLL_PROCESS_ATTACH handler. This behavior can be traced back to old versions of the C runtime library which reset the floating point state as part of their DLL_PROCESS_ATTACH; this behavior was corrected as long ago as 2002 (possibly even earlier; I don't know for sure). Obviously that printer driver was even older. Good luck convincing the vendor to fix a bug in a driver for a printer they most likely don't even manufacture any more. If anything, they'll probably just treat it as incentive for you to buy a new printer.

When you load external code into your process, you implicitly trust that the code won't screw you up. This is just another example of how a DLL can inadvertently screw you up.

Sidebar

One might argue that the LoadLibrary function should save the floating point state before loading a library and restore it afterwards. This is an easy suggestion to make in retrospect. Writing software would be so much easier if people would just extend the courtesy of coming up with a comprehensive list of "bugs applications will have that you should protect against" before you design the platform. That way, when a new class of application bugs is found, and they say "You should've protected against this!", you can point to the list and say, "Nuh, uh, you didn't put it on the list. You had your chance."

As a mental exercise for yourself: Come up with a list of "all the bugs that the LoadLibrary function should protect against" and how the LoadLibrary function would go about doing it.

Comments (23)
  1. Anonymous says:

    LoadLibrary shouldn’t have to protect against anything; it’s not its job.

    The problem lies with a design where loading untrusted (meaning potentially buggy, not "wanting to steal your secrets") libraries in your process state is not only common, but expected.

  2. Nathan_works says:

    Would a white list be better than a black list ? Trying to word it right, something like: be aware that these things can be changed when a library is loaded. Though I guess both lists might be very long.. I’ll admit, changing the FP exceptions never would have crossed my mind..

  3. Michiel says:

    At the very least, give the application some control. I didn’t like Dells crapware to start with, but the wxVault.dll which they force on every application infects all applications.

    Hindsight is not having a complete list of bugs that LoadLibrary should fix. Hindsight is not having the printer driver in-process in the first place. Just define a nice COM interface. For backcompat, load the old DLL-based drivers in a COM wrapper. That way, only that one executable needs to worry about such pesky drivers.

  4. CPG says:

    For me dlls that make calls to SetUnhandledExceptionFilter removing my minidump reporter really annoy me.  There are certain global aspects like exceptions and their handlers I wish could be protected.  We ended up using detours to find out who was taking over the filter.  In one case it was a xerox printer driver; in other it was a video driver.

  5. Tom says:

    Bob:  Looks like Anthony Wieser in yesterday’s post is the winner!  Rod Roddy, tell him what prize he’s won today!

    Rod: Anthony Wieser, pack your bags, because you’re are going to faaaabulous Acapulco!  You and a special guest will be spending four days and three nights in the glamorous, five-star Hotel Paradiso!  You’ll be dining the lap of luxury at Restaurant d’Ennui, and having your cares massaged away at the Bango di Fango Spa.  

  6. TEHb says:

    DLLs writeen in Borland C++ love to enable FP exceptions. Recently we had the same bug.

  7. bcthanks says:

    "Good luck convincing the vendor to fix a bug in a driver for a printer they most likely don’t even manufacture any more."

    This is the exact experience that led Richard Stallman to create the GNU Public License 20 years ago: if the vendor won’t fix their bugs, then at least give the customer a chance fix them.

  8. Cereal says:

    Write an exception handler to catch floating point exceptions, disable them and resume execution?

  9. John says:

    This problem, like most, will not be solved until somebody figures out a way to punch somebody in the face over the Internet.

  10. Dmitry Kolosov says:

    1) Thou shalt initialize your variables.

    2) Thou shalt not disable exceptions. Catch them!

    Can’t wait to see the installment about the dangers of unreferencing the NULL pointer…

  11. Triangle says:

    This is why having having state that should be local to a piece of code be implemented as global state is a bad idea.

  12. Mike Dimmick says:

    But protecting the state in LoadLibrary is not enough. Presumably there’s a reason why they wanted to handle floating point that way. You have to intercept every call to the DLL and change the flags back to the way that the DLL wants them. The problem then becomes callbacks – do you set the state back to the way the caller wanted it? How do you know whether an arbitrary parameter to a function is a function pointer? How does Windows even know how many parameters a function takes and how deep to look, in the general case?

    Basically I don’t think there’s a viable workaround here.

  13. This reminds me of an old crashing bug I fixed a very long time ago. There was this tiny snippet of code somewhere which didn’t protect against division by zero. The exception for dividing by zero wasn’t handled because the flags were set so that dividing by zero wasn’t a fault.

    I talked with some other engineers, and they had also experienced this "printer driver sometimes leaves CPU registers in undetermined state" issues before.

    In addition to checking the divide by zero, my fix added a check which was put somewhere global to assert (in debug builds) that the CPU registers were correct, and it would then fix them up so that the chances of this problem affecting other parts of the application were negated.

    The amusing thing is that after fixing this bug, I was given several bugs over the years for this assertion firing.

  14. ConradHex says:

    This article (and the predecessor) were a revelation to me. I’m pretty sure I encountered this exact bug on a PC game I was working on, 5 or 6 years ago.

    The game worked perfectly, except on one machine that was offsite. There it would crash, and when I hooked up a debugger (had to drive like 2 hours to the site) it was getting a floating-point exception. We ended up having to put all kinds of special code in to make sure we weren’t ever dividing by zero, if I recall correctly. (And we had a LOT of floating-point math code.)

  15. Alexandre Grigoriev says:

    Then start an ephemeral thread and have it to call LoadLibrary. Any FP interrupt mask changes will be gone with the thread.

  16. Miral says:

    I’m with Michiel.  Printer drivers should have been loaded out-of-process, so that they can’t interfere with the app.

    (It’s actually really scary when you look at the list of DLLs being attached to any given process.  It’s a wonder with all that third-party code hooking into all sorts of weird places that the programs work at all.  Oh wait, a lot of them don’t.)

  17. Yuhong Bao says:

    More on this:

    http://www.virtualdub.org/blog/pivot/entry.php?id=53

    BTW, loading shell32.dll by itself loads a lot of DLLs, and displaying a dialog like Open/Save loads more include shell extensions.

  18. Yuhong Bao says:

    "For me dlls that make calls to SetUnhandledExceptionFilter removing my minidump reporter really annoy me. "

    Especially when they do not chain back to the previous handler. There is an article on Nynaeve talking about this, but that seems a little overblown. It can be a security hole, yes, but they are talking about a general problem that is not limited to this.

  19. no one in particular says:

    Wow – Raymond Chen as Open-Source (-Driver) evangelist!

    I had never dreamed that!

    "Good luck convincing the vendor to fix a bug in a driver for a printer they most likely don’t even manufacture any more. If anything, they’ll probably just treat it as incentive for you to buy a new printer."

  20. ack says:

    Like for ConradHex, for me it’s also a relevation.

    I haven’t known that so many DLLs change the mask bits of FPU.

    Thanks Yuhong Bao for

    http://www.virtualdub.org/blog/pivot/entry.php?id=53

    which has a nice summary (I believe written by Avery Lee, the author of VirtualDub). Another good summary is linked in comments there, the Gimp related Bugzilla entry:

    http://bugzilla.gnome.org/show_bug.cgi?id=316645

    (see Comment #14 from Raphaël Quinet)

    I’m now inspired to test how a few of my apps interact with a few of the printer drivers.

    Cereal, on the virtualdub link you can read the following (search for Phaeron – 14 06 05 – 00:30):

    "(…) You cannot simply recover by remasking the FPU exceptions and restarting the faulting instruction. The problem is that the x87 FPU doesn’t signal the interrupt until the next floating-point instruction, at which point necessary information to retry the instruction is irretrievably lost. Take this instruction sequence for example:

    FDIV DWORD PTR [EAX]

    XOR EAX, EAX

    FSTP DWORD PTR [EDX]

    A divide-by-zero error here will actually result in the FSTP instruction faulting, not the FDIV."

  21. Dean Harding says:

    "Raymond Chen as Open-Source (-Driver) evangelist!"

    If that surprises you, I suggest you grep the Linux kernel’s CREDITS file for Raymond’s name…

  22. Todd Laney says:

    I remember this exact problem caused the first patch for Flight SImulator 95.  FS95 was crashing out in the field because of a floating point exception, we never could repro in house, so I added code to our exception handler to reset the FP exception state to what it should have been and continue.  That fixed the problems.

    on a side note the exception handlers for FS95 needed to be in a DLL, they did not work right inside the app.  the DLL name we used was DIVTRAP.DLL if you have a old version of flightsim you will have this file.  I cant remember what the reason was, but I bet it would make a good post Raymond.

  23. bw says:

    raymond, is there any documentation on what should be saved between api calls (library loading) [except ESP EBP ESI EDI EBX, direction flag] ?

    i’ve been looking for this information for a long time, for example, is it legal to modify MMX/SSE2 register state between function calls?

Comments are closed.