Why aren’t compatibility workarounds disabled when a debugger is attached?


Ken Hagan wonders why compatibility workarounds aren't simply disabled when a debugger is attached.

As I noted earlier, many compatibility workarounds are actually quicker than the code that detects whether the workaround would be needed.

BOOL IsZoomed(HWND hwnd)
{
  return GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED;
}

Now suppose you find a compatibility problem with some applications that expect the IsZoomed function to return exactly TRUE or FALSE. You then change the function to something like this:

BOOL IsZoomed(HWND hwnd)
{
  return (GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED) != 0;
}

Now, we add code to enable the compatibility workaround only if the application is on the list of known applications which need this workaround:

BOOL IsZoomed(HWND hwnd)
{
  if (GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED) {
      if (IsApplicationCompatibilityWorkaroundRequired(ISZOOMED_TRUEFALSE)) {
          return TRUE;
      } else {
          return WS_MAXIMIZED;
      }
  } else {
    return FALSE;
  }
}

What was a simple flag test now includes a check to see whether an application compatibility workaround is required. These checks are not cheap, because the compatibility infrastructure needs to look up the currently-running application in the compatibility database, check that the version of the application that is running is the one the compatibility workaround is needed for (which could involve reading the file version resource or looking for other identifying clues), and then returning either the compatible answer (TRUE) or the answer that resulted from the original simple one-line function.

So not only is the function slower (having to do a compatibility check), it also looks really stupid.

Oh wait, now we also have to stick in a debugger check:

BOOL IsZoomed(HWND hwnd)
{
  if (GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED) {
      if (!IsDebuggerPresent() &&
         IsApplicationCompatibilityWorkaroundRequired(ISZOOMED_TRUEFALSE)) {
          return TRUE;
      } else {
          return WS_MAXIMIZED;
      }
  } else {
    return FALSE;
  }
}

And then people complain that Windows is slow and bloated: A simple one-line function ballooned into ten lines.

Another reason why these compatibility workarounds are left intact when a debugger is running is that changing program behavior based on whether a debugger is attached would prevent application vendors from debugging one problem because all sorts of new problems suddenly got injected.

Suppose you support Program X, and you get a report of a security vulnerability in your program. You run the program under the debugger, and when you run the alleged exploit code, you find that the program doesn't behave the same as it does when the debugger is not attached. Some compatibility workaround that was active when your program is run normally is being suppressed, and the change in behavior changes your program enough that the alleged security exploit doesn't behave quite the same.

When run outside the debugger, the program crashes, but when run under the debugger, the program displays a strange error message but manages to keep from crashing. Congratulations, you introduced a Heisenbug.

And then you say, "There's something wrong with the debugger. It must be a bug in Windows."

Pre-emptive Yuhong Bao comment: The heap manager switches to an alternate algorithm if it detects a debugger, and the CloseHandle function raises an exception if running under the debugger.

Comments (22)
  1. Marquess says:

    My first reaction was "Why should you?" After all, as a certain Raymond Chen once blogged: Do not change program semantics in the debug build. (Which is pretty much the same issue)

  2. Ken Hagan says:

    @Marquess: Because if you don’t, you end up with an eco-system stuffed full of programs that require workarounds, making Windows slow and bloated, and programmers who never find out that they have bugs they ought to fix.

    The point about using the debugger as a switch is that the affected people are the original guilty party AND in a position to do something about it.

    For similar reasons, one might suggest that all app-compat workarounds should squawk through OutputDebugString each time they are used. Again, this will really only be an issue for the very people who might be able to do something about it.

  3. DriverDude says:

    Aren’t compatibility workarounds created after the buggy app shipped? Which means the people in the best position to do fix it – the original author or company – has essentially stopped supporting or caring about their product. So how does it help to disable the workarounds when a debugger is running?

  4. Marquess says:

    Right. Not only are app-compat workarounds only for programs only in the wild already, and only targetting a specific version. Patch it, and you better patch the app-compat issue as well.

  5. Duran says:

    I’m totally with you.  It should be an exceedingly rare event that program behavior should be changed under a debugger.  It should raise eyebrows and provoke discussion.  

  6. pete.d says:

    Sorry Microsoft (and Raymond).  As you already know, you’re in a no-win situation.  Fine-tune Windows to accommodate all the crappy software that’s been written and is out there being used, and you wind up with bloat.  Everyone complains that Windows is slow and bloated.

    On the other hand, fail to accommodate all the crappy software that’s been written and is out there being used, and you wind up with applications running on Windows that are crappy and don’t work correctly.  But no one will believe that it’s the fault of the crappy software.  They still say it’s Windows that’s doing screwy things.

    Bonus unfair criticism: try to point out that it’s the fault of the crappy software and not Windows, the Windows haters will all say "well, it’s still Windows’ fault for having a hard-to-use API".  As if Cocoa is some walk in the park.

    On the bright side, bloat is harder and harder to notice, as computers get faster and faster (okay, so CPU speeds have fallen away in favor of multi-core…but one can see similar gains anyway) and memory and storage is cheaper and cheaper.  The compatibility-hack approach probably is the right "you lose" choice to make, out of an impossible situation.  :)

  7. osexpert says:

    "These checks are not cheap, because the compatibility infrastructure needs to look up the currently-running application…" Why isnt’t this check performed once on startup based on file name, hash, version etc.?

    And are really compat hacks added directly in production code like that?? Wouldn’t it be possible/better to link/forward hacked methods to some hack dll eg. kernel32.hacks.dll (detours)?

  8. osexpert says:

    If you read this article it claims windows does exactly what I suggested:

    http://msdn.microsoft.com/en-us/library/bb432182%28VS.85%29.aspx

    “With Appfix, hooks are installed for APIs called by the components of the application. These hooks point to stub functions that can be called instead of the system functions (also known as shimming).”

    [For app-specific fixes that break other apps, then yes that’s what happens. But if a compatibility fix causes no harm to other apps, then there’s no point adding the switch statement. -Raymond]
  9. Alexandre Grigoriev says:

    I’ll repeat my suggestion: only expose new features to the executables that are marked with the current OS version, while disabling kludge-type workarounds for such processes. What that means for an ISV is: if you want these new goodies, get your stuff together and fix the shit. If an ISV can’t be bothered fixing it, or is defunct, then that software should not have access to any new GUI look, or kernel functionality. That will also reduce compatibility issues.

  10. Gabe says:

    Alexandre: Are you suggesting that I have to recompile every EXE on my box in order to use a shell extension that uses new kernel functionality? That’s insane! Can you imagine the support hassles?

  11. Mike Dimmick says:

    Darn! meant to add: Alexandre’s plan falls apart in the plug-in model. If a new version of IE is marked with the current OS version, you should not disable ‘workarounds’ that target problems in Flash Player.

  12. Mike Dimmick says:

    Gabe: or even recompile every shell extension in order to use Windows Explorer.

  13. Dale says:

    Larry Osterman raised the case of the debug heap being different, and causing a heisenberg bug, here:

    http://blogs.msdn.com/larryosterman/archive/2008/09/03/anatomy-of-a-heisenbug.aspx

  14. MItaly says:

    Exactly like the heisenbug I’m after in these days. I must thank you, Raymond and Dale (and, indirectly, Yuhong Bao :P ), for that information: I was exactly wondering why an app run smoothly in the debug build, and in the release build when started by the debugger but crashes when run normally.

  15. Nathan Tuggy says:

    It occurs to me that you know you’re being a persistent annoyance if Raymond Chen starts aiming his Pre-emptive Comments at you… spare me from this fate!

  16. Tuesday says:

    Erm… Don’t reinvent the wheel with every new OS version? Just leave "old" things alone and implement glasses and shakes as new interfaces? These workarounds wouldn’t be necessary if the OS hadn’t changed. So who is to blame?

  17. Mike Dimmick says:

    Alexandre, as Raymond *just explained*, the code path for the workaround is shorter than testing whether to apply the workaround.

    I wouldn’t necessarily call it a workaround, in fact. It’s really tightening up the specification of the function; in Raymond’s example the function goes from returning 0 or WS_MAXIMIZED to returning 0 or (whatever the compiler chooses for boolean ‘true’).

    Today I found that in implementing a new feature, I could finally add a feature to our porting library (it implements an obsolete interface from DOS-based devices on Windows CE). In doing so I discovered that various applications we’ve shipped that only used the CE version have misused that API, so I had to code it so that the behaviour would be reasonable for those apps.

  18. Drak says:

    @Tuesday: Ah, so you advocate leaving bugs in the OS that you know about, and can fix with a one-liner (Raymonds first function)? Let’s never bugfix anything anymore :P

    @Mike Dimmick: Raymonds fix actually returns TRUE or FALSE, not TRUE or 0 :)

  19. Dhericean says:

    I know it’s a bit of an aside but I felt I needed to share a very strange Heisenberg moment I had (almost 25 years ago).

    I was a postgrad solid state chemist, building my own computer-controlled Raman Spectrometer.  The spectrum was scanned by driving a stepper motor which turned a holographic grating.  We had bought the controller boards for the motor and were communicating with them through an IEEE-488 interface from APL (the GPIB drivers were the only non-trivial assembler I ever wrote and are a story on their own >.< ).

    The computer was communicating to the stepper motor controller fine, retrieving status and setting values, but the motor was not moving.  So I got an oscillosope and put it across the outputs to check that the correct wave form was being produced.  However when I put the oscilloscope on, the motor started to move; and when I took it off, it stopped again.  As a non-electronic engineer I was left trying to wonder how I was going to explain that I needed an oscilloscope permanently connected to my spectrometer.

    A more experienced hand diagnosed a floating ground on the low voltage side of the transformer (the wiring diagram provided did not show it being commoned) which was being grounded through the oscilloscope, and managed to rescue the instrument.  But for a moment I really wondered if I had proved Heisenberg.

  20. Vilx- says:

    He, I got hit by the heap manager too one day. I was writing a software for a computing olympiad. My program took as a parameter another program (.exe file) and ran it in a controlled environment – well, as much as I could control it. I set limits for memory and CPU time, limited the security tokens, and monitored it continuously if it was sleep()ing too much. I also attached as a debugger in order to provide as much information about a crash as possible. One contest entry then managed to hit this heap manager behaviour. It was completing the task almost instantly when run as standalone, but when run under my controller software it took more than 10 seconds of solid CPU time (the contest time limit was 1 second). Turns out that the program was allocating 10,000 vectors or something and each one of those was a separate memory allocation by the heap manager. Under the debugger this took way more time. :)

  21. fahadsadah says:

    @Alexandre:

    Imagine this was done with Vista. There would be thousands of support calls: "X works with Aero, Y doesn’t". People blame Micros~1 for creating a buggy and inconsistent OS.

    Imagine it was done with 7. There would be thousands of support calls: "X gives previews, Y doesn’t". People blame Micros~1 for creating a buggy and inconsistent OS.

    It’s not as bad if it’s done with API calls only, though

  22. fahadsadah says:

    To the above, apply sed expression s/gives previews/works with Aero Peek/

    OT: Alternatively, the sed expression can be written as: "segives previewseworks with Aero Peeke"

Comments are closed.