The danger of making the chk build stricter is that nobody will run it


Our old pal Norman Diamond suggested that Windows should go beyond merely detecting dubious behavior on debug builds and should kill the application when it misbehaves.

The thing is, if you make an operating system so strict that the slightest misstep results in lightning bolts from the sky, then nobody would run it.

Back in the days of 16-bit Windows, as today, there were two builds, the so-called retail build, which had assertions disabled, and the so-called debug build, which had assertions enabled and broke into the debugger if an application did something suspicious. (This is similar to today's terms checked and free.)

Now, the Windows development team is big on self-hosting. After all, if you are writing the operating system, you should be running it, too. What's more, it was common to self-host the debug version of the operating system, since that's the one with the extra checks and assertions that help you flush out the bugs.

As it happens, the defect tracking system we used back in the day triggered a lot of these assertions. As I recall, refreshing a query resulted in about 50 parameter validation errors caught and reported by Windows. This made using the defect tracking system very cumbersome because you had to babysit the debugger and hit "i" (for ignore) 50 times each time you refreshed a query.

(As I noted in my talk at Reflections|Projections 2009, the great thing about defect tracking systems is that you will hate every single one you use. Sure, the new defect tracking system may have some new features and be easier to use and run faster, but all that does is delay the point at which you begin hating it.)

If Windows had taken the stance that the slightest error resulted in the death of the application, then it would have been impossible for a member of the Windows development team to run the defect tracking system program itself, because once it hit the first of those 50 parameter validation error reports, the program would have been killed, and the defect tracking system would have been rendered useless.

Remember, don't change program semantics in the debug build. That just creates Heisenbugs.

I remember that at one point the Windows team asked the people who supported the defect tracking system, "Hey, your program has a lot of problems that are being reported by the Windows debug build. Can you take a look at it?"

The response from the defect tracking system support team was somewhat ironic: "Sorry, we don't support running the defect tracking system on a debug build of Windows. We found that the debug version of Windows breaks into the debugger too much."

Comments (31)
  1. Ian says:

    This is a variation on this law: If you want to make people do the right thing then you have to make it easier to do the right thing than the wrong thing.

  2. Thomas says:

    Wait. Why didn't anyone (possibly higher up) exert some pressure on them to support the debug build?

  3. Ben Voigt [Visual C++ MVP] says:

    If programs which triggered assertions in the checked build were denied Windows Logo status, then Windows Logo might actually be a benefit to consumers, and running the checked build for day-to-day development might not be so painful.

    [You'd be surprised how angry ISVs get when the logo requirements are made stricter. Make it too strict, and nobody will bother getting logo status. -Raymond]
  4. Gabe says:

    I don't understand what problem killing the misbehaving program would solve in the first place. How would you debug the program that got killed?

  5. Joshua says:

    [You'd be surprised how angry ISVs get when the logo requirements are made stricter. Make it too strict, and nobody will bother getting logo status. -Raymond]

    You got that right. Between UAC and MSI we see no reason to try anymore. UAC was fixed years later by not installing to Program Files. MSI cannot be fixed until MSI is made re-entrant, by this I mean invoke EXE that invokes MSI from inside MSI.

  6. benjamin says:

    Maybe I'm missing something here, but if the Defect Tracking system was throwing asserts, isn't the solution to, you know, make it stop doing that?

    Preferably by fixing the suspicious operations?

    [See final sentence. -Raymond]
  7. Thomas says:

    UAC was fixed years later by not installing to Program Files.

    This sound like your program has a serious limited-user-account bug.

  8. Crescens2k says:

    This sound like your program has a serious limited-user-account bug.

    Well, this is mainly down to Microsoft suddenly asserting the limited user account like they did. It wasn't too unexpected because people had gotten too used to running with Admin rights most of the time, but also if they had gradually introduced this kind of thing then most people wouldn't have started to use it anyway.

  9. David Walker says:

    About dogfooding: There's a philosophical argument in favor of making things as painful for employees of the company that produces the produvt or service.

    For example, when I worked for an airline way back when, I enjoyed being able to fly almost anywhere, standby, for pretty cheap (it wasn't free, and it was more than just the taxes on the airfare).  Of course, if the seats were all full pf paying passengers, you were not allowed on the plane.  And missing work on Monday because you're stuck in a different city was frowned upon by the bosses.

    So we were insulated from the price of airfare.  On the other hand, when I heard that many mid and high-level executives of car companies are pretty much insulated from the pain of mechanical defects, that seems like it takes away the incentives for them to build cards that don't break.  I understand that they are (or were) able to park their cars at work, and have company mechanics work on the cards right there.  No fuss, no scheduling with a garage, no bother to get a ride to work and back when you dropped your car off, no expensive car repairs.  Somehow, I think that car company executives SHOULD have to dogfood the car repair process.

    If the defect tracking system was a commercial product, then it SHOULD be forced to work on a debug build — maybe an edict from Bill G would have done it.  If it was not a commercial product, then … who knows?

  10. David Walker says:

    About logo requirements:  I hate, hate the programs (there are a few) that still want to install themselves into the root of the C Drive, and can't handle blanks in path names.  We have only had blanks in path names for, what, 16 years now…

  11. Joshua says:

    @Thomas: no, it worked just fine as limited user in XP and in Vista with UAC off. We never could isolate the cause.

  12. David Walker says:

    Sorry for the multiple posts… but  arrrgh: "Microsoft suddenly asserting the limited user account like they did".  Well, your company was warned in plenty of time to fix the issue.  If a program can't be bothered to install into Program Files, I probably won't buy it.  What other things does it fail on?  Non-US English operating systems?  Systems that don't have a C drive, or where Windows runs off of the D or E drive?  

  13. JJJ says:

    When I was a new developer, I once took the initiative and ran the product we were developing through Application Verifier and the program crashed.  I wrote up a bug report with a stack trace and all that and asked the senior developer of that module to take a look.  The response was, "Come on, we can't support running under every development tool that Microsoft releases."

    I then took the initiative and debugged and fixed the problem myself.  This experience was one of the defining moments of my early career, when I discovered that senior developers are not necessarily any good at writing software.

  14. metafonzie says:

    Its kind of obvious but I think its kind of cool that Apple has built a reputation amongst its customesr that they can rely on the OS to be "stable" and "correct" and weed out bad apps. If the users of windows had a microsoft controlled positive experience of tested software from the moment they bought the PC the users would sort of get the idea that "Oh I installed application X and my computer started crashing more" rather than "App not working .. probably some weird windows stuff.. blame microsoft". This would transfer the balance of power back to microsoft in forcing ISVs into creating more robust apps as customers now have a reliable signal in detecting crappy applications. Even so, I'm not fully convinced that microsofts management values positive customer experiences over making more money.

  15. Billy O'Neal says:

    @Metafonzie:

    If you think there are no bad apps on OSX, I believe you have lived a sheltered use of OSX. And that's not because of enforcement on the part of the OS. (It's more that OSX postdates Windows by about 10 years and therefore has a little less legacy cruft to carry around — and NO 16 bit cruft to carry around)

  16. jader3rd says:

    This reminds me of something John Robbins said in a class. He couldn't believe that the Exchange team would crash processes for asserts, instead of having a popup and breaking into the debugger. The result of doing that though, is that the Exchange team runs debug builds all of the time, and since the crash data goes to a central repository they can track the issues.

    On the other hand, some people do some crazy double checking/self testing in debug builds, and it does not make sense to run another teams test passes (regularly) ontop of those because everything really slows down. Those should be special "test" builds.

    I prefer crashing on asserts. It gets problems fixed.

  17. Joshua says:

    I second jader3rd.

    If you need to turn assertions off to function you've got problems.

  18. Nick says:

    @Metafonzi:

    I think what you're trying to say, when translated from marketing-speak, is that Apple has trained their users not to expect their programs to function after an OS upgrade.  An Apple user has been trained to say "I guess I better spend the $29.99 to buy the upgrade!"  On the other hand, Windows users expect VisiCalc and SimCity 1 to function on Windows 7.  It's not a case of "blame Microsoft" but rather "tell friends that installing Windows 7 breaks everything."

    Expecting too much backwards-compatibility is silly, but assuming that it's normal for an OS upgrade to break half the applications on your computer is even worse.

  19. frymaster says:

    @Metafonzi

    'I think its kind of cool that Apple has built a reputation amongst its customesr that they can rely on the OS to be "stable"'

    quite the opposite.  Apple are infamous for causing programs to fail when osx is updated, and not only the badly-bahaving ones

  20. 640k says:

    @Thomas: Wait. Why didn't anyone (possibly higher up) exert some pressure on them to support the debug build?

    As I understand, the defect tracking system was integrated in the debug build (self hosted) and it wasn't compatible with itself? Seen as a whole, this system was buggy. You can say it was defect ;)

    And yes, exert pressure is exactly what should have been done. You don't let incompatibilities between two internal apps/systems out the door, or even out of the team. The right solution is to FIX IT. No one should be forced to work with crappy systems where you have to press a button 50 times only to be able to start working.

    [What do you mean "compatible with itself"? The defect tracking system is just an app written by some group outside the Windows division. There's no integration here. -Raymond]
  21. Dave says:

    if they had gradually introduced this kind of thing then most people wouldn't have started to use it anyway

    How would you "gradually introduce" UAC?  Only turn it on on Tuesdays?

    (I'm quite serious here, it'd be interesting to see people's ideas on how you'd phase in something like this).

  22. Worf says:

    Actually, Mac users had the same problems since 1984. The deal was, Apple simply said that what was in the API books was it. What was documented was that. If you need something not in the APIs, you're SOL.

    Of course, this wasn't good enough, so devs started doing the same things Raymond gets headaches about. And Apple has a tendency to muck around with internals. A lot. So the next release (major or minor), boom, those apps break. Just like they do on Windows.

    Except Apple doesn't worry about it, with no enterprise customers to support and regular customers who want the new features and keep their systems relatively updated, forcing devs to either not do those tricks or keep up to date.

    Heck, one known set of hacks (haxies, using the input manager) basically always breaks because they're doing stuff that really messes with the internals.

    On iOS, Apple does static API scans on apps, trying to pre-empt the use of undocumented or private APIs so they would work on the next OS rev. (You can always do runtime links… but really if you're doing that…).

  23. metafonzie says:

    quite the opposite.  Apple are infamous for causing programs to fail when osx is updated, and not only the badly-bahaving ones

    What I meant is that Apple usually doesn't coddle developers and ISVs (i.e. no Developers, Developers, Developers) and thus won't support broken programs. IMO users who go out and buy the upgrade version are a (vocal) minority.

    Except Apple doesn't worry about it, with no enterprise customers to support and regular customers who want the new features and keep their systems relatively updated, forcing devs to either not do those tricks or keep up to date.

    I was speaking about the consumer side of things. Obviously on the enterprise side the OS would never be installed w/o proper testing of LOB apps.

  24. Thomas says:

    What I meant is that Apple usually doesn't coddle developers and ISVs (i.e. no Developers, Developers, Developers) and thus won't support broken programs.

    That might help explain their market share.

  25. Adam V says:

    @David Walker:

    Somehow, I think that car company executives SHOULD have to dogfood the car repair process.

    You do that by charging them for the repair & car rental costs. Obviously it'll be a drop in the bucket for them, but when December rolls around and they see their YTD charges for repairs, they may look at each other and say "This has got to stop. If this is what it's costing us then it's taking a huge chunk out of the pockets of our customers."

  26. 640k says:

    [You'd be surprised how angry ISVs get when the logo requirements are made stricter. Make it too strict, and nobody will bother getting logo status. -Raymond]

    That's because logo certification is currenly used (by ms) as a marketing feature, not as a technical proof of quality. For example, the requirements for vista was ridiculous because it's purpose was to boost vista sales.

  27. Michael says:

    "The defect tracking system is just an app written by some group outside the Windows division. There's no integration here."

    Why is this accepted as okay? Why can't the Windows division fix the bugs in their code? It seems like this post is crying out for either better employees, better integration, or both. Yet I don't see you acknowledging either of these things as necessary.

    [So the Windows team should take one of its developers away from fixing bugs in Windows to go over to the team that is responsible for the defect tracking tool, learn how the prorgam works, and fix a bug in it. I don't know why you're going on and on about integration since this is not integration. It's just application compatibility. -Raymond]
  28. Cheong says:

    Actually, I think a lot of debugger have option for you to select which type of exceptions/errors to ignore. Maybe also if it's from specific process? It's hard to imagine why they don't make any of these features after you teams complain to them.

  29. Neil says:

    Perhaps if they had started developing using checked builds they would have found and fixed the code generating the assertions as it was written, instead of having to go back afterwards.

  30. Gabe says:

    Cheong: Nobody has mentioned a debugger. Raymond was talking about running a defect tracking system (a program used to *report* bugs, not fix them) on a "debug" build of the OS.

    Maybe this shouldn't surprise me at all, but I'm amazed that such a clearly-written article can be so badly misinterpreted by so many people! "Integration"? "Debugger"? Where are these concepts coming from?

  31. Joe says:

    The response of the defect tracking system support team is unbelievably unprofessional. Why not descend to their level? Use shims to replicate the behaviour of the checked version of Windows and reply to any bug queries with an unhelpful "this is an error in your code" response.

Comments are closed.

Skip to main content