Bug Psychology

Fixing bugs is hard.

For the purposes of this posting, I’m talking about those really “crisp” bugs -- those flaws which are entirely due to a failure on the developer’s part to correctly implement some mechanistic calculation or ensure some postcondition is met. I’m not talking about oops, we just found out that the product name sounds like a rude word in Urdu, or the specification wasn’t quite right so we changed it or the code wasn’t adequately robust in the face of a buggy caller. I mean those bugs where you were asked to compute some value and you just plain get the result wrong for some valid inputs.

Let me give you an example.

The first bug I ever fixed at Microsoft as a full-time employee was one of those. To understand the context of the bug, start by reading this post from the early days of FAIC, and then come back.

Welcome back, I hope you enjoyed that little trip down memory lane as much as I did.

Now that you understand how a VT_DATE is stored, that explains this bizarre behaviour in VBScript:

print DateDiff("h", #12/31/1899 18:00#, #12/30/1899 6:00#) / 24
print DateDiff("h", #12/31/1899 18:00#, #12/29/1899 6:00#) / 24

This prints –1.5 and –2.5, as you’d expect. There’s a day and a half between 6 AM December 30th and 6 PM December 31st, and two and a half days between the other two dates. This is perfectly understandable. But if you just subtract the dates:

print #12/31/1899 18:00# - #12/30/1899 6:00#
print #12/31/1899 18:00# - #12/29/1899 6:00#

You get 1.5 and 3, not 1.5 and 2.5. Because of the bizarre date format that VT_DATE chooses, when you convert dates to numbers, you cannot safely subtract them if they straddle the magic zero date. That’s why you need the helpful “DateDiff”, “DateAdd” and so on, methods.

The bug I was assigned was that testing had discovered a particular pair of dates which DateDiff was not subtracting correctly. I took a look at the source code for one of the helper methods that DateDiff used to do one of the computations it needed along the way. To my fresh-out-of-college eyes it looked something like this:

if (frob(x) > 0 && blarg(y)) return x – y;
else if (frob(x) < blarg(y) && blah_blah(x) > 0 || blah_de_blah_blah_blah(x,y)) return frob(x) – x + y + 1;
else if…

There were seven such cases.

My urge was to dive right in and add an eighth special case that fixed the bug. But my ability to get it right in the face of all this complexity concerned me. It seemed like this was an awfully complicated function already for what it was trying to do.

I researched the history of the code a bit and discovered that in fact variations on this bug had been entered… seven times. Each special case in the code corresponded to a particular bug that had been “fixed”, a term I use guardedly in this case. A great many of those “fixes” had actually introduced new bugs, regressing existing correct behaviour, which then in turn were “fixed” by adding special cases on top of the broken special cases that had been added to “fix” previous bugs.

I decided that this coding horror would end here. I deleted all the code (all seven lines of it! I was bold!) and started over.

Deep breath.

Spec the code requirements first. Then design the code to meet the spec. Then write the code to the design.


  • Input: two doubles representing dates in VT_DATE format.
  • VT_DATE format: signed integer portion of double is number of days since 12/30/1899, unsigned fractional part is portion of day gone by. For example: –1.75 = 12/29/1899, 6 PM.
  • Output: double containing number of days, possibly fractional, between two dates.  Differences due to daylight savings time, and so on, to be ignored.

Design strategy:

  • Problem: Some doubles cannot simply be subtracted because negative dates are not absolute offsets from epoch time
  • Therefore, convert all dates to a more sensible date format which can be simply subtracted.


double DateDiffHelper(double vtdate1, double vtdate2)
  return SensibleDate(vtdate2) – SensibleDate(vtdate1);
double SensibleDate(double vtdate)
  // negative dates like –2.75 mean “go back two days, then forward .75 days”:
  // Transform that into –1.25, meaning “go back 1.25 days”.
  return DatePart(vtdate) + TimePart(vtdate);

I already had helper methods DatePart and TimePart, so I was done. The new code was shorter, far more readable, generated smaller, faster machine code and most important, was clearly correct. No special cases; no bugs.

It’s not that my coworkers were dummies. Far from it. These were smart people. But computer geek psychology is such that it is very easy to narrow-focus on the immediately wrong thing, and try to tweak it until it does the right thing.

When faced with these sorts of “crisp” bugs, I try to restrain myself from diving right in. Rather, I try to psychoanalyze the person – who is, of course, usually my past self – who caused the bug. I ask myself “how was the person who wrote the buggy code fooled into thinking it was correct?” Did they not have a clear specification of what the method was supposed to do? Was it misleading? Did they have a clear plan for how to proceed? If so, where did it go wrong?

If there never was either a spec or a plan, then for all you know the whole thing might only be working by sheer accident. There could be any number of design flaws in the thing that just haven’t come to light yet. Editing such a beast means adding unknown to unknown. which seldom leads to good results. Sometimes coming up with a new spec, a new plan and scrapping an existing bug farm is the best way to proceed.

For many years after that, I would ask how to implement DateDiffHelper as my technical question for fresh-out-of-college candidates that I was interviewing for the scripting dev team. I reasoned that if that was the sort of problem I was given on my first day in the office, then maybe that would be a reasonable question to ask a candidate.

When you ask the same question over and over again, you really get to see the massive difference in aptitude between candidates. I had some candidates who just picked up a marker, wrote a solution straight out on the board, wrote down the test cases they’d use to verify it, mentally ran a few of the tests in their head, and then we’d have another half hour to chat about the weather. And I had some candidates who tried earnestly to write the version using special cases, despite my specifically telling them “you might consider transforming this bad format into something more pleasant to work with”, after they got stuck on the third special case. I’d point out a bug and immediately they’d write down code for another special case, rather than stopping to think about the fact that they’d just written buggy code three times already and told me it was correct three times.

Comments (30)
  1. jcoehoorn says:

    Looks like this has been on your mind a lot recently: it’s basically a longer version of something I saw you write on StackOverflow recently:


    Indeed, that was the inspiration. — Eric

  2. Jonathan says:

    Of course, you are still using cases. It’s just that by focusing on one VT_DATE at a time to translate it into your sensible format, you only have to deal with 2 cases, rather than the many cases that come from the combined properties of 2 VT_DATEs, apparently so many they have not been successfully enumerated.

    But why were you using the horrible VT_DATE format in the first place?

    VBScript uses that format to be compatible with VB. VB uses that format to be compatible with OLE Automation. OLE Automation uses that format to be compatible with Excel. Excel uses that format to be compatible with Lotus 1-2-3. With hindsight, the right thing to do would have been to say “Lotus chose a lousy way to represent dates, let’s write a translation tool that turns their goofy date format into something sensible and industry-standard”. But we did not do so when we had the chance and we’re stuck with it now. — Eric (PS: As a historical note, VB and OLEAUT were owned by the same team, so it is perhaps not quite fair to say that VB does it that way “because of OLEAUT”. They both evolved that behaviour at the same time.)

  3. h t says:

    That’s amazing that the Lotus bug dictated the new Day One, unbelievable! I just read about a 1000 line Switch block the other day. So the case by case approach is very much alive and cultivated.

  4. DRBlaise says:

    Why even have the unnecessary (vtdate >= 0.0) check when everything can fall into 1 case:

    return DatePart(vtdate) + TimePart(vtdate);

    Good point; my way is a potentially premature optimization. I’ll fix it. Though in fact, given that probably 99.99% of dates we see are positive, and given how tight the rest of the code was already, it was actually probably a reasonable optimization to take the early out. I don’t recall the details; this was 13 years ago after all. – Eric

  5. Grant says:

    The Lotus bit reminds me of this story: http://www.astrodigital.org/space/stshorse.html

    Finding a way to replace useful but flawed code seems to be one of Microsoft’s greatest challenges.

  6. configurator says:

    So, do these candidates that do *exactly the same as your smart coworkers* fail immediately or do they have a chance?

    When I specifically tell the candidate “this problem is difficult to solve by cases, consider transforming the data into a better format for arithmetic”, and they keep on trying to solve it by cases, no hire. When I write a bunch of interesting test cases on the board and they write me code that does not pass two of the four test cases, and they tell me it is correct, no hire. Heck, I’ve had candidates who were unable to write the code that computes the absolute value of the difference between two doubles. Those people will not be successful on a compiler team. No hire. — Eric 

  7. Faraz says:

    “we just found out that the product name sounds like a rude word in Urdu”

    with Urdu as my native language, I would love to hear this story.

    Regrettably, there’s no story there. I just made that up as characteristic of the sorts of geopolitical issues we sometimes run into. Usually we don’t have too much difficulty with geopolitical issues on the programming language tools teams, but anyone who, say, displays a map of the world as part of their user interface is required to ensure that the map is carefully designed to not implicitly “take sides” in geopolitical disputes. I’m sure you can think of examples. — Eric

  8. Denis says:

    "…immediately they’d write down code for another special case, rather than stopping to think…"

    …they are just trying to SERVE!!! How many times have we all been told that SOFTWARE DEVELOPMENT IS A SERVICE INDUSTRY? 🙂 Some people just confuse "service" with "subservience" or — worse — "servility". So, I’ve writtent some buggy code?! Sorry, Master! Right away, Master! Whatever you say, Master! Here is another "if-else," Master…

    A sad state of affairs, I’d say… 🙁

    @configurator: I’d probably ask them to 1)relax, 2)shape up a bit, and 3)try again, calmly. Then I’d decide whether they deserve a chance or not…

  9. Stuart Dootson says:

    I’m sure I remember Code Complete containing a warning against the root cause of all these VT_DATE issues – combining two pieces of data into one data item. The VT_DATE format would be better formulated as a tuple or struct containing (days_offset, offset_in_day), but instead those two parts have been shoehorned into a single double, leading to the mistaken assumptions about how arithmetic works with a VT_DATE because its representation is a familiar type.

  10. Ben says:

    Off topic, but any news of Wes Dyer? Very quiet on his blog…

    Wes drops by the C# language design committee meeting every now and then but I haven’t heard much from him either lately. He seemed to be very well when last I saw him though. — Eric

  11. Avi Harush says:

    Great post. I always thought that a large IF is 90% a result of a bad design.

  12. Steve Owens says:

    I really like this line:

      I ask myself "how was the person who wrote the buggy code fooled into thinking it was correct?"

    When you comes across things you wrote that are clearly not working right, especially when someone else is looking over your shoulder, it’s not always easy to admit you goofed.  (OTOH, it does seem to be rather easy to point out that someone else goofed.)

    No matter who goofed, this is a good time to start with some assumptions:  

     – We are all smart people (at least somewhere on the upper end of that continuum)

     – We all really try to do the right thing (i.e. not deliberately mis-coding)

    The comment from @Denis points out how knee-jerk we become when encountering errant code.  When you step back from the quick fix and assume that someone reasonably intelligent was trying their best to do the right thing, you have to say "How did they (I) get fooled?"

    When you can stop and do this, you might find out something.  It could be a bad spec or the need to be compatible with some legacy feature.  Or, it could be they never stopped to analyze the underlying issue, like you are now doing.

    Either way, you’ll be ahead of the game if you ask the question.

  13. TheCPUWizard says:

    Of course the Chevy Chase approach to Jane Curtain during Weekend Update on SNL (in the 1970’s) is an alternative approach to dealing with peoples mistakes [You should hear some of the names I have called myself].

  14. if (frob(x) > 0 && blarg(y)) return x – y;

    else if (frob(x) < blarg(y) && blah_blah(x) > 0 || blah_de_blah_blah_blah(x,y)) return frob(x) – x + y + 1;

    else if…

    Ha ha! So true.

  15. Stefan Wenig says:

    // negative dates like –2.75 mean “go back two days, then forward .75 days”

    Sweet. Did you ever consider making VT_DATE turing-complete?

    Lol. That almost made me spit-take my Diet Dr. Pepper. — Eric


  16. bob says:

    There are bugs for which there are no fixes.  Accept it.  Software is just a flawed thought process made real.  But that doesnt mean all bugs can be fixed.  Because underneath the hood of that bug, there are 700 other bugs.

    Just try and stop whining about other coders.  Fact is software can never ever be bug free. Ever.  


  17. Исправлять баги трудно. В контексте этой статьи я говорю о достаточно «явных» багах – тех изъянах, которые

  18. Joren says:


    Most implementations of Hello World are pretty bug free. It’s also perfectly possible to structure your code so that it’s provably correct. Read some of Edsger Dijkstra’s works if you wish to be convinced of that.

    I might be able to agree that sufficiently complex software is astronomically unlikely to be bug free  (for a sufficiently vague definition of ‘sufficiently complex’).

  19. TheCPUWizard says:

    I have to disagree with Bob’s comment of "There are bugs for which there are no fixes".

    There is ALWAYS a fic os some type of for any Bug.

    As an example, (going back to the mid 1980’s) there was one 4GL RDBMS company which provided a new release of their product. Certain functionallity which worked great in the previous version was now crippled. Their response "It is not a bug, it is a feature with a negative performance impact".

    Thus the bug was fixed.

  20. Greg says:

    Essentially, code modification usually can be done to refactor existing code to be  simpler, better or more robust.  Also, improve the inline comments with notes about the cases handled by the code.

    Continual application of this slow 0.01% improvement will turn a large system from a mess into much less of a mess within a year or two.  Six Sigma.

  21. I remember a similar bug from the days of Visual C 4.1 (mid-90s):

    We were coding a healthcare system and old patients, born before 12/30/1899, got their happy birthday greetings etc. on the wrong day when we upgraded from VC 4.0 to VC 4.1. I think it was a datediff() flavored function that had the same bug as you mention VBScript had. It got fixed in VC 4.2.

    BTW, there are so many facets re time and dates and computers: I remember in the 80’s when Microsoft acquired Lattice C and later released their own Microsoft C 5 and later version 6, great products. One of the time functions had a safeguard so that when run on a PC without a hardware clock, it would instead return a date from 1955. And I always wondered if that was Bill Gates birthday…

    Rgrds Henry

  22. Denis says:

    @Steve Owens, I totally agree with your assumptions. What I was trying to point out (not very successfully, a bug to my queue 🙂 ) is that here we have a management problem that results in a dodgy code. Or a corporate culture problem, if you will.

    There are some developers who really are over-sensitive about their code and believe in a sort of Samurai code (as in "code of honour") demanding that whoever let a bug slip should perform a harakiri. But most people are twitchy simply because they are worried that, if they are noticed to have stuffed up a number of times, they will be singled out for the next staff reduction, with all that implies.

    The remedy to that, I believe, is to promote a culture of fixing the problem rather than the blame: find out what’s stuffed up and fix it. In light of that, the question you propose  — "How did they (I) get fooled?" — may be rephrased into "How did it get stuffed up?" WHO exactly goofed is less of an issue here, because  — and this is an additional assumption to those you propose — EVERYONE can, and DOES, get goofed, stuffs up, or whatever is your local dialect expression for it. 🙂

    A goog team leader/manager should be able to drive home the idea that there is no shame in admitting a mistake and putting your back into fixing it; persisting in your erroneous ways is much, MUCH more pathetic, whether it is done out of misplaced pride, or for political (politi-corporate, I mean) reasons of never, ever, wanting to be seen as someone capable of making a mistake.

    And the way to achieve that is to lead by example, I recon…

  23. Gabe says:

    Grant, as interesting as the story is about how the standard railway gauge is derived from ancient Roman standards, it’s purely made-up. http://en.wikipedia.org/wiki/Standard_gauge says why: there were many gauges used in the old days of railroads. Since old railways (often used to deliver coal from mines) were horse-driven, it’s not surprising that the gauges were mostly around 5 feet, about the width of two horses. Therefore it shouldn’t be surprising that ancient Roman wheel ruts were also about the width of two horses. The similarities are only a coincidence.

    The current standard gauge only came about as a convergence of many different standards, not because of a direct lineage to Roman chariots.

  24. Tanveer Badar says:

    Interesting read. I was going to ask the same question as Faraz asked. 🙂

  25. Welcome to the 49th edition of Community Convergence. The big excitment of late has been the recent release

  26. Jim Lyon says:

    My favorite maxim to apply in situations like these is that, by the time you’re done, the code should look like the author knew what he was doing from the beginning.

  27. Rich Kuzsma says:

    Consider the product out there that depends on the original, buggy implementation. Your crisp, clean solution breaks backwards compatibility.

  28. argatxa says:

    @Denis.   when fixing bugs, it is very helpful to know who did it so you know the style of thinking of whoever do it.    

    Trying to figure out why somebody was doing something on a certain way it’s way more difficult if you don’t have background info (like "this guy always follows the happy path, be way of uncheked previous errors")

    The "How did he/she get fooled on this" is easier to answer then.

  29. remrow says:

    Thanks for this post. I the add date and diffdate are really useful.

  30. Kemp says:

    Gabe and Grant, just to ruin the old classic a bit more… Not only does the gauge only have a very vague relation to Roman chariots, the parts of the shuttle were never transported through a tunnel like that described in the myth. Who comes up with these things?

Comments are closed.

Skip to main content