Sometimes the bug isn’t apparent until late in the game


I didn't debug it personally, but I know the people who did. During Windows XP development, a bug arrived on a computer game that crashed only after you got to one of the higher levels.

After many saved and restored games, the problem was finally identified.

The program does its video work in an offscreen buffer and transfers it to the screen when it's done. When it draws text with a shadow, it first draws the text in black, offset down one and right one pixel, then draws it again in the foreground color.

So far so good.

Except that it didn't check whether moving down and right one pixel was going to go beyond the end of the screen buffer.

That's why it took until one of the higher levels before the bug manifested itself. Not until then did you accomplish a mission whose name contained a lowercase letter with a descender! Shifting the descender down one pixel caused the bottom row of pixels in the character to extend past the video buffer and start corrupting memory.

Once the problem was identified, fixing it was comparatively easy. The application compatibility team has a bag of tricks, and one of them is called "HeapPadAllocation". This particular compatibility fix adds padding to every heap allocation so that when a program overruns a heap buffer, all that gets corrupted is the padding. Enable that fix for the bad program (specifying the amount of padding necessary, in this case, one row's worth of pixels), and run through the game again. No crash this time.

What made this interesting to me was that you had to play the game for hours before the bug finally surfaced.

Comments (42)
  1. Richard says:

    It amazes me sometimes that Microsoft put so much time and money into, essentially, fixing bugs in other people’s software. I can appreciate the business logic, but doesn’t it get infuriating after a while?

  2. AC says:

    I assume the company who made the game died? If it did not, you could just send them a bug report and be over with it.

  3. Rob says:

    You mentioned that the application compatibility team has a bag of tricks. What other tricks do they often use?

  4. Raymond Chen says:

    AC: The company is still in business, but the game market is very different from the productivity market. For most games, the game comes out, it sells for three months, maybe six if it’s lucky, and then it’s over. I remember calling these companies to report their bugs and they simply didn’t care. It’s possible that they didn’t even have the source code any more.

    Rob: Check out the Application Compatibility Toolkit.

  5. Luc Cluitmans says:

    About ‘HeapPadAllocation’: is that ‘trick’ from the ‘bag’ available to us, mere mortals, too?

    It sounds like a tool that could be useful in a weird bug I encountered recently, involving P/Invoke code to access a DLL written in C. The problem was only visible in Release builds, and only visible on some of the machines I tested it on. However, to be useful it would need to work with the allocator used by the P/invoked DLL (I have the DLL’s source code), but also on the allocations performed by the .NET framework marshalling code (obviously I don’t have the source code for that). Or are there other tools available for debugging Marshaller allocations?

  6. Merle says:

    Not completely relevant, but that’s actually a good technique for discouraging cracking.

    Accept about 100 times the number of registration codes than are really valid (using gray codes and the like to ensure that the actual good codes don’t get mistyped as bogus "good" codes).

    But very late into the game, do additional registration checks.. and slowly start corrupting things, making the game harder, hiding necessary components, etc.

    And no kidding about the lifespan of games. Unless it’s a suscription based game, they stop caring about bugs after mere months, if that long. By then they’ve sold all the copies to hardcore gamers that they can, and the reviews are all out. If it was a great seller, just put out a sequel, don’t patch the original.

    It was not that way even a decade ago, although the signs were there.

  7. Brendan says:

    I’ve got good news and bad news Merle, the concept you described has already been invented and used.

    I believe the technology was called Fade, in short, if the game suspected it was pirated or cracked, the usability of the game would diminish over time. Enemies would become stronger, your weapons weaker and more inaccurate as well as the over all intelligence of characters would plummet.

    The rate of decay would be such where the player would become hooked on the game and unable to complete it with out purchasing a legitimate copy of it. Effectively turning the pirated game into a long demo.

  8. Mike says:

    On a semi-related note, I recently discovered that a lot of my old PC games (circa win95) won’t install on my win2000 laptop. I can only guess this is something to do with copy-protection…I worked on one of the titles so I know it used Laserlok, a particularly flaky protection system that involved manufacturing borderline-broken CDs.

    I’d imagine that systems like this would fall apart on newer hardware/OSes, do you guys make any attempt to crutch them up?

  9. Raymond Chen says:

    Seems the rate of people asking questions that have already been answered is on the upswing. http://weblogs.asp.net/oldnewthing/archive/2004/06/04/148427.aspx

  10. Merle says:

    Brendan: oh, I know. Sorry; I wasn’t trying to present it as a "new" idea.

    Did not know about Fade per se, but I’ve seen it done in a lot of smaller games. The real trick is not to annoy potential customers: if they play a cracked copy and decide "this game isn’t worth it", they might not go out and buy it.

    If they are one of those people who buys games after trying them. Sadly, it’s a small subset…

  11. Peter says:

    Hey it’s not that bad job at all, as far as you enjoy playing the game :)

    BTW doesn’t such debugging violate the common "do not dare to reverse engineer, disassemble or even start the software" license?

  12. Adrian says:

    If the bottom right pixel of the descender was in the bottom right pixel of the screen, then your extra allocation would have to be one scanline plus one more pixel to keep the drop-shadow from overrunning the buffer.

  13. Dhericean says:

    Mike:

    I find that Nt Compatible (http://www.ntcompatible.com/) is a good place for getting information and help on running old games. A particular fun one was Thief on Win2K which required a parameter to force and install as it saw NT and decided it didn’t have a high enough version of DirectX.

  14. Why it took hours to find the bug, was it a case of hard bug to nail down even if you get a proper test case or it took hours to get to the level?

    The games didn’t had cheats to reach the top levels or get high score?

    Don’t you have HeapCheck there?

    Sound to me like an urban legend.

  15. Raymond Chen says:

    It took hours to get to that level, and then more hours of investigation to figure out what was going on.

    Remember, it’s easy to figure out the answer once you already know the answer.

    You can have all the tools in the world but until you know which one to use you’re not going to get anywhere. Sure you can turn on heap checking, but first you have to realize that the problem was a heap overflow.

    And even once you figure that out, you need to determine what’s causing the overrun. Maybe the overrun was caused by a messing timing problem. Maybe because some API is returning failure when it used to return success. And then you have to figure out what the correct fix is. What is the correct amount of padding?

  16. Ben Cooke says:

    I really wish application and game authors wouldn’t "probe by version". By this, I mean looking for a specific version of something to determine if your product can run.

    It’s far better in many cases to "probe by capability"; Does this system have XYZ DLL and does that DLL have this function that I need?

    Of course, in order to do this you need to be aware of what key functions indicate the presence of the API you need, but with Microsoft’s policy of backward compability this should theoretically be the best approach since the old APIs aren’t going to go away.

    I’ve had a few different apps which would refuse to install on my Windows 2000 system because they "require Windows XP", but the component from Windows XP that it needs is actually installed on my Windows 2000 system from an update from Microsoft. Microsoft itself is guilty of this, in fact, but I suspect many times when this is done it is for marketing rather than for technical reasons.

  17. Scott says:

    Raymond, I think you’ve just reached the point at which the number of back entries is too long for an idle visitor to read.

  18. Raymond Chen says:

    There are dangers to "probe by capability"; I’ll add it to the topics list.

    If someone asks a question that I’ve already answered, should I answer it again? Or just hope that they find the old answer?

  19. Stuart says:

    I am pretty certian that MS does deliberately stop applications from being installed on old versions of windows, even when they are perfectly compatible. One example is Media Player 7. If you can find the original version 7.0 (oldversion.com) then it WILL install on Windows 95, as long as you have IE 5, it does however display a warning about compatibility. 7.1 refuses to install on Win95. I suspect this was down to marketing, especailly as 7.0 is no longer available from microsoft. Other culprits I suspect are MSN Messenger 6 and probably IE 6.

  20. Tony Cox [MS] says:

    Games very frequently do version probes on drivers to figure out what to do. And with good reason. The problem is that drivers are buggy, and they can be buggy in very subtle ways that aren’t immediately apparent. 3D graphics drivers are particularly prone to this, because 3D graphics is so complex.

    Now, you might say that if the driver is buggy, that’s the driver’s problem. The software should just let the driver misbehave (crash, render the wrong thing, whatever), and that will act as a forcing function for the vendor to fix their driver. This is a nice theory, with two main drawbacks:

    Firstly, driver problems can often just look the application problems. If the driver goofs up rendering your title, but several other titles look fine, even if the problem really is a bug in the driver, people will blame your software.

    Secondly, the vendors (at least the major ones) do fix their drivers! The problem is that users don’t install the fixes. This is especially true for more family-oriented software designed to run on older machines. Not all of the customers have internet access. Many of them are terrified of messing with anything on their computer.

    So, instead of having dissatisfied customers, we (the games studios) just detect known bad driver versions, and work around the bugs. It doesn’t really make me feel very good, but it keeps our customers happy.

  21. Pavel Lebedinsky says:

    Or are there other tools available for debugging Marshaller allocations?

    Yes. The best tool for debugging p/invoke marshalling problems is full pageheap. It is actually the reverse of HeapPadAllocation – instead of ignoring heap overruns, it tries to catch them as soon as they happen.

    For details, see http://www.osr.com/ddk/ddtools/gflags_4n77.htm, "Enable full page heap verification" topic.

    You can get the gflags.exe tool mentioned in this article here: http://www.microsoft.com/whdc/DevTools/Debugging/default.mspx

    You can also google for "customer debug probes" – this is a .NET equivalent of AppVerifier and I think they have some pinvoke related checks too.

  22. rolofft says:

    This sounds like the one-pixel bug in <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0385508603/102-4446662-8907327?v=glance">The Bug</a> that ended up killing the programmer.

  23. Barry says:

    Hmm, not sure what game that might be, but Rainbow Islands on the original Playstation: get all the diamonds in order on the first four worlds (plus the ‘bonus diamond’ at end of level), get onto world five kill the first bad guy on the first level, and turn him into a red diamond, bobs-yer-uncle (or Robert is your mother’s brother, if you prefer), and your Playstation crashes. Takes about 40 minutes. Very irritating.

  24. Merle says:

    There could also be bugs that only surface after hours of continuous play that might not exist if you simply restore a saved game.

    Something on level 2 corrupts a bit of code that nothing touches until level 18… but if you restored the game, you would not have played through level 2, soo…

  25. James Summerlin says:

    Raymond,

    Do not waste your precious time answering old questions that are being asked again unless, in your incredible judgement, you deem it worthy.

    Otherwise, everyone else can do what I do with your kick – ass blog: Go through and read it all.

    James

  26. Merle says:

    Stuart: good to know about Media Player, thanks. I’ve been stuck on 6.4 on my w95 box for ages now, unable to play any .wmv files. (of course, 7.0 might not do that either, but…)

    Agreed on "probe by version". Hey, I just upgraded a SQL Server 7 to 2000, and it forcibly installed an *older* version of MDAC (2.6). Very annoying.

  27. I’m an avid reader, but not an avid poster, that being said…

    Until Scott provides .Text with a search (or index) feature, it’s nearly impossible to find relevant information on someone’s blog. You have 522 posts on your blog…that’s a lot of content to just simply sit down and read for an answer to a question. I’ve read a good portion of your blog, but would be hard pressed to simply know which post to read for an answer to a question that’s already been covered. I appreciate it when you provide links to previous questions as it may point to content I haven’t read before. At the very least, it may be a refresher on something I forgot I read. Also, as the author of this blog, you have a better feel for what questions you’ve already answered in the past. Just my .000002.

    Thanks,

    jayson

  28. J. Edward Sanchez says:

    This blog is indexed by Google, so you can search it using something like this:

    http://www.google.com/search?q=site%3Aweblogs.asp.net+%22Raymond+Chen%22+%7Egame+%7Einstall

    The first hit returned by the above query is the page Raymond linked to.

    In case anyone is wondering, the tilde operator is used to specify that synonyms are acceptable. For more information on advanced Google searches, see:

    http://www.google.com/help/refinesearch.html

  29. Ryan says:

    "I am pretty certian that MS does deliberately stop applications from being installed on old versions of windows, even when they are perfectly compatible."

    Don’t forget sometimes the test teams just say "no." Shipping in 9 languages, 2 OS’s, and 4 deployment configs creates a matrix… Test automation helps here but testers can also make the mistake (Just like some devs) of writing the test code on Win2003 and finding out it doesn’t work on Win2K because some feature wasn’t there or was buggy.

  30. Anonymous Coward says:

    Somewhat back on topic, I have this amusing issue with a game. I like to play Rise of Nations (published by Microsoft) on my Windows XP Professional (by Microsoft). I recently reinstalled XP and ever since then Rise likes to occassionally crash. It dutifully does all that crash reporting stuff (to Microsoft). However about 90% of the time the game is left in full screen mode so I can’t interact with the desktop or the task manager (by Microsoft :-). I have a dual monitor setup so I can still interact with Windows on the second monitor, but almost all dialogs etc come up on the first monitor where I can’t interact with them. Usually I have to resort to the reset button even though Windows is working fine.

    Some workarounds are to run task manager before starting the game and ensure it is on the second screen and close. It also seems like hibernating the machine clears the full screen game window. Sometimes Windows Key – L works and I can switch user and kill stuff from there. It doesn’t really help that the games going fullscreen do all sorts of funky stuff to the display sizes and where windows remember they were.

  31. Mack says:

    Oooohh, fullscreen games are teh fun! Especially when:

    * the game crashes and you need the task manager (I have only one monitor…)

    * you play in a LAN and your personal firewall wants confirmation that the game is allowed to open a port

    * you forgot to close ICQ

  32. James Summerlin says:

    Blizzard games are best. The quality is high and the support their games long after it is out for 3 months.

    James

  33. KG says:

    Now we know why microsoft developers don’t get enough time to fix their own bugs :-)

    Just curious. Do you really have to go so far to ensure compatibility with poorly written apps, especially when the particular bug surfaces after almost what? an hour? Despite your best attempts at making third-party software run bug free, there could other things in the game(unreported bugs) that could spoil the end-user experience. So, are you folks going to take the responsibility for making them work perfectly? By doing all these, aren’t you encouraging 3rd party developers get away with their shoddy work? Most important of all, shouldn’t your resources be best put to use for solving your own bugs? Otherwise, other developers will have do the same thing – fix Microsoft’s Bug instead of their own :-)

  34. Norman Diamond says:

    9/14/2004 12:32 AM KG

    > Otherwise, other developers will have do the

    > same thing – fix Microsoft’s Bug instead of

    > their own :-)

    Yeah we have to but we can’t, because of something we’re missing.

    Actually some of us can but I don’t think I can any more, it’s been decades since I did that kind of disassembling.

  35. Joe [MS] says:

    I remember a app we made work on W2K…it was in the COM code. to activate a COM object, you get the class factory, passing in a CLSID. Then you call CreateInstance() on the returned IClassFactory. One app kept crashing. Without symbols or source, we debugged it and found out that what this app was doing was storing the address of the stack variable (the clsid) in a global and then referencing it when ::CreateInstance() was called, assuming that the CLSID hadn’t fallen out of scope of in the caller’s stack. We had reworked how CoCreateInstance() works from NT4 to W2K and this assumption was no longer valid.

    Yes, we made the app work.

  36. Joshua says:

    It’s delightful to read about all these DLL issues, long considered solved problems by the rest of the software world. Ha! DLLs! Adding heap padding to everything! That’s the kind of thing we did in school when a final project would do, hoping the instructor wouldn’t notice — or in windows, I guess.

    Nice blog, though. Love the spam graph and the gory microsoft-programming details.

  37. Jouni Osmala says:

    Duh. Your blog makes me sad.

    I’ve been using linux since I got well over 6 blue screen per day for a week in y2k . Now after reading your blog MY wishfull thinking of wine becoming fully compatible with all the windows software has become a just that wishfull thinking, no hope anymore. Think about hunting those bugs without access to your tools, nor windows sources of a version that works…

    2ND thing.

    What I’ve realized that good programming platform outright refuses every attempt off accessing anything that its supposedly NOT accepted to access. Fortunately for microsoft, windows didn’t do it from the beginning. Why fortunately? Well it makes opensource windows clone outright impossible to make compatible with all the software on microsoft platform.

  38. Jim says:

    If you are (as I think you must be) an open source enthusiast then this kind of thing isn’t so bad to hear – it at least puts forward the argument for having source code avaliable – it would have been better to fix the game than made a special case in the OS.

  39. As the smell of smoke has permeated my appartment as a result of the fire on Mount Coot-tha just down the road, I can’t sleep. As a result, I have elected to use my time more efficiently and have responded to some emails and finally got a chance to do some reading. Thanks to Westy for a series of links that have proved most informative. Firstly, the research conducted on internet deprivation was fascinating for its identification of the reliance many now place on these technologies. *looks at self* Hmm. A few years back I might have felt a tad reliant on internet connectivity. Lately though I think I could well go without it for a fortnight. I’d just hate to come back to my email!!! And this link to a discussion of bug fixing for Microsoft operating systems explains just how hard it is to find all the various nuances of program functionality in updating system activities. At least it’s good to know there are those out there who are dedicated enough to gaming to fix some problems! :) And finally this link to…

Comments are closed.