The fun days of app-compat testing

Raymond Chen describes his past experience on testing a randomly chosen third-party app with Windows 95. Back then we did not have an app-compat team, and for non-critical applications things were just checked out manually on a voluntary basis.

I remember being part of a similar exercise, back when Windows 2000 shipped. Many of us got assigned all sorts of games and applications. I don't remember what I got, but I found it pretty funny that my manager got Barbie Magic Genie Adventure (if I remember correctly). Well, as members of the Windows Core OS team we had to do all sorts of funny things. But I think he found some USB problems back then, and later, some leaks in the application which got surfaced because we changed the allocation pattern in the runtime library. In the end, we had to accommodate our heap allocator to not make some of these applications crash.

This is one of the weird problems that appear when shipping a platform. To a certain degree, you have to play to accommodate bugs in applications that are using your platform. For example, I remember an app that did not bother to check whether realloc( ) changed the pointer location after a resize. The app simply assumed that realloc resized the buffer. The bug was hidden at the ship time, because the "freed" memory from the old buffer was still unused for a while, so you won't see any AVs. But a different allocation algorithm in the OS (which attempted to change more frequently the location of the pointer) would cause AVs in this application.

The frustrating thing was that the customers are the ones affected in the end. And worse, for them, things look as if it's Microsoft's fault. "The app X worked with Windows 95, now why it AV-ing on Windows 2000?". And such AVs would only force the customer to go back to the previous Windows version, while leaving the actual problem unfixed. I also remember another member of the Windows Core team (Gary Kimura - one of the most senior guys there) being quite blunt whenever he heard that we have to "fix" our allocator to accommodate others.

Today the situation is somewhat different. Starting with XP SP2 and Windows Server 2003 (SP1), the allocator behavior is now intentionally hardened, for one reason - security. Think of it - when you have a buffer overrun you don't want a silent memory corruption. You want an AV as soon as possible. An innovative addition in our memory allocator was to enclose the allocated blocks with certain "guard buffers" which are checked when the memory buffer is freed. If this guard buffer is overwritten, then we just detected a buffer overrun/underrun, and terminate the application. If the application had some hidden bugs which relied on old allocator behavior, then it can be instructed to run in compatibility mode.