Dr. Watson: Please send in your error report

I bet you’ve seen dialogs similar to this:

We at Microsoft refer to this dialog/technology as Dr. Watson (after the famous Sherlock Holmes assistant).

Before Watson, computer users would get the BSOD (The Blue Screen of Death) which would just say that an error occurred and that Windows would have to shut down. Microsoft had no idea how many times these occurred or how to solve them.

When you choose to send Watson errors to Microsoft, a small amount of information is sent, from which we can (hopefully) figure out the cause of the problem and possibly even point the user to a solution. This information includes things like what program was running, what modules (DLLs, or Dynamic Link Libraries) were loaded for that process, and the contents of the CPU registers.

If the error occurs in a third party product, we can forward that information to the third party.

I’ve received several Watson dialogs. After I choose Send, many times I’ve been pointed to a web site with updated device drivers that can cure the problem.

Windows XP, and in fact all versions of Windows since 3.11 run applications in protected address mode. This means that an errant application (or program) cannot disturb the memory of another running application. These applications run in what’s called User Mode. The processor enforces this application protection.

However, certain programs run in a non-protected mode called Kernel Mode, in which that program can do nasty things like corrupt the memory of other running programs. Kernel Mode also allows the program to access hardware directly and do other operations that User mode programs are not allowed to do. (one of my old computers had a HALT instruction… nasty nasty)

This separation between User mode and Kernel mode made operating systems much more resilient from errant applications. However, programs that are Device Drivers run in Kernel Mode, and thus have full power over the memory of the machine. If an error occurs in a Kernel mode application, it’s likely that the machine needs to shut down to prevent any additional damage. If it’s in User Mode, then just closing that process is sufficient to resume.

I attended a talk by BillG a while ago in which he said that something like 80% of all Watson reported errors were due to device driver issues, particular video drivers. This implies that the errors were due to third party device driver authors, but perhaps it also means that it’s not easy to write a device driver that is bug free.

In the old days, a user could run only one single program at a time. I could run a word processor. If I wanted to run another program, I would exit the word processor and start another program. These monolithic programs had full control of the entire computer: any error would be the fault of the program’s author. However, this also meant that the author would have to handle all printers, input devices (like mice), output to the screen, reading and writing files on the disk. Operating Systems were developed to help programs handle much of these tasks. They also helped unify the way things were done on the computer: if all programs wrote to the disk/printer/screen in the same way and handled the keyboard/mouse/pen the same way, then the user would get a consistent user interface.

I wrote a cartoon animation program around 1982 on an original 4.77 MHz IBM PC (it’s reincarnated in Visual FoxPro: start the Task Pane, Solution Sample, Forms, Form Graphics, Line Animation). You could draw a single picture, save it, draw another picture, save it, then choose to animate, which would morph one picture to the other using interpolation.

I wanted to add a mouse interface to the program, so I bought a product called Visi-On from Visicalc, which had a mouse. I had to decode the signals that were sent by the mouse using an oscilloscope to figure out how it communicated to the PC. Then I had to write my own pull-down mouse menus and define the mouse behavior.

Try this: click down, then release on any menu from any application in Windows. Now move the mouse around the screen away the menu. Should the menu disappear when the mouse goes off the menu? Try clicking down on a menu item. Don’t release the mouse. Should the menu item be chosen? What if you then drag the mouse off the item before you release?

Nowadays, running a typical application can have dozens of programs loaded in its process space. For example, I just started Microsoft Word. I have Visual Studio installed, so I can hit bring up Task Manager, right click on the WinWord process, choose Debug. VS starts up and I can choose Debug->Windows->Modules to see that there are over 100 modules loaded!

The interactions, interfaces and assumptions made between these modules are not trivial. With multiple authors and possibly ambiguous specifications, the complexity skyrockets.

Sending in these errors is very beneficial to software users and Microsoft. Given these error reports, we can see actual computer problems experienced by the user and fix real world issues.