Bluescreen Debugging for Dummies: Prologue

That could probably help 90% of the developers at Microsoft, to be honest. Kernel mode debugging is sometimes equated to black magic for devs who spend most of their time in the highly friendly (and deterministic) world of user mode.

An analogy I like is to compare kernel mode to cutthroat corporate life, and user mode to bucolic academic life. In user mode, there are rules that you can’t break, and are enforced from the outside by the faculty (the OS). Breaking the rules doesn’t take down the whole university, only one member. You only hurt yourself.

On the other hand, kernel mode has a set of agreed upon rules, but they’re not nearly as strongly enforced…results are more important after all! You’re expected to follow them, you can cheat if you want (and even get away with it for awhile), but when you screw up, chances are you’ll take down the whole enterprise.

You can’t take anything for granted in kernel mode, because there can be any number of kids playing in the same sandbox. Someone can take your CPU away, take your memory away, dump garbage on your data, and not even call you in the morning. Paradoxically, while it’s more critical here than anywhere else that everyone follows the rules, it’s not in our interest to strongly enforce them. Too much error and behavior checking here could bring the system down to an unusable crawl. So we let driver writers have the power to destroy worlds, and hope they use them for good and not evil.

As anyone who has used Windows NT and up knows, this doesn’t always go well.

In coming entries, I’ll cover some of the basics of how to open and analyze memory dump files, so you can at least feel like you have a starting place when you get a blue screen on one of your systems. I’ll move on to more advanced topics if there’s an interest.