Bluescreen Debugging for Dummies: Prologue

That could probably help 90% of the developers at Microsoft, to be honest.  Kernel mode debugging is sometimes equated to black magic for devs who spend most of their time in the highly friendly (and deterministic) world of user mode.

An analogy I like is to compare kernel mode to cutthroat corporate life, and user mode to bucolic academic life.  In user mode, there are rules that you can’t break, and are enforced from the outside by the faculty (the OS).  Breaking the rules doesn’t take down the whole university, only one member.  You only hurt yourself.

On the other hand, kernel mode has a set of agreed upon rules, but they’re not nearly as strongly enforced…results are more important after all!  You’re expected to follow them, you can cheat if you want (and even get away with it for awhile), but when you screw up, chances are you’ll take down the whole enterprise. 

You can’t take anything for granted in kernel mode, because there can be any number of kids playing in the same sandbox.  Someone can take your CPU away, take your memory away, dump garbage on your data, and not even call you in the morning.  Paradoxically, while it’s more critical here than anywhere else that everyone follows the rules, it’s not in our interest to strongly enforce them.  Too much error and behavior checking here could bring the system down to an unusable crawl.  So we let driver writers have the power to destroy worlds, and hope they use them for good and not evil.

As anyone who has used Windows NT and up knows, this doesn’t always go well.

In coming entries, I’ll cover some of the basics of how to open and analyze memory dump files, so you can at least feel like you have a starting place when you get a blue screen on one of your systems.  I’ll move on to more advanced topics if there’s an interest.

Comments (3)

  1. Waters says:

    Looking forward to some more articles. Here’s a recent interesting stack trace from dump that happened on shutdown; who’s to blame: mcafee (nai) or sysinternals (filem)? 🙂
















  2. Ah, see that’s one of those dumps right there that automatically falls out of the "Dummies" category. You need to really dig into the assembly to get an idea of who did what to who.

    My first suggestion would be to see if there are updates available for either or both drivers. 🙂

  3. khkim says:

    Tom, dick and harry can say like that.(update a driver). pls don’t say like that as an debugging engineer. I also want to know who is blame for the root cause, FILEM or naiavf5x.

    What and how makes the double fault(stack overflow) happen?

    The real attitude as an debugging engineer is to approach the root cause by debugging, not saying the updating of driver. Of course, for customer, prompt response is more important.

    But, still we are thirsty for the technical stuff.