Deciphering Windows safe boot and last-known good configurations

Yesterday I was working with one of our developers on an issue related to the Enhanced Write Filter in XP Embedded (also known as EWF). I installed the EWF driver on one of my test machines - it is an upper filter driver that has to be loaded at boot time and is flagged to cause a blue screen of death with error code 0x7B if it fails to initialize correctly. I had gotten my machine into a state where I got a BSOD 0x7B every time I tried to boot to my XP Embedded OS, so I wanted to boot into my XP Professional partition and try to fix it.  I learned a couple interesting tricks that made this much easier, and I think some of you probably know this but I wanted to share it for those who don't.

There are a set of registry keys that describe the drivers and services installed on a Windows NT-based OS (such as Windows NT4, Windows 2000, Windows XP, Windows Server 2003, etc). If you are booted into an OS, the keys under HKLM\System\CurrentControlSet describe the currently running OS. This hive is a mirrored copy of one of the hives under HKLM\System\ControlSet### (where # is a numerical value, normally something like 001, 002).

So the next question is - what if I see multiple ControlSet### registry hives on my machine? How do I figure out what each one of those is? If you look at HKLM\System\Select, you will see some DWORD values there:

  1. Current - this is the control set number of the drivers/services for the currently booted OS if you are booted into the image. This means that the hives CurrentControlSet and ControlSet# where # equals this Current value in the registry are mirrored copies of the same data - and changes to one will be reflected in both places
  2. Default - this is the control set number of the drivers/services for the OS that will boot by default if you do not change the selection in the OS boot menu. If you are booted into an OS currently, this value should be the same as the Current value
  3. Failed - if you have received a BSOD during boot at some previous time, this is the control set number of the drivers/services for the OS that has failed to boot, and there will be a ControlSet# hive that is cached under HKLM\System to allow debugging and troubleshooting
  4. LastKnownGood - this is the control set number of the drivers/services for the OS that will boot if you press F8 when the OS boot menu appears and then choose the last known good configuration

Until I talked to our dev, I didn't know what the difference was between any of the ControlSet### hives, so I booted to my XP Pro OS, loaded the system registry hive from my broken XP Embedded OS using the strategy I described here (except I loaded system instead of system.sav since my image had already gone through first-boot agent), went to ControlSet001 and removed all of the information about the EWF driver that was somehow causing the BSOD 0x7B. Then I found that when I booted back to the XP Embedded partition I still got the same BSOD.

Once I figured out how to map ControlSet### to the currently active hive, I was able to remove the driver information from the correct hive and then my XP Embedded image booted normally again.

Note that my example above involved troubleshooting an XP Embedded OS image, but the principles about keeping track of the drivers and services using HKLM\System\CurrentControlSet and ControlSet### applies the same way to all NT-based OS's and not just embedded.