Who is restarting my server?

Hello - This is Omer and I recently came across a case where the customer reported that they could not reboot into safe mode using their custom image. Whenever they booted into safe mode, the machine would get to the logon screen, wait for 5 seconds and then reboot regardless of any user input. Nothing was being logged in the event logs either, so it was very strange.

At first it looked like the machine was going through a power cycle, since the shutdown was so quick (we would not see the usual shutdown messages like “Shutting down Services”, etc.). I thought maybe there was some issue with the hardware, but the customer reported that they had the same issue on every machine, regardless of the hardware vendor.

To figure this out, I attached a kernel debugger to the machine, and broke in to make sure the connection was good. I then let the machine go, and it got to the logon screen. Sure enough, after 5 seconds the machine rebooted. I thought that I would run into some kind of exception, and the debugger would break, however nothing of the sort happened. The only message that I got was that the following

Shutdown occurred at (Fri Jun 26 17:27:12.714 2009 (GMT-7))...unloading all symbol tables.

Very strange! The OS disconnected the debugger gracefully. I did a quick source code review and found that one of the places that we disconnect the debugger was in the system shutdown path. Maybe the OS was shutting down gracefully, but since it happened so fast, it looked like a power cycle. To test my theory, I put a breakpoint on nt!NtShutdownSystem to see if it was being called, and find the caller as well. Rebooted the machine, and let it rip.

nt!NtShutdownSystem()

nt!KiSystemServiceCopyEnd()+0x13

ntdll!ZwShutdownSystem(void)+0xa

services!ScRevertToLastKnownGood()+0x1af

services!ScStartMarkedServices()+0x154

services!ScStartServiceAndDependencies()+0x43d

services!ScAutoStartServices()+0x225

services!SvcctrlMain()+0xa75

services!main()+0x31

services!__mainCRTStartup()+0x13d

kernel32!BaseThreadInitThunk()+0xd

ntdll!RtlUserThreadStart()+0x1d

 

Voila! Services.exe is shutting down the system. Probably some service is not starting, which is then somehow causing the machine to shutdown. From the stack, I was able to figure out which service was not starting. Based on the service record, it was some third party remote assistance service.

But, how could this non-critical service not starting successfully, cause the Service Control Manager to reboot the machine? And what is that stack frame about reverting to last known good (services!ScRevertToLastKnownGood()+0x1af) doing on the stack?

Looking at the service record, I found that the SCM returned an error code 0x43c. This can be translated to ERROR_NOT_SAFEBOOT_SERVICE(This service cannot be started in Safe Mode) . Also, the ErrorControl value for this service value was set to 0x2, which meant that if the service was not started successfully, the system needs to revert to the last known good configuration and reboot. However if the system was already using last known good, then it should just continue the boot process and log the error.

Error Control Meaning

Level

0x3 (Critical) Fail the attempted system startup.

                                                                If the startup is not using the

                                                                LastKnownGood control set, switch to

                                                                LastKnownGood. If the startup attempt

                                                                is using LastKnownGood, run a bug-check

                                                                routine.

0x2 (Severe) If the startup is not using the

                                                                LastKnownGood control set, switch to

                                                                LastKnownGood. If the startup attempt

                                                                is using LastKnownGood, continue on

                                                                in case of error.

0x1 (Normal) If the driver fails to load or initialize,

                                                                startup should proceed, but display a

                                                                warning.

0x0 (Ignore) If the driver fails to load or initialize,

                                                                start up proceeds. No warning is displayed.

 

Because the service’s ErrorControl value is set to 0x2, the machine would revert to the last known good configuration and silently reboot. I booted the machine normally, and changed the ErrorControl value in the registry.

I also had to change the value in the other ControlSets, since they were identical to the current control set. This also explains why the machine kept rebooting every time, the value in the Last Known Good Configuration was also set incorrectly. L

I rebooted the machine and was able to boot into safe mode normally. Hence, the mystery of the silent reboots was solved.

Share this post :