System Won’t Power Down


Hi All. Recently I had a Windows 2000 case where the machine wouldn’t shut down. After initiating the shutdown process, we saw the user get logged off, and on the console we watched the services shut down. The final “Windows is shutting down…” message was displayed on the screen, and then the screen would go blank. At that point we expected the machine to power down, but it continued running. Even after waiting 10 minutes the machine stayed powered up. The same behavior was exhibited after a restart as well. We tried the same thing in safe mode, but the machine still wouldn’t shut down.


We setup the machine for a full memory dump, and crashed the box once the screen went blank. The customer uploaded the dump, and I started digging.


The first thing I noticed was a bunch of processes still running. I expected csrss.exe, smss.exe, winlogon.exe and the System process , but to my surprise I found other processes such as svchost.exe and services.exe still lingering. Why were they still running?


 


0: kd> !process 0 0


**** NT ACTIVE PROCESS DUMP ****


PROCESS f95ed480  SessionId: 0  Cid: 0008    Peb: 00000000  ParentCid: 0000


    DirBase: 00030000  ObjectTable: f95edf68  TableSize:  62.


    Image: System


 


PROCESS f94de520  SessionId: 0  Cid: 00fc    Peb: 7ffff000  ParentCid: 0008


    DirBase: 2465c000  ObjectTable: f94def08  TableSize:  34.


    Image: SMSS.EXE


 


PROCESS f94b01c0  SessionId: 0  Cid: 0118    Peb: 7ffff000  ParentCid: 00fc


    DirBase: 26515000  ObjectTable: f94b29e8  TableSize: 103.


    Image: CSRSS.EXE


 


PROCESS f94a5d60  SessionId: 0  Cid: 0130    Peb: 7ffff000  ParentCid: 00fc


    DirBase: 2671a000  ObjectTable: f94b18a8  TableSize: 114.


    Image: WINLOGON.EXE


 


PROCESS f9495a60  SessionId: 0  Cid: 014c    Peb: 7ffff000  ParentCid: 0130


    DirBase: 26814000  ObjectTable: f94b2848  TableSize: 204.


    Image: SERVICES.EXE


 


PROCESS f9494860  SessionId: 0  Cid: 0158    Peb: bffdf000  ParentCid: 0130


    DirBase: 2681d000  ObjectTable: f9499608  TableSize: 247.


    Image: LSASS.EXE


 


PROCESS f9476500  SessionId: 0  Cid: 01e8    Peb: 7ffff000  ParentCid: 014c


    DirBase: 272a1000  ObjectTable: f94833a8  TableSize: 136.


    Image: svchost.exe


 


 


Let’s find out.


How is Shutdown Performed


At first I thought some service was stuck and not shutting down, causing the entire system to stall. I pulled out my copy of Windows Internals and thumbed through the Startup and Shutdown chapter. I found that to initiate a shutdown, Csrss sends a shutdown message to Winlogon. Winlogon then calls ExitWindowsEx, which in turn sends a message to each session’s Csrss. Inside each session, Csrss then sends messages to all the threads that own a Windows message loop to shut down. If the threads do not respond, Csrss waits for a timeout specified by HKCU\Control Panel\Desktop\HungAppTimeout. Once all the threads that own windows exit, Csrss then terminates the processes as well. If Csrss finds a console application, it sends it a message to shut down. It will wait for a timeout specified by HKCU\Control Panel\Desktop\WaitToKillAppTimeout, and then displays a message on the desktop.


If you want Windows to terminate the processes immediately after timeout period, you can set the HKCU\Control Panel\Desktop\AutoEndTasks to 1.


At this point all the processes in the interactive user process have been shut down. Csrss in session 0 then sends messages to the Service Control Manager(SCM) to shut down all the services. When a service is registered, it also specifies a wait hint. During shutdown, this hint is used by the SCM to wait for services to terminate. Prior to Vista, Csrss waits for the SCM to shut down, but if that does not happen by the timeout value in HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout, Csrss would just kill the SCM and proceed with the shutdown. Since this could lead to services not completing their shutdown routines, going forward in Vista and beyond Windows now implements preshutdown notification routines. Services that want to get these notifications can register themselves by using the SetServiceStatus API.


Once Csrss has finished its pass notifying system processes that the system is shutting down, Winlogon in session 0 finishes the shutdown process by calling the NtShutdownSystem. Note, many system processes such as Csrss, Smss, WinInit, Services, Lsass are still running when the machine actually powers down. They are ready to be shut down, but the processes still exist. NtShutdownSystem calls PoSetSystemPowerState to shut down all the drivers and the rest of the executive subsystems.


One of the things that PoSetSystemPowerState does is call the I/O manager to send notifications to all the drivers that have requested the shutdown notification. Once the drivers receive these notifications, they can then clean up and perform any special steps that might be needed for their devices. Other subsystems also perform clean up, and the page file is cleared by the Memory Manager if that option was turned on. The I/O manager is called a second time, and then the file system drivers are notified that the system is shutting down. The System actually shuts down in the power manager. The power manager tells the BIOS to turn the power off to the machine.


The Memory Dump Analysis


I started by looking at the Winlogon process in session 0 and found the main thread had called NtShutdownSystem as we expected. I always start with this thread when troubleshooting shutdown problems.


 


0: kd> !process f94a5d60 17


PROCESS f94a5d60  SessionId: 0  Cid: 0130    Peb: 7ffff000  ParentCid: 00fc


    DirBase: 2671a000  ObjectTable: f94b18a8  TableSize: 114.


    Image: WINLOGON.EXE


    VadRoot f93eb748 Clone 0 Private 515. Modified 396. Locked 0.


    DeviceMap f95c8448


    Token                             e246cdf0


    ElapsedTime                        0:34:42.0937


    UserTime                          0:00:00.0109


    KernelTime                        0:00:01.0015


    QuotaPoolUsage[PagedPool]         30112


    QuotaPoolUsage[NonPagedPool]      31264


    Working Set Sizes (now,min,max)  (1172, 50, 345) (4688KB, 200KB, 1380KB)


    PeakWorkingSetSize                1866


    VirtualSize                       29 Mb


    PeakVirtualSize                   31 Mb


    PageFaultCount                    3603


    MemoryPriority                    FOREGROUND


    BasePriority                      13


    CommitCharge                      888


 


        THREAD f94a5ae0  Cid 130.114  Teb: 7fffe000  Win32Thread: e002e328 WAIT: (Suspended) KernelMode Non-Alertable


            f93c2010  SynchronizationEvent


            f94a5bc8  NotificationTimer


        Not impersonating


        Owning Process f94a5d60


        Wait Start TickCount    138381        Elapsed Ticks: 842


        Context Switch Count    1914                   LargeStack


        UserTime                  0:00:00.0078


        KernelTime                0:00:00.0875


        Start Address winlogon!_setargv (0x0100ae2c)


        Stack Init f1d00000 Current f1cffa60 Base f1d00000 Limit f1cfb000 Call 0


        Priority 15 BasePriority 15 PriorityDecrement 0 DecrementCount 0


 


        ChildEBP RetAddr  Args to Child


        f1cffa78 dd42d893 dd0655c0 f93c2008 dd0654f0 nt!KiSwapThread+0x1b1


        f1cffaa0 dd490be9 f93c2010 00000005 00000000 nt!KeWaitForSingleObject+0x1a3


        f1cffae0 dd4908c5 f93c2000 00000001 00000001 nt!PopWaitForSystemPowerIrp+0xf2


        f1cffb04 dd4903db 00000000 f1cffbf4 f1cffc78 nt!PopSetDevicesSystemState+0xfb


        f1cffbe0 dd468389 00000005 00000004 c0000004 nt!NtSetSystemPowerState+0x329


        f1cffbe0 dd432197 00000005 00000004 c0000004 nt!_KiSystemService+0xc9


        f1cffc64 dd490122 00000005 00000004 c0000004 nt!ZwSetSystemPowerState+0xb


        f1cffd48 dd494db4 00000005 00000004 c0000004 nt!NtSetSystemPowerState+0x70


        f1cffd58 dd468389 00000001 00000000 00000000 nt!NtShutdownSystem+0x2e


        f1cffd58 77f88e07 00000001 00000000 00000000 nt!_KiSystemService+0xc9


        0006fe90 01014c3d 00000001 00000001 000767b8 ntdll!NtShutdownSystem+0xb


        0006fea8 01014f6c 00000060 0000000b 0000000b winlogon!ShutdownMachine+0x165


        0006ff04 0100e20f 000767b8 0000000b 5ffa0000 winlogon!Logoff+0x216


        0006ff20 01007e8c 000767b8 00000005 000735dc winlogon!MainLoop+0x1fb


        0006ff58 0100af70 00071fc8 00000000 000735dc winlogon!WinMain+0x37a


        0006fff4 00000000 7ffff000 000000c8 00000100 winlogon!WinMainCRTStartup+0x156


 


Looks like the I/O Manager has sent shutdown notifications to all the devices and this winlogon thread is waiting for it to complete. Dumping out the IRP, I found it had been completed, but for some reason the completion routine had never been called. That is why we are waiting forever. Digging further, it looked like someone had messed up the current IO stack location on the IRP, which resulted in the IRP completion routine never being called.


Let’s take a look at the IRP.


Based on the thread stack you can see Winlogon sent an IRP down to Plug and Play and is waiting for the action to complete. Since there is no obvious way to get a pointer to the IRP from the stack we’ll need to do some digging. We see that nt!PopWaitForSystemPowerIrp is waiting for some type of dispatch object to signal. According to MSDN the first parameter passed to KeWaitForSingleObject() is the object. Here I’m dumping the raw memory for the Object parameter.


0: kd> dc f93c2010


f93c2010  00040001 00000000 f94a5b4c f94a5b4c  ……..L[J.L[J.


f93c2020  00000000 f94a5ae0 00000000 00000063  …..ZJ…..c…


f93c2030  00000028 00000005 dd47bd20 f93c203c  (……. .G.< <.


f93c2040  f93c203c f93c2044 f93c2044 f938fea8  < <.D <.D <…8.


f93c2050  00000000 00000000 01010000 00000100  …………….


f93c2060  f946c860 f946c860 f93c23ac 00000000  `.F.`.F..#<…..


f93c2070  00000000 f93c2360 f93c2360 f93c207c  ….`#<.`#<.| <.


f93c2080  f93c207c f93c2084 f93c2084 f93c208c  | <.. <.. <.. <.


 


0: kd> !pool f93c2010 2


*f93c2000 size:  420 previous size:    0  (Allocated) *PDss


 


You may ask yourself, “are any of these values in the PDss pool pointers to IRP pool blocks?”  If you wanted to know that, you could iterate over each one of them passing them to !pool or !address.  That sounds tedious to do manually.  But thankfully, the debugger has some nice command tokens such as .foreach that make this a breeze.  If you want to pass each of the above values to !pool, you can perform the following in the debugger:


0: kd>  .foreach /pS 1 /ps 1 ( value { dp /c 1 f93c2000 L 420/@$ptrsize } ) { .if( value != 0 ) {.printf “**** %p ****\n”, ${value}; !pool ${value} 0x2} }


 


**** 21018000 ****


21018000: Unable to get contents of pool block


**** 73734450 ****


73734000: Unable to get contents of pool block


**** 00000002 ****


00000000: Unable to get contents of pool block


**** 00000006 ****


00000000: Unable to get contents of pool block


.


. (I truncated this output for the blog. There were many more entries)


.


**** f946c860 ****


**** f946c860 ****


*f946c800 size:  1e0 previous size:   40  (Allocated) *Irp


**** f946c860 ****


*f946c800 size:  1e0 previous size:   40  (Allocated) *Irp


**** f93c23ac ****


 


**** f93c207c ****


*f93c2000 size:  420 previous size:    0  (Allocated) *PDss


 


Bingo! There is a pointer to an IRP block in our PDss pool. Let’s take a look at f946c800. I’m dumping it’s pool.


 


        0: kd> !pool f946c800


 f946c000 size:   80 previous size:    0  (Allocated)  MmCa


 f946c080 size:   40 previous size:   80  (Free)       ….


 f946c0c0 size:   40 previous size:   40  (Allocated)  Ntfn


 f946c100 size:   a0 previous size:   40  (Allocated)  File (Protected)


 f946c1a0 size:   40 previous size:   a0  (Allocated)  Ntfr.


.


.


.


*f946c800 size:  1e0 previous size:   40  (Allocated) *Irp


 


Here you see the IRP tag in this raw memory output. The first two DWORDs contain the POOL_HEADER.


 


0: kd> dc f946c800


f946c800  0f018002 20707249 01d80006 00000000  ….Irp ……..


f946c810  00000000 00000000 f946c818 f946c818  ……….F…F.


f946c820  c00000bb 00000000 0b0a0000 04000000  …………….


f946c830  00000000 00000000 00000000 00000000  …………….


f946c840  00000000 00000000 00000000 00000000  …………….


f946c850  00000000 00000000 00000000 00000000  …………….


f946c860  f93c2060 f93c2060 f946c9e0 00000000  ` <.` <…F…..


f946c870  00000000 00000000 00000000 00000000  …………….


 


Since the pool header is 8 bytes long we need to add 8 to f946c800 to skip over the header and get to the actual IRP.


 


0: kd> ? f946c800 + 8


Evaluate expression: -112801784 = f946c808


 


0: kd> !irp f946c808


Irp is active with 10 stacks 11 is current (= 0xf946c9e0)


 No Mdl Thread 00000000:  Irp is completed. 


     cmd  flg cl Device   File     Completion-Context


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [  0, 0]   0  0 00000000 00000000 00000000-00000000   


 


                    Args: 00000000 00000000 00000000 00000000


 [ 16, 2]   0 e1 f9533ba0 00000000 dd491028-f93c2354 Success Error Cancel pending


              \Driver\Diskperf    nt!PopCompleteSystemPowerIrp


                    Args: 00000000 00000000 00000006 00000005


 


The Major function is 16 (IRP_MJ_POWER) which synchs up with the power management activity we saw in the stack. You can find all the major function codes for IRPs in the standard header files such as WDM.H or NTDDK.H available with the WDK. There is an excellent chance this is the IRP Winlogon sent down to Plug and Play. Now that we have the IRP in question with the IO Stack locations we see that the pending status is set.


 

                     Args: 00000000 00000000 00000000 00000000


 [ 16, 2]   0 e1 f9533ba0 00000000 dd491028-f93c2354 Success Error Cancel pending


              \Driver\Diskperf    nt!PopCompleteSystemPowerIrp


                     Args: 00000000 00000000 00000006 00000005


 


 


It appears to be stuck in \Driver\Diskperf which is the device stack for the device associated with Diskperf.  At this point I determined it was most likely due to one of the devices in it’s devstack.


0: kd> !devstack f9533ba0


  !DevObj   !DrvObj            !DevExt   ObjectName


  f94d5660  \FileSystem\Foo      f94d5718 


  f95338a0  \Driver\VSP          f9533958 


> f9533ba0  \Driver\Diskperf     f9533c58 


  f9539150  \Driver\Ftdisk       f9539208  HarddiskVolume1


!DevNode f953aba8 :


  DeviceInst is “STORAGE\Volume\1&30a96598&0&Signature80Offset7E00Length2730C00”


  ServiceName is “DiskPerf”


 


Doing a bing search for the drivers in the list above (and shutdown), I found a file system filter driver known to cause this issue in the past. Updating to the latest version of the driver fixed the issue, and the machine would shutdown normally.


BTW – For more information on how Windows performs a shutdown check out this past article from Ntdebugging – http://blogs.msdn.com/ntdebugging/archive/2007/06/09/how-windows-shuts-down.aspx


 





Share this post :





 

Comments (2)

  1. Keith says:

    HI,

      May I ask how you crash it to generate dump while machine no response?

    Thanks.

    [ Hello, Try this article http://support.microsoft.com/kb/244139
     
    Thanks, Ron ]
  2. OzgurDunyam says:

    Thank you for the information your provide.

    <a title=sesli href=http://www.ozgurdunyam.com/ >sesli</a>