Help! My Server is Shutting Down for No Apparent Reason


Hello – Rob here with the GES team, and I have this nugget to pass on to you. I recently worked an issue where a Windows server rebooted intermittently for no apparent reason. The Windows System Event log did not yield any clues, other than this Event ID 6008-


 


Log Name:      System.evt


Source:        EventLog


Date:          25-8-2008 19:06:58


Event ID:      6008


Task Category: None


Level:         Error


Keywords:      Classic


User:          N/A


Computer:      A2A000001


Description: The previous system shutdown at 6:54:04 PM on 8/25/2008 was unexpected.


 


There were no other symptoms or patterns to which the unexpected shutdown could be related. The shutdown could occur anytime of the day. Eventually we attached a debugger to see if we could catch anything, but this wasn’t successful.  Next we looked at the manufacturer’s mechanism used to log errors and found this piece of information –


 


An Unrecoverable System Error has occurred (Error code 0x0000002D, 0x00000000)


 


Note – each vendor has their own way to handle error codes. We noticed a one to one relationship with the vendor error above and the Event ID 6008 messages in the Windows System Event log.  So we engaged the hardware vendor who determined this error indicated an error on the PCI bus. They also informed us that this kind of error asserts an NMI on the bus.


 


To narrow down which component was causing the error, we set the NMICrashDump DWORD value under the following key in the registry:


 


HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl


 


This is described in detail in the article, “927069 How to generate a complete crash dump file or a kernel crash dump file by using an NMI on a Windows-based system”


http://support.microsoft.com/default.aspx?scid=kb;EN-US;927069


 


This registry value causes the machine to bugcheck with a STOP 0x80 (NMI_HARDWARE_FAILURE) when Windows detects an NMI, thus producing a dump file, or, if a debugger is attached, it breaks into the debugger


 


After setting this registry value we hooked up the debugger again and waited… after awhile we got lucky because the debugger intercepted a STOP 0x80!


 


At that time, I ran “!pci 0x102 ff” to get an overview of the various PCI devices and their respective states. The !pci output showed the following output (VendorID and DeviceID have been removed):


 


PCI Configuration Space (Segment:0000 Bus:00 Device:1e Function:00)


Common Header:


    00: VendorID       <vendor>


    02: DeviceID       <device>


    04: Command        0147 IOSpaceEn MemSpaceEn BusInitiate PERREn SERREn


    06: Status         4010 CapList SERR


    08: RevisionID     d9


    09: ProgIF         01 Subtractive


    0a: SubClass       04 PCI-PCI Bridge


    0b: BaseClass      06 Bridge Device


    0c: CacheLineSize  0000


    0d: LatencyTimer   00


    0e: HeaderType     01


    0f: BIST           00


    10: BAR0           00000000


    14: BAR1           00000000


    18: PriBusNum      00


    19: SecBusNum      01


    1a: SubBusNum      01


    1b: SecLatencyTmr  20


    1c: IOBase         20


    1d: IOLimit        30


    1e: SecStatus      6280 FB2BCapable InitiatorAbort SERR DEVSELTiming:1


    20: MemBase        f7e0


    22: MemLimit       f7f0


    24: PrefMemBase    d801


    26: PrefMemLimit   dff1


    28: PrefBaseHi     00000000


    2c: PrefLimitHi    00000000


    30: IOBaseHi       0000


    32: IOLimitHi      0000


    34: CapPtr         50


    38: ROMBAR         00000000


    3c: IntLine        ff


    3d: IntPin         00


    3e: BridgeCtrl     000b PERRREnable SERREnable VGAEnable


 


We couldn’t have gone much further without the vendor’s assistance. They informed us that the Status shows us SERR, which indicates a PCI System Error has occurred in this PCI-PCI Bridge. At this point I had enough conclusive data to pass my findings to the hardware vendor for full collaboration on the problem. They continued investigating the issue.


 


It should be noted that a hardware problem is not the only reason for an Event ID 6008. A quick search in the Microsoft Knowledge Base illustrates other things that could cause the event id to appear in the Windows System log.














Share this post :

Comments (3)

  1. zizebra says:

    Thanks for giving this info. Ididnt know about this one and its critical

  2. Dennis Janson says:

    I am getting the same error message. My server rebooted itself 5 times in 2 days and cannot figure out why.