SQL Server 2008 does not start after SP1 with ETW enabled

In case you have just deployed SQL Server 2008 SP1 on a IA64 machine and SQL Server doesn’t start check if you have ETW tracing using MSSQLSERVER provider enabled.

SQL Server error log last lines are listed below and no further info is displayed after resource database build info:

SQL Server error log:

(…)
2009-09-18 11:48:45.28 spid6s SQL Trace ID 1 was started by login "sa".
2009-09-18 11:48:45.28 spid6s Starting up database 'mssqlsystemresource'.
2009-09-18 11:48:45.34 spid6s The resource database build version is 10.00.2531. This is an informational message only. No user action is required.

(…)

No messages are logged on event viewer and process needs to be killed manually using task manager. Capturing a memory dump shows at least 2 threads with similar callstack:

(…)

Child-SP Child-BSP RetAddr Call Site
(…)

00000000`287ed940 00000000`287f0c58 00000000`01125610 sqlservr!EventInternal<Spinlock<149,1,0> >::Wait+0x250
00000000`287ed940 00000000`287f0be0 00000000`01867d20 sqlservr!FInitDCOM+0x260
00000000`287ed9b0 00000000`287f0bb0 00000000`05116d50 sqlservr!CreateInfoSetReaderExNoCheck+0x50
00000000`287ed9d0 00000000`287f0b78 00000000`05118080 sqlservr!CreateInfoSetReaderEx+0x50
00000000`287eda00 00000000`287f0b10 00000000`05a68230 sqlservr!CTraceETW::FindTemplate+0x670
00000000`287eda70 00000000`287f0a98 00000000`76b0b040 sqlservr!CTraceETW::ControlCallback+0x310
00000000`287edbd0 00000000`287f0a38 00000000`76b16800 ntdll!RegisterGuidsApiCallback+0x140
00000000`287edc30 00000000`287f09b8 00000000`76b16960 ntdll!EtwpUpdateEnableInfoAndCallback+0x180
00000000`287edc70 00000000`287f0958 00000000`76b1ae50 ntdll!EtwpRegisterProvider+0x120
00000000`287edd20 00000000`287f0918 00000000`76b135e0 ntdll!EtwNotificationRegister+0x90
00000000`287edd20 00000000`287f0898 00000000`05a68d80 ntdll!EtwRegisterTraceGuidsW+0x140
00000000`287edda0 00000000`287f0820 00000000`05a6ae80 sqlservr!CTraceETW::Startup+0x620
00000000`287ef9b0 00000000`287f0808 00000000`019b7dc0 sqlservr!CTraceETW::StartupTask+0x20
(…)

 

(…)
Child-SP Child-BSP RetAddr Call Site
00000000`164ae3c0 00000000`164b2300 00000000`76b37940 ntdll!RtlpWaitOnCriticalSection+0x3e0
00000000`164ae410 00000000`164b22c0 00000000`76b1adf0 ntdll!RtlEnterCriticalSection+0x180
00000000`164ae410 00000000`164b2280 00000000`76b135e0 ntdll!EtwNotificationRegister+0x30
00000000`164ae410 00000000`164b2200 000006fa`fd503b60 ntdll!EtwRegisterTraceGuidsW+0x140
00000000`164ae490 00000000`164b2190 000006fa`fdc57ee0 shell32!DllMain+0x180
00000000`164ae4b0 00000000`164b2138 000006fa`fdc582d0 shell32!__DllMainCRTStartup+0x220
00000000`164ae5d0 00000000`164b2108 00000000`76b247c0 shell32!_DllMainCRTStartup+0x50
00000000`164ae5d0 00000000`164b2020 00000000`76b280c0 ntdll!LdrpRunInitializeRoutines+0x440
00000000`164ae740 00000000`164b1f68 00000000`76b27300 ntdll!LdrpLoadDll+0x620
00000000`164ae9d0 00000000`164b1ed0 00000000`76a0fed0 ntdll!LdrLoadDll+0x1e0
00000000`164aec40 00000000`164b1e48 000006fa`fed061f0 kernel32!LoadLibraryExW+0x670
00000000`164aecb0 00000000`164b1de8 000006fa`fed05ca0 ole32!CClassCache::CDllPathEntry::LoadDll+0x190
00000000`164aecb0 00000000`164b1d60 000006fa`fed03f00 ole32!CClassCache::CDllPathEntry::Create_rl+0xe0
00000000`164aece0 00000000`164b1ce0 000006fa`fed0a350 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0x1e0
00000000`164aef20 00000000`164b1c60 000006fa`fed09ad0 ole32!CClassCache::GetClassObjectActivator+0x4d0
00000000`164aef50 00000000`164b1c08 000006fa`fed0d6e0 ole32!CClassCache::GetClassObject+0x90
00000000`164aef90 00000000`164b1be8 000006fa`fed0f850 ole32!CCGetClassObject+0x40
00000000`164aef90 00000000`164b1b20 000006fa`feeb8ea0 ole32!CServerContextActivator::GetClassObject+0x470
00000000`164af010 00000000`164b1ab8 000006fa`fed12280 ole32!ActivationPropertiesIn::DelegateGetClassObject+0x440
00000000`164af040 00000000`164b1a78 000006fa`fed110d0 ole32!CApartmentActivator::GetClassObject+0xe0
00000000`164af050 00000000`164b1a30 000006fa`fed10ff0 ole32!CProcessActivator::GCOCallback+0xb0
00000000`164af060 00000000`164b19d8 000006fa`fed10cd0 ole32!CProcessActivator::AttemptActivation+0x70
00000000`164af060 00000000`164b1950 000006fa`fed10a00 ole32!CProcessActivator::ActivateByContext+0x110
00000000`164af070 00000000`164b1918 000006fa`feeb8ea0 ole32!CProcessActivator::GetClassObject+0xe0
00000000`164af080 00000000`164b18b0 000006fa`fed0edb0 ole32!ActivationPropertiesIn::DelegateGetClassObject+0x440
00000000`164af0b0 00000000`164b1840 000006fa`feeb8ea0 ole32!CClientContextActivator::GetClassObject+0x1f0
00000000`164af300 00000000`164b17d0 000006fa`fed155c0 ole32!ActivationPropertiesIn::DelegateGetClassObject+0x440
00000000`164af330 00000000`164b16c8 000006fa`fecfdf30 ole32!ICoGetClassObject+0x9c0
00000000`164afd10 00000000`164b1600 000006fa`fecfe910 ole32!CComActivator::DoGetClassObject+0x2d0
00000000`164afd20 00000000`164b15c0 00000000`01126000 ole32!CoGetClassObject+0x50
00000000`164afd20 00000000`164b15a0 00000000`0112bd40 sqlservr!GetLuxorClassFactory+0x70
00000000`164afd30 00000000`164b1560 00000000`04297a80 sqlservr!LoadDCOM+0x1e0
00000000`164afd40 00000000`164b1520 00000000`0612e2d0 sqlservr!StartUp::WaitForTempdbLocked+0x120
00000000`164afd50 00000000`164b13c0 00000000`019b7dc0 sqlservr!initCM+0x610
(….)

0:026> !locks
CritSec ntdll!LdrpLoaderLock+0 at 0000000076d18e20
WaiterWoken No
LockCount 3
RecursionCount 2
OwningThread 1064
EntryCount 0
ContentionCount 8
*** Locked
CritSec ntdll!EtwProvCritSect+0 at 0000000076d1d940
WaiterWoken No
LockCount 1
RecursionCount 1
OwningThread 120c à thread 26
EntryCount 0
ContentionCount 1
*** Locked

So basically second thread that is responsible to initialize DCom and signal it but that was locked because ETW logging critical section was owned by second thread. Second thread acquired ETW critical section but was waiting for DCom to be signaled. Basically a deadlock scenario between these 2 threads.

There are 2 possible workarounds, first is to disable ETW tracing and enabling it after SQL has been started however in case SQL restarts or needs to failover that procedure needs to be manually applied.

Another workaround would be to delete or rename the ETW configuration file (etwcnf.xml) located under the SQL Server instance BINN folder. This should disable SQL Server ETW trace. The ETW callback will fail before reaching the point where it waits for DCOM initialization. You should see the following error in SQL Server errorlog. SQL Server should start normally though.

2009-09-22 10:38:29.03 Server      Error: 8352, Severity: 16, State: 1.
2009-09-22 10:38:29.03 Server      Cannot find the requested trace template: id = 1.

This problem should be addressed on SQL Server 2008 SP1 Cumulative Update 5