Where did the SQL Server Instance disappear? The clue may be in the WMI logs!

We recently worked with a customer who ran into an interesting situation. This problem deals with SQL Server 2005 Service Pack 3 setup.

Normally, when you launch the SQL 2005 SP3 setup and you reach the screen which shows the components for which you can apply the service pack, you will get a list of all the product components. For a server with one default instance of SQL Server Database Services installed, the list will appear as shown below.

image

 

In this customer’s scenario, there were 2 servers which did not list all the components. Their setup screen looked like the following:

image

Notice that the 3 components highlighted from the previous screen is missing in this screen. Because of this situation, they cannot apply the SQL 2005 SP3 to these 3 components on these servers. The components [database services, integration service and client] were working properly. Only when setup attempts to enumerate the installed components, it was unable to get the complete list.

We started looking at their setup logs and did not find any errors or warning that would indicate any problem. Next we started looking at how the setup enumerates the installed components and qualify them for the upgrade to this service pack. We verified that the instance is listed properly in the registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\Instance Names\SQL. If this registry key does not contain the correct information, then you can encounter this problem. That was not the issue for them.

When parsing through the Event logs we noticed the following entries appearing at regular intervals:

image

Even though this event log entry did not have direct correlation to the setup attempts or timing, it gave a vital clue. Something was wrong with the WMI infrastructure on this machine. We know that setup uses WMI heavily to perform the discovery and enumeration of the installed instances and components. So we turned our attention to WMI logging.

Here is the relevant snippet from wbemcore.log that showed the results of the WMI calls from the SQL Server setup program: 

(Mon May 29 14:31:01 2010.759152890) : GetUserDefaultLCID failed, restorting to system verion(Mon May 29 14:31:01 2010.759153015) : CALL CWbemNamespace::OpenNamespace

   BSTR NsPath = cimv2

   long lFlags = 0

   IWbemContext* pCtx = 0x0

   IWbemServices **pNewContext = 0x270F058

(Mon May 29 14:31:01 2010.759153062) : STARTING a main queue thread 2592 for a total of 1

(Mon May 29 14:31:02 2010.759153484) : GetUserDefaultLCID failed, restorting to system verion(Mon May 29 14:31:02 2010.759153500) : CALL CWbemNamespace::OpenNamespace

BSTR NsPath = default

long lFlags = 0

IWbemContext* pCtx = 0x0

IWbemServices **pNewContext = 0x279F058

(Mon May 29 14:31:02 2010.759153531) : Error 80041002 occured executing request for root\default

This snippet informs us that connections to root\cimv2 namespace was successful but we encountered a failure when connecting to root\default namespace.

Next we used the WBEMTEST.EXE tool [located @ C:\WINDOWS\SYSTEM32\WBEM\] to isolate this to be a clear WMI problem. When we attempted to connect to the root\default namespace, we got the same error we observed from the wmi logs.

image

Basically this error code corresponds to (WBEM_E_NOT_FOUND) Object cannot be found. Why does SQL setup uses the WMI namespace root\default? Setup uses the StdRegProv class in WMI. The StdRegProv class provides EnumValues method to query values from registry. The StdRegProv class is available in the root\default namespace.

So the next step was to rebuild the corrupted WMI namespace. We worked with our Windows support team and used the following commands to rebuild the namespace:

In c:\windows\system32\wbem

"for /f %s in ('dir /b /s *.dll') do regsvr32 /s %s"

Then from the root of the drive, run

"for /f %s in ('dir /s /b *.mof *.mfl') do mofcomp %s"

After this the customer needed to perform a reboot and then we were able to connect to the WMI namespace, setup was able to enumerate all components and apply the service pack. If you are doing this procedure on your own, it would be a good idea to perform a system backup to make sure you can restore system components in case of a problem.

 During this investigation we also found out that in the recent Operating Systems, WMI logging is done in a much different way. For more details, refer to Tracing WMI Activity.

Thanks

Suresh B. Kandoth

Senior Escalation Engineer – SQL Server