I recently worked on an issue where one of my customers had successfully finished patching their SQL 2005 instances on a 3 node cluster. However the Build number was not getting updated on one of the nodes (NODE3) even after successfully completing the SP4 installation
SQL version On problem node: Database Services (INSTANCEX) SP3 2005.090.4340.00 x86 ENTERPRISE
Expected version: SQL Server Database Service 2005(INSTANCEX) ENU SP4 2005.090.5000.00 x86 ENTERPRISE
OS version: Microsoft Windows Server 2003 family, Enterprise Edition Service Pack 2 (Build 3790) x86
Number of Nodes : 3 (NODE1, NODE2, NODE3)
Note: With SQL Server 2005 unlike the higher versions, the patch is applied to the active node of the cluster and it is the patch that will go off to all passive nodes and will apply the patch files remotely
Patching of the passive node is achieved by remotely creating a scheduled task
With SQL 2005 the patch does everything so once it's finished either all nodes are patched, or none are patched. If there is an issue during the patching phase, then the entire patch needs to be rolled back on all nodes. Therefore if you have 3 node cluster and the last node fails to patch, the patch will be rolled back on all nodes.
- However in our case the patch was not rolled back on the nodes even though NODE3 wasn't patched correctly because the summary log still reports a success!
- We collected the setup logs on the NODE3 and also the successfully patched nodes to make detailed analysis to check if features to be updated have been skipped for any reason
- .We paused the NODE2 and NODE3 from the Cluster Administrator and ran the SP4 setup on the NODE3
We see that the setup succeeded (but completes in less than 30 seconds) showing us it may not be actually patching in the background
Product Installation Status
Product : SQL Server Database Services 2005 (INSTANCEX)
Product Version (Previous): 5000
Product Version (Final) : 5324
Status : Reboot Required
Log File : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\SQL9_Hotfix_KB2716427_sqlrun_sql.msp.log
Error Number : 3010
Error Description :
Product : Database Services (INSTANCEX)
Product Version (Previous): 4340
Product Version (Final) : 5000
Status : Success
Log File :
Error Number : 0
Error Description :
During the very beginning of the Service Pack setup process, the existing SQL Server services are discovered via WMI queries by the means of Wmiprvse.exeprocess; without the proper permissions, the process will fail to discover the installed components
- That led us to suspect "Is the Service pack setup even able to enumerate/discover the SQL instances?
Here is how we investigated that:
- We know that the SQL Server Configuration Manager tool also uses the same WMI queries to iterate through the services collection
- When these tools generate the WMI queries, the SQL Server Web-Based Enterprise Management (WBEM) provider (Sqlmgmprovider.dll) is loaded into the Wmiprvse.exe process. Then, the SQL Server WBEM provider extracts and processes the information about the services of every instance of SQL Server 2005
- So we decided to check if the SQL configuration manager on the problem node was able to enlist the services which would give us an idea if these WMI queries are working as expected
We open the SQL Server Configuration Manager, the right hand pane had the following error "The server threw an exception. [0x80010105]".
- On the NODE2 and NODE1 we do not see these issues.
- We also performed a small test to check WMI connectivity
SELECT * FROM RegServices
We see that in the Problem Node3, even after waiting for a long time we do not see any instances enumerated.
When we try the same on the NODE1 and NODE2 for the same namespace we see all the instances are being enumerated:
Connected to other namespace (e.g. root\cimv2, SELECT * FROM Win32_Service), it is all working fine
It appears that only the SQL namespace is having issue.
- However we had check the Setup logs to confirm if the Service pack setup has actually succeeded for the Database Engine Services, or if it was skipped due to any possible WMI issues?
We can see that setup able to enumerate all the details but weren't sure why it completes in ten seconds (Below is a snippet of setup log generated on NODE3 when the other two nodes were paused):
12/06/2014 13:10:24.272 Instance Details: INSTANCEX
12/06/2014 13:10:44.149 Enumerating passive cluster nodes
12/06/2014 13:10:44.149 Hotfix package completed
We do not see it installing any files or creating a Log file!
++12/06/2014 06:56:02.818 Installing target: NODE2
- So now we had a fair idea that the Service Pack setup is completing without actually patching the instance, because we possibly have issues discovering the installed components by the WMI queries
- We now had to fix the SQL Server Configuration Manager, issue which was throwing the error "The server threw an exception. [0x80010105]".
We have a Public KB article that fixes the SQL configuration manager issue we are seeing:
To work around this problem, open a command prompt, type the following command, and then press ENTER:
What finally resolved it for us!
- We came across http://support.microsoft.com/kb/2849344 , which speaks of the same "The server threw an exception [0x80010105]" error when the configuration manager is started.
- However the KB article is applicable to Windows Server 2012 & SQL server 2008 R2 where the problem occurs because SQL Server 2008 R2 internally uses a deprecated 32-bit Windows .dll file (Msclus.dll). When Configuration Manager is started after the Cluster Automation feature is added, the system discovers that the necessary .dll file is missing, and this triggers the error
The work around provided by the KB is as follows
Note The 32-bit version of the MSClus.dll can be found in the "%windir%\syswow64" directory.
Register this .dll file by running the regsvr32 command from an elevated command prompt:
Click Start, type command promptin the Search programs and files box.
UnderPrograms, right-click Command Prompt, and then clickRun as administrator.
At the command prompt, type the following command, and then press ENTER:
The current customer's environment is Windows 2003. However we decided to give the above workaround a shot as we were also in a clustered environment.
After registering the above MSClus.dll ! Voila…the configuration manager no longer throws the exception
We patched the NODE3 once again, which took slightly longer this time.
Rebooted the node and checked the SQL Server build which now reports 090.5000.00 correctly J