Sql Server 2005 Resource Taking long time to come online on Windows Cluster with Resource Manager Creation Failed reported in Errorlog

Recently we faced an issue where in clustered instance of sql server 2005 was taking long time to come online on a windows 2008 cluster.

 

      We checked the Error log and found the following

 

2009-10-05 10:58:27.41 Server Attempting to initialize Microsoft Distributed Transaction Coordinator (MS DTC). This is an informational message only. No user action is required.

2009-10-05 10:59:54.39 Server Resource Manager Creation Failed: 0x8004d01c(XACT_E_CONNECTION_DOWN)

 

The Error Code 0x8004d01cmeans“A connection with the transaction manager was lost

 

From the Error log we see that there is 1 min spend to communicate to MSDTC service. So it appears that the delay is caused while communicating MSDTC resource. We checked and confirmed that MSDTC was installed as a clustered resource which is requirement for sql server 2005 installed in a cluster.

 

We checked the Event viewer log and found the following warning reported for MSDTC

MSDTC encountered an error (HR=0x80000171) while attempting to establish a secure connection with system HOU150W8UCSQL1D.

      

      From the above error message it seems sql server service was failing to communicate with clustered MSDTC since MSDTC was failing to use mutual authentication.

We found that Network DTC Access which allows communication with DTC over the network was not selected which prevented sql server to communicate a clustered instance of MSDTC.

For more information on what Network DTC Access is used for, please refer https://support.microsoft.com/kb/899191. This article also mentions that Mutual Authentication cannot be used in a clustered environment.

So to resolve this we enabled Network DTC Access with no Authentication as follows. You can also follow the steps mentioned in the KB https://support.microsoft.com/kb/817064

 

a) Start the Component Services administrative tool. To do this, click Start, click Run, type dcomcnfg.exe, and then click OK.

b) In the console tree of the Component Services administrative tool, expand Component Services, expand Computers, right-click My Computer, and then click Properties.

c) Click the MSDTC tab, and then click Security Configuration.

d) Check the Network DTC Access with no authentication

 

However we observed that after making this change if the sql server and MSDTC resource are on the same node of the cluster, sql server comes online
very fast. When the MSDTC and sql server resource are on another nodes it takes long time to come online.

 

We saw the following messages in the Application log at this time

 

MS DTC is unable to communicate with MS DTC on a remote system. MS DTC on the primary system established an RPC binding with MS DTC on the secondary system. However, the secondary system did not create the reverse RPC binding to the primary MS DTC system before the timeout period expired. Please ensure that there is network connectivity between the two systems. Error Specifics:

 

We disabled the firewall on both the nodes of the cluster and found that sql server came online quickly irrespective of the nodes.

 

However we did not want to disable firewall but wanted to find some alternative to enable firewall and allow exceptions ports for DTC and RPC.

 

We added the program “C:\Windows\System32\msdtc.exe” in the windows firewall exception list as specified in the KB article https://support.microsoft.com/kb/899191 but even that alone did not help.

 

We used the article https://support.microsoft.com/?id=250367 and set the registry to enable range of 200 ports from 5000-5200 for dynamic RPC allocation. Since for a cluster, dynamic RPC allocations requires at-least 200 ports for to perform certain operations.

For Windows 7 and Windows Server 2008 R2 we have option to create an Inbound Rule in the firewall for the range of ports. However in our case i.e Windows server 2008 we can enable rule for only 1 port at a time so we used the following powershell script to enable range of ports in the windows firewall in elevated mode

 

$Range1=[system.string]::Join(",",(5000..5200))
netsh advfirewall firewall add rule name="Ports 5000-5200" dir=in protocol=tcp localport=$Range1 action=allow

 

After creating the firewall exceptions using the above script the MSDTC issue was resolved and sql server was coming online on both the nodes of the cluster without any delay irrespective of the MSDTC resource.

 

Parikshit Savjani
SE, Microsoft SQL server

Reviewed By

Sudarshan Narsimhan
Technical Lead, Microsoft SQL Server