Unable to failover a Named instance of SQL server 2005 in cluster or unable to bring a Named Instance of SQL server 2005 Online

The problem is that we have a Named instance of SQL Server 2005 in a 2 node cluster. While trying to bring the SQL Services online we get these error messages in the Application Event Log.

Application Event Log:  

Event Type: Error

Event Source: MSSQL$SQL2k5

Event Category: (3)

Event ID: 19019

Date: 7/17/2009

Time: 7:12:00 AM

User: N/A

Computer: BLRS2R17-3

Description:

[sqsrvres] ODBC sqldriverconnect failed

 

Event Type: Error

Event Source: MSSQL$SQL2k5

Event Category: (3)

Event ID: 19019

Date: 7/17/2009

Time: 7:12:00 AM

User: N/A

Computer: BLRS2R17-3

Description:

[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = ffffffff; message = [Microsoft][SQL Native Client]SQL Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].

 

Event Type: Error

Event Source: MSSQL$SQL2k5

Event Category: (3)

Event ID: 19019

Date: 7/17/2009

Time: 7:12:00 AM

User: N/A

Computer: BLRS2R17-3

Description:

[sqsrvres] ODBC sqldriverconnect failed

 

Event Type: Error

Event Source: MSSQL$SQL2k5

Event Category: (3)

Event ID: 19019

Date: 7/17/2009

Time: 7:12:00 AM

User: N/A

Computer: BLRS2R17-3

Description:

[sqsrvres] checkODBCConnectError: sqlstate = HYT00; native error = 0; message = [Microsoft][SQL Native Client]Login timeout expired

Cause:

The cluster service is not able to Connect with Clustered Named instance of SQL Server. SQL Browser even though running is not actually listening for requests.

 Note: The SQL Server Browser program runs as a Windows service. SQL Server Browser listens for incoming requests for Microsoft SQL Server resources and provides information about SQL Server instances installed on the computer.

Resolution:

We started the SQL Server Browser Service with the -c parameter from command prompt it gave us the below response.

F:\Program Files (x86)\Microsoft SQL Server\90\Shared>sqlbrowser -c

SQLBrowser: starting up in console mode

We checked the same in a working cluster and found that when a browser is started it should have been like below,

F:\Program Files (x86)\Microsoft SQL Server\90\Shared>sqlbrowser -c

SQLBrowser: starting up in console mode

SQLBrowser: starting up SSRP redirection service

SQLBrowser is successfully listening on 0.0.0.0[1434]

[3728]: Waiting for next request...

[3728]: Received request...

[3768]: Waiting for next request...

[0532]: Waiting for next request...

[4612]: Waiting for next request...

[3728]: Waiting for next request...

[3768]: Received request...

[3768]: Waiting for next request...

We found that the SsrpListener key was 0

Path Of The Registry SsrpListener Key :

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Microsoft SQL Server\90\SQL Browser (x64 Machines)

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\SQL Browser (x86 Machines)

Note: The SsrpListener registry value allows or disallows the SQL Browser to expose Engine instance discovery information, So if the value is 0 it disallows the SQL Browser to expose Engine instance discovery.

After we changed value to 1 and restarted the browser service we were getting the response as expected.

We stopped the browser service from the command prompt and then started it from services console and then we were able to bring the SQL Server Services online.

Another Workaround for this issue is to manually create a TCP alias with port number in all the nodes. In that case we need to make sure that SQL is not set to work with dynamic ports.

More Information:
In x64 bit machines you may get into another issue. Once after you change the SsrpListener value to 1 and then when you try to start the browser service through command prompt you may get this response.

C:\Program Files (x86)\Microsoft SQL Server\90\Shared>sqlbrowser -c

SQLBrowser: starting up in console mode

SQLBrowser: starting up SSRP redirection service

SQLBrowser is successfully listening on 0.0.0.0[1434]

SQLBrowser: failed starting SSRP redirection services -- shutting down.

This is caused by the entry in the Wow6432Node registry pertaining to this installation eventhough this was a 64 bit installation. This happens only in x64 machines.

You can remove the checkpoint and then delete this key from both the nodes and then add the checkpoint back.

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Microsoft SQL Server\MSSQL.2\MSSQLServer

If registry checkpoint is not removed properly before deleting this key you may get this error message continously in the Event Logs.

Cluster service could not write to a file (C:\DOCUME~1\SVC~1.WES\LOCALS~1\Temp\CLS2AF.tmp). The disk may be low on disk space, or some other serious condition exists.

Ashwin Menon

SE, Microsoft Sql Server

 

Reviewed By
Karthick Krishnamurthy
Technical Lead, Microsoft SQL Server