Error: 18056 State: 29 The client was unable to reuse a session. The failure ID is 29 after upgrading SQL 2000 to SQL 2008 SP2

I want to share this troubleshooting scenario since it may help you:

After upgrading SQL Server 2000 SP4 to SQL Server 2008 SP2, SQL Server experienced performance problems and many a times, connection failures

SQL Server errorlog shows this message during the problem period:

Error: 18056, Severity: 20, State: 29.

The client was unable to reuse a session with SPID xxx, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

I was thinking that this is the same issue as described in https://support.microsoft.com/kb/2543687 but as per this KB, issue is currently being investigated and scenario described in the KB is not matching the issue I'm troubleshooting.

There is also a fix released for SQL Server 2005 through https://support.microsoft.com/kb/937745 and again it is not applicable here.

Started looking into the data collected through PSSDiag (https://diagmanager.codeplex.com/) and looked into wait stats along with other information. Wait stats showed this:

wait_type

waiting_tasks_count

wait_time_ms

THREADPOOL

262896

470928424

LCK_M_U

331303

335798620

LCK_M_IX

346

214673671

ASYNC_NETWORK_IO

39886463

93412983

PAGEIOLATCH_SH

3274030

49266449

Threadpool worried me since this is not expected to be the Topper.

Looked into sp_configure:

name minimum maximum config_value run_value

max worker threads 128 32767 128 128

As per https://msdn.microsoft.com/en-us/library/ms187024(v=SQL.100).aspx, Upgrading an instance of the SQL Server 2000 Database Engine to a newer version retains the configuration value for max worker threads.

SQL Server 2008 instance with 8 processors will create 576 worker threads but in sp_configure it is hardcoded to 128 threads which caused contention in this case.

So we then changed this value to

EXEC sp_configure 'max worker threads',0

RECONFIGURE WITH  OVERRIDE

and restarted instance, now we longer have the 18056 error.

More information:

THREADPOOL waits only occur when a task is waiting to get assigned to a worker thread. This wait means server is out of worker threads and can’t bind the new connection request to a worker thread.  In most cases the cause of THREADPOOL waits is that existing workers created by SQL Server are tied up with a long blocking chain or running long running pre-emptive work such as extended stored procedures. So when the connection (login) cannot get a worker thread, sp_reset_connection command timed out and it recorded the Error: 18056, Severity: 20, State: 29. The client was unable to reuse a session with SPID 350, which had been reset for connection pooling. The failure ID is 29.

For other 18056 State numbers, please refer this blog for the description https://blogs.msdn.com/b/psssql/archive/2010/08/03/how-it-works-error-18056-the-client-was-unable-to-reuse-a-session-with-spid-which-had-been-reset-for-connection-pooling.aspx

Also, the 18056 error could be noise in certian scenarios, refer https://blogs.msdn.com/b/psssql/archive/2010/05/05/error-18056-can-be-unwanted-noise-in-certain-scenarios.aspx for more information