SqlClient Timeouts Revealed

Just some notes from recent discussions internal and external of how timeouts work with SqlClient.  First a classification of where timeouts can occur with SqlClient:

  1. When attempting to get a connection from the SqlClient connection pool
  2. When attempting to create a new fresh connection to server (not getting one from pool)
  3. When sending a command explicitly to the server
  4. When sending commands with "context connection=true"
  5. When sending commands implicitly (under the covers) to the server
  6. When executing asynchronous commands (BeginExecute...)
  7. When waiting for attention acknowledgement from server (special rare case)
  8. When sending TM commands
  9. When fetching rows
  10. When uploading rows using bulk copy

I hope I did not miss a class of timeout here, if so, let me know.

First, the majority of these above listed timeouts leverage SqlConnection.ConnectionString's "Connect Timeout" and SqlCommand.CommandTimeout values in a logical and sensible manner.

Login Timeouts

For example 1 and 2 use "Connect Timeout", this makes sense to me.  We could have created a separate "get a live connection timeout" versus a "get connection from the pool timeout" but the utility of this does not over-ride the added complexity of having 2 timeouts here.  But say for example you happen to have an old ancient lost to the ages SQL Server that takes 1 minute to open a connection to over a slow WAN.  But once connections are in the pool you want to pull these out in a rapid fashion.   Ahh, it's just not worth it to add this feature for such an edge case.

With Connect Timeout, a value of 0 means "infinite" wait.  It's not truely infinite of course 0 is mapped to UInt32.MaxValue milliseconds underneath which is about 49 days.  Hope you don't have to wait that long for SQL to respond.

If the connection pool is full, attempts to get a connection from the pool will block until Connect Timeout expires, this makes sense.

If you have to open a new connection, there are many activities that occur underneath that could take time.  We have to open a socket to the server, this could be blocked by networking layer for a long time.  Next we have to send prelogin packet to server to negotiate packet size and get server version, and here we wait for a prelogin response from the server.  Once we process prelogin, we then have to send a login packet and this can get complicated.  First we have to negotiate with SQL Server to get SSL setup for encrypting the login packet.  Once we have SSL setup, which can entail multiple trips back and forth to server,  we can then send the login packet.  If you use integrated authentication, we also have to setup the SSPI channel and this could take a few round trips as well.  Ok, once all this is done we get a login acknowledgement from the server and we are "logged in" and hand back control to the caller.  All of this occurs under Connect Timeout period of time during SqlConnection.Open, if any leg in the process takes too long, the timeout expires and exception is thrown.

Note if you do get lots of login timeout failures, this can be indicative of poor pooling performance.  Logins can timeout due to high CPU on server side as well -- if for example the server just does not have enough CPU to process the login quickly enough.  In any case Connect Timeout in SqlClient controls this timeout.

Logins are slightly more complicated when you are connecting to a mirrored SQL Server.  If you are doing this you will have "Failover Partner" specified in the ConnectionString.  When connecting to a mirrored server we alternately attempt to open connections to primary and secondary back and forth until timeout expires.  Overall the same timeout provided by "Connect Timeout" applies in the end.

Explicit Command Execution Timeouts

When executing commands using SqlCommand, the CommandTimeout is used.  A CommandTimeout of 0 means "infinite" again (a.k.a 49 days).  If you don't set CommandTimeout, a default of 30 seconds is used.

Looks simple on the surface but now the fun begins.  What precisely does "the command completed" really mean?  Does it mean the server processed the query and sent back the first byte of response?  Does it mean the time until the entire response is drained?  The help topic for SqlCommand.CommandTimeout indicates:

This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.

This can be important if you are reading millions of rows using a SqlDataReader.  As you read more and more rows you are chewing away at this cumulative time, and the time can run out in theory. 

Note that when reading data from the server, SqlClient reads data in packets (called TDS packets) and these are 8000 bytes each by default.  We don't do any readahead, we just read 8K at a time.  Hence reading a series of rows in a resultset in turn reads a series of 8K packets from the server.  If you are curious about this you can use SQL Server Management Studio and turn on the "Include Client Statistics" feature (it's on the toolbar):

clientstats

You can see here I read 1000 rows, 1306 TDS packets (8000 bytes each) and total of 4MB of data.  This boils down to 1306 network reads of 8K.  In most cases the network reads are very fast, however you could hit one or more locked rows half-way through reading the results which could accumulate more time.  A locked row means the client blocks waiting for the row lock to become available, eating up more of your CommandTimeout.

Hence CommandTimeout covers all the time spend making all these 8k reads, including the first 8K read which is most likely the longest wait as it's waiting for the server to process the query and send back the first bytes of response.

Some customers have complained about this behavior, they think that once the initial read completes then subsequent row fetches should not time out.  The command has completed in other words, why time out reading rows?  In reality, the command may or may not be completed when the first 8K read is returned to the client.  For example you could send a batch of commands and the first command may be completed but subsequent commands in the batch have not even started executing yet.  I suppose we could have a row read timeout and a command timeout to allow more granular control here, but it is really such an edge case that it's not worth cluttering the API with multiple and thus more confusing timeout settings (most customers would never need to set RowReadTimeout).

SqlConnection and "context connection=true"

If you never write CLR code that runs inside SQL Server, you can skip this section.  If you do write code that runs inside SQL Server (CLR stored procedures or functions) then there is a feature called "context connection=true" that allows you to open a loopback SqlConnection to current SQL Server database.  This loopback connection ignores all command timeouts since it is running in the context of a sql batch already and the client controls the timeout.

Implicit Command Timeouts

In some rare cases SqlClient code needs to execute internal commands behind the scenes to fetch metadata etc..., these are "implicit commands".  Some cases of SqlConnection implicit commands:

  • SqlConnection.GetSchema - Internally creates a SqlCommand and executes a metadata query.
  • SqlConnection.ChangeDatabase - Sends a "use [<databasename>]" command under the covers.
  • SqlConnection.BeginTransaction - Sends a "begin transaction" command or other transaction manager command under the covers.

These SqlConnection methods don't have access to any pre-existing SqlCommand object to obtain a CommandTimeout, hence they were written to use Connect Timeout.  Ok, to be honest I am not sure if this makes total sense, but this is how we did it.  Think of "Connect Timeout" as the timeout that applies to all things SqlConnection (login, implicit commands) then it makes sense.   Big note: SqlConnection.GetSchema currently has a bug where it uses a hard-coded 180 second timeout,the plan is to fix this to be consistent and use Connect Timeout some time in the future.

Now SqlCommand does not have any implicit commands it runs, but if it did, it would use CommandTimeout.

SqlDataReader does have an implicit command --> SqlDataReader.GetSchemaTable.  GetSchemaTable uses the CommandTimeout from it's parent SqlCommand object since SqlDataReader does not have a settable timeout.

I classify TM commands as another form of implicit command executed by the driver under the covers.  What is a TM command?  TM stands for transaction manager.  A TM command is a special command sent to SQL Server to request changes in transaction state from the client.  Commands like SqlConnection.BeginTransaction and SqlConnection.EnlistTransaction and SqlConnection.EnlistDistributedTransaction send these TM commands.  All of these use the parent SqlConnection's "Connect Timeout".

Asynchronous Command Execution Timeout

SqlClient does not enforce any timeouts on asynchronously executed commands.  They can run forever in theory.  Note this is not the 49 day forever but the real forever.  Asynchronous commands include SqlCommand.BeginExecuteReader, BeginExecuteNonQuery, and BeginExecuteReader.

Ok, you might be a little shocked by this behavior, but this is how it works.  The idea of asynchronous command execution is you send off the command and then are notified via an event when it completes.  You can implement your own timeout if needed, for example you could start a timer that checks if the command is completed after a period of time and if not then cancels it.

The bottom line is IF we wanted to implement CommandTimeout in this case we would just do the same thing, spin up a timer with a handle to the executing command and cancel it if the timer expires and the command has not completed. 

Cancel Timeout (attention acknowledgement timeout)

Ok, this is a very rare edge case, you can probably forget about this one, but if you are really bored, read on.  Suppose you execute a command, then the command times out.  When this happens the SqlClient driver sends a special  8 byte packet to the server called an attention packet.  This tells the server to stop executing the current command.  When we send the attention packet, we have to wait for the attention acknowledgement from the server and this can in theory take a long time and time out.  You can also send this packet by calling SqlCommand.Cancel on an asynchronous SqlCommand object.  This one is a special case where we use a 5 second timeout.  In most cases you will never hit this one, the server is usually very responsive to attention packets because these are handled very low in the network layer.

Now there is a really rare edge case where you send an attention but the server has already completed the operation.  What happens in this case?  In this case SqlClient just drains the response for you silently and still throws the timeout exception.  This brings up a good point -- there is a slim sliver of a possibility if you cancel a command or it times out it may actually complete on the server side prior to the attention signal, please keep this in mind.   This is a small gotcha that exists when using automatic transactions and a timeout or cancel occurs.  If you use a manual transaction then you can always roll back after handing the timeout exception to ensure you get into a consistent state.

Summary

In general the parent objects timeout is used in most cases, this is the easiest thing to remember.  If parent is SqlConnection then Connect Timeout is used, if the parent object is SqlCommand or SqlDataReader, then the SqlCommand.CommandTimeout is used.   There are a few minor exceptions that don't really add up to much for most scenarios.

If I missed anything here, let me know.  Matt