Try and try again: not always a good idea (at least not for SSMS!)

Contributions from, and reviewed by: Ken Van Hyning, David Shiflet, Charles Gagnon and Alan Ren (SSMS dev team), Dimitri Furman, Mike Weiner and Rajesh Setlem (SQLCAT)

Background

SQL Server Management Studio (SSMS) is the most popular client used to administer and work with SQL Server and Azure SQL DB. Internally, the SSMS code uses the SqlClient class (implemented in the .NET Framework) to connect to SQL Server. Recent versions of SSMS have been compiled with .NET Framework 4.6.1. What this means is that SSMS gets to leverage many of the newer capabilities in the .NET Framework (for example, the changes for Always Encrypted). It also implies that SSMS ‘gets for free’ some underlying changes in .NET. Many of these new, ‘for free’ behaviors in SqlClient were aimed at ‘cloud applications’ connecting to Azure SQL Database.

One such change SSMS got for free is the connection resiliency logic within the SqlConnection.Open() method. To improve the default experience for clients which connect to Azure SQL Database, the above method will (in the case of initial connection errors / timeouts) now retry 1 time after sleeping for 10 seconds. These numbers are configurable by properties called ConnectRetryCount (default value 1) and ConnectRetryInterval (default value 10 seconds.) The previous versions of the SqlConnection class would not automatically retry in cases of connection failure.

In general, because of these changes, transient errors (across slow networks or when working with Azure SQL Database) are less frequent. However, when you consider that a lot of SSMS users still use it with ‘regular’ SQL Server – either in a VM in the cloud or on-premises, there is a subtle but distinct impact of these changes which may affect administrators of ‘regular’ SQL Server databases.

Impact of these changes

Take for example, a case when the very first user database in your server is inaccessible (for example, when the database is either offline or in a recovering status when it is the non-readable secondary in an Availability Group.) Now, it so happens that when you expand the SSMS Databases section for a server, it enumerates all the databases and tries to read some information from the very first database in the list. When that first database is non-readable, that constitutes a ‘connection error’ for SqlClient. In turn, the connection resiliency logic kicks in and sleeps for 10 seconds and retries the connection. Obviously the second connection will also fail, and then SSMS returns to a ‘responsive state’. Unfortunately, in that 10 seconds the user perceives a hang in the SSMS application.

Reverting to original behavior

There is a simple workaround for this situation. It is to add the following parameter string into the ‘Additional Connection Parameters’ tab within the SSMS connection window. The good news is that you only need to do this once, as the property is saved for future sessions for that SQL Server (until of course it is removed by you later.)

ConnectRetryCount=0

Here’s a screenshot to help as well:

Adding the ConnectRetryInterval parameter in the SSMS 'Additional Connection Parameters' dialog

Do note that ConnectRetryCount is a single term (no spaces in between!). Ideally, we would want to expose this through a setting in the Connection Properties screen in SSMS. For now, you can use the above method to revert to original behavior, or if you’d like to see this setting exposed as a GUI option in SSMS, do let us know by leaving your comments below! We are eager to hear from you!