Troubleshooting Connectivity Common Scenarios


Connectivity troubleshooting.

Connectivity issues are quite common in Azure DB and SQL in general. We have a great article that can assist in resolving these which is here:

https://support.microsoft.com/en-us/help/10085/troubleshooting-connectivity-issues-with-microsoft-azure-sql-database

This article is great, and I encourage you to review its content and consider using it when you have a connectivity problem in the future. However, I’d also like to explain how I go about troubleshooting customer issues when connecting to Azure DB is a problem, and some of the common causes.

The first question I always ask is ‘where are you connecting from?’ at first sight this may seem to be a trivial question but it makes a lot of difference. As a first step I usually advise SSMS as a good tool to test with.

The three obvious answers are

  1. a VM within Azure itself
  2. a machine located inside a corporate network
  3. a machine connected to the internet

Scenario 1:

Connection from 1 fails and from 2 and 3 works without error. This is quite a common issue and catches a lot of users. Fortunately, it’s a straight forward issue to resolve, in this situation we need to remember that from within Azure there is a setting ‘allow Azure Services to connect’ but even with this set we might not be good to go. If connection is not possible with this setting in most cases the problem is going to be that the ports on the VM are locked down. We all know that port 1433 needs to be open which it undoubtedly does, but while connections outside Azure will probably be happy with this, inside Azure you need other sets of ports to be open 11000-11999 and 14000-14999 should also be open, this is because connections from inside Azure will make an initial connection over port 1433 but then negotiate the port from the ranges given which they are actually going to use, connections from outside of Azure go via the gateway which masks this behaviour.

This is great if your client running on the VM supports TDS 7.4 AND also supports re-direction, I’ve found that not all drivers supporting TDS 7.4 do. In this case we can set the connection policy to Proxy to avoid re-direction.

https://msdn.microsoft.com/en-us/library/azure/mt604439.aspx?f=255&MSPPError=-2147217396

It can be set either with a REST PUT request or via PowerShell.

Example:

Set-AzureRmResource -ResourceId /subscriptions/<subscription id>/resourceGroups/<resource group>/providers/Microsoft.Sql/servers/<server name>/connectionPolicies/Default -Properties @{connectionType="Proxy"}

The accepted values for connectionType are Proxy, Redirect or Default.

Under Default, connections behave as documented in https://docs.microsoft.com/en-us/azure/sql-database/sql-database-develop-direct-route-ports-adonet-v12

 

Scenario 2:

Connection from 2 fails, but from 1 and 3 work without error. This is almost certainly going to be down to the corporate firewall you need to ensure that port 1433 is open but also that the IP address ranges for Azure are open. They can be found here :-

Be aware that the ranges are different for each data centre so you need to confirm the right range is open.

You can download the latest ranges from here:

https://www.microsoft.com/en-gb/download/details.aspx?id=41653

 

Scenario 3:

Nothing works. This is a worst case scenario, but often the easiest to resolve, this important thing is to understand the error. It may be that you see something like an error 40, or 53, this is usually down to name resolution. So the first test is a simple ping of the server name, from a command prompt.

The above example shows that we have resolved the name back to an IP address and a data centre (useful for knowing which IP ranges you need). The timeout is fine and can be ignored, it is the name resolving we are looking for here.

Scenario 4:

Azure DB has its own firewall, usually from inside Azure this doesn’t come into play unless you have set ‘Don’t Allow Azure Services to connect’. These firewalls are at two levels, one at the server and one at the database. A troubleshooting step I often recommend is to open all addresses (0.0.0.0 – 255.255.255.255) that should wave everyone through. A useful feature if SSMS is that if you are admin and you attempt to connect to Azure DB and your IP address is not allowed, you will be asked if you want to add a rule to allow your IP or subnet to connect. Something like this…..

 

So you have the option to sign in and create a rule for either your IP address or subnet range.

Its worth remembering that you can set the Azure DB firewall for either the server or the database, details on the Azure DB firewall can be found here

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-firewall-configure

 

Scenario 5:

In SSMS you can get as far as connecting and an error such as 18456 is thrown up. This is caused by a number of things, indeed 18456 is traditionally a ‘catch all’ authentication issue. The state of the error can give a good idea of what is going on. You can see the list of states here :-

https://msdn.microsoft.com/en-us/library/cc645917.aspx

Often this will be down to a bad user name or password, or it may be that the Login has been created in Master but a user has not been created in the Master DB or the required user DB.

Scenario 6:

The above are all ‘hard’ failures, in so far that you will either connect or you won’t, this is ‘easy’ it is when you get connection issues that come and go that your work is harder.

As we know databases can scale in Azure DB, allowing for some quite large DBs to be hosted, but this creates an issue. Take for example an Azure DB server with several Premium tier databases, with 1000’s of connections occurring all OK I hear you say they are all Premium DBs what could go wrong. Well there is a problem, it may be that you are sending all of the authentication via Master DB, and at times of stress you can overload Master DB. This is not good at all, but there is a solution, which is Contained Databases, this is where the Master DB and the user DB users are separated. It means that there is much less load on the Master DB. There is another really important side effect of this, the Geo-replication feature allows DBs to replicate across regions, cos we all want our data to be safe, if you use contained DB then your logins/users get replicated too, and you don’t need to worry about moving users between master DBs (we call that a win). As an aside geo-repl now has a listener which can automate failover.

Now having covered these scenarios the one single bit of advice I can give is to ensure that you have re-try logic implemented within your application. There are a number of situations that will require your database to be taken off-line for short periods of time, if you get your retry logic in place most of these will simply never appear as a problem to your applications


Comments (0)

Skip to main content