SQL Azure and Entity Framework Connection Fault Handling

 

A number of customers are choosing to use Entity Framework with Microsoft SQL Azure, and rightfully so.  Entity Framework provides an abstraction above the database so that queries are developed using the business model or conceptual model and queries are executed using the richness of LINQ to Entities.  SQL Azure is a cloud-based database based upon SQL Server which provides reliability and elasticity for database applications.  And, the Entity Framework supports SQL Azure, both the designer and the provider runtime can be used against either a SQL Server database on premise or in the cloud.

But these same customers are describing connectivity drops, classified as general network issues, and are asking for the best practice to handle such network issues between SQL Azure instances and clients utilizing Entity Framework.  In fact, SQL Azure has a whole host of reasons to terminate a connection including but not exclusive to resource shortages and other transient conditions.  Similar issues apply when using ADO.NET; networks can be unreliable and are often suspect to sporadic fits resulting in dropped TCP connections.  There are a number of blog posts, such as 'SQL Azure: Connection Management in SQL Azure' and 'Best Practices for Handling Transient Conditions in SQL Azure Client Applications', which provide connection fault handling guidance and even a framework for creating retry policies with SQL Azure client. Neither article is fully comprehensive regarding the Entity Framework aspects.  The purpose of this blog posting is to fill in the details and discussion points for the many developer options to handle and recover from intermittent connectivity drops when using Entity Framework.

Background Information – Connection Pooling

It is well known that the creation and teardown of database connections to SQL Server is expensive, thus ADO.NET uses connection pools as an optimization strategy to limit the cost of the database connection creation and teardown operations.  The connection pool maintains a group of all physical connections to the database and when clients such as Entity Framework request a connection, ADO.NET provides the next available connection from the pool.  The opposite occurs when clients close a connection – the connection is put back into the pool.  What is less understood is that the connection pooler will remove connections from the pool only after an idle period OR after the pooler detects that the connection with the server has been severed.  But the pooler can only detect severed connections after an attempt has been made to issue a command against the server.  This means that clients such as Entity Framework could potentially draw a severed or invalid connection from the pool.  With high latency and volatile networks, this happens on a more frequent basis.  The invalid connections are removed from the pool only after the connection is closed.  Alternatively the client can flush all of the connections using ClearAllPools or ClearPool methods.  See MSDN article ‘SQL Server Connection Pooling' for a more verbose description of the concepts.

Background Information - Entity Framework Database Connection

The Entity Framework provider abstracts most if not all of the facets from executing a query against the backend store, from establishment of the connection, to the retrieval of the data and materialization of the POCO or EntityObjects.  Nevertheless it does provide access to the underlying store connection through the Connection property of the System.Data.Objects.ObjectContext .

The ObjectContext wraps the underlying System.Data.SqlClient.SqlConnection to the SQL Server database by using an instance of the System.Data.EntityClient.EntityConnection class.  The EntityConnection class exposes a read/write StoreConnection property which is essentially the underlying SqlConnection to the SQL Azure instance.  That mouthful simply says that we have a mechanism upon which to read the current state of a connection and assign the store connection if so desired.  Since we have access to the database connection, we can most certainly catch exceptions thrown when the network transport has failed and retry our operations per a given retry policy.

One subtlety which requires a touch of clarification, take the code snippet below. In this example the connection was explicitly opened on the context. The explicit opening of the connection is a way of informing EF not to open and reopen the connection on each command. Had I not opened the connection in this way, EF would implicitly open and close a database connection for each query within the scope of the context. We will leverage this knowledge in the connection retry scenarios that follow.

 using (AdventureWorksLTAZ2008R2Entities dc = new AdventureWorksLTAZ2008R2Entities())
{
    dc.Connection.Open();
    // ...
}

Case #1 – Retry Policies

Let’s take the following code as an example. How many connections are drawn from the pool? The answer is two, one for the retrieval of the customer and another for the retrieval of the address. This means that if implemented in this general way, Entity Framework will use a new connection for every query it executes against the backend data store given the scope of the current ObjectContext . It pulls a connection for the pool, submits the query against the database and closes the connection for each LINQ query. You can see this by running SQL Profiler on an instance of SQL Server as shown in the table below the query.

 using (AdventureWorksLTAZ2008R2Entities dc = new AdventureWorksLTAZ2008R2Entities())
{
    int cId = 29485;
    Customer c1 = (from x in dc.Customers
                   where x.CustomerID == cId
                   select x).First();

    Address ad1 = (from x in dc.Addresses
                   from y in dc.CustomerAddresses
                   where y.CustomerID == cId && x.AddressID == y.AddressID
                   select x).FirstOrDefault();
}

 

Audit Login

RPC:Completed

exec sp_executesql N'SELECT TOP (1) [Extent1].[CustomerID] AS [CustomerID] …

Audit Logout

RPC:Completed

exec sp_reset_connection

Audit Login

RPC:Completed

exec sp_executesql N'SELECT [Limit1].[AddressID] AS [AddressID], …

Audit Logout

But there is an interesting facet to the code above, the connection to the database was not explicitly created.  The EntityConnection was transparently created by the Entity Framework when the AdventureWorksLTAZ2008R2Entities ObjectContext was instantiated and the StoreConnection property or actual SqlConnection state is set to closed. Only after the LINQ query is executed by the EF provider is the connection opened and the query submitted to SQL Server.  The state of the inner connection changes to closed and the connection placed back into the pool once the results have been successfully retrieved.

In this example, we have two places in which a transient network error or invalid connection in the pool could affect the query and cause a System.Data.EntityException, for the retrieval of the customer and for the retrieval of the address.  The inner exception of the EntityException is of type System.Data.SqlClient.SqlException and it contains the actual SQL Server error code for the exception.

A policy can be applied to wrap the LINQ queries to catch the EntityException and retry the query given the particular metrics of the policy.  The retry policy in the code below utilizes the Transient Conditions Handling Framework described in the blog written by my teammate Valery. This blog provides a very comprehensive selection of retry policies which will properly handle the thrown exceptions. The basic principle is to support a number of retries with increasing periods of wait between each subsequent retry (i.e. a backoff algorithm).  In this case we abort the operation after 10 attempts with a wait periods of 100ms, 200ms… up to 1 second.

 using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling;
using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.SqlAzure;

 

 

 using (AdventureWorksLTAZ2008R2Entities dc = new AdventureWorksLTAZ2008R2Entities())
{
    int cId = 29485;
    int MaxRetries = 10;
    int DelayMS = 100;

    RetryPolicy policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(MaxRetries, TimeSpan.FromMilliseconds(DelayMS));

    Customer c1 = policy.ExecuteAction<Customer>(() => 
                  (from x in dc.Customers
                   where x.CustomerID == cId
                   select x).First());

    Address ad1 = policy.ExecuteAction<Address>(() =>
                  (from x in dc.Addresses
                   from y in dc.CustomerAddresses
                   where y.CustomerID == cId && x.AddressID == y.AddressID
                   select x).FirstOrDefault());
}

This retry policy approach is not without its challenges, particularly in regards to the developer experience.  The Func delegates require that a type be passed in, something that is somewhat cumbersome when using anonymous types because the return type must be set to an object type.  This means that the developer must cast the object to the anonymous type to make use of it, a sample of which is shown below.  I created a CastHelper class for that purpose.

 public static class CastHelper
{
    public static T Cast<T>(object obj, T type)
    {
        return (T)obj;     
    }
}
 var c1 = policy.ExecuteAction<object>(() =>
              (from x in dc.Customers
               where x.CustomerID == cId
               select new { x.CustomerID, x.FirstName, x.LastName }).First());

var anon = CastHelper.Cast(c1, new { CustomerID = -1, FirstName = "", LastName = "" });

Case #2 – Retry Policy With Transaction Scope

Case #2 expands the previous case to include a transaction scope.  The dynamics of the sample changes because a System.Transactions.TransactionScope object is used to ensure data consistency of all scoped queries using an ambient transaction context which is automatically managed for the developer. But from the trace shown below we still observe the typical pattern of Audit Logout events with the SQL Transaction Begin and End events.  This begs the question:  How can we have a local transaction (non-distributed) if our data access code spans multiple connections?   Well, in this case we are not truly spanning multiple physical connections because every connection drawn from pool is based upon the proper connection string and, for those enlisted in a transaction, the thread context.  We can imagine the connection pool subdivided into subdivisions based upon individual transaction contexts and the connection string.  In this way, pulling a connection from the pool guarantees that we are not attempting to enlist in a distributed transaction.

The net result:  We cannot implement our retry policies at the query or SaveChanges level and still maintain the ACID properties of the transaction.  The retry policies must be implemented against the entire transactional scope as shown below.  Note that if you do attempt to place retry logic against the individual queries and a network glitch occurs, be assured than an EntityException will be thrown having an inner SqlException with a message of “MSDTC on Server “xxx” is unavailable”, SQL Azure does not support distributed transactions.  Nevertheless, this is not SQL Azure specific problem. The error is non-recoverable at the query statement level; it is all or nothing with transactions.

 using (AdventureWorksLTAZ2008R2Entities dc = new AdventureWorksLTAZ2008R2Entities())
{
    int MaxRetries = 10;
    int DelayMS = 100;

    RetryPolicy policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(MaxRetries, TimeSpan.FromMilliseconds(DelayMS));
    TransactionOptions tso = new TransactionOptions();
    tso.IsolationLevel = IsolationLevel.ReadCommitted;
    policy.ExecuteAction(() =>
    {
        using (TransactionScope ts = new TransactionScope(TransactionScopeOption.Required, tso))
        {
            int cId = 29485;

            Customer c1 = (from x in dc.Customers
                           where x.CustomerID == cId
                           select x).First();

            Address ad1 = (from x in dc.Addresses
                           from y in dc.CustomerAddresses
                           where y.CustomerID == cId && x.AddressID == y.AddressID
                           select x).FirstOrDefault();

            string firstName = c1.FirstName;
            c1.FirstName = c1.LastName;
            c1.LastName = firstName;

            dc.SaveChanges();

            string addressLine1 = ad1.AddressLine1;
            ad1.AddressLine1 = ad1.AddressLine2 == null ? "dummy data" : ad1.AddressLine2;
            ad1.AddressLine2 = addressLine1;
            dc.SaveChanges();

            ts.Complete();
        }
    });
}
SQL Transaction

0 – Begin

RPC:Completed

exec sp_executesql N'SELECT TOP (1) [Extent1].[CustomerID] AS [CustomerID], …

Audit Logout
RPC:Completed exec sp_reset_connection
RPC:Completed

exec sp_executesql N'SELECT [Limit1].[AddressID] AS [AddressID], …

Audit Logout
RPC:Completed exec sp_reset_connection
RPC:Completed

exec sp_executesql N'update [SalesLT].[Customer] set [FirstName] = @0, [LastName] = @1 …

Audit Logout
RPC:Completed exec sp_reset_connection
RPC:Completed

exec sp_executesql N'update [SalesLT].[Address] set [AddressLine1] = @0 …

Audit Logout
RPC:Completed exec sp_reset_connection
SQL Transaction 1 – End
Audit Logout

Case #3 – Implement Retry Policy in OnContextCreated

If your scenario is such that the EF queries return very quickly, which is the most typical pattern, then it probably suffices to employ a retry policy solely at the connection level. If other words, only retry when a connection fails to open. The premise is that if the client application acquires a valid connection, no other network related errors will occur while during the execution of my queries. If an error does occur, the application can catch the exception and resubmit as it would in a traditional non-cloud based implementation. Besides, this tactic offers the least invasive solution because as you will from the code sample below, we only have to implement the retry policy in one spot. The remainder of the EF code works the same as if it was executing against an on premise SQL Server.

Remember earlier we stated that the ‘validity’ of a connection is only determined after a command is issued against the server. So conceivably to ensure a valid connection, one must attempt to open a connection and submit a command to the database. If an exception is thrown, the connection is close and thus removed from the pool. There is an associated overhead to opening and submitting a command just to check the validity of a connection and for this reason the ADO.NET connection pool elects not to perform this on behalf of the client application.

The tactic is to implement the OnContextCreated partial method of the models context which is called each time a new context is instantiated. In this partial method, employ a retry policy which opens a connection, submits a dummy query and handles exceptions with proper closure of invalid connections. In this way, the pool is ‘cleansed’ of all connections that have disconnected due to network glitches or idle expirations. The tradeoffs are obvious, the additional round trip to the SQL Azure database and the possible delay while the OnContextCreated method closes invalid connections.

 partial void OnContextCreated()
{
    int MaxRetries = 10;
    int DelayMS = 100;

    RetryPolicy policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(MaxRetries, TimeSpan.FromMilliseconds(DelayMS));
    policy.ExecuteAction(() =>
    {
        try
        {
            string ss = Connection.ConnectionString;
            Connection.Open();
            var storeConnection = (SqlConnection)((EntityConnection)Connection).StoreConnection;
            new SqlCommand("declare @i int", storeConnection).ExecuteNonQuery();
            // throw new ApplicationException("Test only");
        }
        catch (Exception e)
        {
            Connection.Close();
            throw e;
        }
    }
    );
}
RPC:Completed exec sp_executesql N'declare @i int; set @i = @ix',N'@ix int',@ix=1
RPC:Completed

exec sp_executesql N'SELECT TOP (1) [Extent1].[CustomerID] AS [CustomerID], …

RPC:Completed

exec sp_executesql N'SELECT [Limit1].[AddressID] AS [AddressID], …

Audit Logout  

The sample above submits a lightweight batch to the database, the only overhead to the database is the time to execution of the RPC statement.  All looks good, but there is one important fact that may catch you.  From the profiler trace we observe that the same connection is used for each query database query. This is by design and as discussed early, i.e. when a connection is explicitly opened by the developer it tells EF not to open/reopen a connection for each command. The series of Audit Login/Logout events to retrieve the customer entity or address entity are not submitted as we saw in Case #1 and #2. This means we cannot implement a retry policy for each individual query like I showed earlier. Since the EntityConnection has been assigned to the ObjectContext , EF takes the position that you really truly want to use one connection for all of your queries within the scope of that context. Retrying a query on an invalid or closed connection can never work, a System.Data.EntityCommandExecutionException will be thrown with an inner SqlException contains the message for the error.

Conclusion

Three cases were examined. Retry policies with/without a transaction scope and an implementation of OnContextCreated. The first two cases apply if you wish to introduce retry policies on all of your queries, in or out of a TransactionScope . You may find justification in this approach if your EF queries are apt to become throttled by SQL Azure which manifests itself during execution. The programming paradigms while not overly complex do require more work on the part of the developer. Applying a retry policy in the OnContextCreated method provides slimmer coverage but offers the best bang for the effort. In the majority of cases, network related exceptions tend to reveal themselves just after the connection is retrieved from the pool, thus a policy implemented right after the connection is opened should prove sufficient for those cases where database queries are expected to return quickly.

 Authored By: James Podgorski
Review By: Valery Mizonov, Mark Simms, Faisal Mohamood