In Part 1 and Part 2 of this series we explored an approach to getting your Client Hyper-V environments up and running quickly on a Windows 8 developer workstation. We also reviewed basic networking fundamentals to create virtual datacenters by segmenting our virtual machines across different subnets as well showing a means of introducing latency to simulate geographically disperse datacenters.
With this post we will explore setting up a SQL Server 2012 “AlwaysOn” Availability Group (AG) implemented as a multi-site (subnet) Windows Server Failover Cluster (WSFC) and look at how the developer can leverage a feature of SQL Server 2012 called “Read-Only Routing” that will allow SQL Server clients to transparently direct read-only and read-write workloads to different SQL Server AG replicas. We will set this up without any explicit code using our virtual infrastructure and a sprinkle of ADO.NET connection string configuration (Java based client applications are supported as well via the Microsoft JDBC Driver 4.0). Furthermore we will simulate what happens to our client application when we perform a cross-datacenter failover of our AG database. Let’s get started!
Setting Up the Virtual SQL Server 2012 “Always On” Geo-Cluster
In our simulation we will have three datacenters virtually located across Ohio, Iowa, and Texas. Each datacenter will have an instance of SQL Server 2012 that will be part of an AG containing a single AdventureWorks sample database. Initially Ohio will host the primary database replica with a synchronous replica located in Iowa and an asynchronous replica in Texas. In real life this setup would allow for cross datacenter (multi subnet) failovers between Ohio and Iowa in real time with a potential failover to Texas with some additional decisions to be made. One could imagine other uses for this topology such as using the Texas site for reporting or backup purposes without needing to impact the primary or secondary replica’s in Ohio and Iowa. The application may have a requirement to run “hot / hot” out of more than one geography at a time. There are lots of options “where to put what” which is outside of the scope of this posting so we’ll keep things simple for this discussion as described previously. If you are curious to see how folks are using “AlwaysOn” with their applications take a look here.
Also of note is a Windows 8 OHIOCLIENT machine will host the ADO.NET client application that will be connecting to our multi-site SQL Server 2012 AG containing the AdventureWorks2012 database. I also installed Visual Studio 2012 Express using the wonderful Web Platform Installer to build our ADO.NET WPF client application we’ll be using shortly. The IP addresses I used for my setup are detailed in the table below.
|Server||Machine IP||Cluster Admin IP||SQL AG Listener IP||DNS||Gateway|
DHCP for NAT
DHCP for NAT
To add a degree of realism to our setup we are going to introduce some latency between the OHIO, IOWA, and TEXAS servers.
The process of building the initial multi site three member node majority Windows Server Failover Cluster (WSFC) for SQL Server 2012 “AlwaysOn” is a somewhat involved process. In a nutshell the process entails first building the multi-site (subnet) WSFC and then installing standalone SQL Server 2012 instances one by one on each node of the WSFC (this is quite different than using older traditional SQL Server Failover Cluster Instances or FCIs). In this setup we need all three nodes up and running for cluster “quorum” although we the cluster will stay up with only two nodes but we won’t be able to make further changes to the AG until the third node becomes available (it is technically possibly to force the cluster to come online with a single node as well in an emergency). A good overview of quorum modes and voting configuration as it relates to SQL Server 2012 “AlwaysOn” is available here. Below is a screen clipping of our simulation environment and its cluster configuration.
So what do we need all these IP’s for in the table above? The Cluster Admin IP is used by the Failover Cluster Manager applet. In our case only one of the three *.1.1.10 addresses will be online (active) at a time. The cluster node who’s IP is online “owns” the administration of the cluster. Just like our SQL Server database the cluster admin service can move around from node to node and datacenter to datacenter.
The SQL AG Listener IPs are used to implement the Virtual Network Name (VNN) for the SQL Server AG. For developers familiar with web farm topologies the VNN is similar to a VIP for a DNS load balanced web application. Each node has the potential to host the SQL Server VNN listener at a time and as such one of the three nodes will have this IP address online at a time. This is how clients will connect to SQL Server via the SQL AG Listener IP and not directly to the node\instance of SQL Server as is traditional with a standalone SQL Server instance or a SQL FCI. We are going to jump ahead a bit to illustrate and emphasize this important point with the screen clipping below from our simulations Failover Cluster Manager details for the AG we are building.
The screen clipping above shows that for our topology the OHIO instance @ 10.1.1.15 is at present where the AdventureWorks database in our AG is hosted (or the SQL instance hosting the database put differently). This doesn’t mean that the other SQL Servers instances are offline they are all still up and running but the primary replica is running on the OHIO instance and clients would connect into the VNN (in our case the DNS is AWAGListener.corpnet.contoso.com) that at present is mapped in DNS to 10.1.1.15.
In the interest of saving space in this blog the act of setting up the cluster and installing SQL Server 2012 has been documented well in many other blog postings. I used the same techniques described in this three part series to build out a “majority node set” WSFC entitled “SQL 2012 Failover Cluster Build (Step by Step)”. This guide includes the necessary steps for building out the cluster, enabling “AlwaysOn” within the SQL Server product, and establishing the initial AG via a setup wizard. If folks have questions about the steps specified to my setup above just post a comment and I will link a document of screenshots with all the gory details.
So in our simulation we are building a SQL Server 2012 AG called “AWAGListener” which is my off-the-cuff naming convention for “AdventureWorks Availability Group Listenter”. The careful reader has probably noticed the use of the word “group” in the term “Availability Group” which means that we could have more than one database contained within our AG if we wanted too. In our case we just have a group of one.
One infrastructure step that is not covered in this blog series that is specific to multi-site (subnet) clusters. It has to do with HostRecordTTL and stale DNS that may occur for clients after a cross datacenter failover. There is a good discussion of this important but relatively unknown setting here and a screenshot below where I set the TTL to 15 seconds (rather low) in my simulation. In real life this would be a key discussion point with your network team as well as the SLA for clients to pick up the DNS change for the AG listener after a failover. The SLA discussion question is simple: how long can remote client applications potentially be unaware of the “AlwaysOn” listener DNS change?
So what does this infrastructure look like to SQL Server? A good place to start is the Availability Group Dashboard available from a right click on the AG’s listed in the SQL Server Management Studio.
What we see above is that we have three instances of SQL Server 2012 each with a replica of our AdventureWorks2012 database. At present the primary replica is hosted on the OHIO SQL instance with replicas on both the IOWA and TEXAS instances. With the above setup if needed we could failover to the IOWA instance immediately with no data loss or manually failover to the Texas replica and either allow the databases to sync or failover with the potential for data loss. Next let’s zoom in on the properties of the AWAGListener AG.
The properties for the replicas are detailed above. In this case we are allowing either the IOWA or OHIO servers to accept both read and write application intent workloads while we are explicitly allowing read-intent only connections to the TEXAS replica. Application intent is a new and important concept for the developer to understand.
- When the ApplicationIntent property is set to ReadOnly, the client requests a read access operation when it connects to an AlwaysOn-enabled database. The server will enforce the read-only access operation both at connection time, and during a USE database statement, but only to an AlwaysOn-enabled database
- When the ApplicationIntent property is set to ReadWrite, the client requests a regular (read-write) access operation when it connects to an AlwaysOn-enabled database. The default setting for the ApplicationIntent property is ReadWrite.
What is our experience now when we connect via a client application to the servers individually as well the as the AG?
Interesting eh? It appears as if we are connected to four SQL Server instances – but we are connected to three in reality. The AWAGListener is VNN that at present is living on the OHIO server per the AG dashboard above. A quick glance over to netstat proves this to be the case as even though SSMS is connected to both the AWAGListener and OHIO instance both are connected to the 10.1.1.15 IP address which is where the primary replica for our AG is currently running.
To show that our read-only intent setting for the TEXAS replica is enforced by SQL Server, let’s attempt to connect from SSMS to the AdventureWorks2012 database read-intent replica on the TEXAS instance.
Denied! As expected the connecting string that is being used by SSMS is not expressing our application intent of read-only and as such we are denied the ability to connect.
Our last infrastructure setup task is to setup “Read Only Routing” for our AG. This feature will enable client applications with read-only workload application intent to be routed to a secondary replica in our AG. In our setup we will direct read-write workloads to the OHIO server and read-only workloads to the IOWA server. You will see shortly that we can transparently swap the roles as well as remove entirely one server from the mix directly both workloads to a single replica – all transparent to the application (well mostly). Defined via KB 2654347:
- Read-only routing is a feature that can guarantee the availability of a read-only replica of a database.
To enable read-only routing:
- You must connect to an Always On Availability Group availability group listener.
- The ApplicationIntent connection string keyword must be set to ReadOnly.
- The Availability Group must be configured by the database administrator to enable read-only routing.
First we must compute the read only routing URL for both the OHIO and IOWA servers. Using an excellent post from Matt Neerincx’s blog we can calculate the URLs with a TSQL script. An example is shown below for the output from the OHIO server.
Once we have the URL’s we need to construct a small TSQL script to enable read only routing. See the example here and simply plug in the Read-only Routing URL’s along with the name of your AG’s.
We shift focus now to a simple WPF application that will demonstrate what the application client experience is for using our AWAGListener AG and Read-only routing. The WPF application contains three DispatcherTimers that perform the following tasks at the click of a button and continue to do so until the click of a different button.
- Directs a read-write workload to the AWAGListener using connection string:
Server=AWAGListener;Initial Catalog=AdventureWorks2012;Integrated Security=True;ApplicationIntent=ReadWrite;MultiSubnetFailover=True;Pooling=False
- Directs a read-only workload to the AWAGListener using connection string:
Server=AWAGListener;Initial Catalog=AdventureWorks2012;Integrated Security=True;ApplicationIntent=ReadOnly;MultiSubnetFailover=True;Pooling=False
- Directs a read-only workload to the TEXAS secondary replica specifically using application intent:
Server=Texas;Initial Catalog=AdventureWorks2012;Integrated Security=True;ApplicationIntent=ReadOnly;MultiSubnetFailover=True;Pooling=False
Let’s take a closer look at these connection strings. The first two are nearly identical as they direct connections to the VNN AWAGListerner. The only difference is the ApplicationIntent keyword which as expected is expressing our intent to perform read-only or read-write operations. The last connection string bypasses the AG VNN and connects directly to TEXAS server and the AdventureWorks2012 read only replica. Unlike the poor SSMS client earlier our app will be allowed to connect since we specified our intent. The MultiSubnetFailover keyword allows for exactly what it says again described in KB 2654347:
- MultiSubnetFailover (including support for the named instance)
Always specify MultiSubnetFailover=True when you connect to a SQL Server 2012 availability group listener, or to a SQL Server 2012 failover cluster availability group listener. MultiSubnetFailover enables faster failover for all availability groups and all failover cluster instances in SQL Server 2012, and will significantly reduce failover time for single-subnet and multi-subnet AlwaysOn topologies. During a multi-subnet failover, the client will try connections in parallel.
So why the Pooling=False keyword? Because this is a demo. :-) Kidding aside I want the connection failures to occur immediately for the scenarios that will follow and if I was using pooled connections I would have to capture errors and retry until all of the pooled connections had reestablished to the new primary replica and new secondary read-only replica. So the fine print here says don’t do this for your production applications! Instead as is often the case in software development, employ someone else’s hard work like the Transient Fault Handling Application Block to retry SQL batches or just roll your own retry logic.
So without further adieu let’s start up our application!
The ADO.NET connections in our application are being routed as expected to different replicas based on their application intent. Our read-write workload that is inserting rows to a table is running against the OHIO replica, our read-only workload that is selecting the count from the inserted table is running against the IOWA replica, and we are running a tertiary read-only query against the TEXAS replica. The count is lagging a bit with TEXAS most likely due to the fact that we are using timers to poll as well as the ~100ms of simulated WAN latency from our OHIOCLIENT 10.x subnet to the TEXAS 30.x subnet.
Now’s let’s use the AG dashboard earlier to failover our AG from OHIO to IOWA while the app is running and see what happens.
Cool so we successfully failed over and our application connections have swapped roles! We have done so across our virtual datacenters. The screen clipping above shows that read-write workload is going against the IOWA replica now and the read-only workload is going against the OHIO replica as expected. For ADO.NET clients the new connection experience is seamless, while existing SQL Server connections a SqlException would be generated and the client could retry and would be successful on the next new connection.
It’s important to note that the client application can and most likely will get transient errors during the AG failover. An AG failover is many times faster than a traditional SQL Server FCI failover but it’s not instantaneous. Some examples of transient errors for read-write and read-only workloads that may be seen during the failover are listed below (don’t you love error messages that tell you exactly what to do next?)
Unable to access database ‘AdventureWorks2012’ because its replica role is RESOLVING which does not allow connections. Try the operation again later.
Unable to access the ‘AdventureWorks2012’ database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.
Next let’s unexpectedly turn off the OHIO replica to simulate loss of network connectivity to a datacenter. Effectively we are going to eliminate our current read-only replica while the application continues to run.
Great! So we powered off the OHIO instance and our read-only workload moved to the only available replica for the AWAGListener which is now on the IOWA instance. We have both our workloads going to a single replica. So let’s power OHIO back on and see what happens again while the application continues to run.
Do you feel like you are watching an infomercial yet for an amazing washcloth at 1am in the morning? We are once again fully online and our application has redirected the read-only workload back to the OHIO replica. The features provided by SQL Server 2012 “AlwaysOn” in conjunction with “Read-only routing” allow us to build client applications that support cross datacenter failover scenarios as well as allowing us to partition our read-only and read-write workloads to separate replicas running on a single replica if necessary. All without writing any explicit code or data dependent routing logic in our application.
Its my hope that this posting exposed the ADO.NET or Java client application developers to a set of powerful features within SQL Server 2012 for providing near seamless high availability and workload partitioning capabilities into your applications.
In Part 4 of this series we will look at building a Windows Server AppFabric Caching v1.1 cluster and examine an application that uses a “Read Through Write Back” provider. We will take a look at using the HA features of both SQL Server and AppFabric Caching to build a highly available caching cluster all within Hyper-V on our developer workstation.
Stay tuned and thanks for reading.