Editor’s note: The following post was written by Cluster MVP David Bermingham. David will do a multi-part series which will go more in-depth on this popular topic
SQL Server High Availability in the Azure Cloud Part 1: The Basics
Thinking about deploying SQL Server in Windows Azure IaaS? If so, you will want to make sure you follow this series of articles that will explain everything you need to know about keeping SQL Server up and running in Windows Azure. This first article will introduce you to some basic Azure concepts which provide the framework for high availability. Subsequent articles will dive deeper into the actual implementation of AlwaysOn Availability Groups and Failover Cluster Instances and the requirements to actually get them working in Azure.
Before we get started, let me make it clear that we will be discussing SQL Server deployed in Azure IaaS. We are not talking about the Azure SQL Database, the database-as-a-service offering in Azure. Azure IaaS allows you to deploy and manage your own VMs and SQL Server implementations much as if you were deploying them in your own data center. You are responsible for the configuration, maintenance and ongoing management of SQL Server.
Part of that responsibility is planning for high availability and disaster recovery in your implementation. While you may assume that by simply deploying your SQL Server instances in Azure that you automatically have high availability, you are mistaken. If you read the Azure service level agreement (SLA) you will find this statement.
“For Cloud Services, we guarantee that when you deploy two or more role instances in different fault and upgrade domains, your Internet facing roles will have external connectivity at least 99.95% of the time.”
You probably have a lot of questions after reading that statement. What is a Fault Domain? What is an Upgrade Domain? How can I deploy two instances of SQL Server to take advantage of the 99.95% “external connectivity” guarantee?
Let’s break it down and tackle each question one at a time.
(For more information about Azure SLA, see the Microsoft article "Microsoft Azure SLA.")
Essentially a Fault Domain is a section of the Azure platform that shares no common single point of failure with another Fault Domain within the same geographic region of the Azure Cloud. Microsoft defines a Fault Domain as a “rack of computers”. In order to qualify for the SLA, you need to have at least two VMs running in different Fault Domains.
With something like web servers or application servers this makes sense. Simply put two VMs up and load balance between the two and you are done. However, with SQL Server instances there is a little more involved. You can’t simply load balance between two instances of SQL Server, you will need to implement AlwaysOn Availability Groups or Failover Cluster Instances with 3rd party replication software to keep the databases in sync and provide failover capability across the two Fault Domains.
Deploying AlwaysOn in Azure has a few requirements that are unique to Azure. Later in this series we will discuss those requirements in detail including: Failover Cluster limitations, Internal Load Balancers and Client Listeners.
While Fault Domains can provide availability for unplanned downtime associated with hardware failures, Upgrade Domains provide the ability to manage planned downtime associated with Microsoft’s maintenance of the Azure platform itself. While most Azure platform maintenance can be done without impacting the availability of a VM, some maintenance will require the rebooting of your VM.
By placing each of your SQL Server AlwaysOn instances in a different Upgrade Domain you can be sure that if your primary server goes offline during the maintenance period your backup server will assume the role as the active server, minimizing your downtime associated with planned maintenance. We can be sure of this because Microsoft only ever does maintenance of one Upgrade Domain at a time.
Later in this series we will discuss the specifics of how to put VMs into a Fault Domain and Upgrade Domain, but for now know that in order to facilitate this process you must put both instances of SQL Server in what is Azure calls an Availability Set.
(For more information about Fault Domains and Upgrade Domains, see the Microsoft article "Manage the availability of virtual machines.")
99.95% External Connectivity Guarantee
What does 99.95% External Connectivity Guarantee mean? In their detailed Service Level Agreement Microsoft basically says that if you have <99.95% availability during a particular monthly billing cycle you are entitled to a 10% credit on your bill if you submit your claim within 2 months of the close of the billing period. If all of your VMs in an Availability Group are unavailable for more than ~21 minutes, you are entitled to a 10% Azure credit. The SLA also states that if you experience <99.9% availability (~43 minutes downtime), you get a 25% Azure credit.
However, this is just an external connector guarantee, this does not guarantee that SQL Server is up and running. In order to provide true high availability for SQL Server you will need to implement AlwaysOn AG or FCI which detects and recovers application level failures.
While Microsoft gives you the tools and framework to provide high availability within the Azure cloud, it is still incumbent upon the administrator to put the pieces together to ensure availability. Over the next few articles in this series we will take a deeper look at how to put the pieces together to implement a highly available SQL Server implementation within the Azure cloud. Later in this series we will explore hybrid cloud options that allow you to have not only high availability within the Azure cloud, but also a “Plan B” option should Azure itself experience an outage that spans multiple Fault Domains or geographic regions.
Windows Azure IaaS is a powerful platform for deploying business critical applications. All of the tools required to build a highly available infrastructure are in place. Knowing how to leverage those tools, especially in regards to providing High Availability for SQL Server, can take a little research and trial and error. I hope that this article has helped point you in the right direction and has reduced the amount of research and trial and error that you will have to do on your own. As with most Cloud Service, new features become available very rapidly and the guidance in the article may become outdated or even wrong in some cases rather rapidly. For the latest guidance, please refer to my blog Clustering for Mere Mortals where I will attempt to update guidance as things in Azure evolve.
About the author
David Bermingham is recognized within the technology community as a high availability expert and has been honored by his peers by being elected to be a Microsoft MVP in Clustering since 2010. David’s work as director of Technical Evangelist at SIOS has him focused on evangelizing Microsoft high availability and disaster recovery solutions as well as providing hands on support, training and professional services for cluster implementations. David hold numerous technical certifications and draws from over twenty years of experience IT, including work in the finance, healthcare and education fields, to help organizations design solutions to meet their high availability and disaster recovery needs. David has recently begun speaking on deploying highly available SQL Servers in the Azure Cloud and deploying Azure Hybrid Cloud for disaster recovery.
About MVP Monday
The MVP Monday Series is created by Melissa Travers. In this series we work to provide readers with a guest post from an MVP every Monday. Melissa is a Community Program Manager, formerly known as MVP Lead, for Messaging and Collaboration (Exchange, Lync, Office 365 and SharePoint) and Microsoft Dynamics in the US. She began her career at Microsoft as an Exchange Support Engineer and has been working with the technical community in some capacity for almost a decade. In her spare time she enjoys going to the gym, shopping for handbags, watching period and fantasy dramas, and spending time with her children and miniature Dachshund. Melissa lives in North Carolina and works out of the Microsoft Charlotte office