Why Azure Availability Zones


Working with several Microsoft ISVs around the world, I heard loud and clear about an important missing capability from Azure: there was no way to be protected by an entire datacenter failure, and maintaining at the same time synchronous state replication for an application or service. The only option was to deploy in a different region, 100s of miles apart, with obviously no way to have zero RPO and synchronous data alignment. Before introducing new concepts, let me recap with this simple picture what are regions and geographies in the Azure infrastructure hierarchy:

To fill this gap, Microsoft recently announced public preview of Azure Availability Zones (AZ):

Introducing Azure Availability Zones for resiliency and high availability

https://azure.microsoft.com/en-us/blog/introducing-azure-availability-zones-for-resiliency-and-high-availability

Availability Zones (AZ) are fault-isolated locations, within an Azure region, providing redundant power, cooling, and networking. AZs allow your customers to run mission critical applications with high availability and fault tolerance to datacenter failures. Customers can now deploy VMs and single-zone or multi-zone VMSS, Managed Disks, Public IPs, and Load Balancers, initially in East US 2 and West Europe, with 3 AZs each. More regions with 3 AZ support will come soon. It is worth noting that some other cloud providers with similar functionality, are not offering such high number of zones. With the addition of AZ, Microsoft is now able to present a unique combination of high-availability scenarios and HA SLAs:

  • 99,90% on single instance VMs with premium storage for an easier lift and shift; no other cloud provider so far provides this.
  • 99,95% VM uptime SLA for Availability Sets (AS) to protect for failures within a datacenter.
  • 99,99% VM uptime SLA through Availability Zones with loss protection from fire, power, cooling disruption.

Availability Zones and Availability Sets cannot be used together: when creating a Virtual Machine (VM), you will have to specify an AS, or AZ assignment, you cannot do both.

The Azure Virtual Machine official HA SLA is available here, and will be updated with AZ once the feature will be finally released for general availability. Additionally, Azure paired regions within data residency boundaries, but hundreds of miles apart, can protect from larger scale events that may impact an entire region. In order to participate in the Availability Zones public preview, self-service enablement is available, per subscription, using this link.

Distance and Latency between Availability Zones

Officially, Microsoft does not report precisely the distance between different AZ, but there is an interesting slide from a recent Microsoft public event presentation that could give an idea. The picture below is related to the France region which will soon have 3 AZs.

Since the goal here is to serve and support applications that need to use synchronous data replication, we could easily guess that latency will be around 1.5 - 2.0 ms. You can do your tests, but is better to wait for the general availability release of AZ. This is in my opinion the maximum tolerance. Given this latency, and considering the speed of light, you can do your own math. But what you will obtain is the maximum possible distance, not the real one. In addition to latency, there are other factors to consider, for example the geographical characteristics and local risks (proximity to rivers or tornado zones), the possibility to have, at that location, sufficient infrastructure services to power datacenters. Two additional considerations related to AZ: how many datacenters are in there? Which is the latency inside the AZ? For each AZ, at least one datacenter must be obviously in place, but probably will be more than one. Based on a recent Mark Russinovich speech, AZ defines a network boundary where latency inside is no more than 0.6ms:

This picture offers also the possibility to obtain an important additional detail: in different subscriptions, even under the same Azure Active Directory (AAD) tenant, the mapping between logical AZs and physical AZs may be different. Customers and users can select "Zone1", "Zone2" or "Zone3" but these are logical, and Microsoft could change for different subscriptions. I could easily guess, for example, to re-balance resource allocation between different AZs. I have seen some other cloud providers doing the same. Providing an API to allow shared mapping across subscriptions is work-in-progress.

Available Services and Resources

During this initial phase of the public preview, these resources are available to be pinned to specific AZs: Virtual Machine (VM), VM Scale Sets (VMSS), Managed disks (MD), Virtual IP (VIP). In addition, there is also a new enhanced version of the Azure Load Balancer that is now zone-resilient. Since there is no Azure Service Management (ASM) API support planned, AZs cannot be used for legacy Cloud Services (Worker and Web roles). From a user perspective, Availability Zones are visible at IaaS level. It is here that you can choose to have control over your basic VM resources (compute, storage, networking). Regarding PaaS, Microsoft will progressively move main services to AZ, but this will be transparent for customers. One of the first service to be moved will be Azure SQLDB, and it makes perfect sense. Azure SQLDB is already architected to have a primary instance and two secondary instances (replicas), all participating into synchronous database replication with quorum voting, zero data loss and automatic failover. Imaging this architecture in 3 Availability Zones is immediate and intuitive.

Regarding storage, Azure already has ZRS (Zone Replicated Storage), but is limited to block blob and does not fit in the actual AZ design and implementation. For this reason, it is going under redesign, and once available, will support page Blobs, Tables, Queues and Files. Additionally, each write to the storage, through a single logical endpoint, will be replicated synchronously across all the zones. This would mean zero data loss in case of a zone failure (RPO = 0) and automatic failover of reads towards the other zones.

Going back to fundamental IaaS resources, here is the list of changes introduced to adapt to AZ:

  • Virtual Machines (VM) - For a VM object, will be possible to specify a logical "Zone" property value, thus locating the VM in a datacenter supporting that specific AZ.
  • Load Balancer (LB) - Azure Load Balancer now offers two SKUs, Basic and Standard. The Basic SKU is what you have always seen so far in Azure. The Standard SKU is a new enhancement and provides zone redundancy for even higher availability. You can now deploy a Standard Load Balancer that can distribute traffic across all VMs in any AZ in the region, actual scale limit is 1000 VMs. There are many additional advantages on using the new Standard SKU, see here for more details.
  • Virtual IP (VIP) - As for LB, now in Azure you can create Public IP using two SKUs, Basic and Standard. Basic is what you have used so far in Azure, while the new Standard SKU IP enables you to build highly scalable and reliable architectures, in conjunction with Standard SKU LB. This can be zone-redundant, or zonal. You can see below the differences (in order) in how to deploy using ARM templates:
"apiVersion": "2017-08-01",
"type": "Microsoft.Network/publicIPAddresses",
"name": "public_ip_standard",
"location": "region",
"sku":
            {  "name": "Standard" }
"apiVersion": "2017-08-01",
"type": "Microsoft.Network/publicIPAddresses",
"name": "public_ip_standard",
"location": "region",
      "zones": [ "1" ],
    "sku":
            { "name": "Standard" }
  • VM Scale Set (VMSS) - Can be now deployed in two different flavors (see picture below). First one consists in 3 different Scale Sets deployed 1 in each zone as zonal resources, then load-balanced by a cross-zone resilient Load Balancer (Standard SKU). The second possibility (bottom part of the picture), instead, requires the creation of a single ScaleSet that is cross-region deployed and then resilient. This option will be soon available in preview, you can subscribe here if not already publicly available.

  • Custom Images and Snapshots - You can now create VMs in AZ from a custom VM image, previously there was a limitation applied and only Azure Gallery images were permitted. Additionally, you can now take snapshots of VMs in an AZ. Snapshots are also global objects across all the zones in a region.

IMPORTANT: Azure Virtual Networks (VNETs) and Subnets, are and will remain "regional" entities. Once you will define in a region, they will be visible and usable across all the AZ since Network Resource Manager in Azure is region-wide. There is no AZ specification for these objects, VNETs and Subnets can cross AZ.

Tools and API

At both the API and Portal levels, changes applied to include AZ are really minimal. In the Azure Portal, there is only one place where you will find AZ, that is Step [3] in the standard creation workflow:

Be sure to select a Region and a VM SKU enabled for AZ, otherwise you will not see the edit-box above. See here to check what and where is available. Before proceeding, give a look to the settings for the "Public IP Address", it is interesting: there is a new IP SKU called "Standard", as explained previously. The "Basic" type is the one used so far in Azure. For the Standard, you can see it has "Availability Zone" property but value is fixed, you cannot change, must be in the same zone as the compute resource. This type of IP is zoned (or zonal) and will not survive its home zone failure, not that bad since the VM will behave in the same way. On the other hand, if you want to, for example, unbind this VIP from a VM in Zone [1], and bind to a VM in a different zone, then you must use Standard SKU for the VIP. This kind of IP is zone resilient and valid across all zones, allocation method can be only static. Be careful since for the same VM you cannot mix different IP SKUs, must be same in case of additional IPs assigned.

If you want to check details of the subnet and VNET that will be used or created, you will see that there is no "Zone" property: once again, these objects are region-wide, valid across all the zones and can include VMs from all the zones. There is no need, for example, to create separate subnets for different AZs. In the ARM template definitions, pay attention to the additional "zones" property: it is an array of strings, because potentially more than one AZ can be specified for a VMSS.

For REST API usage, pay attention to the API version used when creating the VM, the Managed Disks, and the Public IP Address:

  • "type": "Microsoft.Compute/disks", "computeResouresApiVersion" : "2017-03-30"
  • "type": "Microsoft.Compute/virtualMachines", "computeResouresApiVersion" : "2017-03-30"
  • "type": "Microsoft.Network/publicIPAddresses", "apiVersion": "2017-08-01"

List of all the necessary REST APIs for ARM are reported below:

NOTE: At the time of writing this post, REST documentation for VM and Managed Disks must still be fixed to include some details. This should be fixed soon.

Availability Zones (AZ) are already available also in version 1.3 of the Azure Management Libraries for Java, you can find it here on GitHub.

Availability Zones management has been included also in the v1.3 release of Azure Management Libraries for .NET that you can find at this link.

AZs support is also available using the Azure CLI tool: if you use Cloud Shell, no worry... you are already good to go. Instead, if you install Azure CLI tool locally, be sure to use version 2.0.17 or greater.

az vm create --resource-group myResourceGroupVM --name myVM --image UbuntuLTS --generate-ssh-keys --zone 1

az network public-ip create --resource-group myResourceGroup --name myPublicIp --zone 1 --location westeurope

az vm disk attach -g myResourceGroup --vm-name myVM --disk myDataDisk --new --size-gb 50

Please note that in the last command above, where a new Managed Disk is created, there is no possibility to specify any "Zone": the disk is automatically created in the same zone as the VM.

Step-by-Step Example using PowerShell

Here on GitHub, you can find an easy example, using PowerShell, of how to deploy a Virtual Machine as "zonal" resource, along with a Public IP and a Managed Disk. Additionally, I will show you where the "Zones" attribute will show up, and how to use Snapshots to create/copy and attach Managed Disks in a different AZ. Finally, I will highlight some of the errors and mistakes that you could face trying your first approach. Do not run the entire script at once, it is structured in sections to drive you through the creation process, do it in steps and read before executing.

For PowerShell, AZ support is included starting with version 4.4.0. Make sure that you have installed the latest Azure PowerShell module. If you need to install or upgrade, see Install Azure PowerShell module. These are the specific modules versions that you need to have:

  • 3.4.0   AzureRM.Compute  Microsoft Corporation 3.0
  • 4.4.0   AzureRM.Network  Microsoft Corporation 3.0
  • 3.4.0   AzureRM.Storage  Microsoft Corporation 3.0

I tried to explore all my well-known cmdlets, but I was not able to find any one to check if a region is enabled for Availability Zone, and if yes, which are the available zones. I will keep searching, please let me know if you will find it before me. If you will select an Azure region where AZ are not enabled, or you will use an incorrect value, PowerShell commands will return error messages like the one below:

To show the "Zones" property of a Managed Disk, let's try to resize the disk. This is the output you will obtain at the end (truncated):

In the code sample used, if you look at the section where a Snapshot of the Managed Disk is created, you will see that there is no way, in the used cmdlets, to specify any value for "Zones" attribute: Snapshots are region-wide objects, then valid across all regions. In the main VM object creation step, please note the absence of Availability Set (AS): if you want to use AZ, you cannot include that VM into AS at the same time. If the "Zones" property value will be *not* aligned for VM, Managed Disk and IP, then you will receive a similar error:

If the VM SKU used is not one in the supported list for the AZ, you will receive an error like the one below:

 

If you try to attach a Managed Disk located in a zone different from the one where the VM resides, you will have this error:

Finally, with cmdlet "Get-AzureRmVM" you can now see the "Zone" attribute value listed along with the other VM properties:

Thank You!

Hope this content has been useful and interesting for you, let me know your feedbacks.

You can always follow me on Twitter (@igorpag), enjoy with new Azure Availability Zones (AZ) feature!

Resources

Overview of Availability Zones in Azure (Preview)

https://docs.microsoft.com/en-us/azure/availability-zones/az-overview

 

Regions and availability for virtual machines in Azure

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/regions-and-availability

 

Azure Load Balancer Standard overview (Preview)

https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-standard-overview#why-use-load-balancer-standard

 

Very simple deployment of a VM in an Availability Zone

https://github.com/Azure/azure-quickstart-templates/tree/master/101-vm-simple-zones

 

Create Multiple Virtual Machines in Different Availability Zones and Configure NAT Rules through the Standard Load balancer

https://github.com/wkasdorp/azure-quickstart-templates/tree/master/201-multi-vm-lb-zones

 

Simple deployment of a VM Scale Set of Linux VMs within an Availability Zone behind a load balancer with NAT rules

https://github.com/Azure/azure-quickstart-templates/tree/master/201-vmss-linux-nat-zones

 

Virtual Machine ScaleSet distributed across Availability Zones with Load Balancer

https://github.com/Azure/azure-quickstart-templates/tree/master/301-multi-vmss-linux-lb-zones

 

Comments (1)

  1. Chai says:

    Great article. Here is what I have tried.
    1. Created a managed-disk VM with AZ set to 2
    2. Created a snapshot of the OS disk. The snap was created in East US2
    3. Tried to add a new data disk to the VM. The process guided me to create a new managed disk ( I chose the Snapshot from Step 2 as the source). The new disk was created in East US2. However, the AZ property of the disk is None. This stopped me from adding the new disk to the VM. The error was “disk could not be attached to VM because is not in zone 2′

    Is there a powershell to modify AZ for the disk created in Step 3?

    Thanks
    Chai

Skip to main content