Configure an ILB listener for SQL Server AlwaysOn Availability Groups in Azure ARM

This blog post is essentially a porting to Azure Resource Manager (ARM) of the article below, specifically focused on Internal Azure Load Balancer (ILB), where Azure Service Manager (ASM) was originally used:

Configure an ILB listener for AlwaysOn Availability Groups in Azure https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-sql-server-configure-ilb-alwayson-availability-group-listener

The steps below assume you have already configured:

  • At least a Resource Group (RG) to host all your resources, or re-using an existing one;
  • A Virtual Network (VNET) for each site and related subnets;
  • A VPN or Express Route infrastructure to connect the two VNETs in case of multi-datacenter deployment (HA and Geo-DR);
  • The necessary VMs to host your Active Directory Forest Domain, if not leveraging on-premises infrastructure connected via VPN or Express Route;

Using Internal Load Balancing (ILB), with a private address for the listener, will imply that clients, tools and applications will only access the listener from within the same virtual network (VNET), or other connected VNETs via ExpressRoute or VPN. Only regional networks are supported, that is no “Affinity Groups”. Additionally, at the moment there is a restriction in ARM that will permit only one ILB, then one internal AG listener, per “Availability Set”. This limitation could be removed in the future.

Create ILB with Load Balancer Rule & Health-Probe

In Azure ARM, networking part is almost totally changed, then the way to create load balanced endpoints compared to previous ASM API. In the PowerShell code snippet below (also contained in the attached script) you can see an example of how to build an ILB to support a SQL Server AlwaysOn Availability Group (AD) Listener:

Some comments on the code snippet above:

  • In ARM, whenever you create an ILB, the IP address used is static by default. There is no more PowerShell cmdlet to “mark” that IP as static. For more info, see my blog post below:

 How to Set a Static Internal Private IP (DIP) for Azure VM in ARM http://blogs.msdn.com/b/igorpag/archive/2016/01/20/how-to-set-a-static-internal-private-ip-dip-for-azure-vm-in-arm.aspx

  • ILB internal IP address must be in the same VNET as SQL Server node VMs.
  • Even if health probe is required for proper AG Listener functioning, TCP port 59999 is not mandatory, but must match with the port later used to modify the cluster resource as reported in section “Create the availability group listener”.
  • In the Load Balancer rule definition, “EnableFloatingIP” switch is the ARM equivalent of ASM “DirectServerReturn” used for “Add-AzureEndpoint” PowerShell cmdlet.
  • In the Load Balancer rule definition, I assumed default TCP port 1433 for SQL Server instance: if a different port is used, please change value for “BackendPort” accordingly.
  • Additionally, “FrontendPort” can be changed to any other valid port value, but client applications and tools must explicitly reference it.

In order to gain 99,95% High-Availability SLA, all SQL Server VMs must be part of the same ARM “Availability Set” (AS). Please also note that in ARM Azure now provides three “Fault Domains” (FD) for each AS, compared to only two FDs for ASM. Additionally, if you are using an additional witness or file share node to provide adequate odd cluster quorum number, then you need to include in the same AS, but you don’t have to include that VM in the Load Balancer backend pool since SQL Server is not installed there. Instead, if you install an odd number of SQL Server nodes, you don’t have to configure a witness or file share VM, and you should include all these VMs in both the AS and Load Balancer definition.

Creating Virtual Machine NICs

Once create the load balancer (LB) there is a second essential step: you need to create NIC objects for the SQL Server VMs and add them to the “Backend Address Pool” of the just created LB above as shown in the picture/script below (complete script in the file attached to this post):

Let me emphasize that for the witness NIC, if witness is used for the cluster configuration, you don’t have to add it to the load balancer backend pool since that VM will *not* receive SQL traffic, SQL Server is not installed there. What is important instead, is to include the Witness VM in the same “Availability Set” (AS) with the SQL Server VMs for the reason explained at the very end of the previous section.

Creating SQL Server and Witness VMs

 
After creating the NICs, you can create the VMs where you will install SQL Server. If a Witness VM will be used to have an odd number of cluster nodes (and votes), remember that SQL Server must *not* be installed here but must be part of the same “Availability Set” (AS) as SQL Server instances. In the attached file to this blog post I included a sample to show how these VMs should be created: be careful especially with the disk configuration since I didn’t use any optimization, I provided this code only as a simple example.

Verify that KB2854082 is installed if necessary

Next, if any servers on the cluster are running Windows Server 2008 R2 or Windows Server 20012, you must verify that the hotfix KB2854082 is installed on each of the on-premises servers or Azure VMs that are part of the cluster. Any server or VM that is in the cluster, but not in the availability group, should also have this hotfix installed. In the remote desktop session for each of the cluster nodes, download KB2854082 to a local directory. Then, install the hotfix on each of the cluster nodes sequentially. If the cluster service is currently running on the cluster node, the server is restarted at the end of the hotfix installation.

Initialize the WSFC Cluster VMs

Now that you have the VMs ready and joined to the domain, you need to prepare these VMs and then create a Failover Cluster instance. In order to do that you can follow the steps reported at the link below in the section titled “Initialize the WSFC Cluster VMs”. Be sure to select the “PowerShell - Classic” section at the beginning of the article and don’t worry, these steps are valid also using ARM.

Configure AlwaysOn Availability Groups in Azure VM (PowerShell) https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-sql-server-alwayson-availability-groups-powershell

Configure the Availability Group

In this section you will install the Cluster including SQL Server VMs and eventually a Witness, then you will create an “Availability Group” on the SQL Server instances. To complete this procedure, you need to execute the steps reported at the link below in the section titled “Configure the Availability Group”. Be sure to select the “PowerShell - Classic” section at the beginning of the article and don’t worry, these steps are valid also using ARM.

Configure AlwaysOn Availability Groups in Azure VM (PowerShell)

https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-sql-server-alwayson-availability-groups-powershell

Open the Firewall Ports in Availability Group Nodes

In this step, you create a firewall rule to open the probe port for the load-balancer rule (59999 as specified earlier), and another rule to open the availability group listener port. Since you created the load-balancer rule on the Azure VMs that contain availability group replicas, you need to open the probe port and the listener port on the respective Azure VMs.

  1. On VMs hosting replicas, launch Windows Firewall with Advanced Security.
  2. Right-click Inbound Rules and click New Rule.
  3. In the Rule Type page, select Port, then click Next.
  4. In the Protocol and Ports page, select TCP and type 59999 in the Specific local ports box. Then, click Next.
  5. In the Action page, keep Allow the connection selected and click Next.
  6. In the Profile page, accept the default settings and click Next.
  7. In the Name page, specify a rule name, such as AlwaysOn Listener Probe Port in the Name text box, and click Finish.

Create the Availability Group Listener

 

In this step, you manually create the availability group listener in Failover Cluster Manager and SQL Server Management Studio (SSMS).

  1. Open Failover Cluster Manager from the node hosting the primary replica.
  2. Select the Networks node, and note the cluster network name. This name will be used in the $ClusterNetworkName variable in the PowerShell script.
  3. Expand the cluster name, and then click Roles.
  4. In the Roles pane, right-click the availability group name and then select Add Resource -> Client Access Point.

5. In the Name box, create a name for this new listener, then click Next twice, and then click Finish. Do not bring the listener or resource online at this point.

6. Click the Resources tab, then expand the Client Access Point you just created. You will see the IP Address resource for each of the cluster networks in your cluster. If this is an Azure-only solution, you will only see one IP address resource.

7. If you are configuring a hybrid solution, continue with this step. If you are configuring an Azure only solution, skip to the next step.

    1. Right-click the IP Address resource that corresponds to your on-premises subnet, then select Properties. Note the IP Address Name and network name.
    2. Select Static IP Address, assign the static IP address used for ILB (10.1.2.200 in the initial example above) and then click OK.

8. Right-click the IP Address resource that corresponds to your Azure subnet and then select Properties.

9. In the same IP Address properties window, change the IP Address Name. This IP address name will be used in the $IPResourceName variable of the PowerShell script. Repeat this step for each IP resource if your solution spans multiple Azure VNets.

10. Logon to one of the Cluster node, copy the PowerShell script below into a text editor and set the variables to the values you noted earlier:

# Define variables
$ClusterNetworkName = "<MyClusterNetworkName>"
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = "<IPResourceName>"
# the IP Address resource name
$ILBIP = “<X.X.X.X>”
# the IP Address of the Internal Load Balancer (ILB)

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{"Address"="$ILBIP";"ProbePort"="59999";"SubnetMask"="255.255.255.255";"Network"="$ClusterNetworkName";"EnableDhcp"=0}

# If you are using Windows Server 2008 R2 use this:
cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

NOTE: If you want to check cluster resource advanced properties, you can use the simple piped cmdlets below:

 Get-ClusterResource $IPResourceName | Get-ClusterParameter 

 

11. Once you have set the variables, open an elevated Windows PowerShell window, then copy the script from the text editor and paste into your Azure PowerShell session to run it. If the prompt still shows >>, type ENTER again to make sure the script starts running.

12. Repeat this on each VM. This script configures the IP Address resource with the IP address of the cloud service and sets other parameters like the probe port. When the IP Address resource is brought online, it can then respond to the polling on the probe port from the load-balanced endpoint created earlier in this tutorial.

Bring the listener online

1. Navigate back to Failover Cluster Manager. Expand Roles and then highlight your Availability Group. On the Resources tab, right-click the listener name and click Properties.
2. Click the Dependencies tab. If there are multiple resources listed, verify that the IP addresses have OR, not AND, dependencies. Click OK.
3. Right-click the listener name and click Bring Online.
4. Once the listener is online, from the Resources tab, right-click the availability group and click Properties.

 5. Create a dependency on the listener name resource (not the IP address resources name). Click OK.

6. Launch SQL Server Management Studio and connect to the primary replica.
7. Navigate to AlwaysOn High Availability | Availability Groups | | Availability Group Listeners.
8. You should now see the listener name that you created in Failover Cluster Manager. Right-click the listener name and click Properties.
9. In the Port box, specify the port number for the availability group listener by using the $EndpointPort you used earlier (in this tutorial, 1433 was the default), then click OK.

Follow-up Items

After the availability group listener is created, it may be necessary to adjust the RegisterAllProvidersIP (default=0) and HostRecordTTL (default=1200) cluster parameters for the listener resource. These parameters may reduce reconnection time after a failover which may prevent connection timeouts. You can do this executing the script below on one of the Cluster nodes:

$CAPName = 'ListenerNetworkName'
Get-ClusterResource $CAPName | Set-ClusterParameter -Multiple @{"HostRecordTTL"=300;"RegisterAllProvidersIP"=1}

The properties were stored, but not all changes will take effect until “ListenerNetworkName” is taken offline and then online again. If you want to check default properties for the CAP (Client Access Point) network name used for AG Listener, you can use the cmdlet below and look at the output:

Get-ClusterResource $CAPName | Get-ClusterParameter 

For more information on these parameters, as well as sample code, see Create or Configure an Availability Group Listener.

Test the Availability Group Listener Client Connectivity

In this step, you test the availability group listener using a client application running on the same network.If the AlwaysOn replicas are in different subnets, clients must specify "MultisubnetFailover=True" in the connection string. This results in parallel connection attempts to replicas in the different subnets. Note that this scenario includes a cross-region AlwaysOn Availability Group deployment. One example would be to connect to the listener from one of the VMs in the same Azure VNet (but not one that hosts a replica). An easy way to complete this test is to try to connect SSMS to the availability group listener. Recently, Microsoft made available SSMS via standalone installation package, then it is no more necessary to run the full SQL Server setup to only install this tool. You can find it at the link below:

Download SQL Server Management Studio

https://msdn.microsoft.com/en-US/library/mt238290.aspx  

IMPORTANT: Even if you can still connect directly to the SQL Server instances, if you want to leverage high-availability provided by AlwaysOn AG, you need to use the Listener network name, that is the CAP cluster resource name you chose in Step[5] in the section "Create the Availability Group Listener" above in this post.

Another simple method is to run SQLCMD.exe as follows:

sqlcmd -S "<ListenerNetworkName>,<EndpointPort>" -d "<DatabaseName>" -Q "select @@servername, db_name()" -l 15

If the EndpointPort value is 1433, it is not required to specify it in the call. The previous call also assumes that the client machine is joined to the same domain and that the caller has been granted permissions on the database using windows authentication. In addition to automatically connecting clients to the primary replica, a listener can also be used to redirect read-only workloads to the secondaries. This can improve the performance and scalability of your overall solution. For more information, see Use ReadIntent Routing with Azure AlwaysOn AG Listener.

 

Test the Availability Group Listener Failover

With AlwaysOn Availability Group (AG), it is important to remember that you *don’t* have to use Failover Cluster Manager to test AG Listener failover between nodes, otherwise the operation will fail. The tool you have to use is SQL Server Management Studio (SSMS).

Once connected to the SQL Server instance/VM owning the primary replica, expand “AlwaysOn High Availability”, then right-click your custom created AG and select “Failover” from the contextual menu:

Depending on the node you are going to move the AG Listener, and then promote to Primary Replica, you may see a warning related to possible data loss: if the replica/node you are moving to is configured for synchronous data replication, then no data loss, while you will see in case of asynchronous data replication.

 

Once selected the SQL Server instance, click next and on till completion and you will see that AlwaysOn promoted a new Primary and downgraded the previous one to Secondary. IMPORTANT: If you moved/promoted a new Primary from an asynchronous secondary, SQL Server by default will stop replicating data from the new Primary to the other Secondary replicas: this is done to eventually protect new Secondary replicas from data loss. You have to explicitly resume data movement using contextual menu of each Secondary replica as shown in the picture below:

 

That’s all I wanted to share with you, hope you will find this content useful and interesting, feel free to post your comments and feedbacks here. Remember that you can also follow me on Twitter ( @igorpag). Best Regards.

ScriptSample(3).ps1