Site-aware Failover Clusters in Windows Server 2016

Windows Server 2016, debuts the birth of site-aware clusters. Nodes in stretched clusters can now be grouped based on their physical location (site). Cluster site-awareness enhances key operations during the cluster lifecycle such as failover behavior, placement policies, heartbeating between the nodes and quorum behavior. In the remainder of this blog I will explain how you can configure sites for your cluster, the notion of a “preferred site” and how site awareness manifests itself in your cluster operations.

Configuring Sites

A node’s site membership can be configured by setting the Site node property to a unique numerical value.

For example, in a four node cluster with nodes – Node1, Node2, Node3 and Node4, to assign the nodes to Sites 1 and Site 2, do the following:

  • Launch Microsoft PowerShell© as an Administrator and type:

#Create Site Fault Domains
New-ClusterFaultDomain –Name Seattle –Type Site –Description “Primary” –Location “Seattle DC”
New-ClusterFaultDomain –Name Denver –Type Site –Description “Secondary” –Location “Denver DC”

#Set Fault Domain membership
Set-ClusterFaultDomain –Name Node1 –Parent Seattle
Set-ClusterFaultDomain –Name Node2 –Parent Seattle

Set-ClusterFaultDomain –Name Node3 –Parent Denver
Set-ClusterFaultDomain –Name Node4 –Parent Denver

Configuring sites enhances the operation of your cluster in the following ways:

Failover Affinity

  • Groups failover to a node within the same site, before failing to a node in a different site
  • During Node Drain VMs are moved first to a node within the same site before being moved cross site
  • The CSV load balancer will distribute within the same site

Storage Affinity

Virtual Machines (VMs) follow storage and are placed in same site where their associated storage resides. VMs will begin live migrating to the same site as their associated CSV after 1 minute of the storage being moved.

Cross-Site Heartbeating

You now have the ability to configure the thresholds for heartbeating between sites. These thresholds are controlled by the following new cluster properties:

Property

Default Value

Description

CrossSiteDelay

1000

Amount of time between each heartbeat sent to nodes on dissimilar sites in milliseconds

CrossSiteThreshold

20

Missed heartbeats before interface considered down to nodes on dissimilar sites

 To configure the above properties launch PowerShell© as an Administrator and type:

(Get-Cluster).CrossSiteDelay = <value>
(Get-Cluster).CrossSiteThreshold = <value>

You can find more information on other properties controlling failover clustering heartbeating here.

The following rules define the applicability of the thresholds controlling heartbeating between two cluster nodes:

  • If the two cluster nodes are in two different sites and two different subnets, then the Cross-Site thresholds will override the Cross-Subnet thresholds.
  • If the two cluster nodes are in two different sites and the same subnets, then the Cross-Site thresholds will override the Same-Subnet thresholds.
  • If the two cluster nodes are in the same site and two different subnets, then the Cross-Subnet thresholds will be effective.
  • If the two cluster nodes are in the same site and the same subnets, then the Same-Subnet thresholds will be effective.

Configuring Preferred Site

In addition to configuring the site a cluster node belongs to, a “Preferred Site” can be configured for the cluster. The Preferred Site is a preference for placement. The Preferred Site will be your Primary datacenter site.

Before the Preferred Site can be configured, the site being chosen as the preferred site needs to be assigned to a set of cluster nodes. To configure the Preferred Site for a cluster, launch PowerShell© as an Administrator and type:

(Get-Cluster).PreferredSite = <Site assigned to a set of cluster nodes>

Configuring a Preferred Site for your cluster enhances operation in the following ways:

Cold Start

During a cold start VMs are placed in in the preferred site

Quorum

  • Dynamic Quorum drops weights from the Disaster Recovery site (DR site i.e. the site which is not designated as the Preferred Site) first to ensure that the Preferred Site survives if all things are equal. In addition, nodes are pruned from the DR site first, during regroup after events such as asymmetric network connectivity failures.
  • During a Quorum Split i.e. the even split of two datacenters with no witness, the Preferred Site is automatically elected to win
    • The nodes in the DR site drop out of cluster membership
    • This allows the cluster to survive a simultaneous 50% loss of votes
    • Note that the LowerQuorumPriorityNodeID property previously controlling this behavior is deprecated in Windows Server 2016

Preferred Site and Multi-master Datacenters

The Preferred Site can also be configured at the granularity of a cluster group i.e. a different preferred site can be configured for each group. This enables a datacenter to be active and preferred for specific groups/VMs.

To configure the Preferred Site for a cluster group, launch PowerShell© as an Administrator and type:

(Get-ClusterGroup -Name <GroupName>).PreferredSite = <Site assigned to a set of cluster nodes>

Placement Priority

Groups in a cluster are placed based on the following site priority:

  1. Storage affinity site
  2. Group preferred site
  3. Cluster preferred site