Lead Hosts versus SQL Server for Windows Server AppFabric Caching

Note: This is only relevant the on-premises version of the Caching service which is part of Windows Server AppFabric

Introduction

The following would go over the concepts of lead host and offloading in both network share configuration (aka XML configuration) and SQL configuration for the AppFabric Caching technology. In addition, it will also explain when to use one configuration over the other.

AppFabric Caching is built on top of a stateful partitioned cluster. It ensures that the cluster is consistent with minimal down time at all times. To maintain consistency the cluster needs a master which would create the cluster, maintain its uptime and ensure its validity. Hence the cluster would go offline along with its master.

Having a single master introduces a single point of failure, so we have a set of nodes called lead hosts acting as the master. Consequently, when a quorum of lead hosts goes down the cluster goes down. For instance, the quorum for a cluster of 5 cache hosts would be 3 lead hosts, quorum for 6 is 4 and so on. This is the network share configuration.

For small clusters (having 2 cache hosts), lead hosts master configuration would not work well since that means one host going down would take the cluster down (if both of them are lead hosts). For those cases, there is another master configuration where we offload the master capabilities and responsibilities to an external custom store. SQL Server custom store is shipped by default. Here SQL Server becomes the master and hence the single point of failure. This is called ‘master offloading’/’lead hosts offloading’/ or just ’offloading’ and the corresponding configuration is SQL Server configuration.

Which configuration to use?

Since lead host configuration avoids a single point of failure, we recommend this to be the default choice for customers with non-trivial deployments.

If a customer has a deployment consisting of two cache hosts or wishes to avoid managing seed nodes then offloading can be used provided he can ensure uptime of SQL for the entire cluster lifetime. The uptime requirement on SQL was not emphasized in our previous guidance and this doc is meant to clarify the same.

Managing a SQL offloaded cluster

Though offloading avoids managing lead hosts, it comes at a cost of maintaining a very high consistent uptime of SQL server to keep the cluster alive. Various techniques could be used like mirroring/failover for this purpose.

FAQs on Offloading

Q.     SQL Server going down would eventually bring the cluster down. But how soon would that be? Is there any SQL Server downtime cluster can survive?

A.      If the SQL Server downtime is strictly less than one minute, the cluster would be able to survive it.

 

Q.     Can the limit of 1 minute be increased? Or in other words, can the cluster be made to survive larger SQL Server downtimes?

A.      It is possible to do that, but we do not recommend that since it would mean the violation of the original requirement of an external master – consistent uptime. In this case the network share configuration should be used.

 

Q.     What happens during the time SQL Server is down?

A.      Following are the effects during SQL Server downtime:

  • Management operations from administration tool would not be available.
  • If a host goes down during SQL unavailability it might result in a cascading effect and thus the whole cluster can go offline. The reason is when a host goes down; its neighbors would not be able to heal the cluster properly as they would not be able to talk with neither the node which went down nor the master.

Managing a lead host based cluster

In this mode, the deployment has to select a set of lead hosts so that a quorum of them is always up. This can be done as a one-time step before you bring up the cluster.

Considerations in choosing lead hosts

We recommend that you use at least 3 lead hosts. You may choose more but empirical observations have shown that you’d not require more than 7. Also have an odd number of them as having an even number (x) is not better than the previous odd number (x-1).

Steps to configure a lead host

  • Open the PowerShell based administration tool.
  • Use the following command to export the cluster configuration:
    Export-CacheClusterConfig -File <filename.xml>
  • Open filename.xml and change the attribute ‘leadHost’ (under <host> tag in <hosts>) to true to make a particular host as a lead host.
  • Repeat the above step until you have sufficient number of lead hosts.
  • Save the file.
  • Import the file using the following command (the cluster should be down at this point):
    Import-CacheClusterConfig -File <filename.xml>

You can also refer to https://msdn.microsoft.com/en-us/library/ee790910.aspx

FAQs on Onloading

 

Q.     How many lead hosts should I use?

A.      As stated above, you should at least have three lead hosts.

Q.     Can I add a lead host dynamically when the cluster is running?

A.      This functionality is not available in RTM 1.0.

 

Q.     What is the connection between lead hosts and network share?

A.      Lead host configuration is the default and the network share is used to store cluster metadata. However when SQL configuration is used, in RTM 1.0, in addition to store cluster metadata, it is automatically picked as the single master.

 

Q.     I already have an offloading configured cluster, what should I do now to go to a lead host based one?

A.      You need to bring down the cluster and reconfigure the administration tool and all the cache hosts to use network share configuration. You can do it via UI or configuration commands – New-CacheCluster, Register-CacheHost. As pointed above you need to configure individual lead hosts also.

[Update]

Q.     Is it possible to use lead host management with the cache configuration in a SQL database instead of a network share?

A.     No, it is not possible to do this in v1.0.

Q. What happens if the network share goes down?

A. In the network share configuration, it acts as a passive entity (unlike SQL Server). Hence the network share going down would not result in the cluster going down. However there are some limitations during its downtime:

  • Administration tool operations won’t work.
  • Named cache creation/deletion would not work.
  • Node installation/uninstallation would not work.
  • If an existing node reboots, it won’t be able to join the cluster.

Q. Does the network share have to be writable by the cache hosts?

A. No, cache hosts only read from the network share.

Amit Kumar Yadav

Kalyan Chakravarthy Sonnathi 

AppFabric Caching Team