Failover Cluster File Share Witness and DFS

This blog discusses a new feature in the upcoming release of Windows Server 2019. Currently, Windows Insiders receive current builds of Server 2019. We urge you to become an Insider and play a part in making Windows Server 2019 the best that it can be. To do so, go to this link and sign up.

One of the quorum models for Failover Clustering is the ability to use a file share as a witness resource. As a recap, the File Share Witness is designated a vote in the Cluster when needed and can act as a tie breaker in case there is ever a split between nodes (mainly seen in multi-site scenarios).

However, over the years, we have seen where this share is put on a DFS Share. This is an awfully bad idea and one not supported by Microsoft.  Please do not misunderstand that this is a stance against DFS.  DFS is a great feature with numerous deployments out there.  I am specifically talking about putting a cluster File Share Witness on a DFS share.

Let me give you an example of what can happen on a Windows Server 2016 Cluster. Let’s take the example of a 4-node multisite cluster with two nodes at each site running SQL FCI. Each side has shared drives utilizing some sort of storage replication (Storage Replica for those Ned fans). The cluster connects to a file share witness that is a part of DFS share. So, it would look something like this.

All is fine, dandy and working smoothly. But this is what can happen if there is some sort of break in communications between the two sites.

What has happened is there is a loss of connectivity between the two sites. Site A already has the file share witness and places a lock on it so no one else can come along and take it. Because it is running SQL already, it stays status quo. Over on Site B, is where the problem occurs. Since it cannot communicate to Site A, it has no idea what is going on. Site B nodes do what it is supposed to which is to arbitrate to get the Cluster Group and the witness resource. It goes to connect and DFS Referral sends it to one of the other machines and connects. Site B nodes see it has the witness, so it starts bringing everything online, which would include SQL and its databases. For those not so familiar with Failover Clustering and all its jargons, this is known as a split brain.

So as far as each sides view of membership, they have quorum and SQL clients are connecting and writing/updating the databases. When connectivity is restored between the sites and we get back to our normal cluster view again, we think everything is all roses again.

However, remember, each side had the SQL databases being written to. Once the storage replication begins again, a very possible outcome is that everything that was written on one of the sides is now gone.

So as pointed out earlier:

This is an awfully bad idea.

Microsoft does not support running a File Share Witness on a DFS share.

For Windows Server 2019, additional safeguards have been added to help protect from misconfigurations. We have added logic to check to check if the share is going to DFS.

In Failover Cluster Manager, if you go through the quorum configuration wizard and try to use a DFS share, it will fail on the Summary Page with this dialog:

If you attempt to set it through PowerShell, it will fail with this error:

PS C:\Windows\system32> Set-ClusterQuorum -FileShareWitness \\contoso.com\dfs-share
Set-ClusterQuorum : There was an error configuring the file share witness ‘\\contoso.com\dfs-share’.
   Unable to save property changes for ‘File Share Witness’.
   The request is not supported

There has also been added logic during an online of the File Share Witness as well as the thorough resource health check (IsAlive) to validate if it is on a DFS share. If the share is added to DFS after the fact, these checks will fail the resource.

Let me reiterate what I have already mentioned:

Microsoft does not support running the File Share Witness on a DFS share. We did not support it in the past and we will not support it for the foreseeable future.

Thanks,
John Marlin
Senior Program Manager
High Availability and Storage