Killing two birds with one stone: SharePoint HA and DR with stretch farm, and everything you want to know about it

A single SharePoint farm running across multiple data centers is called a “stretch farm”. Using a stretched SharePoint farm, you can provide fault tolerance by following the standard guidance for making databases and service applications redundant. Therefore you can achieve high availability and disaster recovery at the same time.

You can follow this articles to set up a stretch farm using SQL mirroring: https://technet.microsoft.com/en-us/library/dd207314.aspx

Following Q&As are based on my real-world experiences with SQL mirror and SharePoint stretch farm. 

Questions and Answers (DC stands for Data Center):  

  • What are the requirements for setting up a SharePoint stretch farm?
    For a stretched farm to work, there must be less than 1 millisecond latency between all the SQL Servers and the front-end Web servers in one direction, and at least 1 gigabit per second bandwidth.

  • Where to put the witness server?
    Put the witness server in the most reliable DC. If the secondary DC is more reliable and the connection between the DCs is at least as reliable as the DCs, you can put the witness in the secondary DC, otherwise, it should be in the primary DC. 

  • What is the minimum firewall requirements for the witness server?
    Open up the TCP port of the SQL mirroring endpoint for inbound traffic. 

  • Can I use SQL Express as the witness?
    Yes, you can.

  • How to test a stretch farm?
    You can simulate a failed SQL server by stopping its service using SQL Management Studio. Open the studio, connect to the SQL instance and right click on the instance name on the left pane and select Stop command. If the SQL instance is the principal, the mirror SQL instace will take over the principal role and the SharePoint farm will come back to service within a few seconds.

  • How to manually failover from one DC to another DC?
    Run following T-SQL command on the principal SQL server instance. You need to run the command for each database.

    ALTER DATABASE <your database name> SET PARTNER FAILOVER

  • What to do when I lose the primary DC?
    When you have the witness server in primary DC and the whole DC is gone (disaster happens), the mirror SQL instance will become "Principal - In Recovery" but won't serve data. To bring it back online so your SharePoint services resume, you need to run following T-SQL commands. You have to do it for all of your databases.

    ALTER DATABASE <your database name> SET PARTNER OFF

    RESTORE DATABASE <your database name> WITH RECOVERY

    Above commands break the mirroring partnership. You have to backup/restore the databases and resume the partnership after the primary DC is recovered.
    If the witness server is in the secondary DC, the mirror SQL instance will automatically become the principal server, your SharePoint farm will resume services in a few seconds.

  • What to do when the primary DC is recovered after a disaster?
    Assume you already ran the commands to break the mirror partnership. You have to follow the steps to resume it. Note the principal SQL instance is in the secondary DC at this point:
    1. On the principal server, back up all SharePoint databases.
    2. Copy the backup files to the mirror database server. The server should be running on stand-alone mode.
    3. Delete all the SharePoint databases from the mirror database server if they are still present
    4. Restore the databases to the mirror server.
    5. On the mirror server, set up the mirroring partnership.
    6. On the principal server, set up the mirroring partnership.
    7. On the principal server, set up the witness partnership.
    8. Test SharePoint and make sure it is still functioning.
    9. Failover to the original principal server if necessary.
    If the SQL instance in the secondary DC automatically took over the principal role, you don't have to do anything but just give them some time to sync up.

  • What happens if I lose connection between the primary and the secondary data centers?
    Assume both the DCs are still working but the connection between them is broken. Both the SQL instances will become principal. The one with access to the witness server will be serving data, the one without access to the witness will stop serving data and the SharePoint servers in that DC will stop working.

  • Can SQL mirroring work with cluster?
    Mirroring can work between SQL clusters, and a SQL cluster and a single server SQL instance. Mirroring, however, does not work within a SQL cluster.

  • What happens when my principal cluster fails over?
    SQL cluster failover always takes longer time than SQL mirroring witness interval. Therefore, SQL mirroring roles will be switched when the principal cluster fails over. You can failover back to the original cluster after SQL cluster failover is complete.

  • Does stretch farm affect SharePoint performance?
    According to my limited performance tests, no.

  • How do I set up failover for SharePoint features that do not have UI to set up failover instances?
    You can run following PowerShell commands in SharePoint PowerShell console on a SharePoint server:

    $db = get-spdatabase | where {$_.Name -eq "<database name>"}

    $db.AddFailoverServiceInstance("<failover SQL instance name>")

    $db.Update()

  • Does stretch farm protects all SharePoint functionalities?
    No. It does not protect functionalities with dependencies beyond 14-hive and databases. Some examples of the not supported dependencies are file shares, and external data sources. Some 3rd party SharePoint solution without failover capabilities will also not be protected. Contact your vendors for more information.

Zewei Song, Ph.D.
MCPD, MCITP, MCTS: SharePoint 2010, .NET 3.5
Enterprise Services, Microsoft Corporation