Introduction to Running Disaster-Recovery/Hot-Standby SharePoint Farms

So you’ve heard about the possibility of running two SharePoint farms in parallel to help keep SharePoint users happily online longer, and you want some more information about how it works and why you’d want it. This one is a nice and quick post on just the basics to sell the idea a bit - nothing too complicated.

First of all, why would you want to do this? This is pretty easy to answer; so you switch users between farms should you need to for any reason (see below for reasons you might want to).

With SharePoint hot-standby/DR you can go from this:

image

…to this:

image

…all relatively quickly, depending on the methodology you use to failover (see below). While one farm is being used by users the content-database(s) are in normal read/write mode, whereas the redundant farm just receives updates while in read-only (R/O) mode.

This scenario in general is called “SharePoint Disaster Recovery (DR)” or “hot standby SharePoint farms”. These days with SQL Server AlwaysOn the primary/secondary farm can interchange back & forth trivially; before, with SQL Server log-shipping, failing-over to the secondary was a one-way ticket unless databases were fully backed-up & restored on the old primary again.

Anyway, SharePoint DR farms keep users online much longer than single farms. Read on to find out why…

Why Would I Want to Be Able to Switch Farms?

You may want to failover users to the other farm for all sorts of reasons; planned and unplanned. Here’s some examples:

Planned failovers:

  • SharePoint patching (with no downtime). This is a big one given how complicated the patch process can be, and at some point you’ll have to patch SharePoint if only to stay supported.
  • Windows OS patching. These come out monthly and normally need a reboot, which may impact the whole farm (or not) so you may want to proactively move users to the other farm just in case.
  • Unexpected behaviour on a particular farm (not necessarily an error) – failover to the other while it’s investigated if it only happens on one farm.

Unplanned failovers:

  • Outages.
  • More outages.
  • It’s kinda hard to give a list of things that are unexpected. Just imagine this list is huge though and you’re starting to think the right way.

High Availability SharePoint

For any of the above reasons it’s basically really nice to have another SharePoint farm on hot-standby ready to take over serving users from the other farm, especially for heavily-used production environments. When there’s an unforeseen disaster with a SharePoint production environment, the disaster is often easily side-stepped if the setup in this article is implemented – imagine shouting bosses on a SharePoint outage if you want real inspiration for getting this setup.

A lot comes down to how much downtime is acceptable to the business in question. If a day or two of offline is acceptable (it rarely is) then you’ll not need this, but if bosses are insisting on 100% uptime (or as close as possible) then this is your safest way to getting that.

How Can You Switch Users between SharePoint Farms?

There are various tricks to do this magical, transparent switch:

  • DNS update of the A-record for the SharePoint URL.
  • Network-load-balancer (NLB) reconfiguration.
  • Reverse-proxy reconfiguration.

DNS updates are the simplest failover technique; you simply point the host-name at the other farm’s web-front-end/network-load-balancer. The downside is there’s a delay as the change is noticed by clients, depending on the DNS “time-to-live” of the record in question.

clip_image006

NLBs are nicer because the change is instant but may not always be possible. I’d recommend this for most setups.

Reverse-proxies are extra nice because it’s just a change in where requests are sent behind the scenes, but they require quite the setup to get working.

I may expand this subject if there’s interest – let me know in the comments if there is.

Does Microsoft Support Hot Standby SharePoint Farms?

Absolutely, but don’t just take my word for it, as, after all this is just a blog & not official documentation.

From TechNet we have “Choose a disaster recovery strategy for SharePoint 2013” where it states:

“In a hot standby disaster recovery scenario, you set up a failover farm in the standby data center so that it can assume production operations almost immediately after the primary farm goes offline.”

Specifically the article mentions that hot-standby farms need/have:

“A separate configuration database and the SharePoint Central Administration website content database must be maintained on the failover farm.”

…and…

“You can copy SharePoint products content databases to the failover farm by using asynchronous mirroring, asynchronous commit on an availability group replica, or log-shipping. “

There’s some info in there about what you need to have setup to do this; basically both farms needs identical patch levels & SharePoint solutions if the failover is to work (see this post on SharePoint DR with custom solutions for more info).

Something the article above talks about is replicating service-applications between SharePoint farms; something I’d be less inclined to do as if you have service-application problem on one farm, by definition this dilutes the ability to failover to side-step unexpected SharePoint behaviour. The only (important) exception to this is Managed Metadata which, if referenced in the content-databases should also be synchronised between farms via the same way as content-databases – something also perfectly supported.

Finally, as documented here SQL Server AlwaysOn works nicely with SharePoint 2013 for the content-databases synchronisation in both synchronous & asynchronous commit modes.

Just SQL Server AlwaysOn with SharePoint Databases?

Some enterprises avoid AlwaysOn because $reason; often because it’s “relatively new” (it is after all “only” 3 years old in production SQL Server code, over two major versions of the product) and not internally approved yet, or whatever.

Skipping to the end of this tiresome debate; what other database-syncing options are there?

  • SQL Server log-shipping (blogged about here).
  • SQL Server mirroring (not blogged about because the technology is old & likely to be retired in the next versions of SQL Server).

As this TechNet article points out, both alternatives are also supported for SharePoint 2013. Both are technically inferior to AlwaysOn for various reasons, but still possible & supported to use.

Wrap-up

That’s it for now – I hope that’s give you an idea on why this is a good idea & how this can work. If you want to set it up, check out my guide on how to setup SharePoint DR with SQL Server AlwaysOn here.

High Availability SharePoint

Cheers,

// Sam Betts