Stretched SharePoint Farms vs. Disaster Recovery SharePoint Farms

Something I want to clarify is the difference between these types of “high-availability” SharePoint setups. The difference is fairly simple actually but people often get the two confused so I thought I’d write something quick about the subject, because the two designs are not nearly the same, very deliberately.

In short, stretched farms increase availability for endpoint & network failure for a single SharePoint farm. Disaster Recovery (DR) farms on the other hand are an independent copy of the primary farm to take over in-case something bad happens on the primary site that impacts users at all. The fact the DR site isn’t the same farm as the primary means any breaks in the primary site shouldn’t impact the DR site by design, all being well.

Stretched-farms are for high-availability but disaster-recovery farms are, well, about disaster-recovery which isn’t exactly the same thing even if the desired result is the same – to keep SharePoint users happily using SharePoint even with fatal failures in the SharePoint farm.

Stretched SharePoint Farms

A stretched farm is basically a single farm that exists in two separate locations, pretty much exactly like a multi-subnet farm but presumably with service-application redundancy built in on both subnets so any single subnet can function on its own. This implies a multi-subnet SQL cluster of some kind too (or SQL mirroring at least) otherwise there wouldn’t be much point in having your SharePoint farm stretched across X subnets. This diagram shows why:

image

Taking out a subnet won’t take down the farm if you’ve spread out enough redundancy, in short. But it is one logical farm though with just the single configuration database so all the servers in both sites make up the single SPFarm.

Now that said, because we’re talking about just having one farm if a service-application dies we’re basically out of luck on any single farm, stretched or otherwise:

image

Stretched farms are still just a single farm, which is why although it’s great to have stretched-farms when possible, in my opinion the real investment value comes from having an entirely separate farm to failover to, running alongside your primary.

Disaster Recovery SharePoint Farms

DR farms on the other hand are different in design and purpose. First of all they’re logical copies of another primary farm (but not literal copies or backups of) that have their own logical copies of the same service-apps as the primary site running, but running backups of content-databases that are regularly shipped from the primary site. That means if a bad upgrade kills the search application on the primary site, we can just failover to the DR site because the DR site has its own search application with completely separate and independent databases etc, and this is a key reason you’d have a DR site. Having a stretched farm wouldn’t help you at all here.

Here’s how a DR + primary farm looks:

image

Here we see everything working as intended with content-updates arriving at the DR site, but everything else being a separate instance. For how to set this up, see this blog-post.

Any failure on the primary site can be mitigated by failing over to the secondary site:

image

This gives a lot more breathing room to figure out what’s going on with the primary site should there be any issues. This only works because we’re not mirroring the data on both sides though; each site keeps its own farm plus content copy only. Sure, if a huge problem with a content database happens we’re out of luck the same but we’ve at least hugely reduced the failure points for everything else. Setting up a SharePoint DR farm I covered @ https://blogs.msdn.com/b/sambetts/archive/2013/10/11/hot-standby-disaster-recovery-sharepoint-farms-basic-setup-amp-failover-high-availability-sharepoint.aspx if you're interested in how to do it.

And that’s it! I hope that’s helped someone at least clarify the ups & downs of each strategy.

Cheers,

// Sam Betts