SharePoint 2013 on SQL Server AlwaysOn, 2014 Edition

SQL Server 2014 is out bringing with it enhanced AlwaysOn clustering, and is indeed already supported by SharePoint 2013 with at least the SharePoint 2013 May 2014 cumulative update installed. Some new benefits in SQL Server 2014 are the reliability improvements for when connectivity is lost between nodes and also the ability to add a SQL Server instance hosted in Azure into the replica set for the ultimate off-site backup of your farms’ databases. Both of these upgrades are excellent news if you have to keep SharePoint safe; SQL Server is now more resilient than ever against failure and you can leverage that for SharePoint too.

Edit: if you want to use AlwaysOn for SharePoint Disaster Recovery, then you'll want to have a look at this article - https://blogs.msdn.com/b/sambetts/archive/2015/04/24/setting-up-sharepoint-disaster-recovery-sites-with-sql-alwayson.aspx

Either way this new version of SQL Server is a good excuse to streamline my previous article on the subject as well as introducing the new features. Please bear in mind, this guide is not for multi-subnet SQL clusters as SharePoint has issues using them by default, even though it is possible.

Hosting Applications on AlwaysOn or Entire Farm?

It’s important to point out that there’s two ways of using SQL Server AlwaysOn; entire farm, or just the applications for applications shared between X farms via the replication that this technology uses, in a similar way to how SQL Server log-shipping works. Both high-availability strategies have pros & cons but for this guide we’ll do the entire single farm with all databases on the AlwaysOn cluster as that gives the most reliability.

SharePoint 2013 on AlwaysOn Setup Setups

This has been covered in greater detail before but just for convenience, here’s a quick & compact checklist for setting up an entire farm on AlwaysOn. As before, the process is basically the same:

  • Create farm
    • Prepare SharePoint 2013 with SP1.
    • Create SQL alias & point to single SQL node with cliconfg.exe.
    • Create new farm with all necessary databases & service-apps via SQL alias.
  • Prepare SharePoint databases for AlwaysOn
    • Change recovery mode to “full” for databases that have simple recovery only.
      • All search databases.
      • All user profile databases.
      • Usage database.
    • o Backup all SharePoint databases.
  • Prepare SQL cluster
    • Create new AlwaysOn availability group (after enabling SQL Server for AlwaysOn on each node).
      • Add replicating partners, including instances in Azure if needed.
    • Create listener.
  • Use created AlwaysOn cluster with SharePoint
    • Update SQL alias on each server to use listener.
    • Test SQL failover with SharePoint.

Setup Steps with Screenshots

Once you’ve created all the databases that’ll be in the farm, the first thing we’ll need to do is convert any databases in “simple” recovery mode to “full” recovery as AlwaysOn doesn’t work.

clip_image002

All the search & the user-profile databases will need converting. All databases will need a full back-up too.

Next up; create the availability group. The wizard does a decent job of walking you through this.

clip_image004

Any problems with any databases will be shown here; it’s probably worth fixing anything before continuing. Now to pick end-points for the replication sets:

clip_image006

For the ultimate offsite backup, you can store databases in Azure (SQL Server instances hosted in Azure, not SQL Azure which is something else) just like any other instance.

Assuming you’re using x2+ standalone instances (as opposed to 2+ failover-cluster instances), you’ll be able to do a full initial sync with the wizard. In the guide before that wasn’t possible because each instance was itself a failover cluster, which makes this stage somewhat more hassle than normal even if it is a more solid design as it guaranteed each SQL instance uptime.

clip_image008

Click next & start validation…

clip_image010

You’re now ready to start the initial sync!

Once done, if it all went OK you should see this rather happy sight:

clip_image012

Remember, by default SQL Server uses port 5022 for the replication needed (configurable earlier in the wizard for each endpoint) – check firewalls allow this extra port. Also for consideration are the permissions – if SQL Server uses default accounts (not recommended) then each AD machine accounts will need to be added to the security of the other server, which isn’t easy considering the GUI doesn’t allow you to select AD objects of type “machine”. If the SQL Server service is set to use an AD user login and each node is using the same one then you should be set already.

Setup Listener and Reconfigure SharePoint

All we’ve done so far is setup an advanced mirroring system for the databases, basically. To fully use the power of SQL Server AlwaysOn we need an AlwaysOn “listener” too – this is basically an endpoint or address to connect to that transparently redirects the communication to the active SQL Server node, the point being you don’t have to care about what node is active.

In SharePoint land, this is relatively easy thanks to our SQL alias we’ve got setup that already does client-side aliasing for us; we just need to update this to point to the AlwaysOn listener instead of the SQL Server instance name (sql-n1) and we’re good to go.

Run “cliconfg.exe” again & once IIS & SPTimerV4 (SharePoint Timer service) has been stopped, update the alias, save and restart the services again. You should be good to go; the data’s exactly the same and SharePoint won’t even realise it’s using a different “endpoint”, but this is critical to make sure SharePoint can handle a cluster failover automatically.

    

Testing a Failover

As ever with high-availability setups, it’s difficult to be 100% sure of what’ll happen on a failover until it happens, such is the nature of these complex systems. That’s why manually failing over on a fairly regular basis is a good idea to do; to prove you still can should the time come that you really need to failover.

If you’ve followed this guide so far and done nothing else then when you failover to the other SQL instance, although the databases will be fine, SharePoint will die horribly as all SharePoint’s connections are denied to this new SQL Server, which makes sense given we’re only synchronising databases and not logins too.

clip_image002[1]

Lot’s of errors about logins failing from the timer service and IIS.

Create the logins on the other SQL instances too; just on the server itself; no need to grant the users access to the databases as the permissions are already added by virtue of the database being replicated, but the server itself needs to know about the SharePoint service-accounts before SharePoint will work with this 2nd instance.

clip_image003[1]

These users need to be created on every SQL Server instance that SharePoint may need to use, if you want SharePoint to survive a failover that is.

Failover Health Checks

If SQL Server has had a failover, there’s always a period when the client application (SharePoint) will lose its connectivity to SQL as everything’s automatically switched to a new SQL instance. This is pretty much inevitable to some extent at least but it’s good to know how to make sure everything’s failed-over correctly, especially on multi-subnet clusters where there may be multiple DNS servers. Failovers are complicated thing and some basic checks are always recommended just to make sure everything worked ok.

  • DNS update – the name of the listener will have a new IP address after a failover.
    • Check for new A-record with “nslookup [listener_name] ” replacing “ [listener_name] ” with the name of the AlwaysOn listener. You should see the new active nodes’ IP address as the value of the “A” record

clip_image004

    • You may need to flush DNS cache with “ipconfig /flushdns” so the SharePoint machine picks up the new A-record.
    • Restarting IIS/SPTimerV4 can help SharePoint pick-up the new address.
  • Database health:

    • Make sure the right primary node thinks it’s the primary – refresh the availability group to check.
    • Also make sure each database says “Synchronized” on the new active node too (it should say the same on the passive node(s) too).

clip_image005

  • SQL Server instance health – get the (now passive) node back in the group & re-synchronised again – verify that happens.
    • If for some reason the passive node can’t join, it may be necessary to remove the server from the availability group and re-add it.

Performing a planned failover is highly recommend on a regular basis, during a maintenance window (as it always causes disruption). It’s the only way to be 100% sure that the failover will be successful, which is the entire point of doing this. The smallest misconfiguration can cause the cluster to fail and it’s only by proactively testing it for real can we be sure it’ll work when it counts – when a fatal error occurs and the whole system needs to cope.

Wrapping Up

Never in the history of databases has it been so easy to setup a cluster that’s so reliable before, which is one brilliant thing about SQL Server. The new 2014 version improves this reliability even more-so; let’s use it to make SharePoint more reliable too!

Cheers,

// Sam Betts