This article aims to help anyone creating a Disaster Recovery (DR) design/strategy for SharePoint Server 2010. This advice will be based on my experience of designing a DR model and after conversations with experts such as Spencer Harbar, Microsoft IT and SharePoint Online.
Step 1. Do Some Research
Read articles like this. It will only take you 10 minutes but should provide a good background to DR in SharePoint 2010.
Step 2. Define What DR Means
Agree with your stakeholders what is meant by "DR". For example, does it mean making all content databases available when an entire sever farm goes down? If so, make this explicit and get people to agree to it. Call out things like one WFE server dying as NOT DR, but instead something else such as a "critical failure". You then know what you’re actually trying to design for.
Step 3. Define Recovery Point Objective (RPO) and Recovery Time Objective (RTO)
RPO refers to acceptable amount of data loss measured in time. As an example, your customer may want "to only lose the last 1 hour's worth of data in the event of a disaster".
RTO refers to the duration of time and a service level within which a service must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. As an example, a customer may ask for "the service must be up and running within the DR environment within 8 hours".
Step 4. Prioritise Your SharePoint Data in DR Terms
This is where it gets interesting. You should be able to download this spreadsheet I created. Within the spreadsheet I've tried to capture all of the different types of content that would exist within a SharePoint Server 2010 farm. You'll notice, there is potentially a lot of different types of data to DR and I'd wager you won't DR all of it. Therefore, I would advise categorising this in accordance with what your customer deems as High/Medium/Low importance. There is a column in the spreadsheet to allow you to enter this info. It is based on this you will later decide what to DR. This may also influence your logical and physical drive designs (i.e. you may split content out depending on if it's going to be recovered or not).
Step 5. Work Out the Size of Your Data.
The spreadsheet I created also consolidates this list of SharePoint 2010 databases and sizes (an excellent article). I'd advise modifying my spreadsheet inline with what your customer will have. For example, the size of the content databases will obviously vary between customers which in turn will impact log sizes etc. Same goes for things like User Profiles etc.
Step 6. Work Out the Bandwidth and Latency of Your Network Connections
As articles such as this note, the typical options for DR is to use log shipping or asynchronous mirroring, both of which require data to be sent over the network from your "live farm" to your "DR farm". So, it's a good idea to work out the bandwidth of your network connections and the latency. This will inform decisions on what data you can actually send.
Step 7. Realise What's Supported and Not Supported in Terms of Data Replication
My spreadsheet tries to show what the Microsoft stance is on replicating SharePoint Server 2010 data. You'll notice that for some db's Log Shipping is supported, but asynchronous mirroring isn't (and vice versa). Therefore, you probably can't just use one replication technology. Worth keeping this in mind.
Step 8. Learn from MSIT, SharePoint Online and Others
Before finalising on a DR strategy it's probably worth taking a minute to ask "Well, what do Microsoft do then?". It doesn't mean you have to copy what they do, as it's not going to be applicable in every scenario, but it should be a useful reference point. After speaking to some guys in SharePoint Online and Microsoft IT, I learned that:
· Both have a "live farm" and "DR farm"
· Both only send Content Databases over the wire from the "live farm" to the "DR farm"
· Both use DFSR in combination with log shipping to send data over the wire. This reduces the size of files sent and gives flexibility in terms of when data is replicated
· Powershell scripts are used to automate configuration of Service Applications. This means that the "live farm" and "DR farm" are always matching in terms of configuration. It also avoids sending service application data over the network
Step 9. Decide What to DR and How to DR
This is where it should all come together. By now you should know what data is important, how big the data is likely to be, how big your network connections are and what replication options are available etc. You should now be able to start designing a DR model. An example model, is below (it's also in a zip file attached to this post):
Step 10. Test, Deploy, Monitor and Re-design accordingly
As we all know, the lazy option is to do the design, give it to someone else and then not worry about it. However, it's probably a better idea to test the DR model you're proposing and once it's deployed keep an eye on it and re-design where need be. As an example, if the content databases grow five times larger than you anticipated then you may want to re-evaluate things.
I hope the above provides some interesting food for thought!
This article was authored by:
Microsoft Consulting Services UK