Backup solutions for Exchange 2007... (2)

Just to continue on from a previous blog on 'Backup solutions for Exchange 2007' in which I discussed some of the options for backing up Standalone Mailbox Role Servers I need to move on to the options for the next design which is as follows:

Design 2 - Standalone Mailbox Role Server with LCR enabled

...so just to reiterate " ..in the context of data protection I have to mention the importance of Service Level Agreements. Before we can even start designing a backup solution it is vital that we have a good understanding of what our recovery objectives are. We really need to try and pin down firstly whether Outlook is a critical application in terms of our ability to communicate via email (so can we use a dial-tone recovery?) and secondly whether the data held within our databases is critical to the business (so we need to plan for a standard database recovery?). If it is a yes to both then we need to understand how long our business can be without access to Outlook and our Exchange data. Ok so this is a very difficult exercise and if the business will not dictate this then we should be directing the business by coming up with a number ourselves and then obtaining their agreement on this. Once we've got a number, like (at its most basic) 4 hours to restore the full service including all data, then we have something to aim for... "

The main options are as follows:  (These are the same as with the previous design but with some minor changes specifically when considering the use of VSS.)

  1. Traditional streaming backup to tape
  2. Traditional streaming backup to disk and then to tape
  3. Snapshot backups based on the Volume Shadow Copy Service

Traditional streaming backup to tape

This is the standard form of backup solution that all Exchange Administrators will be familiar with in some form or another.  This is possible through the ESE api and is supported by NTBackup and the new System Centre Data Protection Manager 2007 (DPM), as well as numerous 3rd party products from our partners.  There are a number of advantages and disadvantages to using traditional streaming backup to tape when you are using standalone mailbox role servers.  These are summarized as follows:

Advantages Disadvantages
Mature technology with numerous options in terms of software and hardware Will impact the performance of the server during the course of the backup so needs to be considered particularly with companies providing a 24 hour service
Can run backups against multiple storage groups concurrently (NTBackup would required multiple backup jobs to do this) Need to be aware of its impact on IS Maintenance.  Your backup window should be staggered to avoid the IS Maintenance period
Generally simple to setup Can be relatively slow, particularly when compared to VSS snaps or streaming backup to disk
  Can be relatively expensive in terms of the number of tapes that are required
  Full backup each night is generally recommended to be able to meet most recovery objectives (alternative could be weekly fulls and daily differentials)
  Often restricted to relatively small databases in order to meet our recovery SLA's

Traditional streaming backup to disk and then to tape

Very similar in terms of advantages and disadvantages above; the differences being that the speed of any backup is going to be faster, and therefore the impact of your backup on performance and IS Maintenance will be minimized.  Also it is likely that if you need to restore your database from last night it will most likely still be on disk and therefore offline restores to a new storage group then becomes an option (making use of database portability). (*Be careful with public folders though as these are not 'portable'.) Also a traditional restore from disk is likely to be relatively fast, especially when compared to a restore from tape.  The pro's and con's are as follows:

Advantages Disadvantages
Mature technology with numerous options in terms of software and hardware Will impact the performance of the server during the course of the backup so needs to be considered particularly with companies providing a 24 hour service
Can run backups against multiple storage groups concurrently (NTBackup would required multiple backup jobs to do this) Need to be aware of its impact on IS Maintenance.  Your backup window should be staggered to avoid the IS Maintenance period
Generally simple to setup Can be relatively slow, particularly when compared to VSS snaps**
Generally faster than streaming backups to tape** Can be relatively expensive in terms of the number of tapes that are required
  Full backup each night is generally recommended to be able to meet most recovery objectives (alternative could be weekly fulls and daily differentials or even incrementals)
  Often restricted to relatively small databases in order to meet our recovery SLA's
  Requires additional disk space

**The speed of your backup and restore will be determined by a number of factors including network, tape device, RAID type,backup software etc etc.. To give you an idea MSIT used to use NTBackup to back up there Exchange 2003 data to disk and then tape and achieved the following:

  • "Individual backup throughput per storage group can be sustained at approximately 1.2 GB per minute
  • Total throughput can be sustained at approximately 4.8 GB per minute per Exchange virtual server with four concurrent backups running.
  • Restore rates can be achieved in the range of 2 GB per minute for a disk-to-disk-based restoration. This throughput is achievable once the disks being written to are not under any form of production load."

This information was taken from a 'Note on IT' article.

Snapshot backups based on the Volume Shadow Copy Service

The third option which many administrators might not be so familiar with is to take snapshot, 'point in time' backups of your Exchange data.  Snapshots are supported to run against the active copy of a storage group although continuous replication does now enable us to offload snaps to the replica database.  In a deployment using LCR VSS snaps can be taken of the replica database and not necessarily of the active database which has the advantage of reducing the impact of the snap on the active database.  Of course in an LCR configuration there is a less of a reduction than if the VSS requestor was running against an entirely separate node with CCR for example.  Products such as DPM will also take care of transaction log truncation of the active database even when the snap is operating against the replica and will 'follow' the replica database.  So in the event of a failover when the replica becomes the active, DPM will now protect the formerly active database and new replica.  This is configurable in DPM so administrators can choose for this behaviour to be overridden.

Support for VSS has been in place since Exchange 2003 but in my experience has not been widely adopted. (Indeed NTBackup does not provide support for 'Exchange aware' snaps.)  VSS allows files to be backed up when they are still open essentially by pausing disk I\O.  On an Exchange Server a read only copy of the Exchange data is copied to disk which will typically take a couple of seconds and will almost imperceptibly interrupt Outlook. We can take snaps every hour for example and so will be able to restore to multiple points in time according to how many spans we take. A good explanation of how this works in detail can be found here. Exchange 2007 has improved support for VSS including, for example, the ability to restore VSS backups to alternative locations (database portability again) but the technology is essentially the same.

Again there are numerous partner products that can provide you with the ability to take snapshots but DPM is the product which I think will really interest administrators who want to re-evaluate their backup solution.

DPM's approach is described as follows:

"DPM uses a combination of transaction log replication and block-level synchronization in conjunction with the Exchange VSS Writer to help ensure your ability to recover Exchange Server databases. After the initial baseline copy of data, two parallel processes enable continuous data protection with integrity:

· Transaction logs are continuously synchronized to the DPM server, as often as every 15 minutes.

· An “express full” uses the Exchange Server VSS Writer to identify which blocks have changed in the entire production database, and send just the updated blocks or fragments. This provides a complete and consistent image of the datafiles on the DPM 2007 server. DPM 2007 maintains up to 512 shadow copies of the full Exchange Server database(s) by storing only the differences between any two images.

Assuming one “express full” per week, stored as one of 512 shadow copy differentials between one week and the next, plus seven days x 24 hours x 4 (every 15 minutes), DPM 2007 provides over 344,000 data consistent recovery points for Exchange."

Using VSS obviously has a number of advantages and disadvantages:

Advantages Disadvantages
Backup can be offloaded to the replica database reducing the impact on the active database and clients alike** Recovering historical data from a point in time prior to my first snap means I need to retain my tape devices - say beyond 7 days and up to 7 years
Might be able to eliminate or at least significantly reduce any reliance on tape based backups If I am mandated to keep data offsite I may need to retain my tape devices of replicate my backups offsite
Very fast backup (after the 1st) Might require large amounts of additional disk space
Potentially very fast recovery Often a little more complex to design and configure
Only one backup per storage group but with E2K7 a 1:1 ratio of databases:storage groups is recommended and you can run multiple vss snaps in parallel  
Faster backup and recovery times means that databases can be larger so therefore fewer servers might be required  
IS Maintenance will not be interrupted as snaps taker far less time that traditional streaming backups  
Aside from the first full backup there is little performance impact for clients  
A solution like DPM means that control of most backups and recoveries is controlled by the messaging team and not by a separate team which can confuse and delay recoveries**  

**Depends on the solution that is deployed as to whether you can take advantage of this.

If you are deploying stand alone mailbox role servers and are not taking advantage of Exchange 2007's continuous replication technology, or its equivalents, then I believe there are still significant advantages to deploying VSS snaps with something like DPM, particularly when you consider that DPM can be deployed to protect much more than just your Exchange data in the same way.  However I think the decision will often depend on how critical your data recovery times are.  If they need to be fast then you might need to go VSS.  If not then dial-tone recovery and standard restore seems to be the most obvious recovery path.  In which case the choice is then between disk to disk or direct to tape.  My preference is disk to disk but then I'm not stumping up the money for the disks. I am going to put together a blog on recovery scenarios because it is very important when discussing which solution to implement that you understand what you are protecting your data from. For example providing 7 days worth of data to disk with numerous recovery points sounds great but when in our deployments are we going to make use of this technology.  Do we ever need to recover to 2:45am 4 days ago?

Design 3 - Two node MNS Cluster with CCR enabled

....to follow.