Protecting Exchange data with DPM

Microsoft System Centre Data Protection Manager (DPM) is one of the suite of System Centre products to be released last year. DPM offers a new approach to backing up our Exchange data and introduces a number of new alternatives to traditional approaches to backup, particularly when used alongside Exchange Server 2007. Before you can understand what these new approaches are it is useful to understand how DPM works when used to protect Exchange server data.

RTO’s, RPO’s and Retention Ranges

To understand how to configure DPM appropriately it is important to work towards three key pieces of information; a Recovery Point Objective (RPO), a Recovery Time Objective (RTO) and a Retention Range. In other words how much data loss is acceptable to a point in time; for how long can I be without service and or data; and finally how long do I want to keep protected data. It might be necessary to work towards multiple RTO’s and RPO’s. For example an RTO for recovering from database corruption might be shorter than an RTO defining a time objective for recovering from site failure. Suffice to say it is important to understand what all our objectives are for data protection before we can begin to configure DPM. It is also worth noting here that it in some cases an RTO or RPO need not necessarily be met by DPM. It may be that the use of continuous replication or an extended ‘deleted item retention’ period is sufficient.

Once we have a set of RTO’s and RPO’s we need to understand how long to keep hold of this data. For most implementations it is likely that two retention ranges will need to be defined; one for the short term and another for longer term data retention. The most likely scenario is a short term retention range of say one week where DPM will retain Exchange data on disk. A longer term retention range might define a strategy for storing data to satisfy industry regulations governing long term data retention; 7 years for example. Longer term data retention might be to tape where the data is then moved offsite. Within DPM one or more Protection Schemes define our protection strategy and a Protection Group, in the context of Exchange protection, is one or more Storage Groups to which a particular protection scheme applies.

Protecting Exchange Data

There are 2 main processes used by DPM to protect Exchange mailbox and public folder data which I will focus on in this blog – the Express Full Backup and the Transaction Log Synchronisation. These ensure that changes to database pages and newly committed transaction logs, are regularly synchronised with the data held by DPM.

Before either of these processes takes place however, we need to take a copy of the data held in volumes on the Exchange Server to the DPM Server to create the initial baseline. There are actually a number of ways of achieving this first copy. When creating a protection group DPM will offer the administrator the chance to let DPM take a VSS copy immediately, or at a later point in time, or the chance to manually copy the data to the DPM volumes. Once this process is complete ongoing protection will be enabled with a combination of Express Full and Transaction Log Synchronisation as follows:

1-InitialCopy

Express Full

The term ‘Express Full’ is used by DPM to highlight the unique characteristics of this method of data protection. It is an express backup since it only takes changes and is therefore fast but the restorable end result is the same as the classic full backup; hence ‘Express Full’.

The Express Full backup is the process by which DPM ensures that changes to pages within the Exchange database and committed transaction logs are copied to the DPM server ensuring that the data held there is consistent and recoverable. A combination of a volume filter, a volume bitmap and the VSS ‘Copy-On-Write’ method is used to create ‘shadow copies’ of the Exchange database and transaction logs. Each time a change to a disk block is made on the database or transaction log volume, the fact that the block has changed is recorded in a volume wide bitmap. This is merely a quick ‘bit flip’ and does not impact the performance of the protected server.

Periodically, typically each night, an express full backup is initiated by the DPM server and sent to the DPM agent (VSS Requestor) on the Exchange Server. The Exchange VSS Store Writer (there are now two for Exchange Server 2007; one built into the Store ‘the Store Writer’ and one built into the replication service ‘the Replication Writer’ discussed in more detail later) ensures that the data on disk is consistent and administrative actions and write operations against database and transaction log volumes are then suspended. Standard VSS snapshots of the protected volumes are now taken.

The snapshot itself will be completed in a matter of seconds and is a combination of two main processes. The first is to build a volume filter which matches block level changes with database pages and transaction logs to identify what data is to be backed up. The second is to start tracking changes that will occur during the transfer of data to DPM. Once this process is complete write I\O is thawed and the Exchange Server can continue to serve write requests.

At this point transaction logs or pages that we know have changed as a result of their appearance on the volume filter begin to be sequentially copied to the DPM server. ...so what happens when clients now want to create a new calendar appointment or write a new email? This is where VSS ‘Copy-on-write’ comes in. When a change to the volume occurs during the time in which changes are being backed up to DPM, and critically, before the change is physically written to disk, the disk block that is about to be modified is read and written to a difference area. (Remember the .pat file?)

By doing so DPM ensures that it has a record of all the changes to the original data held on disk blocks which have changed since the last express full backup plus a record of the blocks that would have been changed\overwritten during the backup. When the backup is completed the page-level integrity of the information store database is verified (or checksummed) using eseutil with the /k switch. Transaction logs are truncated and the backup completes successfully if no discrepancies are found. If the integrity check fails the backup is aborted and the transaction logs are not truncated.

2-ExpressFull

Transaction Log Synchronisation

The Express Full operation would typically occur every night but a short term protection scheme would also define how often transaction log synchronisation's would occur. These occur by default every 15 minutes and use a VSS incremental synchronisation to ensure that committed, sequential transaction logs are copied to DPM.

Again the Exchange Writer ensures that the data on disk is consistent and any committed transactions held in memory are flushed to disk. Administrative actions and write operations against database and transaction log volumes are then suspended. The VSS writer is notified and the incremental snapshot is taken. Once released any changed transaction logs will be transferred to the DPM recovery point volume. Transaction logs are truncated and the backup completes successfully if no discrepancies are found.

3-TLogSync

A combination of Express Full and transaction log synchronisation's ensures that the DPM server contains a complete consistent copy of the database together with a corresponding set of transaction logs in sequence, providing the administrator with multiple recovery points from which a restore can be initiated.

4-DataHistory

The above scenario is DPM protecting data on a standard mailbox role server. Of course with Exchange Server 2007 it is now possible to run a mailbox role server with continuous replication – LCR, CCR or with Service Pack 1, SCR. The first point to make is that it is currently not supported to directly protect an SCR target database or an LCR replica. Protecting either the LCR active database or the CCR replica is possible however and should be the preferred method for most DPM deployments where either of these continuous replication methods is being used. Protecting the replica as opposed to the active database in a CCR implementation is made possible by the introduction of the Exchange Server 2007 VSS Replication Writer. Essentially the Exchange VSS Store Writer is responsible for the backup and restore of the active database and the Replication Writer is responsible for the backup of the replica database. A restore of a backup of a replica database, however, is controlled by the VSS Store Writer.

Some final thoughts...

Exchange data protection has in the past fallen somewhere between the messaging team and the teams that manage the backups and tapes in general. DPM should enable many more messaging teams to take over responsibility for the protection of the data that they are responsible for and I believe does now start to introduce an opportunity to reconsider traditional approaches to protecting Exchange data. For example how does DPM fit into an environment where CCR and SCR are already deployed? Do we still need our tape infrastructure? A combination of continuous replication and DPM protection to disk might perhaps satisfy all of our recovery objectives without the need for traditional tape based solutions. For many companies it won’t, but the decision to implement DPM might be a good point to generate accurate and viable RTO’s, RPO’s and retention ranges for your implementation to stimulate these types of discussions.