Asynchronous or Synchronous Replication..?

One of the fundamental decisions that you might make in the early stages of an Exchange Server 2007 design is whether you favour the use of asynchronous or synchronous replication of Exchange Server data, particularly if you require data centre resilience. Understanding the pro’s and con’s of each might actually lead you further towards a final design than you’d expect.

To put everything into context a quick look at Wikipedia gives the following definitions:

Synchronous replication - guarantees "zero data loss" by the means of atomic write operation, i.e. write either completes on both sides or not at all. Write is not considered complete until acknowledgement by both local and remote storage.

Asynchronous replication - write is considered complete as soon as local storage acknowledges it. Remote storage is updated, but probably with a small lag. Performance is greatly increased, but in case of losing a local storage, the remote storage is not guaranteed to have the current copy of data and most recent data may be lost.

Exchange Server 2007 introduces continuous replication (local, cluster and standby).  These are based on asynchronous replication of transaction logs.  i.e. data is committed to one copy of a database largely independently of being copied, inspected and replayed into a second copy.  The state of each database on a busy production server is likely to be very slightly different and hence there is a risk of data loss if you lose the first copy. (Exchange Server 2007 is designed so that that risk is kept to a minimum.)

Where you may run into synchronous replication technologies specific to Exchange Server is SAN storage replication (the definitions earlier in this blog are specific to storage replication...) or maybe when it is used by 3rd party replication software specific to Exchange disaster recovery.  The technology is fundamentally similar though in my experience. (..obviously the technologies are very different but the pro’s and con’s seem to follow along the same lines.)   SAN storage vendors also provide asynchronous storage replication solutions but generally these are not used for Exchange 2007 Server deployments as the technology is provided within Exchange and so doesn’t always make sense for use with Exchange 2007.

The following table shows some of the major pro’s and con’s of the use of these two types of replication…

  Asynchronous Replication Synchronous Replication
Data Loss By its nature there may be some data loss [1] Some solutions will guarantee no data loss [1]
Resilience 2 failures are required for there to be loss of service [2] Failures which lead to data corruption will not be replicated to the second copy of the data [3] A single failure could lead to the loss of the service [2] Failures which lead to data corruption are faithfully replicated to the second copy of the data [3]
Cost Asynchronous replication solutions are generally more cost effective Synchronous replication tends to be considerably more expensive to buy and manage than a comparable asynchronous solution [4]
Performance Less dependent on very low latency, high bandwidth network links between units of storage Dependent on very low latency, high bandwidth network links between units of storage [5]
Management Asynchronous replication within E2K7 is native technology [6] Synchronous replication solutions will introduce 3rd party software into your design [6]

[1] Some synchronous replication solutions do make guarantees about zero data loss and if they are well managed then this is perfectly possible. Conversely an asynchronous replication solution cannot make those same guarantees. However with Exchange 2007 there are several mitigations to this which means that the amount of data loss and the chances of there being data loss are tiny. (For example, where LCR and CCR are deployed the transport dumpster {See CCR} ensures that the potential data loss is minimal and should be restricted to read\unread message status, new, incomplete contact and calendar entries for example… Understanding Lost Log Resilience is also important here.) …crucially deploying a solution based on asynchronous replication will meet the vast majority of SLA’s that cover a messaging solution.

[2] With asynchronous replication the two units of failure are highly independent. No single failure should lead to a loss of service (CCR especially).

[3] The benefit of not having to wait for a write to the second unit of storage means that more data verification can take place before data is replayed into the second copy of the data. Conversely of course as we’ve seen this means that there is more of a risk of data loss.

[4] …a “write is not considered complete until acknowledgement by both local and remote storage.”

[5] This is a bit of a generalisation but is my experience and is particularly relevant to SAN based storage replication technologies. This is something that you’d obviously have to verify with each of your vendors.

[6] The introduction of 3rd party software software into a design does require additional skills to support and manage which can complicate and delay recovery. It is important that the recovery path and time to recovery is understood before a decision about which solution is made. This may not be an issue if you already have these skills in-house and do regular fire drills.

The reason for writing this blog is that I’ve sat in on a  number of early discussions about which Exchange 2007 solution to deploy.  I reckon by going back to some basic fundamental principles and understanding your companies preference, you can quite quickly put together a framework for a design…  Hope this helps…

Some related blogs I’ve written recently:

Recovery Scenarios for E2K7…..I
Recovery Scenarios for E2K7…..II
Dynamic Disk Provisioning…
Some more thoughts on SAN v DAS. Is it actually time to consider DAS?
SAN v DAS

If you haven’t seen it already this is a great whitepaper: White Paper: Continuous Replication Deep Dive