Cluster Shared Volumes (CSV) is a clustered file system available in Windows Server 2012 where all nodes in a Windows Server Failover Cluster can simultaneously access a common shared NTFS volume. CSV has a distributed backup infrastructure which enables backups to be taken from any node in the cluster. In this blog I will discuss some considerations with how backups work with CSV which can help optimize the performance of backups.
When a volume level backup is taken, the cluster service returns all the VMs hosted on the volume(s) to the requester (backup application), including VMs running on non-requester nodes. The requester can choose to pick only the VMs that are running on the node where the backup was initiated (this becomes a local node VM backup), or it can choose to include VMs that are running across different nodes (this becomes a distributed VM backup). The snapshot creation has some differences based on the type of snapshot configured:
- Hardware snapshots – The snapshot will be created and surfaced on the node where the backup was invoked by the requestor, which need not be the case as the coordinator node. The backup will then be taken from the local snapshot.
- Software snapshots – The underlying snapshot device will be created via volsnap.sys on the coordinator node, and a CSV snapshot volume will be surfaced on every node that points to this volsnap device. On non-coordinator nodes, the CSV snapshot device will access the volsnap snapshot over SMB. It is transparent to the requestor as the CSV snapshot volume appears like a local device, all access to the snapshot will be happening over the network unless the requester happens to be running on the coordinator node.
When taking a backup of a CSV volume, it can be done from any node. However, when using software snapshots the snapshot device will be created on the coordinator node and if the backup was initiated on a non-coordinator node the backup data will be accessed remotely. This means that the data for the backup will be streamed over the network from the coordinator node to the node where the backup was initiated. If you have maintenance window requirements that require shortening the overall backup time you may wish to optimize the performance of backups when using software snapshots in one of the following ways:
- Initiate Backups on the Coordinator Node – When using software snapshots the snapshot device will always be created on the node which currently owns the cluster Physical Disk resource associated with the CSV volume. If the backup is conducted locally on the coordinator node, then the data access will be local and backup performance may be improved. This can be achieved by either initiating the backup application on the coordinator node or by moving the Physical Disk resource locally to the node before initiating the backup. CSV ownership can be moved seamlessly with no downtime.
- Scale Intra-node Communication – If you wish to have the flexibility of invoking backups with software snapshots from any node, to achieve optimized performance of backups scale up the performance of intra-node communication. It is recommended to use a minimum of 10 GB Ethernet or InfiniBand. You may also wish to use aggregate network bandwidth with NIC Teaming or SMB Multi-channel to increase network performance between the nodes in the Failover Cluster.
- To achieve the highest levels of performance of backups on a Cluster Shared Volume, it is recommended to use Hardware snapshots over Software snapshots.
- To achieve the highest levels of performance with Software snapshots on a Cluster Shared Volume, it is recommend either to initiate the backup locally on the CSV coordinator node or to scale up the bandwidth of intra-node communication.
Principal Program Manager Lead
Clustering & High-Availability