When conducting backups of a Windows Server 2012 or later Failover Cluster using Cluster Shared Volumes (CSV), you may encounter the following event in the System event log:
Log Name: System Source: Microsoft-Windows-FailoverClustering Event ID: 5120 Task Category: Cluster Shared Volume Level: Error Description: Cluster Shared Volume 'VolumeName' ('ResourceName') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.
Having an Event ID 5120 logged may or may not be the sign of a problem with the cluster, based on the error code logged. Having an Event 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR or the error code c0130021 may be expected and can be safely ignored in most situations.
An Event ID 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR is logged on the node which owns the cluster Physical Disk resource when there was a VSS Software Snapshot which clustering knew of, but the software snapshot was deleted. When a snapshot is deleted which Failover Clustering had knowledge of, clustering must resynchronize its state of the view of the snapshots.
One scenario where an Event ID 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR may be logged is when using System Center Data Protection Manager (DPM), and DPM may delete a software snapshot once a backup has completed. When DPM requests deletion of a software snapshot, volsnap will mark the software snapshot for deletion. However volsnap conducts deletion in an asynchronous fashion which occurs at a later point in time. Even though the snapshot has been marked for deletion, Clustering will detect that the software snapshot still exists and needs to handle it appropriately. Eventually volsnap will perform the actual deletion operation of the software snapshot. When clustering then notices that a software snapshot it knew of was deleted, it must resynchronize its view of the snapshots.
Think of it as clustering getting surprised by an un-notified software snapshot deletion, and the cluster service telling the various internal components of the cluster service that they need to resynchronize their views of the snapshots.
There are also a few other expected scenarios where volsnap will delete snapshots, and as a result clustering will need to resynchronize its snapshot view. Such as if a copy on write fails due to lack of space or an IO error. In these conditions volsnap will log an event in the system event log associated with those failures. So review the system event logs for other events accompanying the event 5120, this could be logged on any node in the cluster.
- If you see a few random event 5120 with an error of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR or the error code c0130021, they can be safely ignored. We recognize this is not optimal as they create false positive alarms and trigger alerts in management software. We are investigating breaking out cluster state resynchronization into a separate non-error event in the future.
- If you are seeing many Event 5120’s being logged, this is a sign that clustering is in need of constantly resynchronizing its snapshot state. This could be a sign of a problem and may require engaging Microsoft support for investigation.
- If you are seeing event 5120’s logged with error codes other than STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR, it is a sign of a problem. Be due-diligent to review the error code in the description of all of the 5120’s logged be certain. Be careful not to dismiss the event because of a single event with STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR. If you see other errors logged, there are fixes available that need to be applied. Your first troubleshooting step should be to apply the recommended hotfixes in the appropriate article for your OS versionRecommended hotfixes and updates for Windows Server 2012-based failover clusters
Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters
- If an Event 5120 is accompanied by other errors, such as an Event 5142 as below. It is a sign of a failure and should not be ignored.
Log Name: System Source: Microsoft-Windows-FailoverClustering Event ID: 5142 Task Category: Cluster Shared Volume Level: Error Description: Cluster Shared Volume 'VolumeName' ('ResourceName') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.
Principal PM Manager
Clustering & High-Availability