What really happens when HADR_CLUSAPI_CALL wait type is set?


In a customer scenario we saw a query against system views related to always on taking a fairly long time.

SELECT *
FROM   sys.availability_databases_cluster adc
INNER JOIN sys.availability_replicas ar
ON adc.group_id = ar.group_id
WHERE  adc.database_name = 'db1' 

Investigation showed that the query was predominantly waiting on HADR_CLUSAPI_CALL. As the name suggests, this is coming from the HADR (High-Availability-Disaster-Recovery) functionality of SQL, known as AlwaysOn. Also CLUSAPI_CALL is short (but not too short) for "Cluster API calls".  In other words, whenever AlwaysOn calls into the WFCS for some work, it sets this wait type. You may wonder what type of "work" is being requested, i.e. what are these API calls?

Here is a list that an hour or two of source code research yielded (these are listed in no particular order):

  1. Check remote cluster
    OpenCluster()
    GetComputerNameEx() -not a cluster API per se
    OpenClusterNode()
  2. Get resource name of virtual server
    ClusterEnum()
    OpenClusterResource()
    ClusterResourceControl(.... )  using control codes CLUSCTL_RESOURCE_GET_DNS_NAME and CLUSCTL_RESOURCE_GET_NETWORK_NAME
  3. Enumerate cluster resources
    ClusterEnum()
    ClusterNodeEnum()
    ClusterNetInterfaceControl()  - CLUSCTL_NETINTERFACE_GET_NODE, CLUSCTL_NETINTERFACE_GET_NETWORK , etc.
  4. Read network Information from Cluster
    OpenClusterNetwork()
    ClusterNetworkControl()   - CLUSCTL_NETWORK_GET_RO_COMMON_PROPERTIES , CLUSCTL_NETWORK_GET_COMMON_PROPERTIES
    ResUtilFindDwordProperty( ...'role' ...)
    ResUtilFindMultiSzProperty() -  IPv4 and IPv6 Addresses,  IPv4 and IPv6 PrefixLengths
  5. Closing handles on Cluster resources (not likely that any of the Close* functions were delaying things, but possible)CloseCluster()
    CloseClusterNode()
    CloseClusterGroup()
    CloseClusterResource()
    ClusterResourceCloseEnum()
    ClusterNodeCloseEnum()
    ClusterRegCloseKey()
    ClusterCloseEnum()
    CloseClusterNetwork()
    CloseClusterNetInterface()
    CloseClusterNotifyPort()

What to do?

If you encounter this issue, you should investigate the WFCS in more depth. Run Cluster Validation (but do it when SQL Server is not online)  check your network adapters, cables, DNS resolution, correctness of IP addresses and subnets, disk resources.


Comments (0)

Skip to main content