In a customer scenario we saw a query against system views related to AlwaysOn taking a fairly long time.
FROM sys.availability_databases_cluster adc
INNER JOIN sys.availability_replicas ar
ON adc.group_id = ar.group_id
WHERE adc.database_name = 'db1'
Investigation showed that the query was predominantly waiting on HADR_CLUSAPI_CALL. As the name suggests, this is coming from the HADR (High-Availability-Disaster-Recovery) functionality of SQL, known as AlwaysOn. Also CLUSAPI_CALL is short (but not too short) for "Cluster API calls". In other words, whenever AlwaysOn calls into the WFCS for some work, it sets this wait type. You may wonder what type of "work" is being requested, i.e. what are these API calls?
Here is a list that an hour or two of source code research yielded (these are listed in no particular order):
- Check remote cluster
GetComputerNameEx() -not a cluster API per se
- Get resource name of virtual server
ClusterResourceControl(.... ) using control codes CLUSCTL_RESOURCE_GET_DNS_NAME and CLUSCTL_RESOURCE_GET_NETWORK_NAME
- Enumerate cluster resources
ClusterNetInterfaceControl() - CLUSCTL_NETINTERFACE_GET_NODE, CLUSCTL_NETINTERFACE_GET_NETWORK , etc.
- Read network Information from Cluster
ClusterNetworkControl() - CLUSCTL_NETWORK_GET_RO_COMMON_PROPERTIES , CLUSCTL_NETWORK_GET_COMMON_PROPERTIES
ResUtilFindDwordProperty( ...'role' ...)
ResUtilFindMultiSzProperty() - IPv4 and IPv6 Addresses, IPv4 and IPv6 PrefixLengths
- Closing handles on Cluster resources (not likely that any of the Close* functions were delaying things, but possible)
What to do?
If you encounter this issue, you should investigate the WFCS in more depth. Run Cluster Validation (but do it when SQL Server is not online) check your network adapters, cables, DNS resolution, correctness of IP addresses and subnets, disk resources.