AlwaysOn availability groups introduce the new flexible failover policy for SQL Server instance health monitoring for the AlwaysOn availability group resource.
Legacy clustered SQL Server utilized a LooksAlive that performed a lightweight check of the SQL Server process health. The legacy IsAlive connected to SQL Server and executed a simple query.
AlwaysOn flexible failover policy offers a more comprehensive health monitoring model that is configurable. When creating or modifying an availability group, the failure_condition_level property can be set or adjusted. This property supports values of one to five, with one performing the most lightweight checks up to five which includes more comprehensive internal SQL Server health monitoring.
For more information on availability group flexible failover policy settings, see ‘Flexible Failover Policy for Automatic Failover of an Availability Group (SQL Server)’
The following discussion gives greater detail on the implementation of Windows cluster LooksAlive and IsAlive by SQL Server 2012 AlwaysOn failover cluster instance (SQLFCI) and availability groups.
Implementation of health monitoring Using LooksAlive and IsAlive
Once an availability group is created, the host process of the SQL resource DLL sets up health monitoring with SQL Server and begins periodic LooksAlive and IsAlive operations to satisfy health monitoring.
- The SQL resource DLL initiates the lease thread which begins communicating with a dedicated thread in SQL Server. For more information on the lease mechanism, see ‘How It Works: SQL Server AlwaysOn Lease Timeout’
The resource DLL establishes an ODBC connection and begins receiving the results of sp_server_diagnostics at a pace of 1/3 the availability group’s HEALTH_CHECK_TIMEOUT setting.
The Resource DLL begins health monitoring. In SQL Server 2012, LooksAlive and IsAlive perform identical checks to monitor SQL Server instance health.
Under normal operating conditions, LooksAlive executes every second, checking the health of SQL Server based on the availability group resouce’s failure_condition_level
- Under normal operating conditions, IsAlive executes every minute, checking the health of SQL Server based on the availability group resouce’s failure_condition_level.
The following describes the algorithm used by LooksAlive and IsAlive to detect SQL Server instance health.
1 The health monitoring cannot be disabled. Therefore, health check will always perform the checks associated with failure_condition_level=1 and at minimum, does the following:
Check the SQL Server process is running by performing a query service state operation.
Check the health of the lease mechanism.
2 If failure_condition_level is 2 or greater, in addition to the checks above:
Check that the last result set of sp_server_diagnostics was received within the time period defined by the availability group’s HEALTH_CHECK_TIMEOUT.
3 If failure_condition_level is 3 or greater, , in addition to the checks above:
Check the system component results returned from the last sp_server_diagnostics result set for error condition.
4 If failure_condition_level is 4 or greater, , in addition to the checks above:
Check the resource component results returned from the last sp_server_diagnostics result set for error condition.
5 If failure_condition_level is 5, in addition to the checks in above:
Check the query_processing component results returned from the last sp_server_diagnostics result set for error condition.
- If you create an availability group and do not specify the failure_condition_level explicitly, the failure_condition_level is set to 3.
- A failure_condition_level of 2 most resembles the legacy LooksAlive and IsAlive health monitoring settings for detecting failure.
- If you have more than one availability group hosted in a SQL Server instance the effective HEALTH_CHECK_TIMEOUT is the least of the configured values of the three availability groups.