Ask Learn
Preview
Please sign in to use this experience.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this blog we would discuss about behavior of AlwaysOn availability group where the secondary database doesn't come automatically when the primary instance goes down. The secondary database goes into Resolving state. On the failover cluster manager the resource appears in fail state.
While doing some research, we found that if we stop SQL Service manually on primary replica, it fails over to second node only once. Any further attempts of stopping SQL Service (to test auto failover) would not cause failover. We looked into cluster logs and found below.
From the Cluster Log:
------------------------------
00001b50.00001e54::2012/01/31-03:22:05.170 INFO [RCM] HandleMonitorReply: TERMINATERESOURCE for 'KEXPServices_Staging', gen(3) result 0.
00001b50.00001e54::2012/01/31-03:22:05.170 INFO [RCM] TransitionToState(KEXPServices_Staging) [Terminating to Failed]—>Failed.
00001b50.00001e54::2012/01/31-03:22:05.170 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (KEXPServices_Staging, Pending --> Failed)
00001b50.00000cb4::2012/01/31-03:22:05.185 WARN [RCM] Not failing over group KEXPServices_Staging, failoverCount 2, failover threshold 4294967295, nodeAvailCount 1.
00001b50.00000cb4::2012/01/31-03:22:05.185 INFO [RCM] Will retry online of KEXPServices_Staging in 3600000 milliseconds.
We looked further into resource in failover cluster manager and found the cause as below.
By default the failover properties for the Resource group is set as follows.
This means that when a failover happens and the resource fails to come online, the cluster service would try to get the resource online only once. It would try only once in 6 hours. Since the cluster service would check the log on how many errors have occurred in the past, it verifies the count set for “Maximum Failures in the specified period” and hence would fail if the number of failures equals or exceeds.
Workaround:
1. Maximum Failures in the specified period: set to 60
2. Period (Hours): set to 1
Here is the screenshot showing the properties
DISCLAIMER: The number shown in the blog are for testing purpose only. Please perform testing in your environment before implementing the values.
Written By:
Arvindh Kalidasan - Support Engineer,Microsoft GTSC
Reviewed By:
Balmukund Lakhani- Technical Lead, Microsoft GTSC
Anonymous
May 10, 2015
Excellent Presented.
Anonymous
May 15, 2016
Wonderful. Solved my new SQL 2014 AlwaysON build!!!
Please sign in to use this experience.
Sign in