In this series of blog posts we will review the Resource State Machine from the Resource Host Server (RHS or Resource Monitor) point of view. Hopefully it will help you to design, develop and debug the Resource DLL you are developing to integrate your application with Windows Server 2008 & 2008 R2 Failover Clustering.
Let’s first review what components are involved in the process of managing of the resource state and how they interact with other components.
If you need to integrate your application with Failover Clustering you have to write a Resource DLL. Basically this will allow your custom resource to interact with the cluster and perform common functional tasks, like bringing the resource offline or online. The Resource DLL is responsible for translating cluster resource state transition commands to commands specific for that application, and for bringing that application to a state corresponding to the current resource state.
The image below shows a 2-node cluster interacting with resources.
Both nodes have the Cluster Service running. The cluster has one group configured, Group 1. In the group we have three resources. Resource 3 depends on Resource 2, and the Resource 2 depends on the Resource 1. You will probably see this same relationship when you have a group with an Application (Resource 1) that depends on a NetName (Resource 2), which depends on an IP Address (Resource 3).
Resource Control Manager (RCM)
In the Cluster Service there is a component called Resource Control Manager (RCM). RCM instances on all the nodes will negotiate who owns the group. The group owner will bring the group to its “persistent state”. The persistent state is the last state user put the group to, which could be Online or Offline. All other nodes that do not own this group will not bring this group online, to ensure it is online on at most one node at a time. If the persistent state of the Group 1 is online then Node 1 will bring it online, and the Node 2 will make sure it is offline.
To bring the group online, RCM will bring online all the resources in the group in the order of resource dependencies. First RCM will send online command to the Resource 3. As soon as Resource 3 is online RCM will send Online command to the Resource 2, and as soon as Resource 2 is online the RCM will send Online to the Resource 1. Once Resource 1 is online, the Group 1 is online. This assumes that no failures have happened during the online process.
Resource Host Service (RHS)
Changing a resource state requires interacting with the actual application. To achieve that, RCM will send state transition request to the Resource DLL. Since the Resource DLL interacts with the application to perform its task it can experience some failures, such as a call taking too long or an exception. Failures of the Resource DLL such as exceptions or deadlocks should not bring the Cluster Service down. To achieve that Cluster Service never loads the Resource DLL in its process. Instead it spawns a child process – Resource Host Service (RHS or Resource Monitor).
Normally RHS is shared among many resources, but if you see a flaky resource you can move it to a separate monitor using the resource properties so that it is isolated from stable resources. In the image above you can see that the Resource 3 is isolated to a separate RHS process.
If your application requires multiple Resource DLLs then you can choose to create separate Resource DLLs for them or place them to the same Resource DLL.
For more information about RHS, see this blog post: http://blogs.msdn.com/clustering/archive/2009/06/27/9806160.aspx
You can find more information about the various cluster components in the following MSDN articles:
· Cluster Resource Monitor: http://msdn.microsoft.com/en-us/library/aa372266(VS.85).aspx
· Cluster Resource DLLs: http://msdn.microsoft.com/en-us/library/aa372239(VS.85).aspx
· Resource DLL Entry Points: http://msdn.microsoft.com/en-us/library/aa372244(VS.85).aspx
· Implementing Resource Health Monitoring: http://msdn.microsoft.com/en-us/library/aa372255(VS.85).aspx
· Managing Resource State Transitions: http://msdn.microsoft.com/en-us/library/aa370988(VS.85).aspx
In our next blogs in the series we will discuss how these components interact with the cluster, describe the entry points and give some examples.
Part 2 is now available: http://blogs.msdn.com/clustering/archive/2010/03/30/9987135.aspx
Senior Software Development Engineer
Clustering & High-Availability