Cluster resource dependencies used to be a fairly limited relationship. A resource could depend on one or more resources (we call them “provider” resources in the context of the dependency relationship). This dependency relationship was an “and” relationship, in that the dependent resource R depends on provider resource P1 AND provider resource P2, etc. In other words, every one of the provider resources must be online for the dependent resource to be online. If even one provider fails, then the dependent must be terminated (while we restart the provider and/or invoke group failover, etc.).
There are scenarios, however, where something more flexible is called for. For instance, you may have multiple redundant network interfaces in your server, and as long as one of them is functional, your highly available service can keep running. Or perhaps you’ve got both IPv4 and IPv6 addresses allocated, and you only need one of them up and running.
So what you really want is the ability to say that my highly available service can keep running as long as provider P1 OR provider P2 is alive.
New and Improved
Enter Dependency Expressions, the new way of specifying resource dependencies in Windows Server 2008. There are two advantages here: now you can specify all of your resource dependencies with one expression string (and one API call), instead of having to repeatedly call AddClusterResourceDependency for each provider. Secondly, you can now specify “OR” relationships between providers.
AddClusterResourceDependency( r, p1 );
AddClusterResourceDependency( r, p2 );
SetClusterResourceDependencyExpression( r, L”([p1] or [p2])” );
Note: for convenience, you can use the provider resource’s name inside the square brackets. However, the parsing becomes difficult if the resource name happens to contain a square bracket itself (e.g. “Data Disk A]”). In this case, you must use the resource’s ID, which you can obtain via the CLUSCTL_RESOURCE_GET_ID resource control.
Power Good, Complexity Bad
Here’s the conundrum: one of the major goals of the new Failover Clustering product was simplicity. We made great strides in reducing the complexity involved in creating a cluster, in configuring a clustered application such as a file share, and in many other operations. So when faced with the question of just how much flexibility to allow in these new dependency expressions, there was a tradeoff. How do we add this powerful new feature without making it an administrative nightmare?
If any valid Boolean expression involving ANDs and ORs was allowed, then we risk ending up with spaghetti dependencies like “(P1 AND P2) OR P3 AND (P4 OR (P5 AND P6))” … well, you get the picture.
We also considered the possibility of “m of n” dependencies, in other words “4 out of 10 of these resources must be online”. How do you specify that in an expression – and expose it in the UI in a human-friendly way? And is this a compelling enough scenario to justify the extra complexity?
Finally, what about priority? E.g. you might want to have both P1 and P2 brought online before the dependent, but maybe P2 is totally optional, while P1 is not. “P1 OR P2” doesn’t quite capture this.
The End Result: ANDs of ORs
In the end, the balance we arrived at was to allow “ANDs of ORs”. In other words, you can have groups of OR dependencies that are all ANDed together, like this:
( [P1] OR [P2] ) AND ( [P3] OR [P4] ) AND [P5] AND ( [P6] OR …
This enables us to provide the functionality that enables some powerful new failover clustering scenarios (including geographically distributed cluster, or “multi-site clusters” as they’re also called, with some nodes in one location and some nodes in another), while at the same time not making the new API so complex as to be unusable.
Solution to the “Priority” Example
Also, as it turned out, the priority example above can be handled (in a fairly awkward way, admittedly) with the existing dependency expressions. Say you have essential provider P1, and non-essential provider P2. P1 has to stay up, but P2 can fail without dire consequence to the dependent resource or group.
If you set the dependencies as “P1 OR P2”, that wouldn’t work, because P1 could fail, and as long as P2 was online, the dependent resource would stay online. Similarly, “P1 AND P2” doesn’t work because if P2 fails, the dependent will be terminated, and that’s not what we want, since P2 is optional.
Well, if you use the Boolean trick that “TRUE OR X” is true for all values of X, then you could create a dummy resource (perhaps with a simple script and the GenScript resource) that was always online, and never fails. Then you can use the following dependency expression to give P2 the “optional” status we desired:
[P1] AND ( [P2] OR [AlwaysOnline] )
where AlwaysOnline is the dummy resource.