OpsMgr2007 SP1 RC rolling ugrade and cluster aware application monitoring.

 

Rolling upgrade of OpsMgr2007 is performed the way that server bits are upgraded, new management packs are requested and delivered and those are later pushed down to agents. Operations Manager 2007 SP1 Release Candidate contains an issue when monitoring of cluster aware application (like SQL, Exchange …) is present within topology.

This issue is manifesting itself by unloading several rules responsible for discovering instance of Virtual Server as well as monitoring failover of cluster resource group matching that particular instance of Virtual Server. It is caused by changes to configuration of native modules defined inside of Microsoft.Windows.Cluster.Library management pack.

Rules will be unloaded until new binaries are approved, delivered and replace binaries shipped with Microsoft operations Manager 2007 RTM. Impact of this unload is that during the rolling upgrade of OpsMgr2007 SP1 RC, newly created cluster aware applications are not discovered, already present cluster aware applications are not “actively” monitored if they had failed over to another cluster node or when cluster service stopped, paused or crashed.

Rules are re-loaded and are able to process new configuration once binaries present with OpsMgr2007 SP1 RC are replaced and loaded by runtime on agent (health service) running on cluster node.

Sample of event notifying about discovery rule unload:

Event Type: Error
Event Source: HealthService
Event Category: Health Service
Event ID: 4511
Date: 11/09/2007
Time: 9:26:27 AM
User: N/A
Computer: testBox

Description:
Initialization of a module of type "ClusterDiscoveryDS" (CLSID "{97B1EF21-757C-4004-86BB-57939E2C98D8}") failed with error code “Element not found” causing the rule "Microsoft.Windows.Cluster.Classes.Discovery" running for instance "Cluster Service" with id:"{0753905A-5ACE-5C70-1B0A-7980743053FA}" in management group “marius”