Cluster and maintenance mode

Maintenance mode and cluster became a big issue for most of our customers. Starting MM for physical computer serving as one of the cluster node causes FALSE-positives alerts raised from cluster infrastructure monitoring MP. Please accept our apologizes, let’s hope this post clears things out a bit as well as provides some insight/tools how to perform this maintenance mode process.

Following is class hierarchy and relationships (part of management pack guide, so it should not be completely new). We can see that managed entity Cluster contains all managed entities which create cluster infrastructure (remember that hosting is specialization of containment introducing lifetime dependency)

 

 Cluster objects

 

1. MANUAL PROCESS

Usual steps to maintenance physical computer that serves as one of the nodes for failover cluster should consist of finding cluster which contains this computer. The one must start maintenance mode for this managed entity (Cluster) while also starting MM for all contained entities as well. Inserting entity with all contained objects causes node (which extends (inherits from) Windows computer) to enter MM for every local application hosted by this computer (though in SP1, Health Service which is local app is exempt from this rule). Only thing left to do is to start MM for each health service and health service watcher associated with cluster node computers.

Start maintenance mode 

2. Powershell script

To automate above said work, you can use attached powershell script. It locates all the nodes related to cluster specified by name sent into script as first argument. After it locates cluster nodes, it also enters maintenance mode for heath service and health service watcher associated with those physical computers.

$clusterName = $args[0]

$HoursInMaintenance = $args[1]

$Description = $args[2]

& {

    $ErrorActionPreference = "silentlycontinue"

    Add-PSSnapin "Microsoft.EnterpriseManagement.OperationsManager.Client" -ErrorVariable errSnapin;

}

$pathOpsMgr = $env:ProgramFiles + "\System Center Operations Manager 2007"

cd $pathOpsMgr

.\Microsoft.EnterpriseManagement.OperationsManager.ClientShell.Startup.ps1

$startTime = (Get-Date).ToUniversalTime()

$endTime = $startTime.AddHours($HoursInMaintenance)

$clusterCriteria="Name='"+ $clusterName +"'"

$cluster = get-monitoringClass -name 'Microsoft.Windows.Cluster' | get-monitoringObject -criteria:$clusterCriteria

$clusterNodeClass=get-monitoringclass -name 'Microsoft.Windows.Cluster.Node'

$nodes=$cluster.GetRelatedMonitoringObjects($clusterNodeClass)

"Putting cluster " + $clusterName + " into maintenance mode recursively"

$cluster.ScheduleMaintenanceMode($startTime,$endTime,"PlannedOther",$Description,"Recursive")

foreach ($node in $nodes)

{

    $healthServiceWatcherCriteria = "HealthServiceName='" + $node.Name + "'"

    $healthServiceWatcher = get-monitoringclass -name 'Microsoft.SystemCenter.HealthServiceWatcher' | get-monitoringobject -criteria:$healthServiceWatcherCriteria

    "Putting HSW " + $node.Name + " into maintenance mode"

    & {

        $ErrorActionPreference = "silentlycontinue"

        New-MaintenanceWindow -startTime:$startTime -endTime:$endTime -monitoringObject:$healthServiceWatcher -comment:$Description

    }

    $computer = Get-Agent | Where-object {$_.PrincipalName –eq $node.Name}

    $healthService = $computer.HostedHealthService

    "Putting HS " + $node.Name + " into maintenance mode"

    & {

        $ErrorActionPreference = "silentlycontinue"

        New-MaintenanceWindow -startTime:$startTime -endTime:$endTime -monitoringObject:$healthService -comment:$Description

    }

}

 

3. Recap

These steps with entering MM for whole cluster will cause temporary suppression of all monitoring for said cluster infrastructure (including cluster nodes (physical computers)) because all workflows are unloaded. There will be no alerting on cluster infrastructure or computer and health service while instances are in maintenance mode. I will try to answer question if it is possible to enter just active cluster node into maintenance mode in some of my next posts.

ClusterMM.ps1