What is new: OpsMgr 2007 R2 - How to reset monitor state with recovery?

Cameron had a nice example of using new R2 feature process monitoring in real life, but that raised a question about feature he wanted to use. Following is report of his issue and how we can help him to address his challenges using already existing feature of OpsMgr 2007.

Scenario: Monitoring a system with a process monitor. Define a recovery to reboot the system if it’s not running the process required. Run this recovery automatically on critical state.

Problem: In OpsMgr prior to R2 when a Recovery was created it had an option to “Reset monitor” which would put it to a healthy state. In R2, this now says “Recalculate State Monitor”. This is representing a challenge as described below:

Recovery wizzard

 

Challenge: Recalculating the state may keep the monitor in a critical state until the system has been rebooted successfully and is in fact running the process. If the process does not start correctly after reboot, it gets stuck in the critical state and the recovery will not run again. With a Reset of this monitor to a Healthy state, this would work properly, but without that option available I am not seeing an effective way to make this work.

Workaround: Recovery is no different than other workflows loaded by OpsMgr and is rather similar to task. It consists of modules that are chained together and should provide some corrective action in order for monitor to fix its state. For that reason, first module could be the module which resets state of the monitor.

Following is module that could be used with recovery directly. It will reset the state of the monitor specified in configuration.

<WriteActionModuleType ID="Microsoft.SystemCenter.Community.Health.ResetTargetStateAction" Accessibility="Public" Batching="false">

  <Configuration>

    <xsd:element minOccurs="1" name="MonitorId" type="xsd:string" />

  </Configuration>

  <OverrideableParameters>

    <OverrideableParameter ID="MonitorId" Selector="$Config/MonitorId$" ParameterType="string" />

  </OverrideableParameters>

  <ModuleImplementation Isolation="Any">

    <Composite>

      <MemberModules>

        <WriteAction ID="Health.ResetStateAction" TypeID="Microsoft.SystemCenter.Community.Health.ResetStateAction">

          <ManagementGroupId>$Target/ManagementGroup/Id$</ManagementGroupId>

          <ManagedEntityId>$Target/Id$</ManagedEntityId>

          <MonitorId>$Config/MonitorId$</MonitorId>

        </WriteAction>

      </MemberModules>

      <Composition>

        <Node ID="Health.ResetStateAction" />

      </Composition>

    </Composite>

  </ModuleImplementation>

  <OutputType>System!System.BaseData</OutputType>

  <InputType>System!System.BaseData</InputType>

</WriteActionModuleType>

Next is another module which can be used as well. It resets the state of the monitor first and then executes command.

<WriteActionModuleType ID="Microsoft.SystemCenter.Community.Health.ResetTargetStateCommandExecuterAction" Accessibility="Public" Batching="false">

  <Configuration>

    <IncludeSchemaTypes>

      <SchemaType>System!System.CommandExecuterSchema</SchemaType>

    </IncludeSchemaTypes>

    <xsd:element minOccurs="1" name="ApplicationName" type="xsd:string" />

    <xsd:element minOccurs="1" name="WorkingDirectory" type="xsd:string" />

    <xsd:element minOccurs="1" name="CommandLine" type="xsd:string" />

    <xsd:element minOccurs="1" name="TimeoutSeconds" type="xsd:integer" />

    <xsd:element minOccurs="1" name="RequireOutput" type="xsd:boolean" />

    <xsd:element minOccurs="1" name="MonitorId" type="xsd:string" />

  </Configuration>

  <ModuleImplementation Isolation="Any">

    <Composite>

      <MemberModules>

        <WriteAction ID="Command" TypeID="System!System.CommandExecuter">

          <ApplicationName>$Config/ApplicationName$</ApplicationName>

          <WorkingDirectory>$Config/WorkingDirectory$</WorkingDirectory>

          <CommandLine>$Config/CommandLine$</CommandLine>

          <TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>

          <RequireOutput>$Config/RequireOutput$</RequireOutput>

        </WriteAction>

        <WriteAction ID="Reset" TypeID="Microsoft.SystemCenter.Community.Health.ResetTargetStateAction">

          <MonitorId>$Config/MonitorId$</MonitorId>

        </WriteAction>

      </MemberModules>

      <Composition>

        <Node ID="Command">

          <Node ID="Reset" />

        </Node>

      </Composition>

    </Composite>

  </ModuleImplementation>

  <OutputType>System!System.BaseData</OutputType>

  <InputType>System!System.BaseData</InputType>

</WriteActionModuleType>

Sealed MP with both modules is attached to this post.

Sample: Attached is also example providing use of modules with simple event based monitor. Monitor targets instance of “Root Management Server” and that is a reason why management pack also defines a view for state of this entity. When you choose to display “Health explorer”, you should be easily able to locate sample monitor.

Initial Configuration

 

One of the recoveries present in attached MP runs automatically with WARNING state. Highlighted is MPElement replacement representing monitor you want to reset. (It should be same as value of the attribute Monitor! Also, please observe that using just a reset module causes its output to be displayed in “Context” tab as well as two state changes will appear to have “same” time of change in Health Explorer.

<Recovery ID="Microsoft.SystemCenter.Community.Monitors.RecoverySample.StateWarningResetRecovery" Accessibility="Internal" Enabled="onStandardMonitoring" Target="SC!Microsoft.SystemCenter.RootManagementServer" Monitor="Microsoft.SystemCenter.Community.Monitors.RecoverySample.EventBasedMonitor" RecalculateMonitor="false" ExecuteOnState="Warning" Remotable="true" Timeout="300">

  <Category>Maintenance</Category>

  <WriteAction ID="Reset" TypeID="MicrosoftSystemCenterCommunityMonitorsExtensions!Microsoft.SystemCenter.Community.Health.ResetTargetStateAction">

    <MonitorId>$MPElement[Name="Microsoft.SystemCenter.Community.Monitors.RecoverySample.EventBasedMonitor"]$</MonitorId>

  </WriteAction>

</Recovery>

Warning Recovery Context

DISCLAIMER:

Please evaluate in your test environment first! As expected, this solution is provided AS-IS, with no warranties and confers no rights. Use is subject to the terms specified at Microsoft.

Microsoft.SystemCenter.Community.Monitors.MPs.zip