State based alerts in MOM 2005

Alerts in MOM are generally used to inform administrators of a potential problem condition.  By the time the administrator reviews the alert, the problem condition that caused the alert may already be resolved. 

State based alerts are special types of alerts that were introduced, in part, to allow an administrator to tell at a glance whether the condition that caused the alert to show up is an ongoing condition. 

The MOM operator console uses system state to describe graphically the general health of systems - generally grouped by server function (Exchange, AD, SQL, etc).  In some cases, the thresholds used to describe healthy vs. unhealthy can be configured by the administrator.

At the individual level, state based alerts have a 'problem state' property as part of the alert description.  This property describes the current state of the alert.  If the alert is still an active then the value will be listed as 'active'.  If the problem has been resolved the value will be listed as 'inactive'.  If the problem is inactive the alert will remain until manually cleared or cleared by grooming.  Unresolved alerts whose state is set to 'active' can only be removed manually, even is grooming is set to do so.

There are two requirements for creating an alert that is state based. The first is to set the 'enable state alert properties’ tab on the alert itself and set the Instance value – typically $Logging Computer$.

 

The second is to define what criteria will cause the alert to be generated, it's severity and it's state set to 'active' and what criteria will cause an active alert to change state to be 'inactive' – commonly known as the ‘clearing event’.  In order for state based alerts to work it is required to define what conditions will flag a problem state.  As shown in the graphic below, this criteria can be virtually any MOM property – event ID, event source, event parameter, etc.

Questions arise from time to time about alerts that remain active and never receive the ‘clearing event’ that flags it as inactive and allows it to be groomed from the database.  This can occur for several reasons – including:

-The actual ‘clearing event’ not being received – perhaps a script that produces the ‘clearing event’ not being executed.
-The system has been removed from a computer group so that it is no longer able to run the rules that would generate the ‘clearing event’.
-The problem condition is continuing to occur.

Hope this helps – feel free to post comments with further questions.

-Steve