This post is about some interesting things I discovered when trying to use System.Performance.DeltaValueCondition as part of a workflow in a SCOM Management Pack.
The purpose of the Management Pack is to monitor the health of an ETL process for a data warehouse. The customer wanted to be alerted when certain record counts processed by the ETL jobs varied more than a specified threshold from one day to the next.
The basic design for this type of monitor is unsurprising. I created a data source type that knew how to get the record counts. Then I built a unit monitor type that feed the record counts from the data source into a condition detection module (a.k.a. filter) based on System.Performance.DeltaValueCondition. After the filter is the expression logic to test the delta against the thresholds.
The System.Performance.DeltaValueCondition filter is designed to do exactly what I needed: it outputs the delta between samples. It can be configured to calculate the delta in terms of percentage or absolute difference. I reflected these config settings into the configuration section of the Monitor type so that everything would be super flexible.
Everything seemed to work fine in my unit testing.
The issues didn’t show up until I ran the monitor under more realistic conditions. The primary difference between my unit testing and the production application is that the rules in production have an interval of 24 hours, whereas in testing I ran things much faster.
Here is what I discovered about System.Performance.DeltaValueCondition:
1) It considerable effort to understand the meaning of the numsamples property. Basically I found that any setting other than 1 is of dubious value for my application. Numsamples controls the number of samples held in memory by the filter. The filter will start outputting the delta after it has seen 2 values, regardless of what the NumSamples is set to. However, the output delta is always between the incoming sample and the oldest sample in history, which eventually gets updated as the queue fills up and the oldest entry is removed to make room for the newer entry.
2) The queue of values is in memory. Any time the system is restarted or the service restarts, the queue is lost and will need to be reprimed before alerts are output. For me, this means I will miss alerts for a 24 hour period after a service restart.
3) The delta is a signed number. For some reason I assumed that the filter would output the ‘absolute value’ delta. Since, I couldn’t find an absolute value function in the expression syntax, my expressions doubled in complexity.
So the moral of the story is that while there are lots of great things available for reuse in the System MPs (and System.Performance.DeltaValueCondition is one of them), the devil is in the details and in the requirements of your application. Particularly in scenarios with very long cycles.
Because of these issues, I decided to forgo the flexibility promised by my initial design based on System.Performance.DeltaValueCondition. I rewrote the data source module (which is script based) to incorporate the delta logic. This removed the need for System.Performance.DeltaValueCondition filter from my monitor type. The new data source outputs the ‘absolute value’ delta, and it has no statefulness (it queries for current and previous values each polling cycle) so there is no delay in alerting and no issues with service restarts. The big tradeoff now is some loss of flexibility.