Maintenance Mode And Server Availability (Another Solution)

Maintenance mode expired, monitored server is still down, but there is no alert in the alert view of OpsMgr 2007 console indicating server being offline.

In my previous post I tried to walk through one possible implementation of the workaround to recognize server unavailability. That post also contains information about reasons behind scenario described above so I won’t repeat myself here. I will spend short time to discuss impact of that workaround to RMS and compare with solution I will suggest in this post though. One of the biggest “issues” with previous implementation is potential duplication of alert. Dependency monitor rolls up health state of the instance of its contributing health service and it is likely that alert was raised already. Another “issue” is duplication of health service watcher dependency monitor just to get alert for monitored health service unavailability. I suggested disabling original dependency monitor (Local Health Service Availability), but if that was not done, workflow count is affected and increased by count of health services within enterprise.

This post will try to suggest different solution to avoid alert duplication problem. It still uses dependency monitor and its feature to set the state its unavailable contributing entity uses. Difference is that dependency monitor will use always healthy monitor of contributing entity to guarantee that unhealthy state is a result of unavailability only and is never a result of health state of the contributing entity. Here is step by step set up and snapshot.

· Create a unit monitor targeted to the Health Service class that is always going to be green

· Create a dependency monitor targeted to Health Service Watcher class

· Set this monitor to roll up health from the always healthy unit monitor (created in Step 1)

· Set this monitor MemberUnAvailable configuration to Error

· Configure the alert for this dependency monitor

 

dependency monitor

This will ensure that alert is generated when the Health Service is actually not available even after the computer leaves maintenance mode. One more benefit is it will also alert when the Health Service is not started after computer comes back up.

error state

alert

Unfortunately this implementation still affects workflow count on RMS. There will be an addition of that many workflows as many health service watcher instances your enterprise contains. This means that you need to decide if want to live with increased TCO with possible duplicate alert from previous post, or if your RMS is able to withstand heavier workflow load.

To test and use this implementation, please import attached management pack in your test environment to evaluate if this workaround works for you. It is not sealed and can be further customized if you wish to do so. Test entering MM for all entities as described here (you can use script from this post), shutdown server and wait for alert.

Microsoft.SystemCenter.OperationsManager.AvailabilityWorkaround.xml