Heartbeat detection in OpsMgr 2007

A questionc came up today regarding heartbeating in OpsMgr 2007 – specifically, whether we can easily group systems and deliver missing heartbeat notifications based on specific computers and teams that manage those computers.  The answer is yes – but the process of heartbeating and setting this up is different in OpsMgr 2007.

Watcher node – In OpsMgr you have the agent node and the watcher node.  The agent node is the system with the installed OpsMgr agent that performs data collection, evaluation, etc. The watcher node is a designated system external to the agent that can perform monitoring to ensure the actual agent of interest is healthy.  A good example would be an IIS server.  An agent running on the IIS server is fully able to monitor providing the IIS server is running.  If the IIS server goes down there would be no way to continue monitoring web page availability, etc.  Enter the watcher node.  A system can be designated to ‘watch’ the IIS server to ensure it is up and running.  The same can apply in other examples too – such as the Health Service (the agent). 


Heartbeat monitoring is handled by the Health Service watcher node (RMS/Management servers).  The watcher nodes expect heartbeats from agents – if they don’t get one appropriate rules will fire to indicate the problem.  This works perfectly because, like in MOM 2005, heartbeat issues must be detected at the management server level.


Splitting out systems so that notifications can be sent based on computer group membership is also fairly straight forward – just create a new computer group whose target is ‘Health Service watcher’.  The further criteria for the group can be whatever expression you need to uniquely identify your systems – simple pattern matching/wildcarding or full regular expressions.  From there, you create subscripts for alerts generated by systems in these computer groups and the problem is solved!  More detail is available in the PSS teams blog at http://blogs.technet.com/smsandmom/archive/2008/03/25/opsmgr-2007-monitoring-health-service-availability.aspx