Last contacted


Some customers not only want to know that health service hosted on the computer was not heart beating (remember, this is way different from our approach to recognize computer down, which BTW solely based on my own opinion, is rather unfortunate attempt to recognize something for what health service watcher was not originally designed), but they also would like to get at least some information about the last time such health service contacted its management server. This post is one possible solution that can be used.


This information is already present in health explorer (in some sort of form as we will see later), but is not as friendly to locate and requires “big” TCO. I will mention how one can do that In SP1 version of OpsMgr2007 anyway:


1.       One need to open “Health Explorer” form health service watcher that was marked critical. (Health service watcher views are located inside of the folder “Operations Manager” subfolder “Agent” for those of you who never had to wonder there.)


2.       Locating “Health Service Heartbeat Failure” monitor and exploring “State Change Events” tab is next.


3.       Context of the top state change to critical carries data type that caused state change and its “Date and Time” carries value when runtime recognized that heart beat is missing


Following is screenshot from next version of OpsMgr. We did some improvements in unavailability recognition and changed internal plumbing for some of the “Health Service Watcher” monitors (that is outside of the scope of this post and I may do another one describing the changes once release date approaches). It display the fact that data type used still contains same information about when runtime recognized that health service was not heart beating and that such information is present inside of “Date and Time” within context of the state change:


 


Unavailability as recognized thru health explorer


 


So I just proved that this is highly ineffective to do when multiple health services are not heart beating and one wants to have a quick view with information when heartbeat miss was recognized and what was possibly last time given health service contacted its server.


Before we do this, I need to explain how availability is stored in our Operational Database a little bit. There is a table “Availability”. One of the columns for this table is “LastModified”. Value is equal to the time when runtime notified SDK service about availability changes. That is not the time when runtime was last contacted by health service though. Last contacted time can be calculated based on heart beat interval and how many heartbeats should be missed prior notifying SDK about the fact that heartbeat was missing. Values for interval and number of missies are stored within global settings. And that gives us opportunity to create following SQL script:


use OperationsManager


 


declare @substract float


 


declare @numberOfMissing float


declare @interval float


 


select @numberOfMissing = SettingValue from GlobalSettings GS


                join ManagedTypeProperty MTP with(nolock) on GS.ManagedTypePropertyId = MTP.ManagedTypePropertyId


                where MTP.ManagedTypePropertyName = ‘NumberOfMissingHeartBeatsToMarkMachineDown’


 


select @interval = SettingValue from GlobalSettings GS


                join ManagedTypeProperty MTP with(nolock) on GS.ManagedTypePropertyId = MTP.ManagedTypePropertyId


                where MTP.ManagedTypePropertyName = ‘HeartbeatInterval’


 


select @substract = (@numberOfMissing * @interval)/100000


 


declare availCursor cursor


for


                select B.DisplayName, AH.TimeStarted from Availability A


                join BaseManagedEntity B with(nolock) on B.BaseManagedEntityId = A.BaseManagedEntityId


                join AvailabilityHistory AH with(nolock) on AH.BaseManagedEntityId = A.BaseManagedEntityId


                join (


                                select MAX(AHTMP.TimeStarted) AS MaxTimeStarted, BME.BaseManagedEntityId


                                from AvailabilityHistory AHTMP


                                join BaseManagedEntity BME with(nolock) on BME.BaseManagedEntityId = AHTMP.BaseManagedEntityId


                                where BME.IsDeleted = 0


                                group by BME.BaseManagedEntityId


                ) TMP on AH.TimeStarted = MaxTimeStarted


                where A.IsAvailable = 0 and B.IsDeleted = 0


 


open availCursor


 


declare @name nvarchar(255)


declare @time datetime


 


fetch next from availCursor         into @name, @time


while @@FETCH_STATUS = 0


begin


 


                declare @approxTime datetime


                select @approxTime = (cast((cast(@time as float)- @substract) as datetime))


 


                select @name as Name, @time as Recognized, @approxTime as ‘ApproxLastContactedTime (UTC)’, dateadd(hh, 7, @approxTime) as ‘ApproxLastContactedTime (Pacific)’


                fetch next from availCursor into @name, @time


end


 


close availCursor


deallocate availCursor


 


Results for last contacted


 


Based on the result and comparing with health explorer data, we can see that recognized is “equal” and approximate last contacted is calculated (by default it should be around 3 minutes before recognition). Maybe I will create a report in the future which will try to display this information in more unified manner, but that is not my intent right now …

Comments (1)

  1. My post about how to receive approximate last contacted time spawn good feedback and thanks to Robin