Service Fabric - Watchdog connector

I was helping a customer move an on premises solution into Service Fabric and they were very keen on using Application Insights as the dashboard of the Application Health. They also had monitoring within their on-premises system that would periodically send a heartbeat to their services and report health. This sounded a perfect use case for using the Watchdog sample service, which is an example of how your services could register an endpoint when they started with a stateful Watchdog Service running inside the cluster. The watchdog would then call that endpoint every 30 seconds and report 'Availability' via ETW and then Application Insights.

So let's walk through how we did it. For the sake of this post I have assumed that you have downloaded the sample watchdog from here and deployed it 'as-is' to your cluster. You can then access the watchdog inside the cluster via the Reverse Proxy on the following address: - https://localhost:19081/Watchdog/WatchdogService. You will notice that in the Visual Studio solution there are two test services; I have taken the pieces out of the test services that you need and placed them in a Nuget package so that you can consume them easily.

Next, we need to add a nuget package to the services that you want to watch. For completeness, the nuget package is available here (https://www.nuget.org/packages/ServiceFabric.Watchdog.Connector/)

Add nuget package

This will add a very small assembly to your service and the source code is available here if you are interested. We will be using the same mechanism that Service Fabric uses tois Event Tracing for Windows (ETW) as the mechanism to trace, warn and error. You will have also notice that if you create a Service Fabric Service, the Visual Studio template will create a ServiceEventSource.cs. Edit the class that Visual Studio has created for you and derive from IServiceEventSource which is in the ServiceFabric.Helpers namespace, so add a using for it to.

IServiceEventSource

Then we will get rid of the clunky private constructor that allows you to call it from anywhere and allow this to be passed in via a DI container instead - all will become apparent later. So delete all the singleton constructor semantics, as below.

Remove this constructor

You will notice that you will need to implement two new methods that are defined in the interface, I place them after ServiceTraceEventId 6 (ServiceRequestStop) and they simply allow you to write generic Trace and Error events.

Trace and ErrorLet's compile it and fix up the errors. You should find the only compilation errors are because this class no longer implements the Singleton pattern anymore; i.e. it no longer has Current. You will find the first occurrence in the program flow is in the Main method within Program.cs, change the code to create an instance of ServiceEventSource() above the try block and use the object instance rather than a static. Also notice that I have passed the eventSource into the Stateful constructor.

Main method

The next error is in your service itself, it will be complaining about the missing Current too, change the constructor to be something like this and fix any code to use the private instance rather than the static. This will also fix the constructor arguments that you changed earlier.

Service constructorYou should be able to get this to compile now. You will probably be thinking that you have lost your singleton EventSource, but if you were to go and amend your CreateServiceInstanceListeners method in your <NameOfService>.cs you will see that you can add the following line so that any service you write will have the EventSource injected in at the constructor level should you need it.

Register Singleton

To finish the ETW piece off, you can now add the following attribute [ServiceRequestActionFilter] to any controller and have it write an event every time it starts and stops. This would not have been possible unless we could inject in the instance of the EventSource.

Controller Attribute

We now have a pretty standard way of writing events, next is registering the service with the Watchdog so that it can call us. The easiest way to do this is to override the RunAsync method within your service like so.

RunAsync

The things to call out here are as follows

  1. The first parameter of RegisterHealthCheckAsync is a unique name for the service that will appear as the Event name in Application Insights, the second parameter is the URI suffix that I want the watchdog to call
  2. The first parameter of RegisterMetricsAsync is used to create a url for the metrics to be retrieved on and the boolean is to signify whether the service is stateful.

And that's it, you can see that you can register your services with the watchdog very easily and here is a little screen shot of what you could see in Application Insights if one of your services is having problems. I simulated this error by posting a manual Health Report to the system. Also worth noticing that the Watchdog itself is reporting its own availability too.

AppInsights

At this stage if you want a very in depth explanation of the Health Model, then take 20 minutes to read this: - /en-us/azure/service-fabric/service-fabric-health-introduction

Enjoy!