Monitoring and troubleshooting with .NET 4 and Windows Server AppFabric

By now you may have read about the exciting new capabilities in Windows Server AppFabric. In this post we will dive deeper into the features that enable monitoring and troubleshooting of your WCF and WF applications. The monitoring capabilities for WCF and WF applications in AppFabric are built using the monitoring enhancements in .NET Framework 4. The WCF and WF runtime have been instrumented to emit tracing and tracking events to a high performant Event Tracing for Windows (ETW) session. ETW allows turning on monitoring by default, since it has minimal impact on the application performance.

The AppFabric monitoring infrastructure is built using the following events that are emitted from the runtime:

· Analytic Tracing Events: Analytic traces are targeted traces that get emitted from the WCF runtime, at key execution points such as operation completion, service error

· Workflow Tracking Events: Workflow tracking events are emitted during the execution of a workflow instance. These events provide visibility into the workflow execution such as when a workflow instance starts or encounters an error

· Message Flow Tracing: Turning on message flow tracings allows correlation of traces between different services. This allows reconstruction of end to end message flows between services deployed on a single machine or distributed across machines

Now that we have briefly discussed the events that are leveraged by Windows Server AppFabric monitoring let us understand how these events enable monitoring and troubleshooting. AppFabric monitoring includes an Event Collector service that listens for the tracing and tracking events. The Event Collector service collects these events and stores them to a monitoring database. The AppFabric tooling queries the monitoring databases to display monitoring and troubleshooting data. The figure below is a high level view of monitoring components:

clip_image002[6]

The verbosity of events emitted is controlled by the monitoring level in AppFabric. The monitoring level can be changed depending on the situation such as if the service encounters errors; the verbosity of events can be increased to help troubleshooting. The monitoring levels in AppFabric are:

· ErrorsOnly: Events are collected only if service encounters an error

· HealthMonitoring: The default level that allows AppFabric tooling display health of WCF and WF applications

· End To End Monitoring: Enables message flow tracing to correlate events between services.

· Troubleshooting: The most verbose level to diagnose issues with your WCF and WF service.

A future post will talk about AppFabric configuration and how the monitoring levels can be changed for a WCF or WF application.

We will show an example of troubleshooting a WF service deployed in AppFabric. We will use a simple WF service, from the WF tracking samples. The sample workflow is a simple workflow that computes reciprocal of the input. An error is simulated using an input of zero. Deploy the application to AppFabric as mentioned in the post.

We will use AppFabric tooling to troubleshoot the service once the error has been simulated. To monitor the health of an application a user will usually open the AppFabric Dashboard. The dashboard shows metrics related to the deployed WCF and WF services. In this case, the service encountered an error; hence the failed WCF call and the failed WF instance have been highlighted.

clip_image004[6]

When a WF service executes you see WF tracking events corresponding to the WF execution and WCF events corresponding to the execution of the messaging activities. When the service invocation fails you will get both a WCF exception event, corresponding to failure in execution of messaging activity (Receive activity) and a WF instance failure event. If you click on the exception in WCF Call History, you will get the details of the exception.

clip_image006[6]

For the workflow instance you may want to know which activity failed. Click on the failures in the WF Instance History in the dashboard. Right click on the aborted WF and view the tracked events for the WF

clip_image008[6]

clip_image010[6]

The source of the error is the Assign activity. The exception stack can be found out by clicking on the Errors tab in the details pane.

This shows you how the analytic tracing and WF tracking helps troubleshoot a WCF or WF service. To correlate traces from WCF and WF events you will need to change the monitoring level of the application to End-to-End Monitoring. This uses the message flow feature mentioned earlier to add an end to end activity id to the trace events to navigate between WCF and WF events originating as a part of a request.

We have seen how the monitoring infrastructure enhancements and Windows Server AppFabric tooling make it easier to monitor and troubleshoot your WCF and WF applications. To summarize

· The .NET Framework 4 emits high performant tracing and tracking events to ETW which can be enabled with minimal impact on the application

· Windows Server AppFabric tooling helps you visualize the events to gauge the health or troubleshoot problems with your application

· Windows Server AppFabric configuration gives you control on the verbosity of the events through monitoring levels