Exploring EventSource Activity (correlation and causation) Features

Article
09/14/2015

In this post we will be going over the New EventSource Activity support introduced in .NET version 4.6. If you use EventSource in a server (concurrent) environment, you will want to read this post.

What are Activities?

If you have been using EventSource in sever scenarios you probably have wanted what I am calling 'Activity' support.

The basic issue is that events are great at telling you about what happens at A SINGLE INSTANT IN TIME, but almost always you care about the WHOLE FLOW of your application. Now if your application is only doing one thing at a time, then the chronological ordering of events does a pretty good job of grouping related events together. Thus in a single threaded app, if you have incoming HTTP request 'Start', a series of outgoing HTTP requests, and then the HTTP response, then the inference that the outgoing request are related to the incoming request is a pretty good assumption. However if you have CONCURRENT or OVERLAPPING activities, than this logic VERY QUICKLY breaks down and interpreting your events becomes impossible (in this case, more information is not better).

If you application was singly threaded, is I just that you have multiple threads running simultaneously, then the problem is pretty easy to fix. You need only stamp each event with the ThreadID logging the event. With this extra information you can simply sort the events into separate streams each of which a unique thread ID and we are back to the singly threaded case an life is good again.

Sadly live in a real server is not that simple. The problem with ThreadIDs is

Threads typically do work on behalf of other threads. Typically threads 'listen' for incoming requests in some way, and then process them. Thus any application that makes use of several of these services cannot easily be grouped together. If several concurrent threads are all requesting work from these worker threads, again you get a tangle of event that is impossible to untangle.
Asynchronous programming (which is all to common now), makes this problem MUCH worse, because the concept of a Thread becomes tenuous. Basically every time you do an I/O the next operation will happen on whatever thread is most convenient to use. In such a world using threads as a correlation ID is hopeless.

So what we need is to come up with another concept that is very much like a thread of execution, but works in a concurrent / asynchronous world. This is what an activity is. More formally

An activity is a set of causally related events, linked by an common ID.

Thus an activity:

Starts at a particular point in time
Causes 0 or more other events to fire.
Stops at a particular point in time (and thus has a duration which is the difference in time between the start and stop).

One more important aspect of an Activity is that the can NEST (that is one activity can cause another activity before the first completes. Thus they can form stacks just like methods do. In fact most typically there is a close correspondence between an activity and a particular CALL FRAME of a method in the code.

EventSource Activities in .NET V4.6

In .NET V4.6 we have made it very easy to use activities. But first, you have to have .NET Version 4.6

If you are on Windows 10, you have .NET V4.6
If not, do a 'start c:\Windows\Microsoft.NET\Framework\v4.0.30319', right click on the 'clr.dll' and look at the 'properties' of that dll. Select the 'Details' tab and see the 'File Version'. If it says V4.6.* you are good to go.
In not you can update your system by going to the .Net V4.6 download page, downloading and installing it. This release is very compatible with previous versions so this is a 'safe' thing to do.
After probably about 4/2016, windows update should do this for you automatically.
If you have installed VS 2015 you also have .NET V4.6

Once you have .this version of .NET. It is very easy to use Activities in EventSource. Simply name some of your events with a 'Start' or 'Stop' suffix. For example

Makes an EventSource 'Microsoft-Demos-Activities' that has a Request activity. This activity has; a 'Start' and 'Stop' event associated with it. (Note to simplify finding the pairs of events, 'Stop' events need to use an event ID that is one bigger than the corresponding 'Start' event. Because the events with the 'Start' and 'Stop' EventSource knows that this represents an activity called 'Request' and does the following 'magic'

Creates a new Activity ID to represent the new activity when the 'RequestStart' event is logged
Attach this activity ID to EVERY EventSoruce event (even from different EventSources), that is logged from code CAUSED BY code after the Start event.
Until the 'RequestStop' event fires, in which case the Activity reverts to what it was before the ServiceStart.

If you collect your events with PerfView (Version 1.8) there is basically no change. You can simply collect your events normally For example for my I did

PerfView /threadTime /onlyProviders=*Microsoft-Demos-Activities collect

Now to make things interesting, I ran the code that processed requests in a Parallel.For loop that processes 20 requests concurrently. After stopping PerfView and looking in the 'events' view for RequestStart and RequestStop events, this is what I see.

Which shows that different threads are executing different requests and each get an activity ID (for example the first is //1/5). Notice that the thread that finishes the request is not even the same thread that started it (for example Request(//1/6) starts on thread 10948 and stops on thread 8972). Because PerfView can link up the starts and stops, it can also compute the duration, which it shows as the pseudo-field DURATION_MSEC, so we can see that Request(//1/6) took 2523 milliseconds.

You will note that the 'ActivityID' has the from like //1/6, that is it looks like a path (list) of numbers. This is precisely what it is and it is designed to capture the fact that activities can nest. When an activity nests inside another activity, instead of simply picking another unique number it APPENDS a new child to the path of numbers. This creates a very useful property: All work associated with a particular activity has the same prefix.

We can see this if we look at all the events in the 'events' view. What we would see is many events interleaved with one another like this

Notice that the activity IDs 'jump around' as work on different requests gets executed in parallel. However we can easily focus one particular request simply by including only those vents with a certain ActivityID prefix.

For example if we put //1/6 in PerfViews 'Text Filter' textbox, that will only show you events that have that string and you get the following display

This is a very useful view! It shows you all the events associated with a single request filtering out all other requests. Notice it includes other events that are Not Starts and Stops (e.g. DebugMessage), and while it is not shown above, it would also include other EventSources (not just Microsoft-Demos-Activities). It also shows 'sub-activities' ('like Phase1 and Phase2) and shows their nesting relationship. For example we can tell that SecurityStart at 8206.265 is nested inside of Phase 1 because its activity ID (//1/6/1/1) is a superstring of Phases 1's activity ID (//1/6/1).

Now we can see the value of making ActivityID paths of numbers rather than simply a arbitrary unique numbers. It lets you express nesting and thus easily express concepts like 'everything that I caused, transitively'.

In fact, it is very natural to display activities as a tree, where the parent of a child node is simply the node that caused it. PerfView can display this in the 'Any StartStopTree' view which shows every event organized into a tree showing the activities that caused it. For example in this trace.

We see that the process started a 'Loop' activity //1/1, caused a ForkJoin' activity //1/1/1 , which cases a 'Request' activity //1/1/1/1, which caused a Phase2 activity //1/1/1/1/3 which caused a Security activity //1/1/1/1/3/1 which then logged a Debug message. Thus activities look a whole lot like call stacks, and in fact you can treat them like 'coarse callstacks' in many analysis.

Sorry, Activity Support no available in the Nuget package (yet).

One important caveat to the Activity tracking support is that it is presently Not Available in the EventSource Nuget package. (This is a reminder that we always recommend that users use the System.Diagnostics.Tracing.EventSource class built into the framework if you can, however we provide a 'clone' of this class in a Nuget package by renaming the class to Microsoft.Diagnositics.Tracing.EventSource, for those TEMPORARY cases where you need an EventSource feature on a older runtime. However in this case, Activity support required ehancements outside of EventSource itself to implement which is why the feature is not available in the Nuget package. We may be able to 'hack' some sort of partial support, but for now you need to upgrade to V4.6 to get activity support.

Summary

We covered quite a bit of ground today, there is more to talk about, which I will do in the next few blog posts. However for any who can't wait, I strongly suggest you read the New EventSource Activity support specification (attached at the end), which goes into a lot more detail about how it actually works, where it might produce surprising results, and how it interoperates with other Activity tracking conventions.

But the key take-away, is that activities are INDISPENSIBLE in a server environment when you are doing many things concurrently/asynchronously. EventSource has made this super-easy to do (simply name your events with the Start and Stop suffix), and PerfView makes it very easy to immediately take advantage of the value that EventSource adds to your events.

In my next blog entry, I will do more of a walk though of actual code so you can try it yourself.

Here is the document that goes into more details.

EventSourceActivities.docx

Exploring EventSource Activity (correlation and causation) Features

Additional resources