Using TraceEvent to mine information in OS registered ETW providers

Article
03/09/2013

In previous blocks on TraceEvent I shows you how easy it was to start up ETW sessions to collect information generated by System.Diagnnostics.Tracing.EventSource classes (typically logging that you yourself did). But I also mentioned in other blogs that the one of the real strengths of ETW was the fact that you get to correlate lots of OS generated events with your own. Well in this entry I will show you how to do this (although I will show you even better ways in later blogs).

In this blog I will walk though a simple application called 'ProcessMonitor' which basically prints a line to the screen every time a process on the system starts or stops. There is a ZIP file of a VS2010/VS2012 solution at the end of the blog and I encourage you to open it and read the code and play around with it. I have removed comments and error handling a the 'essential' code is only about 35 lines of code. Here it is.

The basic structure is basically the same as the 'real time' (no file) EventSource example that I already blogged about. Basically you

Open a TraceEventSession (with a null file name to indicate a 'real time session'. A session is a 'controller' which can turn ETW providers on and off.
Connect a ETWTraceEventSource to that session, A TraceEventSource represents the stream of events that were sent to the session. An ETWTraceEventSource knows the 'basic' structure that is common to all events (like that every event has a process id, time stamp, source provider, task and opcode, but does NOT know how the name of the event or how to deserialize the payload of the event.
Connect one or more TraceEventParsers to the ETWTraceEventSource. A TraceEventParser knows how the event names and how to deserialized the payload bytes of the raw ETWTraceEventSource FOR A PARITULAR SET OF EVENT PROVIDERS. In this case we connect the 'RegisteredTraceEventParser, which knows how to decode any event that was registered with the operating system (using wevtutil). EventSource are NOT typically registered (although they can be), for EventSources you connect a DynamicTraceEventParser (because EventSources log their event information at EventSource startup time (dynamically). However in this case we are interested in the OS providers, and so RegsiteredTraceEventSource is what we want. It is common to register both.
Subscribe to the event callbacks of TraceEventParsers that you connected to your source. Some parsers are 'strongly typed' in that they know the exact name and fields of every event they know about at compile time (ClrTraceEventParser and KernelTraceEventParser are like this). These are the best to use because they are VERY efficient (no string lookups), you get to use normal C# property accesses (since the names of these properties are known at compile time), and are type-safe (no casting needed). However both DynamicTraceEventParser and 'RegisteredTraceEventParser can not work like this because they fetch the information about events at runtime (not compile time), and thus look more like a reflection API. In a later blog post I will show you how to fix this by generating TraceEventParsers for use at compile time, but until that we will make do with the limitations of 'RegisteredTraceEventParser . Because 'RegisteredTraceEventParser' does not know at compile time the names of ANY events, you can't subscribe to particular events with this TraceEventParser, instead you can only subscribe to 'All' meaning all events this parser knows how to parse. Once inside the callback you can then filter out the particular events of interest. In our case most of the code above is the delegate body that gets called on every event that 'RegisteredTraceEventParser knows how to parse.
Enable the ETW Providers for the events you are interested in. In the example above we use the GetProviderByName() API to convert the name Microsoft-Windows-Kernel-Process to is unique ID, and then used that to enable the provider with and level (verbosity) of 'Informational' and a 'keyword' (bitset) argument '0x10' . The 'keyword ' argument is a bitset that indicates which events to enable. It is not uncommon to use -1 (all 1s bitset, or every event in the provider), however in this case we 'knew' the right bitset was 0x10 (more on how we did that below).
Process incoming events by calling the ETWTraceEventSource's 'Process()' method. For files, this returns when all the events in the file are processed. For real-time sessions, it only ends when the 'ETWTraceEventSoruce.StopProcessing()' method is called.

Once Process() is called, as events come in your event handlers will be called (for real time sessions there typically is a 1-3 second delay between the event happening and being called back in your Parser). In our case we are looked for events with at 'TaskName' of 'ProcessStart' and 'ProcessStop' and based on what we see there we call 'PayloadByName' to fetch particular payloads (and cast them to the correct type), and then process them (in this case simply print a message to the Console.

So you can see, that the actual mechanics of making this tool is reasonably straightforward, however we make two important observations:

The callback code is not very friendly. It has lots of 'magic names' which if incorrect, will simply cause nulls to be returned (which means casting will throw an exception). You had to know what type various things were (was CPUCycleCount an int, uint, long, ulong?) and you can imagine that the code behind 'PayloadByName' is relatively inefficient (it much do some sort of string lookup when it is called and then box its result into an object (which we then have to cast). This is no horrific (as it would be if we had to deserialize the data blob ourselves), but is not unlike the string parsing you do when processing a CSV file, which we would like to avoid.
There is also a discovery problem. How do we know that the provider 'Microsoft-Windows-Kernel-Process' event exists, and how did we know that the 0x10 keyword was what we wanted? Thus this example alone is really not sufficient, we need to understand what providers are out there and what keywords they have. How do we do this?

The answer to the first problem is to generate a provider-specific TraceEventParser at compile time. That is what we want is to have something that takes a provider (or more specifically a description of all the events in a provider called the manifest), and generates code that knows how to parse THOSE PARTICULAR EVENTS. Now you just link in that code, and the you can now get a much better experience, where intellisense works and each event knows the fields that are valid. We have a tool called 'TraceParserGen' which does exactly this, which I will talk about in a future blog. In the mean time I will show you how to find the strings I used in the code above for yourself in this blog entry.

Then answer to the second point (how do we figure out what providers are out there and what events are in the provider) for today at least is the 'logman query providers' command.

You can find out all the providers that have been registered in the operating system by running the command

logman query providers > providers.txt

It is a long list, which is why I redirected it to a file. Generally you care most about the providers that begin with 'Microsoft-' because these are the 'new' ETW providers. Most of the others are 'old' providers' and may not provide any useful information about how to enable them. It also does not include providers that did not publish themselves to the OS (EventSources DO NOT publish themselves an thus are not on the list). Thus the command above is pretty much only useful to find out what OS providers there are (which is still VERY useful).

There is a very useful variation of the command above. You can give it a -pid <num> argument

logman query providers -pid 5400

Which looks for all USER MODE providers that CAN BE ENABLED in the process given. This will include all EVENTSOURCEes (however it will just show their GUID, so it is not super-helpful), and will NOT show KERNEL providers (they are implicitly available for all processes), but will show you the USER MODE OS providers that are available, which is still pretty handy.

Once you have a provider you are interested in, you can find out what Keywords and level it allows by specifying a provider name on the 'logman query providers' command. For example have doing a logman query providers and seeing Microsoft-Windows-Kernel-Process, we can guess that if we wanted to know about process stuff that this provider would probably do it. Thus we execute the command

logman query providers Microsoft-Windows-Kernel-Process

And we get the output

Value Keyword Description
-------------------------------------------------------
0x0000000000000010 WINEVENT_KEYWORD_PROCESS
0x0000000000000020 WINEVENT_KEYWORD_THREAD
0x0000000000000040 WINEVENT_KEYWORD_IMAGE
0x0000000000000080 WINEVENT_KEYWORD_CPU_PRIORITY
0x0000000000000100 WINEVENT_KEYWORD_OTHER_PRIORITY
0x0000000000000200 WINEVENT_KEYWORD_PROCESS_FREEZE

Which shows us that the keyword 0x10 will get use the WINEVENT_KEYWORD_PROCESS events and the keyword 0x40 would get us the WINEVENT_KEYWORD_IMAGE, so if we cared about processes and image loads, setting the keyword 0x50 (the OR) should give us what we need.

Another pretty common technique is to just 'try it and see'. For example, once we were interested in Microsoft-Windows-Kernel-Process we simply 'try it and see' using PerfView.

PerfView /onlyProviders:Microsoft-Windows-Kernel-Process collect

Let it run for a bit, and then look at the 'events' view in PerfView. By default if you simply specify a provider name in PerfView, it turns on ALL keywords and VERBOSE level, so you should get everything. you can then see what you got. Doing this for Microsoft-Windows-Kernel-Process we see the following in the events view

That the Microsoft-Windows-Kernel-Process provider has 4 events. In PerfView an events is show as 'Provider/Task/Opcode' so the first event in that display has a 'TaskName' of 'ImageLoad' and an empty opcode (Info). We also see the event with the 'ProcessStart' TaskName and the OpcodeName of 'Start'. These are probably of interest
Looking at particular exmaples we see that a ProcessStart/Start event has fields like 'ParentProcessID' and 'ImageStart' and that the ProcessStop/Stop event has fields like 'ExitCode' and 'CPUCycleCount'. This is how we know what strings to pass to 'PayloadByName'.

It is possible to probe this information at runtime (if you just to a ToString() on a TraceEvent, it will dump a XML representation of all the fields of the event into the returned string). There are also the 'PayloadNames' property that tell you all the legal values to pass to 'PayloadByName'. Typically, it is easier to simply 'try it an see' in PerfView, however.

So there you have it, You are now armed with the ability to go data mining for all the interesting events in the ETW system. Keep in mind however, that for kernel events, your first reactions should be to explore the KernelTraceEventParser, because it is already got a nice, efficient, strongly typed wrapper. However if it is not there, You can use RegisteredTraceEventParser to decode it. Couple this with the ability to also see your EventSources, and things start to get very exciting...

So now its your turn. Open the ProcessMonitor.ZIP file attached, drag it to your machine, open it in Visual Studio and take a look (it has lots of comments I omitted above). Play around with it (try other providers, and keywords, Play with image load events or file open events).

Vance

ProcessMonitor.zip

Using TraceEvent to mine information in OS registered ETW providers

Additional resources