ETW Tracing: How and Why?

Awhile ago, I wrote about examining the structure of ETW Events by look at the MOF data in CIM Studio.

 

Having recently started ETW instrumentation of the VS2005 Profiler components, now’s a good time to talk about creating and writing your own ETW events.

 

Why bother with ETW tracing? After all, tracing mechanisms can seemingly be bought by the dozen in nearly any hardware store. Off the top of my head, there’s the Windows Event Log, ETW tracing, WPP, OutputDebugString, and System.Trace in .NET. There are also application specific tracing tools like SQLTrace for SQL Server.

 

Plus, a quick look at the ETW tracing spec shows that it’s not a simple API to use at first. It’s littered with GUIDs and non-trivial function calls that make the learning curve pretty steep. However, the advantage of ETW tracing is that it’s baked into the operating system, and it’s been designed for industrial strength tracing with requirements that would be daunting if you tried to implement yourself.

 

Among the design requirements of ETW tracing, you’ll find that it:

  • Is usable from both drivers and application programs
  • Supports multiple processes logging concurrently
  • Very high throughput (on the order of 20,000 events/second with less than 5% CPU load
  • Can be easily enabled/disabled at run time without any special application awareness
  • Lets you work with trace files on any machine
  • Logging to a file or real time tracing in memory
  • Circular buffers

 

In addition, Microsoft has ETW instrumented many key technologies, including:

  • System (processes, threads, DiskIO, Registry, PageFault, etc…)
  • HTTP.SYS
  • IIS
  • Asp.net
  • Indigo
  • Network
  • ACPI
  • CLR
  • SQL Server
  • ADO.NET providers

 

You can mix, match and enable any combination of these ETW event providers (and your own) in trace sessions. How cool is that?

 

Having looked at why ETW deserves serious consideration as your trace mechanism, let’s now take a very high level tour of how to write your own event provider. Obviously you’ll need the Platform SDK documentation for any real work, but this deliberately simplified overview should help put the SDK information into a better context.

 

Writing an ETW Provider falls into five high level steps:

 

1) Select a provider GUID  and event class GUID(s)

2) Register your provider with the system (per-process)

3) Provide a way for the system to enable/disable your provider

4) Create and emit the events

5) Allow inspection of your events by other tools (optional)

 

Let’s look at these steps in a little more detail:

 

Step 1 (Selecting GUIDs)

In this phase, you decide how you want your provider structured. Each provider is represented to the system by a unique GUID.

 

A provider can have one or more “event classes”. By convention, an event class is one logical area of your code. For instance, in my recent profiler work, I created a provider with three event classes:

  • Instrumentation
  • Collection
  • Analysis

 

Likewise, the “System” trace provider has different event classes for:

  • PageFault
  • UdpIp
  • TcpIp
  • Image
  • Process
  • Thread
  • HWConfig
  • DiskIO
  • Registry
  • FileIO

 

Each event class is also represented by its own unique GUID. Partitioning your logging into appropriate event classes lets you more easily filter the trace output to just the events of interest. Nothing prevents you from having a single event class however.

 

Step 2 (Registering with the system)

 

Registering your provider and event classes with the system is done with the RegisterTraceGUIDs API. You pass this API your provider GUID and the event class GUIDs that you defined in step 1. The RegisterTraceGuids call is per-process, not global, and needs to be called each time you start your program.

 

Step 3 (Enabling/Disabling your tracing)

 

When you called RegisterTraceGuids in step two, you specified a “ControlCallback” function that you’ve implemented. In this function, you’ll want to set a global value which tells your trace emission code whether to emit events or not. The simplified pseudo code for a ControlCallback function looks like this:

 

ControlCallback( RequestCode )

{

     Switch ( RequestCode )

     {

          case WMI_ENABLE_EVENTS:

              TraceOnFlag = true;

              break;

          Case WMI_DISABLE_EVENTS:

              TraceOnFlag = false;

              break;

}

}

 

Your event generation code (step 4, below) should examine the “TraceOnFlag” value, and if it’s false, it shouldn’t do anything.

 

The OS magically takes care of calling your ControlCallback function. It even knows to do this when you’ve enabled tracing before your process even started. Slick, eh?

 

 

Step 4 ( Creating & Writing events)

 

In this part of your code, the work goal is to create an “event” with your data that you then pass to the TraceEvent API. The event you create contains a standard header of type EVENT_TRACE_HEADER. Any additional arbitrary data that you want to trace is added immediately after the EVENT_TRACE_HEADER. If you do add additional data, it’s critical that you update the ‘Size” member of the EVENT_TRACE_HEADER with the length of your additional data.

 

The data in the EVENT_TRACE_HEADER is traditionally packed together as closely as possible, even the strings. If you later write your own ETW event consumer, you’ll be responsible for knowing what data you’ve packed into the additional data.

 

How do you differentiate different events from each other? In the EVENT_TRACE_HEADER, you’ll find two fields ( “Guid” and “Class.Type”) which you get to specify. The Guid field should be set to the “event class” GUID that you defined in step 1. The Class.Type field is an integer that represents a unique events. For instance, you might define your Class.Type values like this:

 

#define BeginActivity1 1

#define EndActivity1 2

#define PausingActivity1 3

#define ReceivedRequest 4

// and so on….

 

When you finally do call EventTrace(), the system fills in various fields in the EVENT_TRACE_HEADER for you, including the process & thread ID, and the time the event was traced.

 

Creating the custom data after the EVENT_TRACE_HEADER can be a bit messy, especially if you have data that varies in size from instance to instance (like strings). I found a way of letting TraceEvent do most of the dirty work for you. To do this, you specify the WNODE_FLAG_USE_MOF_PTR flag in the header, and pass in an array of structures that contain pointers and lengths for the data to be appended.

 

Step 5 (Exposing your events to other tools)

 

After logging the events with TraceEvent, all your required work is done. However, only you know what the additional data in your events was. To make it possible for other programs to learn the structure of your data, you’ll need to describe it using a .MOF file.

 

MOF files are used by WMI (Windows Management Instrumentation) for many things, event layouts being just one special case. The syntax of .MOF files can be pretty baroque, so I suggest cheating, like I did. I copied an existing .MOF file on my system (in my case, the file for the IIS events) and modified it to have my GUIDs and events.

 

After creating the .MOF file, you add it into your system with the system provided MOFCOMP tool. The easiest way to see if your MOF data made it into the system is with the “WMI CIM Studio” application, which I described in my prior post on ETW events.

 

Collecting and viewing your events

 

Once your provider code is built and running, you’ll no doubt want to try it out. There are many tools available from Microsoft for this task. To enable tracing in your provider, the most well known tool is LOGMAN. LOGMAN has an esoteric command line syntax that definitely keeps you on your toes. My two pieces of advice: Don’t forget the –ets switch (which really should be the default) , and consider putting your provider GUIDs into a text file and using the –pf option. To make life simpler, I put my LOGMAN commands into a .BAT file. For instance, to enable my provider, my .BAT file contains:

 

logman create trace MySession -o .\provider.etl -pf c:\util\VSPerfTraceProvider.guid

logman start trace MySession

 

Note that two separate invocations of LOGMAN were needed. One to create the session with the necessary parameters (for instance, my provider GUID is in the VSPerfTraceProvider.guid file), and the second to actually enable logging in the session. The events will be written to “provider.etl” in the current directory.

 

To stop the session, you’ll need to use LOGMAN (or another tool) again. My .BAT file for stopping my trace session contains:

 

logman stop MySession

logman delete MySession

 

To view events, the most common tool is TraceRpt, which processes your events and tries to match up your custom event data with the MOF data in the WMI repository. The output from TraceRpt is a .CSV file, which isn’t the most user friendly format, but many programs understand it.

 

Wrap-up

I’ve intentionally omitted lots of details from my description of tracing. My goal was to give the “big picture” overview to help the existing documentation make more sense.

 

At some point, I may do yet another blog post on consuming ETW events in your own code, rather than using tools like TraceRpt.

 

I need your feedback: Is the technical level of these posts about right? I know they’re not as deep as my MSDN column was, but I can also create one of these in an hour or two, rather than in a week.