ETW - Overview

Article
09/30/2015

Event Tracing for Windows (ETW) is a system for routing events. ETW is primarily intended for diagnostic purposes and is optimized to minimize impact on the overall system performance. ETW should not be used for control purposes as it does not offer guaranteed delivery -- events might be lost in certain circumstances (e.g. if events occur too quickly or if the system shuts down before the events are saved to disk).

ETW Components

ETW works as follows: a controller tool is used to configure sessions, telling ETW what events should be routed to a session and how to store the events that have been routed to a session; an event provider generates events; the ETW runtime routes events to the appropriate sessions; the sessions record the event; a decoder tool extracts information from the events; an analysis tool makes use of the information from the event.

Microsoft provides several controller tools such as XPerf, LogMan, and TraceLog. You can use these tools to start a session, to configure a session (controlling which events should be saved and the location to which they should be saved), and to stop a session. 3rd parties can make use of the ETW APIs (StartTrace, ControlTrace, TraceSetInformation, EnableTraceEx2), to create their own controller tools. In addition, the Windows OS itself controls several sessions that it uses to monitor its own performance and reliability. ETW supports global sessions (receive events from any provider on the system) and private sessions (receive events only from providers in the same process). The system is limited to 64 global sessions, and each process is limited to 4 private sessions. Special privileges (e.g. membership in Administrator or Performance Log Users) are required to control a global session. You can use "TraceLog -l" to get a list of active global sessions.

ETW sessions can be configured to save events in various ways:

The session can be configured to write events to a file. In addition, ETW can treat the file as unlimited, as a circular buffer (overwriting old events once the file has reached a certain size), starting a new file when the file has reached a certain size, or stopping the trace when the file has reached the size limit.
The session can be configured to write events to a reserved block of memory. These events will be lost if the system is turned off. However, the data in memory can be flushed to disk on-demand. The session will use the memory as a circular buffer (overwriting old events once the block is full).
The session can be configured to send events to a consumer in real-time (via a callback). The consumer can then decide what to do with each event.

3rd parties can make use of the ETW APIs (e.g. EventWrite) to create providers. Many Microsoft-provided components also act as providers. The Windows OS kernel publishes many events regarding the operation of the OS, and many Microsoft-provided drivers and applications publish events. Many ETW sessions are always running on the system, started and controlled by the OS itself. For example, the Windows Event Log uses ETW sessions to receive events, so the Event Log sessions are always running to enable this functionality. Other ETW sessions are started as needed. For example, Visual Studio will start a special ETW session when you ask it to do performance profiling of your application. Visual Studio will configure the session to receive events from the OS about the performance of your application, which it will then analyze to determine where your application is spending its time and memory.

ETW itself treats the data as binary data, and any kind of data can be encoded using any system. If the data is encoded using a custom scheme, you must use a custom tool to decode the data. However, there are several Microsoft-supported systems for encoding ETW data, and corresponding tools that can decode the ETW events generated by these systems. For example:

WPP is a system for encoding diagnostic trace information and writing it to ETW. If you use WPP to generate ETW events, you can decode them using tools such as TraceFmt or TraceRpt. Note that in order to decode these traces, you must have the TMF or PDB files with information about your events.
MOF events are rarely used in new code, but many MOF-decodable events are generated by the Windows kernel. TraceRpt and other TDH-based decoders can decode MOF-decodable events as long as the appropriate event schema has been registered with WMI. The XPerf and WPA tools cannot access the WMI-registered data, so they cannot decode 3rd-party MOF-decodable events, but they contain hard-coded decoders for many common events generated by the Windows kernel. In particular, WPA makes heavy use of the kernel-generated events to analyze system behavior.
Manifest-based ETW is a general-purpose system for encoding and decoding ETW data using XML manifests. If you use manifest-based ETW, you can decode the events using TraceRpt or XPerf. Note that in order to decode these traces, the data from the manifest must be available to the decoding tool. This can be provided directly (e.g. by providing the path to the XML manifest on the TraceRpt command line) or indirectly by compiling the manifest into a binary format, including the binary data in the resources of your component, and registering the component on the system so that the decoding tool can find it. (You can use "TraceLog -enumguid" to get a list of components that have registered their decoding data on a system.)
TraceLogging is a general-purpose system for encoding and decoding ETW data without using separate decoding information. TraceLogging events can be decoded using the Windows 10 versions of TraceRpt or XPerf. Each event contains its own decoding information. This means each event will be larger, but it also means that the decoding information is always available and no separate XML, MOF, PDB, or TMF files need to be tracked.

3rd parties can make use of the ETW APIs (ProcessTrace, EVENT_RECORD) to get the data from an ETL file, and can use the TDH APIs (TdhGetEventInformation) to decode events that use the Microsoft-supported encoding systems.

ETW was introduced as part of WMI in Windows 2000. Events written using the Windows 2000 APIs (e.g. TraceEvent) are sometimes called "Classic" events. The WPP and MOF frameworks for writing events use the Windows 2000 APIs, so they generate classic events. Providers that use the Windows 2000 APIs have a significant limitation -- they can only be associated with one session at a time. If multiple sessions try to subscribe to events from the same classic provider, only the last session will actually receive events.

ETW was significantly updated for Windows Vista. Events written using the Windows Vista APIs (e.g. EventWrite) are sometimes called "Crimson" events. The XML manifest and TraceLogging frameworks use the Windows Vista APIs, so they generate crimson events. Providers that use the Windows Vista APIs can be associated with up to 8 sessions at a time.

Event Content

Each ETW event contains the following information:

Required user-supplied information, such as the event provider GUID, the event ID, the event's "severity level", and the event's keywords.
Optional user-supplied information, such as event payload (binary data), activity GUID, and related activity GUID.
Always-present ETW-supplied information, such as the event's timestamp and the ID of the thread logging the event.
Optional ETW-supplied information, such as call stacks. The session controller configures ETW to include or exclude this data.

The Event Provider GUID is supplied in the call to EventRegister. This GUID is used when routing and decoding events. For example, an event controller might configure a session to include all events with a particular provider GUID. When decoding MOF-based and manifest-based events, the provider GUID is used to look up the data that explains how to decode events. Note that multiple components can use the same provider GUID (even if they run at the same time) as long as you intend to enable/disable/route the events from multiple components as a unit, and in the case of manifest-based and MOF-based events as long as they share the same decoding information. Note also that there should be a strong 1-to-1 relationship between a provider name and the provider GUID. In order to maintain this relationship, you might want to base the provider GUID directly on the provider name using a hash. (A tool that performs this hash is provided in an earlier blog post.)

Each event has an event ID. The event ID is used when routing and decoding events. For example, an event controller might configure a session to include only events 1, 2, and 5 from a particular provider. When decoding MOF and manifest-based events, the event ID + event Version should uniquely identify the event within the provider (e.g. given a provider GUID, an event ID, and an event Version, you should be able to look up the event's binary layout, and the binary layout should never change for the GUID + Id + Version combination; every event with the given GUID + Id + Version should have the same fields with the same types in the same order).

Each event has an event version. This is used when you need to change the fields in an event. When the binary layout of an event is changed, you'll typically duplicate the event in the manifest or MOF file, increment the event version in one copy, and make the necessary changes in the copy. That way, you can still decode events in the old format, and you can move forward using the new format.

Each event has an event level. This is an indicator of the event's severity or importance, and is used in event filtering. For example, an event controller might configure a session to include only events with severity "error" from a particular provider, or an event analysis tool might filter out events with severity below "warning". The level can be any value from 0 to 255. Levels 1-5 have been given names by Microsoft: 1 is critical, 2 is error, 3 is warning, 4 is info, and 5 is verbose. Typically, events should default to level 5 (verbose). The level 0 is special -- it means that the event does not specify a particular level, and will be enabled regardless of any level-based filtering done for the session. A lower numeric value means a more-severe event.

Each event has a keyword field. (In classic ETW, the keyword field was called "flags".) Each bit in the keyword corresponds to a category, and if the bit is set in the keyword for an event, it indicates that the event belongs to the specified category. The low 48 bits of the keyword are user-defined, while the upper bits must only be set for specific scenarios as defined by Microsoft. For example, the provider might define bit 0x1 as "Networking event", bit 0x2 as "I/O event", and bit 0x4 as "UI event", and a particular event might set its keyword to 0x5 indicating that the event is related to both networking and UI. An event controller might configure a session to only include events with certain keywords. If the keyword value is 0 (no bits set), that means the event does not specify any keyword and will be enabled regardless of any keyword-based filtering. (Note that starting with Windows 10, a controller can configure a session to ignore events with keyword set to 0.)

Each event has an opcode field. (In classic ETW, the opcode field was called "class".) The opcode is used to mark events, but does not affect ETW routing. A few opcodes have well-defined semantics recognized by analysis tools. Other opcodes can be defined by the user for any purpose. General-purpose analysis tools will only use the opcode to label or group the events. The default opcode is 0 = Info. The most commonly-used opcodes are Start (indicating the beginning of an activity) and Stop (indicating the end of an activity). Other commonly-used opcodes include DC_START (typically indicating that the associated event contains a dump of the provider's state, triggered by a notification that a session has started collecting data from the provider) and DC_END (similarly indicating that the associated event contains a state dump triggered by the end of data collection). The remaining opcodes can be used or defined at the discretion of the user.

Each crimson event has a task field. The task is a user-defined category that can be used to mark events, but does not affect ETW routing. Tasks can be used to group events into categories. It is also common to use tasks to give a name to an event, since manifest-based ETW does not have a first-class concept of an "event name" but it does have a concept of "task name". This is done by assigning the same numeric value for Event ID and Task ID, then setting the Task Name to the desired Event Name. (TraceLogging-based ETW is the reverse. TraceLogging supports Task ID, but does not support Task Name. On the other hand, TraceLogging does support Event Name, and uses the Task Name field for the Event Name.)

Each crimson event has a channel field. The channel can cause an event to be given special treatment by the ETW runtime or by an event consumer. The default channel 0 means no special treatment is requested. Channels 1-15 are reserved for definition by Microsoft and may lead to special treatment for the event by the ETW runtime or other built-in Windows OS components. Other channel values are user-defined and can be used to request special treatment from a session. The kind of special treatment depends on the kind of session that receives the event. For example, a real-time session might interpret channel 25 as indicating a high-priority event. Channels are most commonly used when interacting with the Windows Event Log. The Event Log's session listens to certain event providers that have registered themselves with the service. The Windows Event Log service uses the event's channel to route the event to the correct log. For example, a particular provider might register itself with Event Log and configure channel 16 to mean the Operational log.

Activities

The Windows Vista (crimson) ETW APIs support the concept of an activity. An ETW activity is a set of related events. For example, there might be an activity for reading the data from a file. The activity might start with an event from opening the file, continue with an event each time a block of data is read, and end with an event from closing the file.

At an ETW level, an activity is simply a set of events that all have their ActivityId field set to the same nonzero value. The first event in the activity is the Start event, and it should have its OpCode set to START. The last event in the activity is the Stop event, and it should have its Opcode set to STOP. Activities can be logically nested by associating one activity with a "related" activity. This association is recorded by setting the RelatedActivityId field of the Start event to the ActivityId of the parent event.

The ActivityId is a 128-bit value that is intended to be unique within the scope of a given ETW trace. The ActivityId can be a GUID. However, GUID generation is relatively slow, and ActivityIds do not need to be globally unique, so ETW provides a more efficient API that generates locally-unique ActivityId values (EventActivityIdControl).

Every crimson event has an ActivityId field. The value for the field can be set using the EventWriteTransfer or EventWriteEx APIs. If the user does not explicitly specify the value of the ActivityId field (e.g. if the EventWrite API is used, or if the user passes an activity ID of NULL to the EventWriteTransfer or EventWriteEx API), ETW will use the value of a thread-local ActivityID variable.

Every crimson event can optionally have a RelatedActivityId field. The value for the field can be provided using the EventWriteTransfer or EventWriteEx APIs. If the user does not explicitly specify the value of the RelatedActivityId field (e.g. if the EventWrite API is used, or if the user passes a related activity ID of NULL to the EventWriteTransfer or EventWriteEx API), ETW not include a RelatedActivityId in the event.

The value of the thread-local ActivityID variable can be read and written using the EventActivityIdControl API. Setting this variable will make all subsequent events from the same thread default to the specified activity ID value. This allows for an the following pattern for creating activities:

Save the original value of the thread-local ActivityID variable.
Enter the scope of the activity.
Generate a new activity ID value, and set the thread-local ActivityID variable to this value.
Write a start event. Set the event's OpCode to START. Optionally set the event's RelatedActivityId to the saved original value of the thread-local ActivityID variable. Note that the start event will implicitly use the thread-local ActivityID value, so an explicit Activity ID does not need to be provided.
Write any number of events. As long as no code overwrites the thread-local ActivityID, all of the events on this thread will default to using the thread-local ActivityID value, implicitly making them part of the activity. (Events can exclude themselves from the activity by explicitly providing a different activity ID using the EventWriteTransfer or EventWriteEx APIs.)
Write a stop event. Set the event's OpCode to STOP. Note that the start event will implicitly use the thread-local ActivityID value, so an explicit Activity ID does not need to be provided.
On scope exit, restore the original value of the thread-local ActivityID variable.

The advantage of this pattern is that the events in the activity can be written without explicitly providing an activity ID -- they will default to the thread-local value. The disadvantages of this pattern include the following:

This pattern only works for single-thread synchronous activities. It does not work for asynchronous or multi-threaded activities.
This pattern depends on the good behavior of all other code that runs on the thread. If you call into a function that sets the thread-local ActivityID and does not restore the value on exit, the events after that function returns will not be associated with the correct activity. Note that while it is good practice to restore the activity ID on scope exit (even on exception or other error), there is no enforcement of this rule. In addition, it is possible for a call to a Windows OS API to overwrite the current thread's Activity ID, e.g. if the OS API calls into a driver that sets the thread-local activity ID and does not restore the original value.

Event Filtering and Routing

ETW is responsible for routing each event to the correct set of sessions. Classic event providers (providers using the Windows 2000 APIs such as TraceEvent) can be routed to at most one session. Crimson event providers (providers using the Windows Vista APIs such as EventWrite) can be routed to up to 8 sessions.

Most events are not routed at all. If no session has asked for any events from a particular provider, the provider is considered disabled, and any events from that provider are ignored. When a disabled provider calls EventWrite, ETW will determine that the provider is disabled and will immediately return success, indicating that the event was delivered to all interested providers (all 0 of them). However, this is relatively inefficient -- it takes ~100 CPU cycles for EventWrite to validate its parameters and look up the provider's state to determine that the provider is disabled. (Note that using APIs such as EventEnabled or EventProviderEnabled will not help, since these APIs also require about the same amount of time to look up the provider's state. The only difference is that EventEnabled can be called before the event payload is set up.) The implementation of EventWrite looks something like this:

ULONG EventWrite(
    REGHANDLE handle,
    PCEVENT_DESCRIPTOR descriptor,
    ULONG count,
    PEVENT_DATA_DESCRIPTOR data)
{
    if (!ValidateParameters(handle, descriptor, count, data))
    {
        return ERROR_INVALID_PARAMETER;
    }

    ULONG errorCode = ERROR_SUCCESS;
    PROVIDER_INFO info = LookupProviderInfo(handle);
    if (info.IsEnabled &&
        info.IsLevelOk(descriptor.Level) &&
        info.IsKeywordOk(descriptor.Keyword))
    {
        errorCode = info.SendEventToSessions(descriptor, count, data);
    }

return errorCode;
}

Providers can improve their performance by providing a callback function when they register with ETW (see the EventRegister API). ETW will invoke the callback whenever a session enables or disables the provider. This allows the provider to keep track of its own state, including whether it is enabled or disabled, what levels are being filtered out, and what keywords are being filtered out. The provider can check its own state much more efficiently than EventWrite, allowing it to run much more efficiently when the provider is disabled. A provider might store its state in global variables and might then write events using code like this:

if (g_providerIsEnabled &&
    g_providerLevel > ThisEventDescriptor.Level &&
    IsKeywordEnabled(ThisEventDescriptor.Keyword))
{
    EVENT_DATA_DESCRIPTOR data[4];
    // Prepare data for event ...
    EventWrite(g_providerHandle, &ThisEventDescriptor, 4, data);
}

In the case where the provider is disabled, this will skip the event-write code using only 1 or 2 CPU cycles. In the case where the provider is enabled, but the event is filtered out by level or by keyword, this will skip the event-write code using only 5 or 10 CPU cycles. The code to prepare the data and call into ETW will only run if the provider is enabled for events with the given level or keyword.

The code for maintaining the callback and for checking the state of the provider for each event can be cumbersome. Various frameworks exist to hide this complexity from the user. WPP and manifest-based ETW generate header files with code that looks similar to that shown above, implementing very efficient filtering with little effort on the part of the developer. The TraceLoggingProvider.h header implements similar filtering in the TraceLoggingWrite macro. For example:

TraceLoggingWrite(
    g_hProvider,
    "EventName",
    TraceLoggingValue(GetMyValue(), "MyValue"));

In this code, the GetMyValue() function will only be invoked if the provider is enabled.

The .NET EventSource class implements efficient filtering within the WriteEvent, WriteEventCore, and Write methods, so these methods return very quickly if the provider is disabled or if the event is filtered by level or keyword. Similarly, the Windows Runtime LoggingChannel class implements efficient filtering so that the LogEvent, LogMessage, and LogValuePair methods return very quickly if the event does not need to be written. However, in these cases, the automatic filtering does not eliminate the time spent preparing the data for an event. For example, with .NET EventSource,

myEventSource.WriteMyValue(GetMyValue());

In this code, the GetMyValue() function will be invoked even if the provider is disabled. If GetMyValue() returns quickly, this might be ok. On the other hand, if the GetMyValue() function might be expensive, it would be more efficient to use code like this:

if (myEventSource.IsEnabled())
{
myEventSource.WriteMyValue(GetMyValue());
}

Note that the above check only tests the overall provider state. This might be sufficient, or it might be better to use another overload of IsEnabled to also check for level-based or keyword-based filters.

The same applies to TraceLoggingProvider.h -- the automatic filtering does not skip code outside the TraceLoggingWrite macro. For these cases, a manual check might be needed for optimal code. Each of these frameworks provides methods for determining whether the provider is enabled without sending an event. For example, if preparing the data is more complex than a single GetMyValue() call, the above code might be written as follows:

if (TraceLoggingProviderEnabled(g_hProvider, 0, 0))
{
    int myValue;
    GetMyValue(&myValue);
    TraceLoggingWrite(...);
}

ETW supports many other kinds of filters that are difficult or impossible to replicate within the provider code. The provider-side filtering will not exactly match the complete filtering and routing done by ETW, and some events will pass the provider-side filtering only to be dropped by the more complete filtering done by the ETW runtime. This is generally ok -- the provider-side filtering is usually successful at eliminating the vast majority of unnecessary events in only one or two CPU cycles. However, for this simple provider-side filtering to be most effective, events and providers should be organized in a way that allows event scenarios to be defined by a combination of provider GUID, level, and keyword.

If scenarios are well-defined using filters understood by the provider (e.g. filters based on level and keyword), the scenario will have minimal impact on the system when the corresponding session is created. For example, if a "networking" keyword is defined, and the "error" level is used consistently, it is very easy to diagnose a networking error by setting up a session to collect events where level <= error and keyword mask = networking. On the other hand, if no keywords are defined for your events, and levels are not used consistently, the same scenario might not allow for any meaningful filtering at the level or keyword layer. All events in the provider might need to be enabled and collected, leading to a performance impact while the session is enabled.

Each session can be configured to receive events from any number of provider GUIDs. For each provider GUID, various filters can be established, such as filtering by level, keyword, or event ID. The session can also request additional information (e.g. callstacks) for certain events.

When ETW receives an event from a provider, it determines the set of sessions that are receiving events from that provider. It then checks the filters for each session, removing the session from the set if the event does not meet the session filters. If the set of sessions is non-empty, it reserves space for the event in each of the session's buffers. After it has reserved space, it copies the event into the reserved space for each session.

If ETW fails to reserve space in all sessions, its default behavior is to drop the event (i.e. if ETW cannot deliver the event to all interested sessions, it does not deliver the event at all). This policy was intended for consistency, i.e. so that the different sessions will all record a consistent story. However, this behavior is not always desirable. A session can opt out of this policy by setting the EVENT_TRACE_INDEPENDENT_SESSION_MODE flag. (Newer versions of the XPerf tool will set this flag for the sessions it creates.)

Each ETW session has a worker thread that handles tasks for the session. For file-based sessions, the worker thread is responsible for noticing when a buffer is full, writing the full buffer to disk, marking the buffer as empty, and allowing it to be reused for future events. For real-time sessions, the worker thread is responsible for sending full buffers to the real-time consumer process.

ETW - Overview

ETW Components

Event Content

Activities

Event Filtering and Routing

Additional resources