TraceLogging - Background

Article
09/24/2015

ETW is a system for getting data from providers to consumers. The core ETW runtime does not know anything about the payload of the ETW event -- it just routes the event based on event attributes such as provider ID, event ID, level, and keywords. The user of ETW can put any data in any format into the event (unless the payload causes the event to exceed the ETW event size limit of 64KB).

While it is possible to write custom code to pack and unpack each event, this tends to be hard to manage. Developers don't want to spend time writing custom packing and unpacking code. Worse, if each event has a custom packing/unpacking format, there can be no general-purpose decoding tools. To avoid this situation, several event encoding/decoding protocols have been built over the general ETW infrastructure.

MOF - description of the event encoding provided in MOF (WMI) files. (Old technology, generally not used for new ETW providers. Covers mostly the same scenarios as manifest-based ETW, but with less flexibility, so it won't be discussed further in this post.)
WPP - description of the event encoding provided in C/C++ code. A special preprocessor (tracewpp) generates code for packing the event data and generates TMF files that can be used by event decoding tools to decode events. The preprocessor also adds attributes to the code that allows the TMF files to be reconstructed from the PDB. (Intended for logging diagnostic information for use by developers.)
Manifest - description of the event encoding provided in MAN (XML) files. A special preprocessor (mc) generates a binary representation of the manifest file and optionally generates C/C++ code for packing the event. The binary version of the manifest file is added to the resources of a DLL and registered on a system so that decoding tools can find it when they need to decode events. (Intended for general purpose logging, e.g. for the Event Log or for diagnostic purposes.)

One of the major challenges of ETW is managing the decoding information (the "metadata") for events. Somebody has to author the event description in a way that allows both machines (encoder and decoder) and humans (developer writing the event and whoever wants to read the decoded version of the event) to understand the event's content. This event description has to somehow be made available to the tools that are used to process and decode the event.

WPP is popular because in straightforward cases, it is very easy to use. A developer adds a few macros to the code, sets up the build system to enable the WPP preprocessor, and writes familiar printf-style code. Decoding typically requires TMF or PDB files and an ETL file turns into a list of timestamped messages that is easy to look through. However, it becomes hard to manage in more complex scenarios - the macros are very tricky to customize, the preprocessor can interfere with the build process, the flexibility of the printf-style formatting means the data is not in a standard format that is easy to analyze, and the decoding information (PDB or TMF) is specific to a particular build of the DLL, so it can be hard to decode the WPP data from a particular ETL file (you have to figure out what version of the DLL was used, then locate the TMF or PDB file for that specific version). In addition, WPP is very tightly-coupled to C/C++ and won't work for other languages.

Manifests work well in other scenarios where the centralized manifest provides benefit. Developers maintain a manifest file, and the binary representation of the manifest is included with the product (embedded in the DLL resources). Maintaining the manifest is more work, but it provides benefits like stable event definitions (events from the old product can generally be decoded using the new manifest) and localizable event messages. The data decodes into name-value pairs (where the value has a strong type like "uint32") that allow for structured analysis if necessary. On the other hand, maintaining the manifest is more work, and can be completely infeasible in some scenarios (e.g. scripted environments). Proper decoding requires global registration of the binary during component installation, which can also be problematic (e.g. for Windows store apps, or xcopy-deployed apps).

Neither of these technologies is ideal for all situations. During development of Windows 10, the ETW team looked for ways to improve the metadata management story. We came to the following conclusions

There is no way to make a single metadata handling system that meets all needs. A centralized design (a single point where all events in a component are defined, e.g. a manifest) is a requirement for some scenarios but adds unnecessary complexity for other scenarios.
Manifest-based ETW is a reasonable solution for centralized scenarios.
WPP-based ETW is a reasonable solution for some distributed scenarios.
There are many scenarios for which neither manifest-based nor WPP-based ETW is a good fit.

We designed a new ETW metadata system to address scenarios that weren't met with either manifest-based or WPP-based ETW. The new system is called "TraceLogging" or "manifest-free ETW". The new system was designed with the following goals:

Usable on any development platform, including C/C++, .NET, and scripting languages.
Automatic management of metadata. The developer shouldn't have to worry about how the decoder gets the decoding information. ETW should Just Work, and ETL files should Just Decode.
Permit distributed management of decoding metadata. The developer should be able to write the event description in code.
Provide strongly-typed and structured data. Data is organized as name-value pairs. Values are strongly-typed as uint32, string, datetime, etc. Values can have complex types such as structure, array, array of structure. Structures can nest.
Providers should be able to use this technology on Vista or later.

At its core, TraceLogging is simply a convention on how to encode data and how to describe the encoded data. A TraceLogging ETW event is simply a normal ETW event with a payload packed using the TraceLogging encoding rules and a flag set that tells the decoder to unpack using TraceLogging decoding rules. The implementation of this technology includes helpers for packing the TraceLogging event and libraries for decoding TraceLogging events.

The primary difference between TraceLogging and the other systems is that each TraceLogging event includes its own decoding information. In WPP, MOF, or manifest-based ETW, the decoder must locate a file with the appropriate decoding information, but each TraceLogging event is self-contained, allowing the decoder to interpret the event content without access to any external information. In some ways this is similar to the way a JSON object or XML document can be parsed (and potentially even analyzed) without access to any external schema reference.

The benefits to this approach include the following:

Decoding information is always available. There is no need to track down a manifest, MOF, PDB, or TMF file. There is no need to register your component's metadata. The decoding information is never out of date.
Events can be generated dynamically. This makes ETW more easily usable from scripting languages like JavaScript or Python.
Event definitions can be distributed and can be written directly in code with no need for an external tool or preprocessor.

As mentioned above, there is no way to make a single ETW solution that meets all needs. The following issues must be considered when discussing the new TraceLogging technology:

Each TraceLogging event contains its own metadata, so TraceLogging events are always larger than the equivalent manifest-based or WPP-based event. While the metadata encoding is relatively compact, it can add several hundred bytes per event. In cases where reducing the event size is a high priority (e.g. when generating millions of events), TraceLogging is not ideal.
TraceLogging supports distributed event authoring, so there is no single place to go to find a list of all of a component's events.
To write a TraceLogging event, the metadata must be available (and in memory) at runtime. The design of WPP and manifest-based ETW support installing a provider without also installing the decoding information, potentially saving disk space. In addition, neither WPP nor manifest-based ETW need to load the decoding information into memory until the event needs to be decoded.
TraceLogging events do not support localization. The provider name, event name, and field names are hard-coded into the event, and there is no built-in support for localized stringtables to map these identifiers to an alternate language.
TraceLogging does not have built-in support for enumerated types. The existing TraceLogging APIs simply convert the enumerated value to an integer. There is no built-in support for associating friendly names with each enumeration value.

We've implemented support for TraceLogging in C/C++, .NET, and Windows Runtime.

For C/C++ code (kernel-mode or user-mode), use the Windows 10 SDK and include the <TraceLoggingProvider.h> header. Note that while this header is new to the Windows 10 SDK, you can use this header in projects that target Vista or later.
For .NET code, we've updated System.Diagnostics.Tracing.EventSource in .NET 4.6 to add extensive new capabilities based on the flexibility provided by TraceLogging. If you want to use this functionality with an earlier .NET runtime, you can use the EventSource NuGet redistributable from nuget.org.
For Windows Runtime code, we've updated the Windows 10 version of LoggingChannel to support writing structured events based on the new TraceLogging encoding.

The provider side of the TraceLogging technology stack does not require any new OS features, and works correctly on Vista or later (with a few limitations). You can write a program that generates TraceLogging events, run the program on Vista, and capture the events into an ETL file. The TraceLogging provider library will pack the payload using TraceLogging rules, and will set the event's channel = 11 to flag the event as being TraceLogging-encoded. It will then use the Vista-compatible EventWriteTransfer API to send the encoded event to ETW, which will treat it like any other event. However, you will need an OS update for the following functionality:

Using the TDH APIs to decode TraceLogging events requires new TDH functionality. This functionality is built-into Windows 10, and is available as a Windows Update for Windows 7 SP1 and for Windows 8.1.
Writing an event with a channel other than 11 requires new functionality in the core ETW runtime. Again, this support is built-into Windows 10 and is available as a Windows Update for Windows 7 SP1 and for Windows 8.1.

TraceLogging - Background

Additional resources