Part 3: ETW Methods of Tracing

Introduction and Overview

 

This is Ivan from the Platforms OEM team and this is the final installment of the ETW series. In this article, we are going to continue our exploration of the ETW tracing available in Windows. This post is going to cover some of the other methods available to enable and capture ETW logs. In previous posts we covered

 

-ETW Introduction and Overview
-Exploring and Decoding ETW Providers using Event Log Channels

In this post we will explore some of the methods Microsoft support may use to enable and capture tracing, in order to troubleshoot issues on a box. In fact, some of these methods may ultimately be quite transparent when you use them; but we wanted to dig a bit into what his happening behind the scenes.

Please keep in mind that the previous post, Exploring and Decoding ETW Providers using Event Log Channels, was geared more towards self-discovery and decoding. This post covers the more advanced scenarios and tools that you may see Microsoft support using.

We plan on starting with some of what’s available in the box, then using a command line tool (logman) to enable further logging to a kernel buffer – both of these are useful for dumps. The second set of scenarios we are going to cover is using the command line (logman) and GUI to enable tracing to a sequential file. We will then finish up with some other special types of tracing available.

Tracing already available (and likely running) in the box

 

Starting with Vista/2008, there is a variety of tracing already running inbox and accessible via a kernel dump, or logging to an on-disk ETL file (limited tracing is available in 2003 & XP). This screenshot was taken from a default Vista box with no additional logging enabled or turned on. While we aren’t going to cover these ETW providers in detail, it’s good to know these are available for analysis. One logger in particular is a useful kernel logger, which is the Circular Kernel Context Logger or CKCL, a sort of In Flight Data Recorder (IFR). The CKCL is one of the two available kernel loggers, and is intended as a high-performance, high volume log to record the last 0.5-3 seconds (assuming high load) of low level kernel activity on the box. The CKCL is set to circular mode meaning newer events will overwrite older events, and with a 4MB total buffer space available.

This can be very useful, because along with all the rich information a full kernel .dmp provides, the CKCL can enhance that and provide recent historical information about how the kernel arrived at the state it’s currently in. The CKCL is enabled by default in Vista and can be enabled in Server 2008 in the GUI or via the command line.

image_thumb[3]

The CKCL Keywords of 0x237f correspond to the following flags that are enabled on this box: process, thread, img, proccntr, cswitch, dpc, isr, disk, file, hf. The screenshot below shows information from the “NT Kernel Logger” session whose provider name is “Windows Kernel Trace”. “Windows Kernel Trace” is the provider for the CKCL and is shown here because it is the easiest/most reliable way to view what the kernel flags mean.

The green boxes below highlight the flags that were automatically enabled on this Vista machine and available in dumps. You might notice that tools such as XPerf will modify and collect data from the “NT Kernel Logger” session. XPerf post processes/merges the trace (along with other potential providers) to add valuable information to be able read the trace in more human readable form as well as reliably view the trace offline. However, even without the post processing added by Xperf, traces can be pulled from kernel dump, decoded, and even viewed by Xperf (although they will be missing information such as process names, ISR names, symbol information, etc).

To review how the kernel providers and sessions are related:

Session Provider Comment____________________________

Circular Kernel Context Logger(CKCL) Windows Kernel Trace In-Flight Recorder for recent activity on the box
NT Kernel Logger Windows Kernel Trace Kernel Logger used by XPerf but can be enabled manually w/o XPerf

 

Win7/2008R2:

In order to cover the newer flags available, we illustrate the “NT Kernel Logger”, which is enabled because xperf was enabled with: xperf –on Base+Diag. To reiterate, the same flags are available in the CKCL, but whose configuration and purpose is suited toward an IFR scenario.

image_thumb[34]

image_thumb[35]

image_thumb[36]

Vista/2008:

As covered in the picture above, the flags are very similar to Win7, with the exception that dispatcher and virtalloc are not available.

Server 2003:
ETW Tracing in 2003 is limited. You can see the Kernel Flags are much smaller and do not cover flags such as ISR or DPC.

image_thumb[37]

Using the !wmitrace.strdump debugger extension, we can find the Logger Id for the CKCL which is 2.

image_thumb[38]

And use !wmitrace.logsave to save the log to an ETL file

image_thumb[39]

Here we use Xperf to load the ETL file and are looking at ISR history over the lifetime of this particular trace (roughly last 250s of time before dump). This is a trace from a normally functioning box and is showing that ISR at 0x83b44b65 had the highest count over our selected ~100ms timeframe.

image_thumb[40]

As previously mentioned, you will need to hand decode much of the trace data, because the XPerf merge step is missing. Here we find that the ISR at 0x83b44b65 is from ndis!ndisMiniportIsr

image_thumb[41]

Finally, we can dump the log out in the debugger in plain old text format using !wmitrace.logdump, which we can decode because Windbg ships with a system.tmf which will decode most Kernel Events from the CKCL.

image_thumb[42]

Using Logman to enable tracing to a circular buffer

 

Logman is a built in tool that ships with the OS in Vista/Win7, and can be used to enable and control ETW tracing. We are going to cover it first because it can be used to automate starting and stopping ETW logging without using the Computer Management GUI. It is also baked into the OS, making it ideal because no extra tools are required to start using or enabling ETW tracing. While logman is very useful, the syntax is often inconsistent, and it’s often difficult to figure out the right way to run a command, which is why we are going to cover some examples here.

You may even receive a set of batch scripts that basically automate the starting and stopping of tracing. For example:

-Start.bat <- Run this to start tracing
-Stop.bat <- Run this to stop tracing

Here we are using logman to start the tracing of the just previously mentioned “NT Kernel Logger” (used by XPerf) to enable the “Windows Kernel Trace” provider. This logging would be useful to enable kernel events that aren’t enabled by default in the CKCL, or if you want a larger time than the 4MB buffer the CKCL provides. The logging is flushed to disk after every 1MB buffer fills up.

<Start.bat>

REM – This creates a kernel trace using 20MB buffer in a circular file mode that logs to kernel.etl (and available in a memory dump as a circular buffer)

logman start "NT Kernel Logger" -p "Windows Kernel Trace" (process,thread,disk,isr,dpc,net,img,registry,file) -bs 1024 -nb 20 20 -mode Circular -o kernel.etl -ct perf -max 20 -ets

</Start.bat>

<Stop.bat>

logman stop "NT Kernel Logger" -ets

</Stop.bat>

 

Using Logman to enable tracing to a file

 

You may receive logging scripts that are tuned specifically to the problem at hand, including the necessary providers, and within each provider, applicable Flags and Levels. In the previous blog posts, we already covered some information about how to find out providers, so we let’s assume we already have that information. Using logman to enable tracing in this manner allows for an ETW trace .etl to be captured and sent to Microsoft for decoding. Using the Microsoft-Windows-WinINet example, the tracing may look like this:

<Start.bat>

REM - This creates a circular file trace with all Flags and Levels on the Microsoft-Windows-WinINet provider, with a max size of 20MB

logman start customTrace -o c:\temp\customTrace.etl -p "Microsoft-Windows-WinINet" 0xff 0xff -mode Circular -max 20 -ets

REM This creates a file trace with all Flags and Levels on the Microsoft-Windows-WinINet provider, logging until stopped

logman start customTrace -o c:\temp\customTrace.etl -p "Microsoft-Windows-WinINet" 0xff 0xff -ets

 </Start.bat>

<Stop.bat>

logman delete customTrace

</Stop.bat>

Using the Computer Management – Performance to enable tracing manually

 

The Computer Management performance MMC GUI snap-in is useful for viewing and changing trace sessions. In previous blog posts, we briefly touched on the Event Trace Sessions GUI to view sessions auto-created when enabling a Windows Event log channel. Channels are useful because they don’t require much knowledge of the individual keywords/flags and levels to enable – they are pre-defined for a scenario. Most events that are delivered to a channel or decoded due to Event Manifests. However, there is another class of ETW providers called WPP Software Tracing, which was originally intended for driver tracing, but lacks a lot of the self-described meta information that manifest based providers give.

This is where Computer Management -> Performance -> Data Collector Sets -> Event Trace Sessions, once again comes in handy. Not only can you view existing sessions created by the system (Autologgers/Start Event Trace Sessions), but you can modify tracing, and even create new tracing; which is what we are going to cover. This is the only way to create tracing using the GUI for WPP style ETW tracing, such as iSCSI.

image_thumb[43]

To create a new session, right click Event Trace Sessions and choose New -> Data Collector Set

image_thumb[44]

In our example, we already know what type of tracing to enable (you may be directed by Microsoft Support), or you may be given an .xml template to automatically setup the tracing. In our example, we are going to setup tracing for iSCSI, which is WPP based.

image_thumb[45]

The list of providers that displays unfortunately isn’t searchable, but is sorted alphabetically, so we can use our ‘logman query providers | findstr /I iscsi” to find providers named iscsi.

image_thumb[46]

Which we should find in the GUI of Event providers:

image_thumb[47]

Next we need to choose what Keywords (Any) are used for this provider. Keywords (Any) is an OR filter meaning any bit that is set on the Keyword/Flag mask will be traced. Keywords usually specify functional sub-components while Levels control at which level of detail (Error, Informational, Verbose) events are logged. Components are free to choose what Keywords/Flags they use, and what Levels the use; and don’t have to use both. We will need to set or check both Keywords(Any) and Level(s) in order to get any ETW tracing, otherwise events won’t be logged.

image_thumb[48]

When we edit Keywords (Any) to include these four flags, you should notice the mask is set automatically to 0xf, which will reflect under Keywords(Any). When we go to edit the level, we notice levels don’t seem to be exposed or used by the msisci_iScsi provider. That normally means a provider doesn’t use levels, but it is recommended to always be on the safe side, so to make sure not to lose events we can set the level anyways. Here we set the level to 0xff.

image_thumb[49] image_thumb[50]

 

image_thumb[51]

What we should have now is a trace session called iSCSI with the msisci_iSCSI provider with Keywords(Any) of 0xf and Level of 0xf. There are just a couple of more items we need to check before starting to collect data on this provider, while is changing the Trace Buffer parameters, and checking stream mode and log location.

image_thumb[52] image_thumb[53] image_thumb[54]

Here we change the Buffer Size to 1MB, flushed to disk every second, with 10 buffers available in case there is so much event load that events can’t be flushed to disk fast enough. We simply check the Stream Mode and see it is set to File, and check the file name which is under %LOCALAPPDATA%\iSCSI.etl. Now we can start the ETW session and start collecting data!

image_thumb[55]

Using the Computer Management – Performance to enable tracing with a template

 

Now, while it’s useful to know how to manually create sessions, either with the GUI or logman command line; you may only want to quickly enable tracing provided by someone else, such as Microsoft Support, or save off commonly used tracing. This is where templates come in handy. Continuing with the last session, we can right click on our iSCSI session and choose “Save Template”, and then save the template as iSCSI.xml.

image_thumb[56] image_thumb[57]

Now when creating a new trace with the GUI we can use a template to capture the previous saved settings, and quickly setup tracing.

image_thumb[58]

Using tracelog (DDK) to enable tracing to a file

 

Tracelog is an ETW tracing tool that ships with the DDK. It has some more advanced features than logman that ships with the OS, with a disadvantage that you will need to download the WDK/DDK. You will find tracelog under %DDK_INSTALL_DIR%\tools\tracing\%PROCESSOR_ARCHITECTURE%\tracelog.exe, along with other useful tools such TraceView, and tracefmt.

While you can use the built-in Computer Management GUI or logman to manipulate most tracing, a couple of key features that tracelog provides are:

                -Realtime output to the kernel debugger (-kd switch)
-Private process heap tracing (-heap -pids <n> <pid1 pid2 …>). Heap tracing is also available in xperf, and is more useful due to stack walking (more on this later)
-Private process critical section tracing (-critsec -pids <n> <pid1 pid2 …>)
-Ability to create autologgers/traces that persist on reboots (-addautologger). Note: This ability does exist in logman, but is hidden. It can be done by appending “autosession\” to the name of session using logman.

Here we are going to use tracelog to demonstrate a special type of logger in ETW called a private session logger, which uses private buffers inside a process (normally ETW buffers live in kernel mode). Private buffers are the only way to collect heap or critical section ETW tracing.

Critical section private tracing

First we start by finding the notepad process, for which we are going to collect critical section tracing.

C:\temp>tasklist | findstr /i notepad

Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
notepad.exe 7752 Console 2 10,052 K

Now we use tracelog with the –critsec switch to

<Start.bat>

tracelog -start critSecTrace -f c:\temp\critSecTrace.etl -critsec -pids 1 7752

</Start.bat>

<Stop.bat>

logman -stop critSecTrace

</Stop.bat>

The critical section trace can be decoded with the DDK tracefmt tool using the system.tmf in the DDK. Here is an example portion of the decoded log.

image_thumb[59]

Heap private tracing

 

We can use tracelog in a similar manner to enable and decode heap tracing. The heap trace can also be decoded with the DDK tracefmt tool using the system.tmf in the DDK.

<Start.bat>

tracelog -start heapTrace -f c:\temp\heapTrace.etl -heap -pids 1 7752

</Start.bat>

<Stop.bat>

logman -stop heapTrace

</Stop.bat>

image_thumb[60]

While the heap trace is useful in its own right, we can even get more useful info using XPerf. Xperf is a great way to view most types of ETW data (especially performance), and in this particular case is vital to get full stack traces that led up to the heap entries. Xperf can be downloaded from the Windows Performance Analysis Dev Center. Here we enable stacktraces for heap allocations and reallocations so we know who is creating the heaps. You may want to explore other heap stackwalking available with the xperf help - ‘xperf -help stackwalk’

image_thumb[32]

<Start.bat>

xperf -start HeapSession -heap -PidNewProcess "notepad.exe" -BufferSize 1024 -MinBuffers 128 -MaxBuffers 128 -stackwalk HeapAlloc+HeapRealloc

</Start.bat>

<Stop.bat>

xperf -stop HeapSession -stop -d c:\temp\xperf_heap.etl

</Stop.bat>

Now we can load up the ETW trace in xperf, and view heap allocation information (in additional to kernel info such as CPU from the NT Kernel Logger), with stack traces at the time of allocation!

image_thumb[61]

Wrapping it up

 

Hopefully, we dug further into ETW and explored some of the various methods to put the power of ETW to work. There is a lot of tools and power exposed with the built-in ETW tools, and some other scenarios that can be enabled with add-on tools from the DDK and XPerf.

 

Share this post :