Windows Performance Toolkit - Xperf

I recently came across a very interesting profiling tool that is available in Vista SP1 and Server 08 called the Windows Performance Analyzer. You can use this tool to profile and diagnose different kinds of symptoms that the machine is experiencing. This tool is built on top off the Event Tracing for Windows (ETW) infrastructure. It uses the ETW providers to record kernel events and then display them in a graphical format.

Performance Analyzer provides many different graphical views of trace data including:

  1. CPU Sampling
  2. CPU and Disk utilization by process and thread
  3. Interrupt service routine and deferred procedure call
  4. Hard faults
  5. Disk I/O Detail

What you will need

Download the latest version of the Windows Performance Tools Kit, and install it on your machine. (https://www.microsoft.com/whdc/system/sysperf/perftools.mspx : Windows Performance Tools Kit, v.4.1.1 (QFE)) You will need to find the toolkit that corresponds to your processor architecture. Currently there are 3 versions available i.e. X86, IA64, X64.

After installation you should be able to see 2 new tools. The first one is Xperf, which is a command line tool that is used to capture the trace. The second is called XperfView, which graphically interprets the trace that has been collected by Xperf.

You will need to run the Xperf and XperfView from an elevated command prompt for all functionality.

Starting a Trace

For many tasks all you need for effective analysis is a kernel trace.  For this example, we'll use the –on DiagEasy parameter to enable several kernel events including: image loading; disk I/O; process and thread events; hard faults; deferred procedure calls; interrupts; context switches; and, and performance counters.  From an elevated command prompt launch xperf –on DiagEasy.

This starts the kernel logger in sequential mode to the default file "\kernel.etl"; uses a default buffer size of 64K, with a minimum of 64 and a maximum of 320 buffers. 

Stopping a Trace

To stop a trace, type xperf –d <filename>.etl at the command line. This will stop the trace and output the file.

Viewing a Trace

There are 2 ways to view the trace. From an Elevated Command prompt, launch xperf <filename>.etl, or
launch the XperfView tool and open the file manually. When you open the trace file, you should see something similar like this.

NOTE - While you need to run xperf from an elevated command prompt in order to record a trace you do not need an elevated command prompt in order to *analyze* a trace.  

Using the Chart Selector tab, you can select all the graphs that you want to look at. To drill down in each chart, you can select the Summary table. For instance, in the CPU Sampling chart, the summary table gets you the summary of the processes that were running, with information like the amount of CPU time, CPU %, stacks (if the stacks were collected in the trace, see below). When looking at the Summary table for the Disk I/O chart, you can see which processes were writing files (the filename too!) to disk, as well as how much time it took.

You also have the ability to zoom in on a selected area. Another really cool feature is the ability to overlay multiple graphs on one frame. This way you can correlate different pieces of data together very easily.

Also, you select which counter instances you want to see in each specific chart. On the top right corner of each chart is a drop down box from where you can select the counter instances. For instance on the Disk I/O chart, you can select Disk 0, Disk 1, or a combination as well.

You can also view detailed information about the system that the trace was taken on. Click on the Trace menu item, and select System Configuration.

Xperf Kernel Flags and Groups

In the first sample Xperf command we ran, xperf –on DiagEasy. I am sure many of you were wondering what DiagEasy means. DiagEasy is a group of kernel events that are predefined by the Windows Performance Toolkit. This group includes Process, Threads, Kernel and User Image Load/Unload, Disk I/O, DPCs and Context Switch events.

When we used the xperf –on DiagEasy command, we did not specify an individual provider, so we enabled the kernel events for all the ETW providers on the system. If you want to enable events for a specific provider, you can the following format xperf -on: (GUID|KnownProviderName)[:Flags[:Level]] . For more information about ETW providers, Kernel Flags and Groups, you can run the xperf –help providers command.

Stack Tracing

One of the most powerful features in Performance Analyzer is the ability to visualize stacks. It's important to note that this requires no special instrumentation in the code – only that you have symbols for the binary components you are interested in analyzing.

When the trace is setup to collect the stacks, Performance Analyzer will display call stack summary information for the events that had stack walking enabled. Here is an example that takes a trace (with stack tracing enabled) of the entire system while running a "find string" utility.. We can use the Stack Tracing feature of Xperf to record a stack when certain events happen, or take sample at regular intervals over time. See xperf –help stackwalk output for more info.

Below, we will use the Stack Tracking feature of Xperf to take stack samples at regular intervals. With this output, we will be able to determine where the CPU is spending most of its time within a process.

xperf -on latency -stackwalk Profile

Latency is the kernel group to enable certain events, including the profile event which records the CPUs' activity every millisecond. The "-stackwalk Profile" flag tells Xperf to record stack walks on every profile event, which makes the profile information much more useful. In other words, in order to get profile information with stack walks you need to turn on the profile event, and turn on stack walking for that event.

Note that decoding of stacks requires that symbol decoding be configured. However stacks can be recorded without symbols, and can even be viewed without symbols, although they are much less useful without symbols. I only mention this in the event you're trying to record a trace of a problematic machine with little time to mess around with _NT_SYMBOL_PATH.

To get a trace with the stack information, do the following:

  1. Run the command, xperf -on latency -stackwalk Profile.
  2. Run the application that you wish to analyze, then
  3. You can end the trace by using the regular xperf –d <filename>.etl command.
  4. Load the trace in the viewer and then select the Load Symbols from the Trace drop down menu.
  5. Once the symbols are loaded, select an interesting region on the CPU sampling graph.
  6. Right click and select Summary Table.

Click on the selector tab to bring up the column chooser list. Then select "Process name", "Process", "Stack", "Weight" and "%Weight". These are the most useful columns when looking at stacks from the sample profile event. You should get a view similar to this.

At this point I need to mention a few of the restrictions with stack walking coupled with when and how it works.

· Xperf stack walking is not available on XP

· On Vista stack walking is available for x86, and is available for x64 as of Vista SP1.

· On Windows 7 stack walking is available.

· Stack walking on x64 is complicated. You have to set DisablePagingExecutive in the registry, as documented here:

https://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/282e5beb-0afd-411b-9a81-a0bbbf569dfe/

REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD –f

Sample Case

I recently came across a case where the customer was complaining that DPC processing was taking up too much CPU time. We ran Xperf on the machine and drilled down into the DPC activity on the machine.

From the Xperf graph, I was able to confirm that the customer was actually seeing high DPC usage. I selected the Summary for this chart, and got the list of drivers that were actually taking up CPU time.

Right off the bat, I could identify the driver that had a lot of DPC activity. I also noted that the average duration for each DPC from that driver was taking 788 microseconds. This is way too high. Each DPC should be taking a maximum of 100 microseconds.

 

- Omer

  

Reference

Performance.Analyzer.QuickStart.xps – This is shipped with the performance toolkit.

From an elevated command prompt, launch xperf -help

https://www.microsoft.com/whdc/system/sysperf/perftools.mspx

https://blogs.gotdotnet.com/pigscanfly/archive/2008/02/16/using-xperf-to-take-a-trace.aspx

https://www.microsoft.com/whdc/Driver/tips/DPC_ISR.mspx