xperf recipes

See also: all the recipes and the intro

Xperf is the Windows Performance Analyzer, in the Unix terms it would be called a profiler. It can be used to profile both the user-space and in-kernel code. The profiling data is collected in the form of the ETW events in the ETL files (the ETW format was initially born for profiling, and only then was generalized for the logging), so any other ETW tools can be used on these files as well.

Xperf is actually a set of tools:

  • xperf proper that is used to control the event collection
  • wpa - the GUI for the analysis of the logs
  • xbootmgr - a tool used to configure the collection of the performance data during the Windows boot
  • and others that I haven't used much.

The control of the event collection happens through a normal ETW session that subscribes to the NT Kernel Logger event provider. This session can be created on demand or in Autologger, as controlled by the Registry settings. This session has the special settings EnableKernelFlags that describes what kernel events need to be collected and StackWalkingFilter that describes what event cause the collection of a stack trace event. These collected stack traces can then be used for the detailed profiling. So when you use a command like

 xperf -start -on Base+DiagEasy+FileIO+WorkingSet -stackwalk Profile+CSwitch+ReadyThread+PoolAlloc+PoolAllocSession+PagefileMappedSectionCreate+PagefileMappedSectionDelete+ProcessCreate+ImageLoad

these settings -on and -stackwalk translate to the settings of these bitmask values.

Since the profiling tends to collect a large amount of data, setting the large in-memory buffer size for this session is a good idea. Here is an a example of semi-detailed profiling registry settings for an Autologger session:

 Windows Registry Editor Version 5.00

; See the following MSDN article for details:
; https://msdn.microsoft.com/en-us/library/windows/desktop/aa363687(v=vs.85).aspx
; https://msdn.microsoft.com/en-us/library/windows/desktop/aa363690(v=vs.85).aspx

[HKEY_LOCAL_MACHINE\LoadedHive\ControlSet001\Control\WMI\Autologger\XperfLogger]
; Output file
"FileName"="%SystemRoot%\\System32\\LogFiles\\XperfLogger.etl"
; up to 255 files
"FileMax"=dword:000000ff
; Unique GUID used to control the logging session from the tracelog.exe
"GUID"="{FB34C0FD-D910-4F3E-9C95-E001EE2BC7A4}"
; EVENT_TRACE_FILE_MODE_CIRCULAR | EVENT_TRACE_SYSTEM_LOGGER_MODE
"LogFileMode"=dword:02000002
; Log file size, MB, 32GB
"MaxFileSize"=dword:00008000
; min 128 buffers
; (setting this too high makes it revert to 1 buffer)
"MaximumBuffers"=dword:00000080
"MinimumBuffers"=dword:00000080
; 16MB buffer size
"BufferSize"=dword:00004000
; EVENT_TRACE_CLOCK_PERFCOUNTER (default)
"ClockType"=dword:00000001
; Should it be started on the next logon?
"Start"=dword:00000001
; Base           : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+PROFILE+MEMINFO+MEMINFO_WS
; DiagEasy       : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER
; FileIO         : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+FILE_IO+FILE_IO_INIT
; WorkingSet     : PROC_THREAD+LOADER+HARD_FAULTS+VIRT_ALLOC+MEMINFO+VAMAP+SESSION+MEMINFO_WS+WS
; ResidentSet    : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+MEMORY+MEMINFO+VAMAP+SESSION+VIRT_ALLOC
; PRIORITY+DISK_IO_INIT+DISPATCHER
; Keep in mind that the more CPUs a machine has, the more events it will generate.
; See the values of the bits beyond the 0th word in performance.h. The top 3 bits
; of the #define PERF_* determine the word index here. I.e. in PERF_CONTEXT_SWITCH=0x20000004 
; the top 3 bits are 001, that's the word 1, bitmap 0x4 in it.
"EnableKernelFlags"=hex:\
  0f,a7,00,06,\
  82,60,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00

To add more detail and the stack traces to the collected data, you can also add:

 ; Base           : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+PROFILE+MEMINFO+MEMINFO_WS
; DiagEasy       : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER
; FileIO         : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+FILE_IO+FILE_IO_INIT
; WorkingSet     : PROC_THREAD+LOADER+HARD_FAULTS+VIRT_ALLOC+MEMINFO+VAMAP+SESSION+MEMINFO_WS+WS
; ResidentSet    : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+MEMORY+MEMINFO+VAMAP+SESSION+VIRT_ALLOC
; PRIORITY+DISK_IO_INIT+DISPATCHER
"EnableKernelFlags"=hex:\
  0f,a7,00,06,\
  87,e6,c8,08,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00,\
  00,00,00,00
  
; ProcessCreate+ProcessDelete+ThreadCreate+ThreadDelete+
; Profile+CSwitch+ReadyThread+
; VirtualAlloc+VirtualFree+PoolAlloc+PoolAllocSession+
; DiskReadInit+DiskWriteInit+DiskFlushInit+FileCreate+
; FileCleanup+FileClose+FileRead+FileWrite+FileSetInformation+FileDelete+FileRename+FileDirEnum+FileFlush+FileQueryInformation+FileOpEnd+MapFile+UnMapFile
"StackWalkingFilter"=hex:\
  01,03,00,00,\
  01,05,00,00,\
  02,03,00,00,\
  02,05,00,00,\
  0c,01,00,00,\
  0d,01,00,00,\
  0e,20,00,00,\
  0e,21,00,00,\
  0f,01,00,00,\
  24,05,00,00,\
  25,04,00,00,\
  26,04,00,00,\
  2e,0f,00,00,\
  32,05,00,00,\
  40,04,00,00,\
  41,04,00,00,\
  42,04,00,00,\
  43,04,00,00,\
  44,04,00,00,\
  45,04,00,00,\
  46,04,00,00,\
  47,04,00,00,\
  48,04,00,00,\
  49,04,00,00,\
  4a,04,00,00,\
  4c,04,00,00,\
  62,02,00,00,\
  63,02,00,00

Don't forget to stop the session and flush the buffers before analyzing the files.

And here go the rest of the recipes and links:

 # Windows Performance Analyzer/Toolkit/Xperf
https://msdn.microsoft.com/en-us/library/windows/hardware/dn927310(v=vs.85).aspx
# Profiling with xperf
https://devproconnections.com/development/windows-application-performance-profiling-xperf
# WPA new features for Win10
https://geekswithblogs.net/akraus1/archive/2015/04/21/163342.aspx
# Intro to Xperf 
https://blogs.technet.com/b/askperf/archive/2008/06/27/an-intro-to-xperf.aspx
# WPA Quick start guide
https://msdn.microsoft.com/en-us/library/windows/hardware/dn927318%28v=vs.85%29.aspx

xperf –on DiagEasy # starts and writes events to \kernel.etl
xperf –d trace.etl # stops and merges the events from \kernel.etl to trace.etl
wpa trace.etl # view the merged trace

# app profiling with xperf
xperf -on latency -stackwalk profile
xperf –d trace.etl # stops and merges the events from \kernel.etl to trace.etl
set _NT_SYMBOL_PATH=SRV*c:\symbols*https://msdl.microsoft.com/download/symbols

# "Merging" adds the metadata to the ETL file, after which it can be analyzed,
# even on another machine.
# To manually merge a raw trace file, do:
xperf -merge data.etl data.merged.etl

# interesting kinds of analysis of a process trace
xperf -i c:\tmp\trace.etl -symbols -a profile -detail >c:\tmp\prof.txt
xperf -i c:\tmp\trace.etl -symbols -a stack -process bevtcol.exe -butterfly 10 >/tmp/stack.html

# also see for the quick profiling, especially boot/shutdown
wpr.exe
xbootmgr.exe

# performance counters in PowerShell
get-Counter

The fields in the WPA table are draggable. Left of the yellow line - aggregation
key, right of it - data. Right click on the title to select the columns.
Right-click on the data to expand the groups.
Move Stack Tags to the leftmost to set it as the primary distinction.

# interesting analysis of the whole-system data
set _NT_SYMBOL_PATH=SRV*c:\symbols*https://msdl.microsoft.com/download/symbols
xperf -i e:\tmp\data.merged.etl -symbols verbose -a eventmetadata
# get the CPU usage data by the 0.1 second (100000 microseconds)
xperf -i e:\tmp\data.merged.etl -symbols verbose -a profile -util 100000
# get the services start/stop events
xperf -i e:\tmp\data.merged.etl -symbols verbose -a services