I’ve heard Mark R. say that “sometimes Microsoft gave me a gift” when coming across an interesting feature not in mainstream documentation. How true that often the odd find can prove ultimately very valuable. Such was the case when I read Bruce Dawson’s Stack Walking in Xperf blog entry.
Here’s the excerpt that blew me away:
You can also record call stacks for manifest-based ETW (Event Tracing for Windows) events, but the syntax is quite different, and it only works on Windows 7 and above. When you specify your ETW provider to xperf after “-on” you can specify extra parameters after the provider name, separated by colons. These are flags, a level, and, for manifest-based providers, a list of extra data to record, which can include call stacks. You can leave the flags and level fields blank and just specify ‘stack’ (in single quotes) after three colons like this:
xperf -on Latency -stackwalk profile -start browse -on Microsoft-IE:::'stack'
rem Your scenario goes here...
xperf -stop browse -stop -d mytrace.etl
This sounds rather ordinary at first, but looking at the syntax you may realize that this means for just about every manifest based provider (Microsoft-*) one can generate a stack trace on all ETW events therein. This is really cool because if you are only interested in stacks, you don’t have to monkey around with any other trace level or flags which is often the most frustrating part about manual ETW tracing. Recall from Ivan’s blog that there are over 600+ inbox providers to explore so we can go crazy here exploring components! Let’s demonstrate.
First, remember to use the Win7 SDK version of XPerf. The XPerf(WPT) install *.msi files are found in the Bin directory after installing Win32 Development Tools, so you have to do a two-step at present to get the tools installed.
Let’s list all the providers…
C:\utils\xperf7sdk>xperf -providers | findstr "Microsoft-"
93c05d69-51a3-485e-877f-1806a8731346:0x0001000000000000:0x4 : Microsoft-Windows-LUA
9580d7dd-0379-4658-9870-d5be7d52d6de:0x0001000000000000:0x4 : Microsoft-Windows-WLAN-AutoConfig
959f1fac-7ca8-4ed1-89dc-cdfa7e093cb0:0x0001000000000000:0x4 : Microsoft-Windows-HealthCenterCPL
96ac7637-5950-4a30-b8f7-e07e8e5734c1:0x0001000000000000:0x4 : Microsoft-Windows-Kernel-BootDiagnostics
96f4a050-7e31-453c-88be-9634f4e02139:0x0001000000000000:0x4 : Microsoft-Windows-UserPnp
98583af0-fc93-4e71-96d5-9f8da716c6b8:0x0001000000000000:0x4 : Microsoft-Windows-Dwm-Udwm
a50b09f8-93eb-4396-84c9-dc921259f952:0x0001000000000000:0x4 : Microsoft-Windows-Diagnosis-MSDE<
ad5162d8-daf0-4a25-88a7-01cbeb33902e:0x0001000000000000:0x4 : Microsoft-Windows-WPDClassInstaller
ae4bd3be-f36f-45b6-8d21-bdd6fb832853:0x0001000000000000:0x4 : Microsoft-Windows-Audio
af2e340c-0743-4f5a-b2d3-2f7225d215de:0x0001000000000000:0x4 : Microsoft-Windows-Netshell
b03d4051-3564-4e93-93db-3c34f1b5b503:0x0001000000000000:0x4 : Microsoft-Windows-Diagnosis-TaskManager
b1f90b27-4551-49d6-b2bd-dfc6453762a6:0x0001000000000000:0x4 : Microsoft-Windows-PowerCpl
b2a40f1f-a05a-4dfd-886a-4c4f18c4334c:0x0001000000000000:0x4 : Microsoft-PerfTrack-IEFRAME
c89b991e-3b48-49b2-80d3-ac000dfc9749:0x0001000000000000:0x4 : Microsoft-Windows-Documents
c9bdb4eb-9287-4c8e-8378-6896f0d1c5ef:0x0001000000000000:0x4 : Microsoft-Windows-HomeGroup-ProviderService
dbe9b383-7cf3-4331-91cc-a3cb16a3b538:0x0001000000000000:0x4 : Microsoft-Windows-Winlogon
de513a55-c345-438b-9a74-e18cac5c5cc5:0x0001000000000000:0x4 : Microsoft-Windows-Help
ded165cf-485d-4770-a3e7-9c5f0320e80c:0x0001000000000000:0x4 : Microsoft-Windows-DeviceUx
e978f84e-582d-4167-977e-32af52706888:0x0001000000000000:0x4 : Microsoft-Windows-TabletPC-InputPanel
ed6b3ba8-95b2-4cf5-a317-d4af7003884c:0x0001000000000000:0x4 : Microsoft-Windows-Sidebar
f3f14ff3-7b80-4868-91d0-d77e497b025e:0x0001000000000000:0x4 : Microsoft-Windows-WMP
fbcfac3f-8459-419f-8e48-1f0b49cdb85e:0x0001000000000000:0x4 : Microsoft-Windows-NetworkProfile
ffdb9886-80f3-4540-aa8b-b85192217ddf:0x0001000000000000:0x4 : Microsoft-PerfTrack-MSHTML
We have a lot to play with as you can see, but let’s say you are trying to track down a socket leak. Since we know sockets are implemented in afd.sys, let’s search for that provider…
C:\utils\xperf7sdk>xperf -providers | findstr "AFD"
e53c6823-7bb8-44bb-90dc3f86090d48a6 : Microsoft-Windows-Winsock-AFD
Now, we can form the XPerf syntax to trace socket operations during the leak. The “AFDSession” is just the session name we are making up, the rest is the prescriptive syntax. Just to explain the syntax we are starting multiple sessions at the time here – kernel logger session + AFDSession. Similarly we are stopping both sessions at the same time and then merging both into the one file AFD.etl file.
xperf -on Latency -stackwalk profile -start AFDSession -on Microsoft-Windows-Winsock-AFD:::'stack'
xperf -stop AFDSession -stop -d AFD.etl
Run xperfview AFD.etl and switch on “Stack counts by type graph”. If you don’t see this frame, make sure you are on the 7SDK version
The beauty here is the Stack Counts by Type frame which allows us to explore which stack traces where collected, notice all the AFD stack count sources!
Let’s say we have the simple case of a massive number of Create’s happening and thus handle counts rocketing up (leaking) for a given process. Perhaps exhausting all ephemeral ports on the system, ultimately resulting in any number of hang symptoms.
Simply filter the Events for AfdCreate as follows.
Now, we have a filtered Frame of just those stack counts….
You can imagine that if there were a lot more creates here that the frequency distribution shown above would be much higher, but for the sake of just exploring the feature, let’s just look at these small number of creates anyway. In fact, let’s select them all by just choosing Select View in the context menu which will get all of these events over the trace sample.
The frame area is now highlighted in light purple indicating the selected area.
Because we are interested in Stacks, make sure you have configured your symbol path and have checked Load Symbols
(be patient, this could take a while to build up your symbol cache)
Next, invoke the Summary Table
Click the flyout control on the left to select only the Creates again.
Next, make it easy on yourself by Filtering on the interesting process only.
After, much better.
Now, here is the fruit of our labor in the full stack trace inside iexplore.exe allocating the socket handles.
Consider that what we see above is read from the top down. Iexplore.exe is doing this work through wininet, jsproxy, jscript, etc. all the way down through the calling of ws2_32.dll!socket which ultimately creates the socket and returned our file handle. I’ve seen so many cases of ephemeral port exhaustion which is essentially a handle leak against sockets, these could have been solved in minutes with this tracing! Keep in mind that this data isn’t perfectly suited for tracking handle leaks, especially small ones since there’s no accounting for outstanding handles. However, sometimes the leak is egregious enough that you just need to look at activity over time to be pretty certain of the culprit. Really, what we are doing here is just probing a particular ETW instrumented API, which is incredibly powerful.
Typically, on 64-bit computers, you cannot capture the kernel stack in certain contexts when page faults are not allowed. To enable walking the kernel stack on AMD64, set the DisablePagingExecutive Memory Management registry value to 1 and reboot. The DisablePagingExecutive registry value is located under the following registry key: HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management
In summary, this isn’t really a hidden feature since it’s documented in EnableTraceEx2’s Enable Parameters parameter, but I hope with this additional context you see what powerful analysis is possible. Previously, this would have required live debugging a process, so now we have another example of the power of XPerf to decode and view key system activity, non-invasively!