2012-10-23: This post is about USB 2.0 driver stack tracing. Many of the techniques here also apply to USB 3.0 driver stack tracing in Windows 8; for more information, see our recent blog post on USB 3.0 driver stack tracing.
This is Philip Ries posting again with more on the USB tracing that's new in Windows 7. In my last post I introduced the instrumentation in the USB stack. I gave instructions for taking USB ETW traces and opening them in Netmon. This time I'll go through an example scenario and trace, showing how to identify a USB issue.
As a device developer or device user, at some point you may have plugged in a device only to find that the system doesn't recognize the device. Now you can find out why. Follow along with my example.
I plugged in my device and it came up as Unknown Device in Device Manager and other parts of the UI. The Hardware ID was USB\UNKNOWN. To diagnose further, I started an ETW trace (described in my last post) and plugged in the device again. I waited for it to show up as unknown and stopped the trace. Now let's open the trace and see what happened.
If you want to follow along as closely as possible, download my trace file. As I walk through it, I'll describe some important basics that will help you make sense of your own traces.
Before we start looking at the trace, it helps to understand USB enumeration--that is, what the core USB drivers do when a device is plugged in. Martin's recent posts to this blog illuminate the enumeration process in detail. In this post , Martin spelled out some causes for the Unknown Device symptom.
If we have trouble understanding the log, we can refer back to these. Having context like the above list of possibilities can make logs easier to read.
My log was recorded to log file with extension .etl. To view it, run Netmon, choose File -> Open -> Capture, and select the file.
Let's start reading the trace from the beginning.
The very first event, identified as type "SystemTrace", contains some general information about the log. You can read its details for information like the number of events lost and the trace start time, and then move on.
Device summary events
Event 2 is the first USB event. This and several subsequent events describe the USB host controllers, hubs, and devices that were connected to the system when I started the trace. We'll call this group of events the device summary events, or just summary events. Like the first event, the summary events don't signify driver activity. Whereas other events represent something happening on the bus, interactions with client drivers or the system, or changes of internal state, the summary events simply record the state at the start of a logging session.
In this scenario, the device of interest wasn't connected at the time I started the trace, so it's safe to skip over the device summary events for now. But how do we know which events to skip over? The usbhub and usbport drivers both have summary events. The driver that logged each event is indicated by the Protocol Name column. (In my screenshots, I've truncated this column to show just the key part, the driver name). If you don't see events from both "USBHub" and "USBPort", try going through the trace start instructions again as it's easy to make a mistake there. There should be a consecutive sequence of usbport events followed by a consecutive sequence of usbhub events. If there is a significant break in the timestamp pattern among the usbhub events, that break is probably the end of the device summary; otherwise, the first usbport event after any usbhub events is likely the first non-summary event.
Event description and data payload
In my log, the first event after rundown is a "USB Hub Wait Wake IRP Completed". This string, shown in the Description column, gives us the most information immediately about the event that was logged. My scenario provides the context to make sense of this description at a high level. After starting the log, I plugged in a device, so it stands to reason that something, such as a host controller or hub, is waking up in response. To find out what specifically is waking up, we need to look at the event's data. The data is in the Frame Details pane, shown as a tree in approximately the following form:
ETW event header information
ETW event descriptor (Constant information about the event ID such as error level)
Event payload (Data logged at the time of the event)
Name of a USB-specific struct
Struct members and their values (Types: numbers, strings, or arrays)
If I open the data for the "USB Hub Wait Wake IRP Completed" event, I find an ETW struct named "fid_USBHUB_Hub". Before we even look at what's inside the struct, let's break apart the name:
fid_: Just a prefix on all the structs, ignore it.
USBHUB_: Means the event was logged by the usbhub driver, which we already knew.
Rest of the string: The name of the object that the struct's data describes. In this case, it's a "Hub" object
Thus fid_USBHUB_Hub is a struct logged by the usbhub driver to describe a USB hub. Events with this Hub struct in their data payload have to do with a hub, and we can identify the specific hub using the contents of the struct. Let's take a look at the members of the hub struct. I've enlarged the Frame Details pane to show more payload data at one time.
The Hub struct is very similar to two other structs, fid_USBHUB_Device and fid_USBPORT_Device, that commonly appear in other events. The most important fields are common to all three structs.
idVendor: The USB Vendor ID (VID) of the device
idProduct: The USB Product ID (PID) of the device
PortPath: Starting from a root hub, the port number to follow at each hub to reach the device. Ignore any 0s. Examples:
|[0, 0, 0, 0, 0, 0]||The event has to do with a root hub (a set of ports on the PC, directly controlled by a USB host controller)|
|[3, 0, 0, 0, 0, 0]||The event has to do with something (hub or device) plugged into the root hub's port number 3.|
|[3, 1, 0, 0, 0, 0]||A hub is plugged into the root hub's port 3. The event has to do with a hub or device plugged into this external hub's port 1.|
It's a good idea to keep track of the port paths of any devices of interest. Early on when a device is being enumerated, the VID and PID are unknown and logged as 0. The VID and PID also won't be displayed during some low-level device requests such as reset and suspend. These requests are actually sent to hub that the device is plugged into.
Now that we know about port paths, we can wrap up our analysis of the Wait Wake completion event. The event shows us a port path of six zeroes, which, as we just learned, indicates a root hub. Excellent: I plugged the device into a root hub port, so the root hub is waking up.
Filter: Shortcut to important events
You can continue to read each event in chronological order--if you have the time. Even with experience it's hard to quickly pinpoint what happened by looking over a list of the event descriptions. To speed up finding the cause of the Unknown Device, let's look for events that indicate a problem. You can do that easily using Netmon's filter feature.
The filter we'll use is included in the latest parser package. (To update your parser package, repeat steps 3 and 4 from my last post.) Choose Filter -> Display Filter -> Load Filter -> Standard Filters -> USB -> USB Hub Errors, then click Apply in the Display Filter pane. Here's what the filter means:
|(USBPort_MicrosoftWindowsUSBUSBPORT AND NetEvent.Header.Descriptor.Opcode == 34)||This opcode indicates an error for usbport events|
|(USBHub_MicrosoftWindowsUSBUSBHUB AND NetEvent.Header.Descriptor.Opcode == 11)||This opcode indicates an error for usbhub events|
|(NetEvent.Header.Descriptor.Level == 0x2)||This event level usually means "error"|
|(USBHub_MicrosoftWindowsUSBUSBHUB AND NetEvent.Header.Descriptor.Id == 210)||This event ID is "USB Hub Exception Logged," explained below|
In the log a handful of events come up after applying the filter.
I choose to first look at each error event briefly to get the gist of it. Most of the usbhub exceptions have a fid_DebugText data of "CreateDeviceFailure". It's not clear how serious the exception is, but the debug text gives us a hint as to its meaning: an operation related to the new device failed. For now it's simplest to assume the adjacent "Create Device Failed" events are redundant. The last two exceptions are "CreateDeviceFailure_Popup" and "GenErr_UserIoctlFailed". The popup exception sounds like an error exposed to the user, but any and all of these errors could be related to the Unknown Device problem.
We may find that the events leading up to the error events indicate subtler error conditions. We should look at events preceding the errors to try to determine the root cause of the Unknown Device. Let's start looking backwards from "CreateDeviceFailure_Popup," the second to last exception. Select this event while the filter is still enabled, then click "Remove" above the filter text. The filter text remains in case you want to re-apply it, but all the events show up now.
The 2 events just before this error are a Dispatch and Complete of a USB control transfer. The fid_USBPORT_Device port path field is 0 for both events, meaning the transfer's target is the root hub. Inside the fid_USBPORT_URB_CONTROL_TRANSFER struct of the completion event, the status is 0 (USBD_STATUS_SUCCESS) meaning the transfer succeeded. For now let's keep looking backwards.
The next 2 events going backwards are the 4th (final) "Create Device Failed" event and 4th (final) "CreateDeviceFailure" exception, both of which we already looked at.
One event backwards is "Endpoint Close". This event means that an endpoint is no longer usable. The event data describes both the device and the endpoint on that device. The device port path is [1, 0, 0, 0, 0, 0]. I know the system where I took the log has only host controllers (root hubs) plus the device I was connecting, so this path rules out any hub. The closed endpoint must be on the single device I plugged in, and now we know the device's path is 1. Likely the drivers made the device's endpoint inaccessible due to a problem encountered earlier. Let's look backwards again.
The next event going backwards is a completed USB control transfer. If we look inside, the target of the transfer is the device (path is 1). The fid_USBPORT_Endpoint_Descriptor struct indicates the endpoint's address is 0, so this is the USB-defined default control endpoint. The URB's status is 0xC0000004--since this is not 0, the transfer was probably not successful! For more details, we check usb.h and MSDN, as described above, for this USBD_STATUS value.
#define USBD_STATUS_STALL_PID ((USBD_STATUS)0xC0000004L)
Meaning: "The device returned a stall packet identifier"
What request was stalled by the endpoint? Looking at the other data logged on the same event, we can tell the request was a standard device control request. The latest parser package understands these standard requests. (Again, to update your parser package, repeat steps 3 and 4 from my last post.) Here is the parsed request:
Frame: Number = 184, Captured Frame Length = 252, MediaType = NetEvent
- MicrosoftWindowsUSBUSBPORT: Complete Internal URB_FUNCTION_CONTROL_TRANSFER
- USBPORT_ETW_EVENT_COMPLETE_INTERNAL_URB_FUNCTION_CONTROL_TRANSFER: Complete Internal URB_FUNCTION_CONTROL_TRANSFER
+ fid_URB_Ptr: 0x84539008
+ Urb: Status = 0xc0000004, Flags 0x3, Length = 0
- SetupPacket: GET_DESCRIPTOR
+ bmRequestType: (Standard request) 0x80
bRequest: (6) GET_DESCRIPTOR
Value_DescriptorIndex: 0 (0x0)
Value_DescriptorType: (1) DEVICE
_wIndex: 0 (0x0)
wLength: 64 (0x40)
Putting the GET_DESCRIPTOR together with a descriptor type of DEVICE, the request is "Get device descriptor"
The device is supposed to respond to this request with its device descriptor in order for USB enumeration to continue. Instead, this device stalled the request, causing the attempt at enumeration to fail. It turns out all 4 create device failures were caused by stalled requests for the device descriptor. All that remains is to go off and fix the misbehaving device.
To help you understand what's going on in this log, I've introduced the following concepts:
- Basics of USB ETW (described in my last post)
- Expected causes of the Unknown Device symptom--one one of them was "The request for the USB device's Device Descriptor failed"
- The USB device summary events
- How to read events: descriptions and data payloads
- USB device port paths
- Netmon filters and a useful filter for problem events in USB logs
- Usbhub exception events
- NTSTATUS and USBD_STATUS codes
- Reading backwards from problem events
- Reading standard control transfers
Refer back to these concepts when you're reading your next USB ETW trace. And watch this space for parser improvements that will make the log reading process more convenient.