My name is Vivek Gupta; I am an engineer in Microsoft. I am part of the team that developed the USB 3.0 stack in Windows 8.
USB 3.0 offers significant improvement in performance and power over USB 2.0. However, like any new technology, it also brings its own set of new challenges. During the development of USB3.0 stack in Windows 8, we observed some common mistakes made by hardware vendors when developing USB 3.0 hardware. I am going to talk about these issues with the goal of helping hardware designers avoid these errors in future designs. I am also going to talk about how to catch some of these failures using a tool that we have developed called USB Hardware Verifier. In this blog entry, I am going to focus only on the issues related to devices. We will cover the issues related to hubs and controllers in future blog entries.
2 Design choices on how to deal with failures
2.1 Transient Failures
Some of the issues that we encountered are just transient failures that do not happen consistently. For example: a link going into an error state or an occasional transfer failure. Our driver stack typically tries to be resilient towards such failures by initiating an appropriate error recovery action or if applicable, retrying the operation in progress. Of course if such errors happen frequently, then they will lead to a poor end user experience. These types of errors give an impression to the user that the hardware is flaky. For example a device failing transfers could cause the device enumeration to take longer or fail altogether.
2.2 Consistent Failures
Then there are more consistent failures where hardware is behaving in a non-spec compliant manner by design. For example a device returning invalid descriptor or repeatedly failing a spec required control transfer. We had a design choice regarding how our driver stack should deal with such failures. In many cases, these errors are not fatal i.e. we could choose to ignore the error and go on working with that hardware. However, the problem with that approach is that we are essentially guessing the intent of the hardware and we might have to default to non-optimal values. This in turn could lead to sub optimal user experience, examples of which I will give below. Moreover, since the hardware “works”, there is no forcing function for the hardware vendors to fix these issues and we lose the ability to ensure the quality of Windows experience. Therefore we consciously try to detect such issues and fail the hardware explicitly as early as possible. For example, if a device returns an invalid descriptor, we will fail the enumeration of the device and the device will come up with an error code in Device Manager.
There are a few exceptions to the above design pattern. In some scenarios, it might not be efficient to look for such errors and we decided to avoid the performance penalty. Then there are cases where we realized that the specification is ambiguous and the hardware issue is really due to a different interpretation of the specification by the hardware vendor. In such cases, we try to get the specification updated to clear up the ambiguity but also keep working with the hardware. Finally, we wanted to maintain backwards compatibility with Windows 7 for old devices and hubs. For USB 2.0 or lower devices, our goal was to maintain the Windows 7 behavior for hardware errors. If the device version is greater than USB2.0, then we are stricter about requiring spec compliant behavior.
As we were developing the Windows 8 USB 3.0 stack, we also implemented specific workarounds for specific hardware where we allowed non-spec compliant behavior and interpreted the hardware behavior in a manner not dictated by the specifications. This was typically done only if we got a confirmation from the hardware vendor about the intended behavior, such that we didn’t need to “guess” the intent of the hardware and only after we had received assurance that the behavior would be fixed in the next version. As we approached the end of Windows 8 development we stopped implementing any additional workarounds and over time we became much more critical about applying workarounds to new hardware, as new hardware had plenty of time to test with Windows 8. With the broad availability of Windows 8 and the USB 3.0 stack, hardware vendors should be able to successfully build their hardware in a compliant way, thus avoiding the need for adding workarounds.
3 Catching the failures
As we will see later in the document that in many cases, hardware issues manifest themselves in ways such that it is not trivial to determine the exact failure just by looking at the end to end behavior. In order to enable our hardware partners to catch these errors really early in the development cycle, we created a tool that catches and reports these errors. We refer to this tool as “USB Hardware Verifier”. This tool is part of the MUTT software package.
Usage of the tool involves three steps: starting a session, running the desired tests with your hardware and then stopping the session. This tool captures hardware events as they occur. It gives the user the ability to display information about those events in real time and at the same time, allows the user to capture the information in a trace file that can be parsed at a later time. The tool also provides an ability to filter events based on the VendorId and ProductId of the hardware so that the user can target a specific hardware.
To start the session, run the tool at an elevated command prompt and to stop the session at any time, simply press CTRL+C.
The tool supports these options:
-v <VendorID> : Logs all hardware verifier events for the specified VendorID
-p <ProductID> : Logs all hardware verifier events for the specified ProductID
-f <ETL file> : Parses the specified ETL file. Note that this option is used to parse an existing ETL file offline.
/v output : Displays all events to the console.
At the end of the session, a file named AllEvents.etl is added in the current directory. This file contains trace information about all events that were captured during the session. The command window also shows detailed report after the session ends. This report categorizes the information by controller, hub, or device, making it easier to read. The report contains a key that you can use to filter events based on those categories. Note that while displaying information in real time, the tool might not capture some information (such as VID/PID) related to events that occur before the device gets fully enumerated. The missing information is available in the detailed report though.
Here is an example output from the USB hardware verifier tool.
DeviceDescription: Generic USB Hub
PortPath: 0x2, 0x0, 0x0, 0x0, 0x0, 0x0
Event Message: SuperSpeed Device is Connected on the 2.0 Bus:
PortPath: 0x2, 0x4, 0x0, 0x0, 0x0, 0x0
PortPath: 0x3, 0x0, 0x0, 0x0, 0x0, 0x0
PortPath: 0x3, 0x0, 0x0, 0x0, 0x0, 0x0
PortPath: 0x3, 0x0, 0x0, 0x0, 0x0, 0x0
In the above output, you will notice that the tool is reporting an error (highlighted) for the hub where the port power control mask is zero in the hub descriptor. Similarly, it is reporting a number of errors for a device. The first error (highlighted) is about the companion endpoint descriptor for an isoch endpoint having WBytesPerInterval field too large. In the real time output, the details of this device are not displayed. However, the following test report shows those details.
The following output shows an example test report for the preceding session
Below is a report of all the Hardware verifier events encountered. The Key field refers to a Controller, hub, or device. During the lifetime of this utility, all unfiltered events are captured in file AllEvents.etl (in current directory). The Key in below report can be used to filter events in AllEvents.etl file
Record #1 (Key = 0x57ff0de4858)
DeviceDescription: Generic USB Hub
PortPath: 0x2, 0x0, 0x0, 0x0, 0x0, 0x0
All errors encountered:
#1: (UsbHub3/176): DescriptorValidationError20HubPortPwrCtrlMaskZero
Record #2 (Key = 0x57ff79fd4e8)
PortPath: 0x3, 0x0, 0x0, 0x0, 0x0, 0x0
All errors encountered:
#1: (UsbHub3/176): DescriptorValidationErrorCompanionIsochEndpointWBytesPerIntervalTooLarge
#2: (UsbHub3/176): DescriptorValidationErrorCompanionIsochEndpointWBytesPerIntervalTooLarge
#3: (UsbHub3/176): DescriptorValidationErrorCompanionIsochEndpointWBytesPerIntervalTooLarge
4 Hardware Failures
I will now talk in detail about some of the common issues that we encountered. For each issue, I will also point out whether it can be caught by the USB hardware verifier tool and if so, what is the failure string corresponding to the issue.
4.1 U1/U2 Related Issues
The USB 3.0 specification introduces two new link power management states U1 and U2. A big share of hardware issues that we encountered were related to the implementation of these states. I have already talked in detail about these states and the related problems in this paper. I will not be covering those failures here; I strongly encourage you to read that paper
4.2 2.0 LPM Related Issues
We added limited support for 2.0 LPM in the Windows 8 USB 3.0 stack. We have seen some issues related to this mechanism. I will cover this topic in detail in a future blog entry.
4.3 Invalid Descriptors
As part of device enumeration, the USB core stack reads, parses and validates the descriptors reported by the device. Some devices report descriptors that are not valid as per the USB spec. In the USB hardware verifier output, these failures can be identified by the tag DeviceHwVerifierDescriptorValidationFailure. Since there are many variations of this failure, each specific failure has an associated tag starting with the string DescriptorValidationError, some examples of which are given below.
4.3.1 Missing or zero Container Id
The concept of Container Id was introduced in Windows 7. In the USB world, container ID was reported as part of the MS OS descriptor. With USB 3.0 and USB 2.1 addendum, this concept was adopted in the USB specification. Devices are required to report Container ID (USB 3.0 specification Section 184.108.40.206) as part of their BOS descriptors. However, we have observed that some devices do not report the Container Id descriptor or report it as all zeroes. We will fail the enumeration for such a device because if we ignored this failure and enumerated the device, the device might work but might not appear correctly in the UI. In the USB hardware verifier output, this failure can be identified by tag DescriptorValidationErrorMSOSContainerIdAllZeroes.
4.3.2 Invalid U1/U2 exit latencies
U1 and U2 are power states introduced in USB 3.0. Some devices report edU1/U2 exit latencies that were not valid. We will fail the enumeration for such a device. If we ignored this failure, we wouldn’t know the real values for the device and would be forced to disable U1/U2 leading to more power consumption.
In the USB hardware verifier output, these failures can be identified by the tags DescriptorValidationErrorSuperSpeedCapBU1DevExitLatTooLarge and DescriptorValidationErrorSuperSpeedCapBU2DevExitLatTooLarge
4.4 SuperSpeed devices connecting on the 2.0 port
We observed that sometimes a SuperSpeed device would connect to USB 2.0 even though the device is connected to a 3.0 connector and SuperSpeed is available on that port. This can happen either when the device is initially attached or after a system sleep resume cycle. Such behavior can lead to reduced functionality and non-ideal user experience.
In the USB hardware verifier output, this failure can be identified by the tag DeviceHwVerifierSuperSpeedDeviceWorkingAtLowerSpeed. However, if the device is deliberately connected behind a 2.0 Hub for testing purposes, then it is expected that the device will work at 2.0 and in that case, please ignore this failure.
4.5 SuperSpeed devices reporting old USB version when operating at 2.0
When a 3.0 device is plugged in behind a 2.0 Hub or controller, it will operate at one of the 2.0 speeds. Devices in this scenario are required to report their bcdUSB in the Device Descriptor as 0210 (USB 3.0 Specification Section 220.127.116.11). However some devices reported this value as 0200. When a device does that, the driver stack cannot detect this error and therefore the device will enumerate and continue to work. However, the driver stack will not query for the BOS descriptor for such a device and hence will not be able to indicate that the device can perform faster when connected to USB 3.0. The user will get the lower speed and potentially reduced functionality without knowing the remedy. This failure cannot be caught by the USB hardware verifier.
4.6 Devices connecting at both 2.0 and SuperSpeed simultaneously
The USB 3.0 specification clearly states that any USB 3.0 device, that is not a hub, should not connect at both USB 2.0 and SuperSpeed simultaneously (USB 3.0 Specification Section 18.104.22.168). Such devices are not supported by Windows and can lead to unknown behavior. In Windows 8, we do not explicitly prevent such a device from enumerating and therefore it might seem that these devices are working with Windows. However, there are good chances that these devices might not work at all with future versions of Windows. This failure cannot be caught by the USB hardware verifier.
4.7 Wake capability in Interface descriptor
For 3.0 devices, supporting remote wake also involves supporting function wake (USB 3.0 Specification Section 22.214.171.124). So there are two places where a device needs to indicate that it is remote wake capable. First is the “Remote Wake” bit of bmAttributes in the configuration descriptor (USB 3.0 Specification Section 9.6.3) and the second place is in the “Function Remote Wake Capable” in the Get Status for the first interface of each function in the device (USB 3.0 Specification Section 9.4.5). We expect devices to report consistent information in these bits. For single function devices, we expect these bits to match. For multi-function devices, we expect the Remote Wake bit in the configuration descriptor to be 1 if at least one of the functions reports 1 for the Function Remote Wake Capable bit. However, we have observed some devices did not report the interface status correctly. In such cases, we are forced to choose one bit over another. A wrong guess on our part could lead to incorrect power capabilities and the client driver making a wrong decision about whether to arm the device for wake or not. It could also result in client driver not enabling Selective Suspend for the device.
In the USB hardware verifier output, these failures can be identified by the tag DeviceHwVerifierInterfaceWakeCapabilityMismatch.
4.8 Not supporting 3.0 specific requests
Some devices failed the standard control transfers that are new additions to the USB 3.0 specification over USB 2.0. These transfers are used to relay information to the device. It is quite likely that the information relayed in the request is not interesting to some devices. However, since these are standard requests, we expect devices to always succeed the transfer as required by the USB 3.0 specification. As it is, the failure of this transfer is not a problem from the point of view of the driver stack because quite likely the device is not interested in that information at all. However the driver stack cannot differentiate between a device that always fails this transfer because it does not need this information versus a device that needs the information but happens to fail the transfer because of a transient error (in which case the driver stack should retry).
4.8.1 Set Isoch delay
“SET_ISOCH_DELAY” (USB 3.0 Specification Section 9.4.11) is a standard control transfer sent to inform the device of the delay from the time a host transmits a packet to the time it is received by the device. Unfortunately we saw a significant number of devices fail this transfer and as a result, we were forced to ignore this transfer failure in the Windows 8 driver stack.
In the USB hardware verifier output, this failure can be identified by the tag DeviceHwVerifierSetIsochDelayFailure.
4.8.2 Set SEL
“SET_SEL” (USB 3.0 Specification Section 9.4.12) is a standard control transfer sent to inform the device about U1 and U2 exit latencies of the path from the host to the device. Our Windows 8 driver stack ignores the failure of this transfer if the device stalled the transfer as it indicates that the device is consciously returning failure. If the transfer fails for any other reason, then the driver stack fails the device enumeration.
In the USB hardware verifier output, this failure can be identified by the tag DeviceHwVerifierSetSelFailure.
4.9 Different Serial Number on re-enumeration
Some devices reported one serial number during the initial enumeration on device attach and then reported a different serial number on re-enumeration after a port reset. Re-enumeration can happen while resuming from system sleep or it can be initiated by the client driver. While re-enumerating the device, we match the serial number of the device with the initial serial number to ensure it is the same device. This step is particularly important on system resume where the user has the opportunity to replace the device. If the serial number does not match, then the re-enumeration is failed and the device is surprise removed before it is enumerated again. This sequence is particularly problematic if the device is a boot device, in which case the system will crash.
In the USB hardware verifier output, this failure can be identified by the tag DeviceHwVerifierSerialNumberMismatchOnRenumeration.
4.10 Device disappearing on system resume
When the system goes to sleep, some controllers or platforms will cut the VBUS off and cause the downstream devices to disconnect from the bus. When the system resumes and VBUS is reapplied, these devices will connect back. If the device is bus powered, then it will also need to power back on before connecting on the bus.
If the driver stack reads the port status during system resume and finds that the device is back, the driver will re-enumerate the device and re-store its configuration without letting the operating system know that the device re-connected. As a result, this re-enumeration is transparent to applications. Any operations in progress before the system went to sleep can continue without interruption.
However, if the driver stack finds that the device has not connected back, it will report the device as missing to the operating system. When the device connects back, there will be a new instance created for the device and the device will start working. However, the surprise removal of the old instance of the device will interrupt any applications that were using the device before the system went to sleep. For example if a file copy was in progress for a storage device, it will fail. Also, creating a new instance will take some time and the user might see a delay in device becoming available after resume.
To mitigate this issue, we implemented a heuristics in our Windows 8 driver stack. When the driver stack finds that the device is not connected on system resume, it gives a grace period (which is currently set to one second) to the device to connect back before reporting the device missing. However, we observed that some devices take longer than this to connect back, leading to surprise removal and a poor end user experience.
In the USB hardware verifier output, this failure can be identified by the tag HubHwVerifierPortDeviceDisconnected. However, please note that this tag will appear whenever the device is disconnected. So if you did indeed remove the device, then ignore this tag.
4.11 SuperSpeed devices requiring additional delays
We found that some SuperSpeed devices required a delay between two operations initiated by the driver stack even though the specification mandates them to not be dependent on such a delay. Often these delays were such that they were part of the USB 2.0 specification but have been removed in the USB 3.0 specification.
These errors cannot be detected as such by the USB hardware verifier. However, these failures will most likely manifest themselves as the device not completing the next control transfer successfully. These failures can be identified in the USB hardware verifier output by the tag DeviceHwVerifierControlTransferFailure. One tricky part with catching such errors is that they might not happen consistently and might only reproduce with specific host controllers and topologies. Therefore it is always a good idea to test a device with multiple host controllers and in various topologies.
4.11.1 Delay after connect
When a SuperSpeed device connects to a host, it goes through a sequence of operations in the hardware before the connect bit goes to 1 and a connect change is indicated. By the time the host gets the connect change, the device must be ready to respond to host requests. But we saw that some devices got confused if the host started communicating with the device immediately which leads to the device failing enumeration. It is not even possible to implement a device specific workaround for this issue as we have not even received the device descriptor of the device at that point in enumeration.
4.11.2 Delay after reset
The USB 3.0 specification mandates that after a port is successfully reset or resumed, the USB driver stack is allowed to access the device attached to the port immediately and it is expected to respond to data transfers (USB 3.0 Specification Section 126.96.36.199). Unfortunately some devices got confused if the driver stack did that and failed enumeration.
4.12 Devices disconnecting on disable/suspend
When the user chooses to “Safely remove hardware” from the task bar, our driver stack disables the port that the device is attached to. However, we observed that when we disabled or suspended some devices, they would disconnect from the port. This causes unexpected surprise removal of the device. Moreover, since the driver re-enumerates a new instance of the device, the port comes back in enabled state. This can lead to end user confusion. For example, If the device has an indicator light, it does not go off and user might get confused as to whether it is indeed safe to remove the device. Also, it leads to the device consuming more power without any good reason to do so.
Note that this behavior changed from Windows 7 to Windows 8. In Windows 7, we didn’t disable the port. The reason we changed the behavior from Windows 7 to Windows 8 is because Windows 7 behavior leaves the port in the enabled state and leads to the same problems that we discussed above: end user confusion and more power consumption.
In the USB hardware verifier output, this failure can be identified by the tag HubHwVerifierPortDeviceDisconnected. However, please note that this tag will appear whenever the device is disconnected. So if you did indeed the remove the device, then ignore this tag.
In this blog, I talked about some of the issues that we encountered with USB 3.0 devices. We hope that the future versions of the devices will avoid these issues leading to a better end user experience and a great Windows experience. We urge our hardware partners to start using the USB hardware verifier tool early in their development cycle so that the errors can be caught and fixed without incurring high costs.
There have been several errata to the USB 3.0 specification. These errata are aimed at removing the ambiguities and filling the gaps. Often these improvements weed out inter-operability issues between software and hardware as well as between hardware components. We strongly encourage the hardware vendors to follow the latest versions of these specifications to provide the best user experience.
In the subsequent blog entries in future, we will talk about issues related to hubs and controllers. Stay tuned!