Authored by Vivek Gupta [MSFT]
In this blog, I’ll provide an overview of the USB 2.0 Link Power Management (LPM) feature and how it can be used with the Selective Suspend mechanism to reduce system power consumption. I’ll also describe the common pitfalls in LPM implementation in USB controllers and devices. Finally, I’ll describe the support of 2.0 LPM in Microsoft Windows USB stack.
- USB 2.0 LPM Mechanism
- L states
- Software LPM vs. hardware LPM
- L1 transitions
- Common Issues with LPM 2.0
- Microsoft USB driver stack behavior
- LPM Testing
Before reading this post, revisit the Selective Suspend mechanism. It is a powerful mechanism for conserving power but has certain limitations. The mechanism is practical only when the device has been idle for a long time, typically in seconds. Also, Selective Suspend imposes strict power consumption limits on the device in the suspended state. For more details, read Limitations of Selective Suspend Mechanism.
To allow finer-grained power management and overcome the limitations of Selective Suspend, a new intermediate power state L1 was defined in an addendum to the official USB 2.0 Specification (USB2_LinkPowerMangement_ECN). This power state is a link level state. Let’s first understand the concept of a link.
A USB connection exists between two USB ports:
- The downstream port (DS port) of a host or a hub and,
- The upstream port (US port) of an attached device or hub.
A link is a pair of DS and US ports; the ports are known as link partners. Each port has two layers. The physical layer transmits or receives sequences of bytes or other control signals. The logical layer manages the physical layer and ensures smooth flow of information between the link partners. The logical layer is also responsible for any buffering that might be required for the information flow.
As per the USB 2.0 specification, there are two main link states:
- L2: A link enters a low power state (L2 state) only when the downstream device enters the suspended state through the Selective Suspend mechanism.
- L0: Working state of the link it is active.
The LPM addendum defines a new intermediate link low power state (L1 state) to complement Selective Suspend, leading to significant power savings. L1 state has entry and exit time from tens of microseconds to few milliseconds. With such short entry and exit times, the device can enter L1 even when the device has been idle for a very short amount of time without negatively impacting the end user experience.
When the device has been idle for a longer time, the device should still enter Selective Suspend as the device power consumption in L2 is expected to be much lower than in L1.
Moreover, L1 state decouples the power management of device from the power management of the link. There are no hard requirements on how much power the device can draw when the link is in L1. Therefore a device could remain sufficiently powered when the link is in L1 so that it does not lose its any capabilities.
In general, the entry and exit to L1 must be driven by the software on the host side. Software-initiated transitions limit the advantages of L1 because of the additional delays with software intervention. However, xHCI controllers can optionally support a feature called Hardware LPM. If the controller supports that feature, the host software can just program certain parameters in the controller hardware and then hardware can make the L1 transitions automatically. The mechanism is similar to how U1 and U2 states work in for USB 3.0 LPM (http://msdn.microsoft.com/en-us/library/windows/hardware/dn379338(v=vs.85).aspx). However, unlike U1/U2, hubs do not support hardware LPM for L1. Therefore, hardware LPM for L1 is limited to devices that are attached directly to the root port.
The link enters L2 through an explicit transaction from the host. Because a USB 2.0 transaction has only one reserved PID value, an extension transaction (LPM transaction) was defined for L1 so as to allow future protocol extensibility. Through an LPM transaction, the host communicates two important pieces of information to the device. The first is HIRD or BESL; the second is a bit that indicates whether remote wake is enabled on the device. The L0-L1 transition occurs when the device responds with an ACK handshake to a successful LPM transaction.
Either the host or device can initiate an L1-L0 transition. The signaling methods are same as those defined for an L2-L0 transition. Like L2, resume signaling is used to exit L1. However, the duration of the signaling is different. Similar to L2, a port in L1 only listens for the resume signal. When it gets the signal, the downstream port transitions to L0 so that the device can participate in the resume signaling sequence.
On host-initiated resume, the host drives resume signaling for a specified period of time, this time is known as HIRD. That duration is flexible for L1. The host can choose the duration from a range of values. If a longer duration is used by the host, the device can go into a deeper state because it will need to poll for L1 exit less frequently. The device can turn off more of its internal components in that deeper state and save more power. Therefore, the host is able to choose a value based on the resume latency that it can tolerate for the device without compromising user experience. The HIRD value is a 4-bit encoded value that translates to a range of 50 microseconds to 1.2 milliseconds on a linear scale.
Note: On host-initiated resume, for L2, the resume signaling duration is constant.
In the initial version of the LPM ECN,
- The upper bound on HIRD was insufficient for the devices and hosts to enable significant power savings. That value needed to be 10 ms.
- Certain values were found to be optimal for a given device or host.
- The definition of HIRD was not directly usable by the device because just the HIRD value did not convey the overall time after which the device is serviced by the host.
To address those limitations, an errata was published that introduced a new value called Best Effort Service latency (BESL). BESL is the expected latency from the start of resume signaling to when the device is expected to be serviced i.e. when the first transfer is sent to the device. The device can report up to two preferred values (BESL and DeepBESL) in its BOS descriptor for which the device has been optimized. By using those values not only can the device save more power but also ensure that it has the required amount of buffer to guarantee that the data is not lost on resume.
The host is responsible for ensuring that the BESL value is a total of the resume signaling duration and the other overheads associated with the resume process Because this value is hard to determine, the LPM errata specifies a constant time for the overhead of 50 ms. So, effectively the host drives the resume signaling for (BESL – 50) ms.
Errata compatibility risk
Note that BESL is not a new capability for BESL. It replaces the concept of HIRD. Also, the set of values that represented a range of 50 us to 1.2 ms are now redefined to represent a range of 50 us to 10 ms. Also, those values are not on a linear scale. This means that the hosts and devices that implemented the errata are not guaranteed to be compatible with the pre-errata hosts and devices. The rationale behind such a decision was that the number of controllers and devices implementing LPM at that time were limited and it was not worth maintaining two different concepts forever.
We do need to consider the implications of a mismatch between the controller and the device. Due to the different interpretations of the same field in pre-errata and post-errata hardware, the host can drive resume signaling for a period that is different than what the device expects. If the host drives resume signaling for longer than the device expected value, it should not have any negative impact. This is because the device does not keep track of time taken for resume signaling; it merely awaits the occurrence of an EOR (end of resume signaling event). However if the host drives resume signaling for a shorter duration than what the device expects, there might be serious issues because the shorter resume duration might not be enough for the device to resume correctly.
Those risks can be mitigated by the software by ensuring that it uses such values that prevent the host’s resume signaling duration to be greater than or equal to the device’s expected values. In fact, for the value of 4, the interpretation remains identical for pre and post errata hardware. If the software detects a mismatch, the safer option is to always use this value.
Following are some of the common hardware issues we have seen:
Race condition between L1-L2 transition
In case of hardware LPM, it is expected that the L1 state remains transparent to the software. When the host software issues a command to put the link into L2, it is the responsibility of the controller hardware to automatically bring the link back to L0 and then initiate the L2 transition. Some controllers do not correctly implement this operation. The initial version of the xHCI specification was ambiguous about this requirement. A subsequent errata clarified that requirement.
- Race condition between port reset and L1 transitions
Some controllers do not correctly handle port resets (initiated by host software) while the L1 entry or exit is in progress. The incorrect implementation leads to a mismatch between the software and the hardware state because the software assumes that the port has been reset.
- LPM with NYETs
Some controllers do not correctly handle the scenario where the link enter L1 after the device responds with a NYET. As a result some transfers constantly time out and device becomes unusable.
- Incorrectly setting LPM support bit
Most USB 3.0 devices indicate L1 support in their descriptor when they don’t support it. It might be because of the requirement for USB 3.0 devices to support LPM while operating in the 2.0 mode.
- Remote Wake
Some devices do not correctly wake up the link from L1. This issue might be related to the remote wake capability of the device from L1 to the remote wake capability from L2 which is indicated in the bmAttributes of the configuration descriptor of the device. Note that for the device to always correctly work with hardware LPM, it is expected that the device always supports remote wake from L1.
The Microsoft-provided USB driver stack enables L1 only in a very limited set of conditions to enable power gains without impacting end user experience due to the incorrect implementation of L1 in the hardware.
The USB driver stack only enables hardware LPM. It does not enable software LPM. So, LPM is only enabled for root ports. It is not enabled for devices downstream of hubs. The controller must satisfy these requirements in order for the driver stack to enable LPM for it:
- LPM is only enabled in the USB 3.0 driver stack that is loaded for xHCI controllers. It is not enabled for USB 2.0 driver stack that is loaded for legacy controllers (EHCI, OHCI, UHCI).
- LPM is only enabled if the xHCI controller indicates support for hardware LPM.
- LPM is only enabled if the xHCI controller is not known to have one of the problems described in Common Issues with LPM 2.0. There is a hard-coded list for which Microsoft stack disables it.
Microsoft restricts the LPM policy so that for LPM to be enabled for a device, it must satisfy one of these two conditions:
- Device should be an internal device.
- Device should be a post-errata device.
If the device satisfies one of those conditions, LPM is enabled for it as long as it is not explicitly known to have one of the problems described in Common Issues with LPM 2.0 i.e. there is a hard coded list for which Microsoft stack disables it.
To ensure the devices properly implemented LPM, LPM testing has been included in the USB-IF certification process. By enabling LPM for only those devices that are post-errata, we hope to filter out most of the existing devices. However, that does not apply to internal devices, mainly because:
- OEMs have the ability to test an internal device with the specific controller and ensure that it correctly implements LPM.
- It is likely that the platform power can be optimized based on the LPM capability of the internal devices leading to greater power savings.
Hardware LPM parameters
As noted above, Microsoft-provided USB driver stack only supports hardware LPM. While enabling LPM, the software can choose certain parameters in the xHCI controller that governs the LPM behavior. These parameters are defined in the xHCI specification section 5.4.9 and 5.4.11.
Here is Microsoft policy for setting those parameters in Windows 8.1. Note that these parameters might change in the future versions of Windows and devices or controllers should not depend on particular parameter values.
- HardwareLpmEnable (HLE)
Determines whether hardware LPM is enabled for a given port. If the controller and device requirements (Microsoft USB driver stack behavior) are satisfied, this bit is set to 1, otherwise it is set to 0.
- RemoteWakeEnable (RWE)
Determines whether remote wake from L1 is enabled. Microsoft stack assumes that all devices support waking from L1. It always sets the RWE bit except if the device is explicitly known to not support wake from L1.
- BestEffortServiceLatency (BESL)
Indicates the duration of the resume signaling during host-initiated resume. If the controller or the device does not support the BESL errata, the only safe value for BESL is 4 (400 microseconds). The USB driver stack uses this default value. If both device and the controller support BESL errata, and the device provides a preferred BESL value, that value is used instead of the default value of 4.
- BestEffortServiceLatencyDeep (BESLDeep) and HostinitiatedResumeDurationMode (HIRDM)
xHCI specification provides a mechanism where the host software can program an additional BESL value known as BESLDeep which is greater than the BESL value. The host first uses the BESLDeep value for L1 transaction. If the device rejects that transaction, the host then attempts L1 with the BESL value. To enable this mechanism, HIRDM must be set to 1. If both host and the device supports BESL errata and the device reports a preferred DeepBESL value in its BOS descriptor, the driver stack programs that value in the BESLDeep and sets the HIRDM to 1. Otherwise both these parameters are set to 0.
Indicates the time for which the link remains idle before the host attempts to transition the link to L1. USB driver stack sets this value to 512 microseconds. This value can be overwritten using the following global registry key.
Value: L1 timeout in units of 256 microseconds
Here are some guidelines for testing LPM support for controllers. It is assumed that the controller supports BESL errata.
- A device that does not support BESL errata should be tested. To test such a device with Windows, the device must be declared as internal. To mark a device as internal in ACPI, define a _PLD method for the port on which the device is connected and mark the _PLD.UserVisible field to 0. For details, see the ACPI specification (http://www.acpi.info/spec.htm).
- Devices that support BESL errata should be tested.
- Devices that return NYET should be tested.
- Test the scenario where the link is in L1 and then a L2 transition is initiated.
- Test race conditions where the port is reset while a transition to L1 or from L1 is in progress.
- Test devices that provide preferred BESL value but do not provide DeepBESL.
- Test devices that provide both preferred BESL and DeepBESL.
- Test with various values of L1Timeout by using the registry key override.
Here are some guidelines for testing LPM support for devices. It is assumed that the device supports BESL errata.
- Test with a controller that supports hardware LPM and that does have LPM disabled because of known issues.
- Test with various values of L1Timeout by using the registry key override.