Misinterpretation of standard probably causing USB disconnects on resume in Linux

[Update] 23 August 15:17 GMT
In an update to her Google+ post Sharp has noted “Looks like this is an xHCI specific issue, and probably not the cause of the USB device disconnects under EHCI. To everyone who commented with other USB issues (none of which really sounded related), please email the linux-usb mailing list with a description of your issue.”

Original Story
According to a new revelation by Sarah Sharp, misinterpretation of the USB 2.0 standard may have been the culprit behind USB disconnects on resume in Linux all along rather than cheap and buggy devices.

According to Sharp the USB core is to blame for the disconnections rather than the devices themselves as the core doesn’t wait long enough for the devices to transition from a “resume state to U0″. The USB 2.0 standard states that system software that handles USB must provide for 10ms resume recovery time (TRSMRCY) during which it shouldn’t attempt a connection to the device connected to that particular bus segment.

It turns out Sharp believes that from a hardware programmer’s perspective this TRSMRCY value is the minimum and not the maximum. This means that a USB device may take longer to resume. Sharp notes in her Google+ post that if the USB core attempts to access the port while the device is still in ‘resume’ status, the device will disconnect.

“If the USB core attempts to access those ports while the device is still coming out of resume, such as issuing transfers to the device, or resetting the port, the device will disconnect, or transfer errors will occur. This causes the USB core to mark the device as disconnected.”

Sarah has tested over 225 wakeup events and found that 163 transitions took place in less than 1 microsecond, 47 happened in under 10ms while 17 took over 10ms. This means that with a 10ms TRSMRCY value, USB core attempts to connect to the device while the device is still resuming, which leads to disconnects.

Sharp has managed to reproduce the bug under ChromeOS which is probably one of the most aggressive operating systems when it comes to USB power management. Temporary fix involves changing the value of TRSMRCY value to 20. Further technical details about the bug can be found on this mailing list announcement.

  • queeky

    While there’s obviously misinterpretation going on somewhere, reading the spec sure makes it sound like it’s on the hardware side. If there is a defined wait time (a minimum) before a host can communicate with a device, isn’t that just another way of saying that is the maximum time the device has to get ready for communication? How else can you read the spec and have it make any sense? Sounds to me like the hardware vendors are just trying to cover their poor workmanship here.

    That said, it is a bit naive for Linux to assume that all devices will be so well behaved. They should have based the default value on a real-world sampling a variety of devices. 20ms sounds like it would help a lot.

  • bugmenotcom

    “It turns out that this TRSMRCY value is the minimum and not the maximum.”

    That’s not how technical specifications work. BOTH sides are required to obey a technical specification. This makes the value both a minimum for one side and a maximum for the other side. The issue here is that when USB devices go into sleep/power_saver mode they need time to “wake up”.

    (1)The computer is required to wait a minimum of ten milliseconds before initating any work.
    (2)The USB device is allowed to a maximum of ten milliseconds to prepare for work.
    (3)After ten milliseconds the computer is allowed to begin work.
    (4)After ten milliseconds the USB device is required to be ready for work.

    Notice that before ten milliseconds the computer is required to give promised time to wake up, and the device is receiving promised time to wake up. After ten milliseconds the positions exactly reverse. The device is required to promise that it will be ready, and the computer is receiving a promise that the device will be ready.

    Some USB device manufacturers are clearly violating the specification. The USB standards organization needs to revoke their legal permission to use the USB trademarked logo until they pass thorough testing that all new devices comply with the standard. Unfortunately so many violating devices have already been sold that operating systems need to be patched to tolerate slow devices.

  • svartalf

    There’s no maximum really specified in the spec, Ravi. What’s specified is the minimum- which is when the device is REQUIRED to respond, actually. Now, Windows, OSX, they’ve provided for a larger margin to respond- but the minimum is what is actually required by the spec. Now, should Linux adjust it a bit like the others? Probably. But to say it’s misinterpreting the spec is bogus.