[Update] 23 August 15:17 GMT
In an update to her Google+ post Sharp has noted “Looks like this is an xHCI specific issue, and probably not the cause of the USB device disconnects under EHCI. To everyone who commented with other USB issues (none of which really sounded related), please email the linux-usb mailing list with a description of your issue.”
According to a new revelation by Sarah Sharp, misinterpretation of the USB 2.0 standard may have been the culprit behind USB disconnects on resume in Linux all along rather than cheap and buggy devices.
According to Sharp the USB core is to blame for the disconnections rather than the devices themselves as the core doesn’t wait long enough for the devices to transition from a “resume state to U0”. The USB 2.0 standard states that system software that handles USB must provide for 10ms resume recovery time (TRSMRCY) during which it shouldn’t attempt a connection to the device connected to that particular bus segment.
It turns out Sharp believes that from a hardware programmer’s perspective this TRSMRCY value is the minimum and not the maximum. This means that a USB device may take longer to resume. Sharp notes in her Google+ post that if the USB core attempts to access the port while the device is still in ‘resume’ status, the device will disconnect.
“If the USB core attempts to access those ports while the device is still coming out of resume, such as issuing transfers to the device, or resetting the port, the device will disconnect, or transfer errors will occur. This causes the USB core to mark the device as disconnected.”
Sarah has tested over 225 wakeup events and found that 163 transitions took place in less than 1 microsecond, 47 happened in under 10ms while 17 took over 10ms. This means that with a 10ms TRSMRCY value, USB core attempts to connect to the device while the device is still resuming, which leads to disconnects.
Sharp has managed to reproduce the bug under ChromeOS which is probably one of the most aggressive operating systems when it comes to USB power management. Temporary fix involves changing the value of TRSMRCY value to 20. Further technical details about the bug can be found on this mailing list announcement.