[ltp] APM suspend with 2.6 kernel [long]

Charles Lepple linux-thinkpad@linux-thinkpad.org
Fri, 2 Jan 2004 21:26:49 -0500


I have seen a number of posts from people who could successfully suspend their 
ThinkPad under 2.4.x but could not with 2.6.x-test kernels. I don't have a 
one-size-fits-all solution, but this email may help folks who are willing to 
probe deeper into the problem. I did my debugging using Debian unstable on a 
ThinkPad 770 (type 9548-30U).

The exact symptom (discussed in depth on LKML this past fall) is that the 
suspend LED would start blinking after a suspend was requested (either via 
the lid switch, the power/hibernate button, or the APM control program), but 
would not fully suspend, and would return ~30 seconds later with the kernel 
message "apm: suspend: Unable to enter requested state".

One of the first things I did was create the minimal kernel configuration that 
would still let me boot a 2.6 kernel. This meant turning off all unnecessary 
drivers, including PCMCIA sockets, parallel and serial ports, CD-ROM, etc. It 
is much easier to add these back in later once you have figured out the 
extent of the problem.

Once I did that, I started doing a binary search to find the latest of the 
2.5.x kernels that allowed suspending. For those unfamiliar with the "binary 
search" concept (as applied to kernel troubleshooting), you need to first 
ensure that you have good boundary conditions. Find an early  kernel that 
works (say, the last 2.4 kernel before the 2.5 fork) and the latest that 
doesn't, and make a checklist of the versions in between. Pick the one in the 
middle (e.g. 2.5.35) and see if the problem can be reproduced in that 
version. If so, repeat using the version midway between the last version that 
works, and the newly-discovered first version that breaks.

One of the "gotchas" with this approach is that if you use the config file 
generated by a newer kernel's menuconfig with an older kernel (e.g. you tried 
2.5.1 and 2.5.31, and now want to test 2.5.16), you may uncover differences 
in the configuration system that will mess up the binary search algorithm 
(which isn't especially fault-tolerant-- if you mess up a test, you may find 
yourself backtracking a long way). For configuring, the best defense is to 
only copy over .config files from older kernels, and run 'make oldconfig' to 
see if any new options were added. In general, it is best to document all of 
your debugging steps in detail (preferrably on another machine, in case you 
do accidentally do something nasty to the kernel).

In this case, I found out that it was not enough to turn off all standard 
drivers. Under "General Setup", there is a "Remove kernel features (for 
embedded systems)" option which I enabled. Ignore the submenu which it 
provides, and head on over to the "Device drivers" -> "Input device support" 
menu. With 2.6.0-testX and later kernels, you should see an option for "i8042 
PC keyboard controller". Make this a module. Then, rmmod-ing the i8042 driver 
on suspend, and modprobe-ing it on resume will make things work. The only 
tricky part here is that you may hose things if your resume script doesn't 
work (since not even the Magic SysRq keys will be active after removing the 
i8042 module).

If you have a TP600, I can't guarantee that you are seeing the same problem 
that I saw on my TP770. But at least you can try the above troubleshooting 
procedure to narrow things down.

For the kernel-hacking types, the good news is that the 2.6.1-rc1 patch 
includes the framework to allow the i8042 module to react to power management 
events. This means that the module would stay loaded (or could be compiled in 
directly), and would simply stop the keyboard polling timer when a suspend 
event arrives. I would rather not volunteer to code this until I have access 
to a faster machine for kernel compiles, though. And I'm not entirely sure 
why the polling timer is needed (everything should be handled via interrupts, 
as it was in 2.4), but I'll leave it to someone else to argue that on the 
LKML.

Hope that clarifies a few things.

-- 
Charles Lepple
http://www.ghz.cc/charles/