[ltp] Re: [ibm-acpi-devel] Recently identified ThinkPad mysteriouses

Henrique de Moraes Holschuh linux-thinkpad@linux-thinkpad.org
Thu, 22 Nov 2007 12:53:30 -0200


On Thu, 22 Nov 2007, Thomas Renninger wrote:
> I'd like to point you to some things I found out on ThinkPads the last
> weeks:
> 
>  - IBM T41p shuts down, powersave, Temperature state changed to critical
>    https://bugzilla.novell.com/show_bug.cgi?id=333043
> 
>    This affects a lot machines (T41(p), T42(p), T43(p), R40)
>    I expect the real culprit is a confused EC (one machine had the
>    problem, the other one not).
>    Anyway, it seems ACPI notifies had higher priority (or did not get
>    scheduled away) in former kernels. Therefore the BIOS could still
>    avoid a critical thermal shutdown through lowering CPU frequency
>    (_PPC) interface on older kernels, but something seem to have
>    changed there...


Argh.  Well, thinkpad-acpi fortunately has absolutely nothing to add to the
picture (unless the misterious HKEY 0x6011/0x6012 events are actually
thermal warning events).  I kept well away from the standard thermal ACPI
interface, since there didn't seem to be a reason to muck with it.  Now I am
not so sure anymore.

What I do know: the T41 and T42 BIOSes and EC firmware *are the same*, down
to the bit level.  So if there is a difference between the T41 and T42
behaviour, it is very likely either a hardware fault, or one of the two is
NOT using the latest BIOS.  If they are indeed using the latest BIOS, it is
a BIOS bug (the BIOS does know if it is in a T40, T41 or T42, and could act
in a different way) in its SMI management routines.  The only hope is to beg
Lenovo for a fix.

Thermal problems with the T4x are common, and if they developed after the
machine was used for a few years (as opposed to a bad manufacturing issue
that it had since day one), they are usually user-serviceable.  Remove the
entire heatsink assembly, and very carefully and very throughoutly replace
any already existing or missing thermal glue with very high grade thermal
cooling paste, the type you'd use for serious overclocking (Artic Silver 5
or better).  Of course, check to make sure the fan is working properly.
This is no milkrun, and it takes a few hours and a lot of patience if one
wants to do it perfectly.

Unless you are in a really hot place (35C or above), in my experience a T4x
notebook with the entire heatsink system working at top condition does NOT
reach critical temperatures, even while working at full CPU load.  But it is
really easy for a T4x to have their heatsink system at far below the top
condition :(

There is one thing I can help with.  ThinkPad ACPI knows how to enable a
"ludicrous speed" fan mode, aka "disengaged" mode or "full speed" mode,
which is typically at least twice as fast as the highest fan mode the EC
likes to trigger, even at thermal emergencies.  Give me a trigger inside the
kernel, and I will kick it on during thermal emergencies, regardless of
fan_control status.   Userspace already has control over this if
fan_control=1 is specified on module load (just disable PWM through the
hwmon interface, and it will kick the fan into full speed mode).

But really, if you have a need of the full speed fan mode regularly, it
either means your ThinkPad is in need of repair to the heatsink system, or
it means you are in such a hot climate you'd better be playing at the beach
instead of using a laptop.

>  - Latest Lenovo ThinkPads do not like ACPI EC writes for brightness
>    switching (s2ram broken)

That's because you must not do EC writes to change brightness on these boxes
:-)  Latest thinkpad-acpi in ibm-acpi.sf.net should get this right already,
and use only the UCMS ACPI method to modify brightness levels on Lenovo
boxes.

>    I've been told it only happens when brightness is set above 7.
>    Don't know, but it seems to be the EC writes > 7 that lets the
>    machine not wake up anymore from a s2ram.

Well, latest X.org really, really dislikes anyone messing with the backlight
brightness behind its back, so make sure it is not related to that problem
as well.

I am starting to talk to the X.org people to see how I can be told by the X
server that it is active, and completely switch thinkpad-acpi to "userspace
is in control" backlight mode...

>  - Weird USB - EHCI IRQ problems on very latest Lenovo models
> 
>    Unhandled IRQ messages. Looks like an IRQ (from camera?) gets routed
>    to EHCI pin also? Don't know the details and I am also not very
>    familar with this..., Oliver might be able to point you to bug
>    reports, AFAIK there also exist kernel.org bugs.
>    This is not solved yet, AFAIK.
>    If anyone knows more about this problem, that would be great...

Indeed.  And if we do get precise enough descriptions of the issue, it is
something that we should try to find a way to forward to Lenovo for a fix.

>    Attaching an USB device breaks USB and throws a calltrace
>    https://bugzilla.novell.com/show_bug.cgi?id=325601
> 
>    As this seem to be a very (recent Lenovo) ThinkPad specific problem
>    -> Only happens on latest ThinkPads, but back to at least
>    kernel 2.6.16 until latest mainline... I hope to get some
>    feedback from Lenovo ThinkPad users also seeing this.

Same comment as above.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh