[ltp] T60p overheating problem (fixed)

Henrique de Moraes Holschuh linux-thinkpad@linux-thinkpad.org
Sat, 27 Mar 2010 12:53:28 -0300


On Fri, 26 Mar 2010, Richard wrote:
> Anyway, if I run  burnP6 & burnP6 &, I used to be able to reproduce
> the lockup within about 10 seconds (proc/acpi/ibm/thermal showed 100
> degrees as the critical point just before lockup).

Yeah, the ACPI tables of some thinkpads have a bug in it on the alarm
passive trip point.  When that alarm point is correct, in theory it should
let the OS know it has to stop hammering the CPU *NOW* or it will overheat.

I believe we have a workaround for that bug in Linux, though.

But even with that thing working fine, the CPU gets hot too fast to react
to when the thermal interface between the CPU and headsink is defective. It
only helps when the problem is not severe (e.g. bad fan).

> Having disassembled the machine, and put some arctic silver between
> CPU and fan (there was practically no goo there before), it's now

Good choice, AS5 is one of the best possible choices (if not the best one)
on most thinkpads if you can apply it correctly, as it will cure *really*
fast on a laptop (many cold-hot cycles per day) and its performance is
extremely good.

It is important to notice that *some* thinkpads have a large gap between the
heatsink and a chip (north-bridge, GPU... never heard of that on the CPU),
and in that case you probably can't do it right using thermal pastes like
AS5, you need either a thermal pad or to mess with the heatsink to shorten
the gap :-(

> * What's the critical temp? Aren't Intel CPUs supposed to underclock
> themselves automatically in order to cool down, and then recover the
> OS where it was?

80°C to 120°C, depending on CPU model.  Anything since the Pentium M will
throttle its clock, but that is simply not enough if the heat buildup is too
fast (which is what happens when the heatsink is effectively detached from
the CPU), as temperature will still manage to climb to the second emergency
threshold, and the CPU will shutdown entirely.

I am *not* sure the firmware powers down the rest of the box when the CPU
decides to shutdown due to heat.  If it doesn't, pressing the power button
for a long period will request the EC to power-off the entire box.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh