[ltp] Thinkpad X31 - Intriguing Hard Freeze - Semi-Random but Reproducible

Richard Neill linux-thinkpad@linux-thinkpad.org
Mon, 09 Apr 2007 01:03:56 +0100


Peter Heatwole wrote:

> In short, the computer demonstrates random hard lockups. (The computer 
> must be powered down and restarted.) The most consistent way I can 
> reproduce this is to play a high defintion video in mplayer. (For 
> example the 720p trailer for Unreal Tournament 2007.) Videos cause the 
> computer to lockup in both linux, Windows 2000 (with and without SP4), 
> and Windows XP. (Note it does not always crash the first run through, 
> nor in the same spot of the movie.)

Given that the crash occurs on multple OSs, I'd say you almost certainly 
have a hardware problem.
> 
> However, the computer has also locked up twice in plain linux consoles 
> (bash, no framebuffer) while compiling some large packages, and also 
> consistently in Memtest86 3.3 (but not with memtest86+ 1.70). Both 
> memtest86 and memtest86+ were run via bootable CD.
> 
> It gives every appearance of the randomness of thermal issues, but 
> thermal conditions seem nominal (never reaching above 66 degrees 
> celsius, the fan seems to work well).
> 
> I have tested the memory for 15+ hours in memtest86+ (note memtest86 
> freezes). I also tried replacing the memory with known good sticks from 
> my main computer (both PC2100).

I'd have said this looks like faulty RAM, but you seem to have been 
pretty thorough. That said, memtest can sometimes take up to a day to 
find an error - this has happened to me once.

> 
> I did find some xorg.conf settings on this list that largely alleviates 
> the problem when playing the hi-def trailer. If all I do is play the 
> video, everything runs just great.  However, if I simultaneously play 
> the trailer _and_ compile glibc or sync portage (intensive operation, 
> downloads a bunch of little files and updates the local Gentoo package 
> database) it freezes the machine.  Something about this added stress 
> brings down the computer. Also, I repeat that I have had hard lockups in 
> plain consoles, no graphics involved.

OK...if you have a hardware bug, what hardware would high loads stress?
My thoughts are:

1)  CPU (temp)  - but you already tested that.
2)  Power supply (power brick, or internal voltage regulator)
3)  Anything else indirectly heated? Eg thermal flexing of motherboard?


You haven't mentioned much about case 2. Here are some ways you could 
perhaps test it.


1)Run several instances of "nice yes >/dev/null".  This will load the 
CPU hard, but will use very little RAM. It will also make the PSU work 
fairly hard.

2)Try powering some things off USB (again, to load the PSU). You can 
draw 2.5 watts from each of the 3 sockets. Have the backlight on full too.

3)Try with ONLY the power brick, but no battery installed.

4)Swap the power brick over.

That's my best guess. Good luck with this - sounds tricky!

Richard