[ltp] Crash on resume

Michael Selway linux-thinkpad@www.bm-soft.com
Wed, 28 Aug 2002 11:24:27 +0100


--3sAfty+Tzi
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit

Richard Neill writes:
 > >>However, one time in 5 (or so), the machine won't wake from suspend.

I bought a T21 about a year ago and I've had some real problems
getting the machine out of suspend.  At its worst it would fail to
resume around 50% of the time.  At best it stays up for a month.
Some kernels are much better than others: when I upgraded to
redhat 7.3 (kernel 2.4.18) it was at its worst - I posted a
message to this list at the time (attached) but there were no
replies.

Most of the problems turned out to be with my pcmcia modem card,
which explains why no one else was having the same problems.  If
the modem card was physically removed or switched off in software
("cardctl eject"), then the machine would resume OK most of the
time.  It would fail to resume though if I inserted or removed the
power cord while it was suspending.  Following a hunch, I forced
the kernel parameter APM_ALLOW_INTS to be switched off, and this
seems to have solved the power cord problem.  But to be sure I
need to switch it back on to test the problem returns, which I
haven't done yet.  NOTE THAT it is hard to switch APM_ALLOW_INTS
off: there is code in arch/i386/kernel/dmi_scan.c which forces it
on if it detects a thinkpad.  You need to edit this source file
and rebuild the kernel (I can be more precise if you'd like to
experiment).

I'm still getting around 1 crash per month (last crash July 31st,
so it's imminent!).  My current lead is that this is related to an
NFS mount from a Solaris 8 machine (yes that sounds rediculous,
but there is some evidence and it feels like I've tried everything
else).

To answer the questions in your message directly, all the things
you list sound plausible causes: I went through even more
possibilities, eliminating them one by one.  And as for logs, do
go back and do the archeology on /var/log/messages after each
crash: there are sometimes some hints as to the problem in there.

Finally, have you checked through the PCMCIA HOW-TO?  There are
several hints as to things you should consider putting in your
/etc/pcmcia/config.opts file on a thinkpad.  This has solved
problems for me on other thinkpads, though it didn't help with my
T21.

Good luck,
Michael.

--

--3sAfty+Tzi
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit

MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <15620.31326.638834.705441@biff.ssl.co.uk>
X-Mailer: VM 6.93 under Emacs 21.2.1
To: linux-thinkpad@www.bm-soft.com
Subject: T21 and RedHat7.3: won't resume after suspend 
Date: Mon, 10 Jun 2002 11:07:26 +0100

I bought a T21 9 months ago, I only run Linux on it.  On the
whole, it's great.  There have been 2 recurrent problems which
I've never fixed:

 - it crashes in various circumstances around once a month

 - around 50% of the time coming back from suspend, it hangs for
   around 15-20 seconds before the disc then spins up and all is
   then fine.  It logs these messages:

	kernel: ide_dmaproc: chipset supported ide_dma_lostirq func only: 13
	kernel: hda: lost interrupt

I've not tried very hard to solve these problems.

Last week I upgraded to Redhat 7.3 (kernel 2.4.18).  Now, around
50% of the time when resuming, the disc doesn't spin up at all.
The machine is running, but anything which tries to use the disc
hangs - and this is pretty much everything, of course.  I have to
reboot to sort it out.  It feels like this is the same problem as
I had before, only it doesn't solve itself after 20 seconds like
it used to.

I've tried several things, including D.Sen's advice (a while back)
for sorting out "lost interrupt" messages, disabling disc DMA,
unloading ethernet drivers before suspending, not loading sound
drivers, etc, etc.  None of these things made a difference.

It seems to be related to attaching/detatching mains power
before/during/after suspending.  But I haven't been able to find
any reproducable correlations.

Any thoughts?  This is really debilitating!
Michael

--
p.s. 3 cheers for redhat's bundling of the ext3 filesystem in
rh7.2: at least rebooting doesn't need to check the disc and only
takes a couple of minutes now.

--3sAfty+Tzi--

----- The Linux ThinkPad mailing list -----
The linux-thinkpad mailing list home page is at:
http://www.bm-soft.com/~bm/tp_mailing.html