[ltp] Linux kernel instability? (Rant/Panic/Cry-for-help!)

Richard Neill linux-thinkpad@linux-thinkpad.org
Sun, 28 Sep 2008 01:02:13 +0100


Dear All,

I'm just wondering - am I just unlucky, or has the quality of the Linux 
kernel nosedived in the last 6 months?

In just the last week, I've experienced multiple cases[1] of kernel 
lockups[2], on different systems, crashing in different ways, on 
different[3] hardware.

Linux used to be reliable (uptimes of 6 months+), and even then, crashes 
were slightly predictable: if one avoided hotplug/hot-unplug/suspend, it 
was pretty well solid. However, these days, the experience is about as 
bad as Win98SE, and running Cgwin+KDE on the XP kernel is looking 
tempting! Userspace isn't exactly great either (firefox crashes weekly; 
konqueror crashes daily).

What am I doing wrong?
   [Am I just really really unlucky?]

Are other people having the same rate of crashes?
   [Or is this actually a common complaint?]

Any thoughts on what could cause this?
   [I can rule out hardware faults, overheating, power-glitches,...
   It could possibly be a gcc error, but I suspect it's just a lot
   of poorly debugged[4] kernel code.]

Any suggestions what to do?
   [Are there any ways to get a non-antique kernel which has actually
   been tested and proven stable?
   Is it possible to run eg a BSD-kernel on an otherwise Linux system?
   Is there any way to usefully de-bug a totally unresponsive machine?]


Thanks for reading, and for any suggestions you might have.

Richard


P.S. On a lighter note, I really want something like this:
http://stubbornfacts.us/humor/microsoft_wesyp
[This video was actually produced by Microsoft themselves]


-----------

1. Among the crashes, have been:
  * Kernel oopses on NFS4, and UnionFS on several different machines 
running Ubuntu Hardy, 2.6.24

  * "BUG: soft lockup - CPU#0 stuck for 11s!"  on my T60p, running 
Ubuntu Hardy (different 2.6.24.x kernel version to the above), causing 
the system to have unkillable processes, and to corrupt the filesystem 
on shutdown.

  * Random lockups of my desktop machine (every few days) running 
Mandriva 2008.0.

This has happened on 6 different machines, running different hardware, 
and different kernel versions (and distros).

----------

2. Neither the local user, nor a remote connection by SSH will work. In 
some cases, it's possible to ping the system, but not always. Sometimes 
even SysRQ-B doesn't work.

------------

3. About the only thing in common is that they are all 64-bit intel 
core2 duo/quad. There's very little else shared by all the systems.
I administrate about 20 different Linux machines (variously servers, 
laptops, desktops, running different distros and versions), and none of 
them is really solid.
(Even the few 586 machines are pretty flaky, excepting the ones running 
2006-era distros, which I daren't update.)


-----------

4. I feel slightly hypocritical here; I can't write kernel code at all, 
so it seems unfair to criticise those who do. On the other hand, it's 
very awkward when all the people to whom I've advocated Linux for its 
stability come back to me and say "but it crashes every few days".