[ltp] Linux kernel instability? (Rant/Panic/Cry-for-help!)

Henrique de Moraes Holschuh linux-thinkpad@linux-thinkpad.org
Sun, 28 Sep 2008 12:15:50 -0300


If you guys want any of us people closer to the kernel to even participate
on threads as this, there is an EXTREMELY important information you must
give:  whether you are using any binary blobs.  Also which kernel, but most
of you provide that information already.

Really, because if you use nVidia drivers (hello T61 users), fglrx (hello
most 3d-using ThinkPad users), madwifi (hello IBM wireless adapter users)...
your experiences will only be valid for people using the very *same* version
of the binary blob and mainline kernel: these drivers often do NOT play nice
with mainline, and they DO regress when you upgrade mainline (or just the
driver).

My T43 has been extremely stable, but I play it safe:

I don't use Ubuntu's patched kernel (I'd use either Fedora's or SuSE's, they
DO have a team large enough to take care of it.  Ubuntu doesn't).  I sort of
trust Debian's kernel because it is *really* close to mainline, often it has
fixes for weird arches (doesn't matter for a x86 build) and minor backports
of stuff that has been accepted upstream but is not in stable mainline yet.
The only heavy weight patching I do is tp_smapi, plus whatever crap I am
working on (currently rfkill and thinkpad-acpi).

I don't use bleeding edge kernels except for fast testing (laptop production
is currently at latest 2.6.25 + a few selected patches.  Server production
is at 2.6.16.y), and take note that I *am* a kernel driver developer.  I
don't use ANY binary blobs at all, EVER.  I don't use the latest
fuck-the-box-to-get-some-useless-eyecandy crap from KDE or Gnome, that is
likely to touch something fragile in X.org.  I don't enable known-unstable
or still immature stuff in the kernel (PAT, group scheduling,
virtualization, containers, etc).

And I never do hibernation, just suspend-to-RAM (VERY often, so I am 100%
sure it is working perfectly, even if it were a crash in 1000, I'd hit it
within a week).  No undervolting or screwing up with the clock, either.

This doesn't mean the quality of mainline's core is not changing for the
worst, it really might be.  But it DOES mean that there is such a thing as
"playing it safer"...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh