[ltp] User processes disappearing while idle

Jeffrey L. Taylor linux-thinkpad@linux-thinkpad.org
Sat, 4 Jun 2011 18:33:37 -0500


Quoting Richard Neill <rn214@richardneill.org>:
> 
> 
> On 04/06/11 17:38, Jeffrey L. Taylor wrote:
> >OpenSuSE 11.4, KDE, Lenovo T520: while idle, almost all user processes will
> >disappear.  Only two daemons (fetchmail and search) and kde4d w/ 2 child
> >processes remain.  The screen brightens slightly on tapping a key or wiggling
> >the mouse, but is still dark gray.  I can switch to tty1.
> >
> >Two questions: Any way to prevent it from happening?  Any way to restart the
> >desktop environment short of rebooting?
> >
> >While trying to send this e-mail, the keyboard and mouse clicks stopped
> >responding.  The mouse could still move the cursor, but nothing else
> >responded, including the 3 finger salute (reboot shortcut).
> 
> Is there anything interesting in the logs (syslog, dmesg etc)? What
> is the memory usage like? Could the oomkiller be getting them?
> I find it very useful to run a CPU/Memory monitor applet in the
> taskbar, so as to see if the system really is idle. Can you ssh into
> the machine, and still have a working connection as it starts to
> fail?
> 

Jun  4 09:59:01 viajero kernel: [73516.813960] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jun  4 09:59:01 viajero kernel: [73516.815783] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 2449443 at 2449439, next 2449444)
Jun  4 09:59:01 viajero kernel: [73516.815903] [drm:i915_reset] *ERROR* Failed to reset chip.
Jun  4 10:01:01 viajero /USR/SBIN/CRON[20684]: (jeff) CMD (cd /home/jeff/Rails/amethyst2production; ruby enqueue_refresh.rb)
Jun  4 10:01:47 viajero polkitd(authority=local): Unregistered Authentication Agent for unix-session:/org/freedesktop/ConsoleKit/Session1 (system bus name :1.30, object path /org/kde/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Jun  4 10:01:47 viajero kdm[1175]: X server for display :0 terminated unexpectedly

The first line is always present before a hang.  The next two are frequently
present, but not always.  The last line (X server for display :0 terminated
unexpectedly) is frequently but not always present.

Googling the error message reveals that it appears to be an intermittent bug
in the kernel and/or userspace xorg-drv-intel component that comes and goes
with kernel versions.  It occurs in Fedora and Arch, as well as OpenSuSE, and
probably others.  Lenovo T520 is mentioned by name, at least once.  The Dell
Latitude E6320 is also.  AFAICT, it is a Sandybridge problem.

Doesn't look like it is going to be fixed soon.

Sigh,
  Jeffrey