[ltp] memory problems ?

Kacper Wysocki linux-thinkpad@linux-thinkpad.org
Fri, 13 Jun 2003 15:38:22 -0400


On 2003.06.13 05:59, Fabrice Bellet wrote:
> On Fri, Jun 13, 2003 at 01:37:03AM -0700, Robert Hajime Lanning wrote:
> > That would not cause the symtoms he is having.  If you have HIGHMEM
> turned
> > off, you just will not have 1024MB of ram.  The kernel will "see"
> 896MB.
> > This has to do with how kernel memory is mapped to the upper 1 gig
> of
> > address space in a process.
> >
> > Ok, I just took Ted's Usenix Linux kernel tutorial on Monday. :)
> >
> > Random processes dying like that would most likely be caused by
> either
> > an overheated CPU or by bad memory.  I have had both issues on my
> normal
> > PC.  (Overheated 486DX4-100, and, most receintly, a memory stick in
> my
> > new Athlon.)
> 
> You're right.
> 
> I enabled HIGHMEM (4GB) support in my custom kernel, and jumped from
> approx
> 926000000 to 1058029568 total available memory. I'll have to stress
> the machine
> a bit in this new configuration, but I doubt this'll make disappear my
> random
> crashes. I'll keep you informed.
> 
> Moreover the memtest86 program doesn't have this compilation option,
> and can
> test memory above 1GB.
> 
> I have a long experience of problems with faulty memory chips.
> Basically, ALL
> *noname* 512MB modules, with very few exceptions, that I put in my
> machines
> were buggy on the long term. Some of them started causing problems
> after a few
> weeks (running 24/24 7/7), some were so buggy that I couldn't even
> enter the
> BIOS of the machine. Symptoms with Linux were always the same : random
> gcc
> crashes (internal error, signal 11, ...), random seg faults of
> userland apps,
> kernel oops (less frequent). Since these various bad experience, I
> stay with
> chips from well known brands, possibly with ECC, to avoid these kind
> of
> problems...


I hadn't tried memtest86 before, but now that I have seen it I 
definitely agree that no kernel option will fix those mem problems..
Does memtest in linux show the same problems as memtest86? Or can it 
not access the space for testing?

A couple of things (if you haven't tried them already):

1. Tweak the BIOS options for your memory
2. Try the module on a different machine, or
3. if possible, try swapping the working module with the one causing 
problems (but I'm guessing the working module is built-in?)


So if 1) doesn't help, then 2) or 3) should indicate whether your 
module is faulty. In that case, you shouldn't have any problems getting 
a new one under warranty.


cheers,
	K