[ltp] T40: airo_mpi death under high load? (fwd)

Fabrice Bellet linux-thinkpad@linux-thinkpad.org
Thu, 14 Aug 2003 17:57:08 +0200


On Wed, Aug 13, 2003 at 08:07:00PM +0100, honey@gneek.com wrote:
> I really didn't expect this, and was about to report it as a potential
> hardware problem to IBM after struggling with it for days: also
> because the card now sometimes unexpectedly drops out in Windows XP
> too, the little I've used it (Cisco util shows the MAC address
> suddenly as 00:00:00:00:00:00).  I'll guess I'll put that down to XP
> not hardware and await your verdict.
> 
> Unless of course we both have the same hardware problem under high
> load...

I have the same drop out problems under high load too. Both with the
airo_mpi driver and with the cisco driver too, which is somewhat logical, as
both share a very similar code for tx/rx/interrupt handler. I can only test 
with Linux.

Did you try the cisco driver for linux in the same load conditions ? And
is it more stable ?

Mine quietly stops working with a message "NETDEV WATCHDOG:
eth1: transmit timed out" in my logs. Trying to remove the driver in this
state usually freezes the machine. I made a lot of tests during several 
hours, and another typical error message of the cisco driver is :

Aug 14 14:28:52 localhost kernel: Command int!
Aug 14 14:28:52 localhost kernel: Link stat int ls=8001
Aug 14 14:28:52 localhost kernel: No carrier
Aug 14 14:30:56 localhost kernel: venuscommand cmd = 21
Aug 14 14:30:56 localhost kernel: venuscommand status = 8914
Aug 14 14:30:56 localhost kernel: venuscommand Rsp0 = ec45
Aug 14 14:30:56 localhost kernel: venuscommand Rsp1 = 428b
Aug 14 14:30:56 localhost kernel: venuscommand Rsp2 = 8918
Aug 14 14:30:56 localhost kernel: venuscommand cmd = 21
Aug 14 14:30:56 localhost kernel: venuscommand status = 7f21
Aug 14 14:30:56 localhost kernel: venuscommand Rsp0 = 6
Aug 14 14:30:56 localhost kernel: venuscommand Rsp1 = 5489
Aug 14 14:30:56 localhost kernel: venuscommand Rsp2 = 424

This problem is quite hard to diagnose, because we cumulate difficulties :-)

  1. no documentation is available, so the meaning of these status registers
     is unknown.
  2. not always the same crash profile.
  3. not easily reproductible.

Best wishes,
-- 
fabrice