[ltp] Experience from updating RH 6.2 to RH 7.0 on TP 770X (fwd): Token Ring

Burt Silverman linux-thinkpad@www.bm-soft.com
Fri, 22 Dec 2000 13:43:28 -0500


I meant for this to get onto the forum, I think I missed the first time.

The key items are:

1) Please get the latest ibmtr_cs.c from pcmcia-cs-3.1.23. It won't prevent
TR problems, but it will make the cleanup phase more reliable if you do
decide to pull the card or restart PCMCIA services.

2) If your problem is during the adapter open phase, and you see messages
indicating failure to open, my guess is that you have a bad Ring Parameter
Server in your network. It is possible that the old driver is more tolerant
of that failure, and I will go back to that style. But I recommend that you
push hard on your network operator to upgrade the RPS. These people often
need to be pushed very hard; but they do have responsibility for keeping
the networks configured properly with up to date equipment and/or software
in that equipment. One giveaway is if you experiment with one of IBM's
newer PCI (and probably CardBus) adapters, and you almost never have
problems opening on the ring, in contrast to the other adapters, then you
probably have a misconfigured RPS.

3) The fix below may help with some of the problems. I'm anxious for people
to test.

 > (2) the " tr0: adapter error: ISRP_EVEN : 04" which only happens on
  >    the Netfinity and some folks (George Staikos) claim to be
  >    associated with SMP machines.

Hi Friedemann,

I'm doing lots of changes, but there is one very simple one which it would
be great for you to test

-       writeb(~CMD_IN_SRB, ti->mmio + ACA_OFFSET + ACA_RESET + ISRA_ODD);
+/*BMS   the next line causes nightmares. Don't second guess the adapter!  */
+/*BMS  writeb(~CMD_IN_SRB, ti->mmio + ACA_OFFSET + ACA_RESET + ISRA_ODD);*/

In other words, go through ibmtr.c and comment out any occurrances of the resetting of CMD_IN_SRB in ISRA_ODD. Do not comment any lines that set the
bit; those are important.

This has to be a bug. The term nightmares in the comment is relative, but
for me trying to figure out why moving a line of code a notch forward or
backward in time would break things on my testbed; places where a notch one
way or the other should not have been critical, this was a welcome
discovery. It has been sitting in the code for a long time. Fixing it made
those time critical problems disappear. I don't know with certainty that
the ISRP_EVEN 04 will go away. But ISRP_EVEN 04 is an Access interrupt
error reported by the adapter, a shared RAM access violation or an illegal
MMIO operation by the computer to an Attachment Control Area register pair
has occurred. It is probably lucky that I moved a line or two of code, or I
would not have looked at the code closely enough to spot this error. I
discovered it no more than a few nights back.

The way things usually work is we set an interrupt bit, then they act on it
and they reset the bit when they are done. But here we are resetting the
bit. Perhaps the perfect adapter would simply ignore our attempt to reset
the bit, but that does not seem to be the case. They can't protect us from
all dum mistakes, all the time!

I say "we" set a bit and "they" reset the bit. With TR adapters, you
effectively have interrupts in both directions, from adapter to host, and
from host to adapter.

I'll keep my fingers crossed.

Burt

----- The Linux ThinkPad mailing list -----
The linux-thinkpad mailing list home page is at:
http://www.bm-soft.com/~bm/tp_mailing.html