[ltp] Re: Re: Re: EC Hanging with latest tp-smapi (0.29)

TNKS linux-thinkpad@linux-thinkpad.org
Wed, 30 Aug 2006 12:36:34 -0500


Henrique de Moraes Holschuh wrote:

> On Tue, 29 Aug 2006, TNKS wrote:
>> big difference between 0ms and 10ms.  I'm feeling like there /is/
>> something special about 0x0b. Well, I /know/ something is special about
>> it, because I do get warnings when I do the dump:
> 
> Well, if you change the loop to ONLY call 0x0b and nothing else, does it
> hang?  Or if you invert the loop, so that it starts with 0x0b and walks
> its way to 0x00 ?
> 
> 0x0a is also suspect.  Inverting the loop will test 0x0a.

Okay, so I did two tests, which I think are informative.  First I made the
for loop
  
    for (arg0=0x0a; arg0<=0x0a; ++arg0)

Everything did look normal.  I did somewhere in there get exactly one

    thinkpad_ec: thinkpad_ec_request_row: EC is mysteriously silent
      (0x0a:0x00)->0xa0
    thinkpad_ec: thinkpad_ec_read_row: failed requesting row  
      (0x0a:0x00)->0xfffffffb

This was odd, but it didn't hang the EC permanently, and there was only one
of these messages.  I let the loop go for a while.  Outside of that one
message, everything seemed stable and normal.

Then I made the loop

    for (arg0=0x0b; arg0<=0x0b; ++arg0)

This did what I thought it might do; I got a flood of error messages because
I was accessing 0x00b.  Then the EC hung, and I got a different flood of
messages until I rebooted.  Here's what the transition point looks like.

    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_data: 0x161F  
      reports error: (0x0b:0x00)->0x80
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_data: 0x161F 
      reports error: (0x0b:0x00)->0x80
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_data: 0x161F 
      reports error: (0x0b:0x00)->0x80
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_data: 0x161F 
      reports error: (0x0b:0x00)->0x80
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_data: 0x161F 
      reports error: (0x0b:0x00)->0x80
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_request_row: EC is 
      mysteriously silent: (0x0b:0x00)->0xa0
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_row: failed 
      requesting row: (0x0b:0x00)->0xfffffffb
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_request_row: bad 
      initial STR3: (0x0b:0x00)->0xa0
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_row: failed 
      requesting row: (0x0b:0x00)->0xfffffffb
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_request_row: bad 
      initial STR3: (0x0b:0x00)->0xa0
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_row: failed 
      requesting row: (0x0b:0x00)->0xfffffffb
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_request_row: bad 
      initial STR3: (0x0b:0x00)->0xa0
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_read_row: failed 
      requesting row: (0x0b:0x00)->0xfffffffb
    Aug 30 11:46:38 hole kernel: thinkpad_ec: thinkpad_ec_request_row: bad 
      initial STR3: (0x0b:0x00)->0xa0

I didn't do the loop-inverting suggestion you had yet because I thought
these results were pretty interesting, and so I thought you might
re-evaluate what you want to do next.  

> Please send me a full dmesg output of the thinkpad_ec errors on private
> mail, I will try to track down exactly where they happen in the
> thinkpad_ec state machine, and that might give me a new clue, if we're
> lucky.

I've put my kern.log up on my web space at school at 
www.cs.utexas.edu/~tnks/kern.log.  My archived dmesg logs, unfortunately
aren't saving the messages I noted above (they're pretty sparse, actually),
but kern.log is getting these messages.  kern.log is mammoth right now at
7MB because it's getting flooded with these error messages (especially with
all this experimentation).  I think after a quick look at it (start at the
bottom), you'll see that it's not exactly all that informative.  It's
basically a normal kern.log.  Outside of the timestamps, though, I'm just
not sure how much you can infer from about the state of the EC.  The
snapshots I've shown above are really all I can gather of importance from
it, but I've made it public just in case you can gather more.  

> I bet if we profiled the EC calls we would know, but _that_ ain't a
> trivial thing to ask you to do.

If the experimentation we're doing isn't cutting it, I'm willing to learn
what it takes to profile the EC.  I'm interested, and I should be competent
enough to figure out what I have to.  

- Sukant