[ltp] Re: Interference of wlan usage and hdaps?

Elias Oltmanns linux-thinkpad@linux-thinkpad.org
Thu, 24 Sep 2009 17:44:33 +0200


Henrique de Moraes Holschuh <hmh@hmh.eng.br> wrote:
> On Wed, 23 Sep 2009, Elias Oltmanns wrote:
>> Henrique de Moraes Holschuh <hmh@hmh.eng.br> wrote:
>
>> Excessive spinning in sw interrupt context has a rather severe impact on
>> other timers running at the time. If they have fired shortly after the
>> atheros tasklet has started, the interrupt handler is delayed
>> considerably. Apart from everything else, this really messes up hdapsd's
>> book keeping, thus making it believe that a minor change in the accel
>> data has occurred in a particularly short time frame. For all hdapsd
>> knows, this indicates an emergency condition and it parks the heads
>> accordingly.
>
> That just means hdapsd has a severe design issue.  It should use timestamped
> events, and it should look at the timestamps to get the time component,
> instead of assuming it is being fed an isosynchronous stream...   It should
> also do something sensible if it notices it is processing old data :p

Trust me, it means far more than just that. See below.

>
> AFAIK the input layer does timestamp events at the time they are issued by
> the kernel.  If it doesn't, we need to move to something that does :-)

But hdapsd does use timestamped events already. In fact, using the input
system wouldn't be possible otherwise. Also, "old data" isn't an issue
here. In my opinion, the mathematical algorithm used in hdapsd has its
weaknesses, to say the least. Adding some more obscure hacks just
because a kernel design flaw (arguably a *bug*) makes these deficiencies
painfully obvious doesn't seem very appropriate if
a) the issue could be resolved by fixing kernel driver code and / or
b) nobody but me appears to be troubled by this issue.

As far as ath5k is concerned, I have verified carefully (for 2.6.27, 11
months ago mind) that the driver spends even *more than 20 msecs* in the
softirq handler *regularly*, which seems quite insane. Apart from all
other implications that might spring to mind, I have also demonstrated
that this makes other timers fire *far too early*, e.g. timers that are
supposed to run for 20 msecs will sometimes fire after less than 5 or
even 1 msec. I seem to remember that this happened to timers initialised
with bigger timeouts too, but I'm not quite sure which periods I have
tested at the time.

As long as I am the only one who is affected, I don't blame people for
not caring very much about it. Since I have now observed similar
symptoms in an environment where ath5k doesn't figure at all, I'm
beginning to wonder, though, whether more people are affected by this
problem in disguise. Note that it took me some time to figure out that
disk head parking issues possibly could be and, indeed, are related to a
wireless driver. This experience suggests to me that there may quite
possibly be other applications (including kernel code) that behave
unexpectedly or even erroneously in the presence of such a driver,
leaving people puzzled as to the cause of that behaviour.

At this point, I have to emphasise again, that purely coincidental
observations and very little testing have lead to my suspicions against
the intel driver, there has been no analysis similar to that of the
atheros driver. The people working at the timer and scheduler code don't
expect (soft)irq handlers to block for 20 msecs, as I understand. You
can see Thomas writing something to that effect in the email (or the
thread leading up to it) I referred to by link in my previous post.
Unfortunately, I most likely won't have the resources to persue this
matter in terms of patches or in-depth analysis for some time yet.
Nevertheless, I thought it might be worthwhile to raise this issue in
case someone else was willing to pick it up---I'd do my best to assist
people as time permits, of course. First of all, however, I figured it
would be a good idea to find out, whether other people can reproduce
this problem and whether this problem really is similar to that of the
atheros driver. Since the easiest test case I can come up with involves
wlan and hdaps, I posted to this list rather than any of the kernel MLs.

Having said all that, another interesting observation I have made in the
meantime might indicate that things are actually different from the
atheros scenario now: The symptoms observed due to the usage of ipw2200
even survive a reboot as long as I don't actually power off the system.
That is to say, hdapsd sometimes parks disk heads quite unexpectedly
even when I have just rebooted into my old 2.6.27 kernel which doesn't
even include the ipw2200 driver. Very strange!

Regards,

Elias