[ltp] 2nd gen X1 carbon 3g/lte Sierra Wireless EM7345 4G LTE

Bjørn Mork linux-thinkpad@linux-thinkpad.org
Sun, 01 Jun 2014 17:18:16 +0200

Bjørn Mork <bjorn@mork.no> writes:

>> <<<<<<   data   = 03:00:00:00:30:00:00:00:03:00:00:00:01:00:00:00:00:0=
>> >>>>>>   data   = 02:00:00:80:10:00:00:00:02:00:00:00:00:00:00:00
> And the device just continues to reply to tid 2 with "close done".  That
> is completely unexpected.  There is no acking of these messages, so the
> device should never send duplicates unless we send ducplicate
> requests. It's normally just fire-and-forget.
>> >>>>>>   data   = 03:00:00:80:38:00:00:00:03:00:00:00:01:00:00:00:00:0=
> But then we actually do get a reply to one of our requests anyway?  Stran=
>> <<<<<<   data   = 02:00:00:00:0C:00:00:00:04:00:00:00
>> [07 Apr 2014, 12:39:57] [Debug] [/dev/cdc-wdm0] Sent message (translated=
) ...
>> <<<<<< Header:
>> <<<<<<   length      = 12
>> <<<<<<   type        = close (0x00000002)
>> <<<<<<   transaction = 4
>> error: couldn't close device: Transaction timed out
> But we don't see the reply to the "close", after all those unexpected
> "close done" replies.

Just FYI: Now that I finally have one of these modems myself I've been
able to figure out what's happening here.  I believe the issue is that
the modem under certain conditions will queue up replies and indications
for which it never successfully have notified the host. The result is
that the driver and modem gets out of sync. The modem keeps adding
messages to its queue and notifying the driver, and the driver will read
the *first* message in the queue.  The problem is that there are more
than one unread messages queued up, and the first message is not the one
corresponding to the current notification.

I don't really know how to fix this.  I sincerely believe it is a modem
firmware bug.  The modem is supposed to notify the host about *every*
message it queues up.  The question is of course how it should handle
notification failures.  I don't see any way the host can possibly deal
with those (as the problem really is that the modem cannot communicate
with the host at the time of failure), so I believe it is up to the
modem firmware to either retry the notification or drop the message.
The usage pattern triggering the bug is probably unexpected seen from
the firmware developers side, but I still think it should be handled
better than this.

The way we trigger this problem is for example by using mbimcli with
--no-close and some command which takes time to complete.  We then send
the command and the modem prepares a reply.  But by the time the modem
is ready to notify us about the reply, we have already closed the
character device and the driver has killed the interrupt URB.  So the
modem is unable to send its notification.  But it still queues the
reply, and we are out-of-sync...

The good news is that this won't happen during what I consider "normal"
usage: If you let ModemManager manage the modem then the character
device is not closed and the interrupt URB is kept active.  So the modem
can successfully send all its notifications, and we will empty the
message queue immediately.