Hi Michael,
On 9/22/21 3:59 AM, Michael Johnson wrote:
Hi everyone,
In my use case that involves heavy roaming, I've been noticing that
occasionally `iwd` disconnects completely from the network before
rejoining which can cause a few seconds of downtime. In addition to
the downtime it seems to put the network in an odd state which is
hitting a bunch of corner cases in code that uses the network more
than just a normal disconnect. Looking at the logs, it seems like iwd
might be getting locked up somewhere as it also stops reacting to
threshold notifications?
This is indeed strange. iwmon would capture this, but maybe it is easier to put
in a bit more debugging into the CQM paths and see if we get some more clues?
Maybe the CQM events are propagated correctly but the signal isn't below the
roaming threshold?
I'm not sure how to reproduce this as it's relatively rare, happening
3 times in about a week. The logs below are from a machine running
version 1.17.
I'll see if I can start a rolling iwmon capture to get more info. Is
there anything else that would be useful?
Might be chasing ghosts, but the only other thing I see is a a few CMD_FRAME
events (see below) that we receive which are not recognized? Or at least I
don't see any logs of them being parsed. iwmon trace will capture these, but
may be easier to log them in your case.
```
Sep 21 15:40:22.138376 p3-683 iwd[425]:
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX
Status(60)
Sep 21 15:40:22.140101 p3-683 iwd[425]:
src/netdev.c:netdev_unicast_notify() Unicast notification 59
Sep 21 15:40:22.140101 p3-683 iwd[425]:
src/station.c:station_early_neighbor_report_cb() ifindex: 5, error:
0()
Did we just roam, or is something else going on?
Sep 21 15:40:22.140101 p3-683 iwd[425]:
src/station.c:parse_neighbor_report() Neighbor report received for
ea:9e:38:78:07:b1: ch 44 (oper class 0), MD not set
Sep 21 15:40:22.141734 p3-683 iwd[425]:
src/netdev.c:netdev_unicast_notify() Unicast notification 59
We're getting a CMD_FRAME here
Sep 21 15:40:22.142298 p3-683 iwd[425]:
src/netdev.c:netdev_unicast_notify() Unicast notification 59
And here. Wonder what they are?
Sep 21 15:40:22.166050 p3-683 kernel: wlan0: Limiting TX power to 27
(30 - 3) dBm as advertised by ea:9e:38:78:08:71
Sep 21 15:40:22.473706 p3-683 iwd[425]:
src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Sep 21 15:41:12.767433 p3-683 iwd[425]:
src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Two CQM events. We need to know what they actually contain.
Sep 21 15:46:41.998639 p3-683 kernel: wlan0: deauthenticated from
ea:9e:38:78:08:71 (Reason: 34=DISASSOC_LOW_ACK)
Hmm, never seen this before. I wonder if the power limit event the kernel
prints about earlier has something to do with this?
Regards,
-Denis