Hi Craig,
On 04/11/2018 03:59 AM, Craig McQueen wrote:
Daniel Wagner wrote:
> On 04/04/2018 09:56 AM, Craig McQueen wrote:
>>> Could you post the debug log of both tests? It sounds like that in
>>> the first case, ConnMan falls back to the aggressive retry loop and
>>> in the second case it doesn't happen.
>>
>> I sent the debug logs last week. Did that provide what you needed?
>
> Sorry about that. Got lost in the inbox. I did look at it but I need some more
> debug prints enabled. There isn't much information in the log.
> Could you try to get reproduced though this time with '-d
> src/ntp.c:src/timeserver.c' added to ConnMan command line?
I found I needed to run:
connmand -d ../git/src/ntp.c:../git/src/timeserver.c
(I'm cross-compiling connman with the Yocto build system)
Ah, the joy of Yocto :)
I'm attaching a log file of a test done today. The device is
plugged
into an Ethernet switch. I unplugged the switch's upstream Internet
connection a few times, corresponding roughly to these two times in the log:
Apr 11 10:37:50
Apr 11 10:56:49
Apr 11 11:01:42
The last three lines of the log are most interesting, because it tries to use an IPv6
address:
Apr 11 11:01:42 wt000043 daemon.debug connmand[30185]:
../git/src/timeserver.c:sync_next() Using timeserver 2405:3c00:5200:100::1
Apr 11 11:01:42 wt000043 daemon.debug connmand[30185]: ../git/src/ntp.c:start_ntp()
server 2405:3c00:5200:100::1 family 10
Apr 11 11:01:42 wt000043 daemon.err connmand[30185]: Failed to open time server socket
I wonder if the problem was related to it trying to use a server
with an IPv6 address, even though the device's kernel has no IPv6 capability.
Yes, that sounds like the source your problem. Obviously, ConnMan
shouldn't try to use an IPv6 time server if there is no support
for IPv6 in the kernel.
After this, no further ntp/timeserver activity has occurred (for 54
minutes, at the time I'm writing this).
>> I'm interested to fix this issue soon. Has any work been done on this, or
> could I look into it?
>
> So far I haven't done anything. If you want to dig into the problem, I suggest
> to look at the debug output of ntp.c and timeserver.c and figure out why the
> state machine is not going back the initial state looking for a NTP server. From
> your description it seems we miss the some event and don't reset the state
> machine.
I can consider looking into it. First, I would be very interested to
hear your thoughts on the IPv6 address in the above log.
Can you try this one here?
diff --git a/src/ntp.c b/src/ntp.c
index 3c40f6210b81..335065f813e3 100644
--- a/src/ntp.c
+++ b/src/ntp.c
@@ -531,7 +531,9 @@ static void start_ntp(struct ntp_data *nd)
nd->transmit_fd = socket(family, SOCK_DGRAM | SOCK_CLOEXEC, 0);
if (nd->transmit_fd <= 0) {
- connman_error("Failed to open time server socket");
+ if (errno != EAFNOSUPPORT)
+ connman_error("Failed to open time server socket");
+ nd->cb(false, nd->user_data);
return;
}
If this works we should check all error paths in ntp and call the
callback to let the upper layer know something didn't work out.
By the way, what might cause ConnMan to use a 16 second polling
interval, rather than the normal 1024 s (17 min) polling interval?
The decode_msg() function in ntp.c extracts the polling interval from the
message it receives. According the logs the server tell us to poll
all 16 seconds. Obviously, the ConnMan code could be buggy. Could you
check with wireshark what the actually value in the frame is?
Thanks,
Daniel