Hi Daniel,

On Fri, Dec 23, 2016 at 12:32 AM, Daniel Wagner <wagi@monom.org> wrote:
>
> Hi Shrikant,
>
> It looks like your email client sends a plain text and as HTML too. As with may Linux related mailing list we prefer plain text message. Could you please check your settings. Thanks.
Sure will take care to send only plain text, sorry.

>
> On 12/22/2016 02:07 PM, Shrikant Bobade wrote:
>>
>> getrandom(0x7ef1ac40, 1, GRND_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
>
> > execution just stuck at this point..
>
> getrandom returns immediately with GRND_NONBLOCK. It doesn't block. The library or whatever is blocking. The strace doesn't really help here. Can you try to get a complete stacktrace, maybe using gdb?
>
tried to get the complete bt using gdb.. but it seems the stack got corrupted.. meanwhile will share further if any details bt we can get..

Breakpoint @ _rnd_get_system_entropy_getrandom (_rnd=0x7efffc78, size=32)

bt
#0  _rnd_get_system_entropy_getrandom (_rnd=0x7efffc78, size=32) at ../../../gnutls-3.5.3/lib/nettle/rnd-linux.c:98
#1  0x76e24344 in do_device_source (init=init@entry=1, event=event@entry=0x7efffcdc, ctx=0x76e62a38 <rnd_ctx>) at ../../../gnutls-3.5.3/lib/nettle/rnd.c:132
#2  0x76e244ac in wrap_nettle_rnd_init (ctx=<optimized out>) at ../../../gnutls-3.5.3/lib/nettle/rnd.c:234
#3  0x76d72a28 in _gnutls_rnd_init () at ../../gnutls-3.5.3/lib/random.c:49
#4  0x76d64dfc in _gnutls_global_init (constructor=constructor@entry=1) at ../../gnutls-3.5.3/lib/global.c:307
#5  0x76d3d948 in lib_init () at ../../gnutls-3.5.3/lib/global.c:504
#6  0x76fdf2dc in call_init.part () from /lib/ld-linux-armhf.so.3
#7  0x76fdf438 in _dl_init () from /lib/ld-linux-armhf.so.3
#8  0x76fcfac4 in _dl_start_user () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
>>>     I think I don't really understand what you describe here. Does
>>>     'attempt' mean the user/application tries to tell ConnMan establish
>>>     a connection and this is done using the D-Bus API. Or do you mean
>>>     the auto connect state machine?
>>
>>
>> ok, adding the details below..
>>
>> using systemd as init-manager, with connman.service status check
>> observed no activity for few mins(w.r.to <http://w.r.to> above case of
>> strace getrandom), internally observed during this time connman.service
>> get re-started and we can see new instance of connmand..i.e. attempt
>> during 2 to 3 attemps/re-start of connman.service when the target
>> entropy reached nearly between 10 to 30 range got the getrandom hang
>> resolved..with further connman execution getting the ip assignment, so
>> actual delay happening during multiple attempts for pass-through of
>> getrandom.
>
>
> From the strace it doesn't look like the kernel blocks. Without a real stacktrace we can just do some wild guessing.
>
yes, agree. its difficult to get on with only starce log.

>> Now to resolve the getrandom hang, used rng-tools, with rngd running via
>> rngd.service(before connman.service) getting available entropy between
>> 3k to 4k, & call to getrandom getting passed easily, so almost no delay
>> in ip assignment from user point of view.
>
>
> Still I don't get where we can block for the IP assigment. The wispr code is called _after_ the ip assignment. So what's holding up our IP assignement.
>
> Can you start ConnMan with
>
>         CONNMAN_DHCP_DEBUG=1 CONNMAN_DHCPV6_DEBUG=1 ./connman -d -n
> ?
yes, didn't got any debug info with it...
:~#  CONNMAN_DHCP_DEBUG=1 CONNMAN_DHCPV6_DEBUG=1 ./connmand -d -n 

:~# CONNMAN_DHCP_DEBUG=1 CONNMAN_DHCPV6_DEBUG=1 ./connmand -d -n  &
[1] 789
:~#
:~# pidof connmand
789
:~#

>
> Uuuuuh, I think I found it gdhcp/common.c:
>
> #define URANDOM "/dev/urandom"
> static int random_fd = -1;
>
> int dhcp_get_random(uint64_t *val)
> {
>         int r;
>
>         if (random_fd < 0) {
>                 random_fd = open(URANDOM, O_RDONLY);
>                 if (random_fd < 0) {
>                         r = -errno;
>                         *val = random();
>
>                         return r;
>                 }
>         }
>
>         if (read(random_fd, val, sizeof(uint64_t)) < 0) {
>                 r = -errno;
>                 *val = random();
>
>                 return r;
>         }
>
>         return 0;
> }
>
> So for a simple test, can you change the define to
>
> #define URANDOM "/dev/random"
>
> and see if you still block?

prepared the change, observed the first attempt is similar..

case 1: 0 entropy
:~# cat /proc/sys/kernel/random/entropy_avail                                                                                                                           
0
:~#

short log: blocks at
ugetrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
brk(NULL)                               = 0xc4000
brk(0xe5000)                            = 0xe5000
getrandom(0x7eeb6c30, 1, GRND_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 10000}, ru_stime={0, 0}, ...}) = 0
getrandom(

but with further attempts ..I think the slight entropy increase.. at that point.. may be due to keyboard activity..

case 2: slight entropy available..

:~# cat /proc/sys/kernel/random/entropy_avail
35
:~#

short log: blocks at
.
.
send(6, "<31>Dec 29 14:43:13 connmand[794"..., 124, MSG_NOSIGNAL) = 124
socket(AF_PACKET, SOCK_DGRAM|SOCK_CLOEXEC, 8) = 12
setsockopt(12, SOL_SOCKET, SO_ATTACH_FILTER, "\t\0\0\0\244<\t\0", 8) = 0
bind(12, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth2"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
fstat64(12, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fcntl64(12, F_GETFL)                    = 0x2 (flags O_RDWR)
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
open("/dev/random", O_RDONLY)           = 13
read(13,


& in further repeated attempts/execution getting the further execution..leading to ip assignment..
>
> cheers,
> daniel

Thanks
Shrikant