[Weekly meetings] MoM - 18th of April 2019
by Matthieu Baerts
Hello,
We just had our 46th meeting with Mat, Peter and Ossama (Intel OTC),
Christoph (Apple), Davide (Red Hat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Paolo's patch-sets:
- refactor xmit, new version:
https://lists.01.org/pipermail/mptcp/2019-April/001066.html
- addressed Mat's feedback, dropping patch 3/4 and switching to per
msk page frag use, and checking mptcp seq number for coalescing
- Mat didn't have time to do a deeper review, will do it ASAP
(hoping today) → once reviewed (and accepted), Mat will squash this in
the commit introducing the xmit code
- also a fix for mptcp_accept() → Mat will integrate that in the
commit introducing the bug
- https://lists.01.org/pipermail/mptcp/2019-April/001075.html
- Paolo is testing that.
- Found a bug in the substream (subflow) part. But when tracking the
bug, the bug is not visible, playing hide and seek :)
- Mat also found a race condition in mptcp_accept() but seems
different than what Paolo saw. For Mat, it might be related to the ULP
stuff. Maybe a bug due to concurrency.
- Note that with ULP, it is currently not used with listening
socket. Might have issues there.
Davide:
- he is looking at extending the ULP structure to get info about
MPTCP via ss.
- Mat: it might be good to also fix the fact the userspace can
create MPTCP subflow via ULP
- Also a question about having MPTCP in a module: but certainly
better to avoid this (lot of work to support this). We can also see if
maintainers request it.
questions about mptcp-level locking:
- https://lists.01.org/pipermail/mptcp/2019-April/001070.html
- simpler lock/release vs cmpxchg
- Mat's answer:
Right now, the mptcp-level socket lock is taken first when
called from userspace for sendmsg/recvmsg because the MPTCP-level socket
will need to be locked first to determine which subflow socket to work
with, and then lock the subflow it needs to use.
I had been thinking that calling "up" from the subflow to the
MPTCP-level socket would involve scheduling a worker that would be able
to lock in the same order as sendmsg/recvmsg. That does introduce some
latency, but I don't think MPTCP-level resends are as time-sensitive as
TCP retransmission (but maybe others on the list have more data or
experience on that topic).
As for the window size computation, I've been trying to think of
ways to handle that without needing the msk lock but haven't found the
right idea yet.
- in short, there was a missing lock (bug).
- worker vs using locks: worker seems fine for to manage regular
data but not when we need a have an immediate response. Especially when
there are middleboxes.
- adding a lock might reduces the race conditions issues but we need
to check if it is OK to do that in Linux.
- we need to be careful with the contention, ddos attacks, etc.
- we might need to add a lock when doing "subflow_finish_connect()
-> mptcp_finish_connect()"
- (sorry I missed the beginning) according to Paolo's research, we
might need 3 locks: one lock for the state structure)
- can be good to document that on the Wiki
@Paolo: may you document that in the wiki please?
I just checked and according to Github' settings: Any GitHub user can
create and edit pages to use for documentation, examples, support, or
anything you wish. Please ping me if you cannot do that!
Ossama:
- license in uapi: there should be a note mentioning that we can use
this file in a non GPL project.
Paolo:
- Pushing patches to Gerrit introduces some overheads.
- we all agreed it is OK to continue to use git send-email
- Matt just updated the wiki page to keep the essential. The goal,
once 'git review' is setup is to keep the same workflow as before with
'git send-email' but only this last command is replaced by 'git review'.
Next meeting:
- We propose to have it next Thursday, the 25th of April.
- Paolo and Davide will not be there (Liberation Day in Italy).
- Usual time: 16:00 UTC (9am PDT, 6pm CEST)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20190425
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 9 months
questions about mptcp-level locking
by Paolo Abeni
Hi all,
I just gave a better look at the current locking schema implemented in
the mptcp code, and found some things I could not explain/look like
bugs to me.
1) in mptcp_recvmsg(), just before the function return, both ssk and sk
socket lock are released. It looks like the named function can complete
even without not acquiring such locks. PROVE_LOCKING does not splat
because release_sock() is a no-op if the lock is not owned - thanks
Florian for noticing that. I'm unsure it the above is intentional, but
looks confusing.
2) msk->ack_seq is updated via atomic_read/_set(), AFAICS without any
serialization. Multiple concurrent threads can race while updating the
sequence number, currupting the msk status. Florian suggested we could
use cmpxchg instead of set(), but we agree that acquiring the socket
lock would be simpler and safer - unless we missed something ;)
3) Reading http://multipath-tcp.org/data/MultipathTCP-netsys.pdf
slide 64, and the MTPCP RFC it looks like windows size must be shared
for all subflows, but AFACS, the current code does not allow that. To
such extend we would need to read/update an MPTCP-level status while
reading/updating the subflow's window size.
With the current lock nesting, as imposed by mptcp_sendmsg() (msk lock
-> ssk lock), we can't do that, as window size computation happens in
the TCP core code, with ssk lock already held and no notion about msk
lock.
Managing the MPTCP pending xmit data - to implement level data
retransmission - will pose similar issues as we would probably need to
remove elements from the MPTCP pending data queue at TCP ack processing
time.
I *think* that inverting the locks order in mptcp_sendmsg() (and in
_recvmsg() - assuming we move to socket lock even there) should address
the above - allowing MPTCP hooks in the core TCP code to acquire the
msk lock.
Any comments welcome!
Thanks,
Paolo
1 year, 9 months
[PATCH mptcp-next 0/2] mptcp: two minor improvements
by Florian Westphal
First patch makes it so CONFIG_MPTCP=n build is same as
building a kernel with all mptcp patches reverted.
Second patch provides minor size reduction of tcp_sock struct
by closing a 6-byte hole in mptcp rx option struct.
As usual, feel free to squash them as needed.
1 year, 9 months
[Weekly meetings] MoM - 11th of April 2019
by Matthieu Baerts
Hello,
We just had our 46th meeting with Mat, Peter and Ossama (Intel OTC),
Christoph (Apple), Davide (Red Hat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Mat and Peter's patch-set:
- We had some discussions, mainly about ULP:
https://review.gerrithub.io/c/multipath-tcp/mptcp_net-next/+/450099/1
- Mat is looking at the warning reported by Paolo last week.
- Mat is concerned about the fact the userspace can currently open a
"subflow" ULP via setsockopt. It should be possible to revert the removal:
- ULP bits that restricted some ULP types to kernel-space-only
were removed in 1243a51f6c05ecbb2c5c9e02fdcc1e7a06f76f26
-
https://github.com/torvalds/linux/commit/1243a51f6c05ecbb2c5c9e02fdcc1e7a...
- except if there is another way?
- Mat is moving code from tcp options to mptcp options file. Also
cleaning the two-phases parsing.
- Peter looked at the different comments, addressing all of them
except to have a private header file which might require more work, can
be done later.
- thanks for the answers! and rework!
Paolo's new patch-set:
- https://lists.01.org/pipermail/mptcp/2019-April/001044.html
- A new patch-set has been shared to refactor the xmit path
- Use better page allocation for MPTCP. But hard part is the
collapsing / data going from one to another subflows, etc. MPTCP
specific problem.
- good to make it more clear, already think about how to handle
retransmissions/reinjections, etc. thanks!
- Paolo is traveling now but it seems he will be able to continue
working on that next week.
Florian's new patch-set:
- https://lists.01.org/pipermail/mptcp/2019-April/001061.html
- goal 1st patch: no impact if CONFIG_MPTCP=n
- goal 2nd patch: closing a hole (memory related of course...)
- good to think about that, thanks!
MPTCP.org:
- new version 0.94.4
- new branch mptcp_v0.95
- mptcp_trunk: rfc for MPTCPv1 (RFC6824bis):
https://sympa-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev/2019-04/msg00003.html
- for the upstream, the idea is to support only the v1
Next meeting:
- We propose to have it next Thursday, the 18th of April.
- Usual time: 16:00 UTC (9am PDT, 6pm CEST)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20190418
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 9 months
[RFC PATCH 0/4] mptcp: refactor xmit path
by Paolo Abeni
This series refactor the mptcp xmit path trying to make sendmsg behavior
more consistent, memory efficent and improve its performances.
This is an early draft, only lightly tested, shared early to avoid codying
conflict in this area.
The code is also available here:
https://github.com/pabeni/mptcp/tree/mptcp-proposal-devel
Paolo Abeni (4):
mptcp: use sk_page_frag() in sendmsg
mptcp: sendmsg() do spool all the provided data
mptcp: move tpext data_len initialization at option creation time
mptcp: allow collapsing consecutive sendpages on the same substream
include/net/mptcp.h | 1 +
net/mptcp/options.c | 2 +
net/mptcp/protocol.c | 160 ++++++++++++++++++++++++++-----------------
3 files changed, 100 insertions(+), 63 deletions(-)
--
2.20.1
1 year, 9 months
memory accounting issue with current mptcp-proposal branch
by Paolo Abeni
Hi all,
I just checked-out the mptcp-proposal and did some tests vs a debug build (CONFIG_KASAN and several CONFIG_*DEBUG* enabled).
I also have the local change reported at the bottom, just to made the self-test a little bit more stressing.
Running:
while true; do ./mptcp_connect.sh; done
I get the following splat after a few iterations:
[ 574.215874] WARNING: CPU: 2 PID: 3538 at net/core/stream.c:206 sk_stream_kill_queues+0x370/0x510
[ 574.217403] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw ata_generic virtio_net virtio_blk net_failover failover virtio_console ata_piix libata
[ 574.221141] CPU: 2 PID: 3538 Comm: mptcp_connect Not tainted 5.0.0.mptcp_dbg_b0afc0265a8b+ #10
[ 574.222219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 574.223781] RIP: 0010:sk_stream_kill_queues+0x370/0x510
[ 574.224432] Code: 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 84 c0 74 04 3c 03 7e 43 8b b3 40 02 00 00 eb 92 0f 0b 85 f6 74 be <0f> 0b 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b e9 ed fe
[ 574.226522] RSP: 0018:ffff8881033cfaf8 EFLAGS: 00010206
[ 574.227110] RAX: 0000000000000000 RBX: ffff88803581ee40 RCX: 0000000000000020
[ 574.227968] RDX: 1ffff11006b03e19 RSI: 0000000000000b00 RDI: ffff88803581f0c8
[ 574.228779] RBP: ffff88803581f080 R08: ffffed1021e7db10 R09: ffffed1021e7db0f
[ 574.229570] R10: ffffed1021e7db0f R11: ffff88810f3ed87b R12: ffffffff9cac4fa0
[ 574.230349] R13: ffff88803581efa8 R14: ffff88810b1a1700 R15: 00000000f8b7605b
[ 574.231205] FS: 00007f6446d5c740(0000) GS:ffff88810f200000(0000) knlGS:0000000000000000
[ 574.232219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 574.232942] CR2: 00000000008b3ad0 CR3: 0000000108d20002 CR4: 00000000001606e0
[ 574.233778] Call Trace:
[ 574.234093] inet_csk_destroy_sock+0x15c/0x3a0
[ 574.234607] tcp_rcv_state_process+0x1264/0x2391
[ 574.236230] tcp_v4_do_rcv+0x2bc/0x790
[ 574.236680] __release_sock+0x120/0x360
[ 574.237132] tcp_close+0x540/0xeb0
[ 574.237590] inet_release+0xdc/0x1c0
[ 574.238066] __sock_release+0x1d2/0x2a0
[ 574.238578] mptcp_close+0x182/0x380
[ 574.239066] inet_release+0xdc/0x1c0
[ 574.239494] __sock_release+0xc5/0x2a0
[ 574.239935] sock_close+0x11/0x20
[ 574.240343] __fput+0x261/0x800
[ 574.240734] task_work_run+0x10e/0x190
[ 574.241196] exit_to_usermode_loop+0x136/0x160
[ 574.241739] do_syscall_64+0x3c3/0x480
[ 574.242196] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 574.242881] RIP: 0033:0x7f64468667a0
[ 574.243339] Code: 00 64 c7 00 0d 00 00 00 b8 ff ff ff ff eb 90 b8 ff ff ff ff eb 89 0f 1f 40 00 83 3d fd c7 2d 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 9e c5 01 00 48 89 04 24
The warning is triggered by:
WARN_ON(sk->sk_forward_alloc);
so it looks like there is some memory accounting issue.
Cheers,
Paolo
--
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index ec64e95a6d35..2f22f3ae0e2f 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -36,6 +36,7 @@ SIZE=$((RANDOM % (1024 * 1024)))
if [ $SIZE -eq 0 ] ; then
SIZE=1
fi
+SIZE=$((SIZE * 100))
dd if=/dev/urandom of="$tmpin" bs=1 count=$SIZE 2> /dev/null
1 year, 9 months
[Weekly meetings] MoM - 4th of April 2019
by Matthieu Baerts
Hello,
We just had our 45th meeting with Mat, Peter and Ossama (Intel OTC),
Christoph (Apple), Paolo and Davide (Red Hat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Mat and Peter's patch set:
- Changes:
* Peter has modified the internal subflow sockets to use ULP
https://review.gerrithub.io/c/multipath-tcp/mptcp_net-next/+/450099/1
https://github.com/multipath-tcp/mptcp_net-next/commit/0500c9d7586e8fe73c...
* Fixed building without CONFIG_MPTCP
* Modified self-test commits to account for removal of
IPPROTO_SUBFLOW
- Review on Gerrit: don't hesitate to review and give a score ;-)
- ULP:
- Mat mentioned that for the moment it is possible to register a
subflow from userspace (setsockopt ULP).
- it was possible not to allow that before but this has been
removed recently → @Mat: feel free to share the link to this patch ;-)
- we should then re-add this support
- we should be able to share that with Eric soon:
- Mat want to do a last change not to send the SKB to the TCP
stack too soon.
- Paolo is reviewing the code and he has some changes to propose
related to mptcp_send_msg()
- there are maybe some memory allocating issues found by Paolo,
the trace/log can be shared on the ML.
- For the kselftests, we can squash them: Paolo is OK / @Florian ?
(it should be fine according to his colleagues :) )
Next steps:
- Review the architecture design code (Mat & Peter's patch-set)
- ideally on Gerrit:
https://review.gerrithub.io/q/project:multipath-tcp/mptcp_net-next+status...
- or on the ML if needed.
- Merge this code in our repo
- Prepare a cover-letter
- Send this to Eric for first review
- What is important and still need to be done: (and this would then
not be part of what we will send to Eric)
- MP_JOIN
- coupled received window
- IPv6
- DATA-FIN handling
- Once a first version of the architecture design code is
frozen, feel free to one of these items and share that on the ML ;-)
Next meeting:
- We propose to have it next Thursday, the 11th of April.
- Usual time: 16:00 UTC (9am PDT, 6pm CEST)
- Paolo might not be there but ok for him not to be there
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20190411
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 9 months