[PATCH v3 0/4] mptcp: disable mptcp when md5sig is set
by Paolo Abeni
As per last public mtg discussion, md5sig will cause TCP option space
exaustion. Without md5sig we can't exhaust the TCP option space.
This series explcitly disable MPTCP when md5sig is set, and cleanup
later option len checks with the assumption that TCP option space exhaustion
is not expected - add a single WARN_ON() for that.
v2 -> v3
- rebased
- fix mptcp_established_options() retvalue in patch 3
Paolo Abeni (1):
mptcp: move mp_capable initialization at subflow_init_req() start
mptcp: disable on req sk if MD5SIG is enabled
mptcp: warn once if exceeding tcp opt space for dss/mp_capable
mptcp: remove unneeded check in mptcp_established_options_mp()
--
2.21.0
1 year, 2 months
kselftests: IPv6 crash when testing end of part 2
by Matthieu Baerts
Hello,
Since the move of 3 commits from part 2 to part 3 (see "[GIT] reduce
patchset 2 (up to kselftests)" thread), the first IPv6 test, on the
commit introducing the kselftests, causes a crash.
It is easy to reproduce:
$ git checkout $(git log -1 --format=%H --grep "mptcp: add basic
kselftest for mptcp" origin/net-next..origin/export)
$ make -C tools/testing/selftests TARGETS=net/mptcp run_tests
## or if you have the scripts from the "scripts" branch:
$ ./Dockerfile.virtme.sh virtme.sh
Here is what I got:
ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP [ 59.569727] general
protection fault: 0000 [#1] SMP NOPTI
[ 59.570060] CPU: 0 PID: 354 Comm: mptcp_connect Not tainted
5.4.0-rc7+ #321
[ 59.570060] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[ 59.570060] RIP: 0010:__ipv6_sock_mc_close+0x82/0x100
[ 59.570060] Code: 4c 89 fa e8 d0 e3 ff ff 4d 85 ff 75 44 3e 41 83 2c
24 40 48 8d 7b 30 be 30 00 00 00 e8 47 b8 75 ff 48 8b 5d 58 48 85 db 74
34 <48> 8b 43 18 4c 89 f7 48 89 45 58 8b 73 10 e8 5b 9f ec ff 48 85 c0
[ 59.570060] RSP: 0018:ffffa8948021be20 EFLAGS: 00010206
[ 59.570060] RAX: 0000000000000001 RBX: 2185dbe0e23218d4 RCX:
0000000000000001
[ 59.570060] RDX: ffffa0c39edcd280 RSI: ffffa0c39e6b0380 RDI:
ffffffff9c8fde20
[ 59.570060] RBP: ffffa0c39df244f8 R08: 0000000000000000 R09:
0000000000000000
[ 59.570060] R10: ffffa0c39e6b0380 R11: ffffa0c39deb3f10 R12:
ffffa0c39df24140
[ 59.570060] R13: ffffa0c39df24000 R14: ffffa0c39ddb8000 R15:
ffffa0c39e676c00
[ 59.570060] FS: 00007f09acb80500(0000) GS:ffffa0c39f200000(0000)
knlGS:0000000000000000
[ 59.570060] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.570060] CR2: 00007fb168761e60 CR3: 000000001deda000 CR4:
00000000003406f0
[ 59.570060] Call Trace:
[ 59.570060] ipv6_sock_mc_close+0x37/0x40
[ 59.570060] inet6_release+0x16/0x30
[ 59.570060] __sock_release+0x38/0xb0
[ 59.570060] sock_close+0xc/0x10
[ 59.570060] __fput+0xb1/0x240
[ 59.570060] task_work_run+0x79/0xa0
[ 59.570060] exit_to_usermode_loop+0xa5/0xb0
[ 59.570060] do_syscall_64+0xf4/0x120
[ 59.570060] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.570060] RIP: 0033:0x7f09ac67f8d4
[ 59.570060] Code: eb 89 e8 cf 43 02 00 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 48 8d 05 31 00 2e 00 8b 00 85 c0 75 13 b8 03 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 3c f3 c3 66 90 53 89 fb 48 83 ec 10 e8 f4 fd
[ 59.570060] RSP: 002b:00007ffcf9398688 EFLAGS: 00000246 ORIG_RAX:
0000000000000003
[ 59.570060] RAX: 0000000000000000 RBX: 00007ffcf93986b8 RCX:
00007f09ac67f8d4
[ 59.570060] RDX: 0000000000000000 RSI: 00007ffcf939a6c0 RDI:
0000000000000003
[ 59.570060] RBP: 0000000000000003 R08: 00007f09ac95a20c R09:
00007f09ac95a240
[ 59.570060] R10: fffffffffffff7e5 R11: 0000000000000246 R12:
00007ffcf939a6c0
[ 59.570060] R13: 0000000000000000 R14: 0000000000000000 R15:
000000000000008f
[ 59.570060] Modules linked in:
[ 60.149224] ---[ end trace 5333410f2d3e653b ]---
[ 60.162408] RIP: 0010:__ipv6_sock_mc_close+0x82/0x100
[ 60.176519] Code: 4c 89 fa e8 d0 e3 ff ff 4d 85 ff 75 44 3e 41 83 2c
24 40 48 8d 7b 30 be 30 00 00 00 e8 47 b8 75 ff 48 8b 5d 58 48 85 db 74
34 <48> 8b 43 18 4c 89 f7 48 89 45 58 8b 73 10 e8 5b 9f ec ff 48 85 c0
[ 60.227674] RSP: 0018:ffffa8948021be20 EFLAGS: 00010206
[ 60.242295] RAX: 0000000000000001 RBX: 2185dbe0e23218d4 RCX:
0000000000000001
[ 60.262192] RDX: ffffa0c39edcd280 RSI: ffffa0c39e6b0380 RDI:
ffffffff9c8fde20
[ 60.281343] RBP: ffffa0c39df244f8 R08: 0000000000000000 R09:
0000000000000000
[ 60.300763] R10: ffffa0c39e6b0380 R11: ffffa0c39deb3f10 R12:
ffffa0c39df24140
[ 60.320332] R13: ffffa0c39df24000 R14: ffffa0c39ddb8000 R15:
ffffa0c39e676c00
[ 60.339678] FS: 00007f09acb80500(0000) GS:ffffa0c39f200000(0000)
knlGS:0000000000000000
[ 60.361732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 60.377509] CR2: 00007fb168761e60 CR3: 000000001deda000 CR4:
00000000003406f0
./mptcp_connect.sh: line 360: 354 Segmentation fault ip netns
exec ${connector_ns} ./mptcp_connect -t $timeout -p $port -s ${cl_proto}
$connect_addr < "$cin" > "$cout"
(duration 980ms) [ FAIL ] client exit code 139, server 0
\nnetns ns1-5dd5919f-pDmJLu socket stat for 10003:
State Recv-Q Send-Q Local
Address:Port Peer Address:Port
TIME-WAIT 0 0
[dead:beef:1::1]:10003 [dead:beef:1::1]:36930
timer:(timewait,58sec,0) cwnd:2 reordering:0
\nnetns ns1-5dd5919f-pDmJLu socket stat for 10003:
State Recv-Q Send-Q Local
Address:Port Peer Address:Port
FAIL: Could not even run loopback v6 test
Is there someone who would like to have a look? :-)
Thank you!
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 2 months
[GIT] reduce patchset 2 (up to kselftests)
by Matthieu Baerts
Hello,
As discussed at a previous meeting, I just moved some commits linked to
the manipulation of multiple subflows after the one introducing the
kselftests.
Just to keep a trace and to allow a review, I did this operation with
TopGit. But for this kind of operation, if we don't need a "review",
that's a bit quicker with "git rebase" :)
- 3e77653ecb5f: cut (empty) t/mptcp-Add-path-manager-interface
- 1cb01298549d: cut (empty) t/mptcp-Add-ADD_ADDR-handling
- 25c9462ee09f: cut (empty)
t/mptcp-Add-handling-of-incoming-MP_JOIN-requests
- 2a1ef2512c28: conflict (Makefile) in
t/mptcp-new-sysctl-to-control-the-activation-per-NS
- 152ef156123f: tg create t/mptcp-Add-path-manager-interface-v2
- 4569c65e2c58: paste (recreate) t/mptcp-Add-path-manager-interface
- 3cc80d22d759: tg create t/mptcp-Add-ADD_ADDR-handling-v2
- 7928974b360d: paste (recreate) t/mptcp-Add-ADD_ADDR-handling
- 5d24692b5191: tg create
t/mptcp-Add-handling-of-incoming-MP_JOIN-requests-v2
- faf79ec33f1e: paste (recreate)
t/mptcp-Add-handling-of-incoming-MP_JOIN-requests
- d4fd5437d11b: t/mptcp-Add-handling-of-outgoing-MP_JOIN-requests is now
on top of t/mptcp-Add-handling-of-incoming-MP_JOIN-requests-v2
- a4c1cf31bee5: conflict (Makefile) in
t/mptcp-allow-dumping-subflow-context-to-userspace
- abe60f261c3d: conflict (Makefile) in
t/mptcp-add-MIB-counter-infrastructure
- f1e5ff4a68a1: conflict (Makefile) in t/mptcp-Implement-basic-path-manager
- e73eef882a05..eaa68661fa8c: result (only the order in the Makefile)
The only issue is that now the kselftests with IPv6 no longer works when
we launch them in the commit that introduces the kselftests :'(
It is easy to reproduce and I have a calltrace but I will send a
separated email for that.
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 2 months
Feedback to the IETF
by Christoph Paasch
Hello,
I have mentioned to the IETF that the upstreaming effort brought forward some feedback,... to the IETF-draft.
Now, the draft is about to become an RFC very soon. Currently it is in the "RFC-Editor" stage, which is the last stage before official publication.
Ideally, we should get feedback to the IETF as soon as possible so that the working-group can make adjustements before publication. Otherwise, it would need to got through an "Errata" process which means to correct the published RFC.
So, let me try to compile a list of things that came up. Please correct or add anything:
Clarifying that the reception of a DATA_ACK means that the server successfully received the MP_CAPABLE (Text "If B has data to send first, then the reliable delivery of the ACK can be inferred by the receipt of this data with an MPTCP DATA_ACK inside the DSS option (Section 3.3).)
Should we vouch to disallow early or late DSS mappings that are covering a TCP-sequence space different than the packet they are being sent on (cfr., our discussion on/around September 25th)
ADD_ADDR-option size - last week we discussed this briefly and I replied by mail that the size-problem is actually for the non-Echo option. In that one there is no way to get around it and all the bits need to be presented. Do you agree on that?
Anything else?
Cheers,
Christoph
1 year, 2 months
[PATCH 00/10] selftests: IPv6 by default and time reduction
by Matthieu Baerts
Here are a few improvements in the selftests. Nothing complex, some
details to have the results more readable (I think), IPv6 enabled by
default, the time of the execution is reduced (around 30 seconds now
with the same launched tests), and the script is more robust for the
future.
I hope we can apply the one before the last one: IPV6 is now forced in
the config, we can then always apply the IPv6 network config and remove
a lot of 'if $ipv6' in the code.
For the last one, I guess we should not apply it now because some tests
are really slow in some conditions. More details in the patch and a
separated email has been sent.
All the patches can be squashed in the commit introducing the selftests:
mptcp: add basic kselftest for mptcp
Matthieu Baerts (10):
selftests:mptcp: fix typo
selftests:mptcp: enable v6 by default
selftests:mptcp: align v4 and v6 results
selftests:mptcp: reduce wait time for the listen
selftests:mptcp: reduce time to wait for DAD
selftests:mptcp: only display the NS id
selftests:mptcp: avoid global var clash
selftests:mptcp: do not reset "ret" when skipping
selftests:mptcp: force MPTCP_IPV6 config
[RFC] selftests:mptcp: decrease timeout to 100 sec
tools/testing/selftests/net/mptcp/config | 1 +
.../selftests/net/mptcp/mptcp_connect.sh | 245 +++++++++++-------
tools/testing/selftests/net/mptcp/settings | 2 +-
3 files changed, 147 insertions(+), 101 deletions(-)
--
2.24.0
1 year, 2 months
Warn + crash in mptcp_sendmsg_frag()
by Matthieu Baerts
Hello,
When testing some changes in the kselftests area, I initially saw the
message:
main_loop_s: timed out
in the middle of the tests. But just after, there was a "[ OK ]".
Anybody saw that?
I tried to understand why because the binary should returns 2 if the
message is printed and the shell script should catch the error. Then I
added debug in the script and launched it in a while-loop. I didn't get
the initial error I was looking for but instead I got a WARN followed by
an OOPS:
+ addr_port=10.0.3.1:10030
+ printf '%-4s %-5s -> %-4s (%-20s) %-5s\t' ns3-5dcee2db-MqSsUi MPTCP
ns4-5dcee2db-MqSsUi 10.0.3.1:10030 MPTCP
ns3-5dcee2db-MqSsUi MPTCP -> ns4-5dcee2db-MqSsUi (10.0.3.1:10030 )
MPTCP
+ false
+ spid=28842
+ ip netns exec ns4-5dcee2db-MqSsUi ./mptcp_connect -t 30 -l -p 10030 -s
MPTCP 0.0.0.0
+ sleep 1
++ date +%s%3N
+ start=1573839651723
+ cpid=28845
+ ip netns exec ns3-5dcee2db-MqSsUi ./mptcp_connect -t 30 -p 10030 -s
MPTCP 10.0.3.1
+ wait 28845
[90317.510635] ------------[ cut here ]------------
[90317.511028] WARNING: CPU: 0 PID: 28845 at net/mptcp/protocol.c:317
mptcp_sendmsg_frag+0x51b/0x570
[90317.511028] Modules linked in:
[90317.511028] CPU: 0 PID: 28845 Comm: mptcp_connect Not tainted
5.4.0-rc6+ #316
[90317.511028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[90317.511028] RIP: 0010:mptcp_sendmsg_frag+0x51b/0x570
[90317.511028] Code: fc ff ff 49 63 53 18 49 03 53 10 48 3b 95 58 05 00
00 0f 85 0c fe ff ff c6 44 24 3f 01 c7 44 24 38 00 00 00 00 e9 5a fe ff
ff <0f> 0b eb b0 48 8b 7c 24 48 4c 89 ce 44 89 44 24 30 48 29 ce 4c 89
[90317.511028] RSP: 0018:ffff979e40287cc8 EFLAGS: 00010246
[90317.511028] RAX: ffff921f1de56200 RBX: ffff921f1d430900 RCX:
0000000000001fa0
[90317.511028] RDX: 0000000000001fa0 RSI: 0000000000000000 RDI:
ffff921f1d430900
[90317.511028] RBP: ffff921f1d451380 R08: 0000000000001fa0 R09:
0000000000001fa0
[90317.511028] R10: ffff921f1dc8eec0 R11: ffff921f1de58000 R12:
ffff921f1d4518d8
[90317.511028] R13: ffff921f1de56200 R14: ffff921f1edcf6d0 R15:
0000000000001fa0
[90317.511028] FS: 00007f53fa681500(0000) GS:ffff921f1f200000(0000)
knlGS:0000000000000000
[90317.511028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[90317.511028] CR2: 00007f049c5dfe60 CR3: 000000001dc30000 CR4:
00000000003406f0
[90317.511028] Call Trace:
[90317.511028] mptcp_sendmsg+0x15b/0x230
[90317.511028] sock_sendmsg+0x4f/0x60
[90317.511028] sock_write_iter+0x8a/0xf0
[90317.511028] new_sync_write+0x116/0x1b0
[90317.511028] vfs_write+0xa8/0x1a0
[90317.511028] ksys_write+0x9c/0xd0
[90317.511028] do_syscall_64+0x43/0x120
[90317.511028] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[90317.511028] RIP: 0033:0x7f53fa180154
[90317.511028] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[90317.511028] RSP: 002b:00007ffd5f71be78 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[90317.511028] RAX: ffffffffffffffda RBX: 00007ffd5f71bea8 RCX:
00007f53fa180154
[90317.511028] RDX: 0000000000002000 RSI: 00007ffd5f71beb0 RDI:
0000000000000003
[90317.511028] RBP: 0000000000000003 R08: 00007f53fa45b1d0 R09:
00007f53fa45b240
[90317.511028] R10: 00007f53fa1dad30 R11: 0000000000000246 R12:
00007ffd5f71deb0
[90317.511028] R13: 0000000000002000 R14: 0000000000002000 R15:
0000000000002000
[90317.511028] ---[ end trace 6d6ab3e3a5c67aed ]---
[90318.041976] BUG: kernel NULL pointer dereference, address:
0000000000000014
[90318.042798] #PF: supervisor write access in kernel mode
[90318.042798] #PF: error_code(0x0002) - not-present page
[90318.042798] PGD 0 P4D 0
[90318.042798] Oops: 0002 [#1] SMP NOPTI
[90318.042798] CPU: 0 PID: 28845 Comm: mptcp_connect Tainted: G W
5.4.0-rc6+ #316
[90318.042798] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[90318.042798] RIP: 0010:mptcp_sendmsg_frag+0x4d4/0x570
[90318.042798] Code: 00 e9 72 fc ff ff 48 c7 44 24 20 00 00 00 00 41 80
4d 37 02 c6 44 24 1c 00 e9 0f fc ff ff 80 7c 24 1c 00 74 4c 48 8b 44 24
20 <66> 44 01 78 14 48 83 3c 24 00 0f 85 d4 fc ff ff 45 01 46 08 e9 cb
[90318.042798] RSP: 0018:ffff979e40287cc8 EFLAGS: 00010246
[90318.042798] RAX: 0000000000000000 RBX: ffff921f1d430900 RCX:
0000000000001fa0
[90318.042798] RDX: 0000000000001fa0 RSI: 0000000000000000 RDI:
ffff921f1d430900
[90318.042798] RBP: ffff921f1d451380 R08: 0000000000001fa0 R09:
0000000000001fa0
[90318.042798] R10: ffff921f1dc8eec0 R11: ffff921f1de58000 R12:
ffff921f1d4518d8
[90318.042798] R13: ffff921f1de56200 R14: ffff921f1edcf6d0 R15:
0000000000001fa0
[90318.042798] FS: 00007f53fa681500(0000) GS:ffff921f1f200000(0000)
knlGS:0000000000000000
[90318.042798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[90318.042798] CR2: 0000000000000014 CR3: 000000001dc30000 CR4:
00000000003406f0
[90318.042798] Call Trace:
[90318.042798] mptcp_sendmsg+0x15b/0x230
[90318.042798] sock_sendmsg+0x4f/0x60
[90318.042798] sock_write_iter+0x8a/0xf0
[90318.042798] new_sync_write+0x116/0x1b0
[90318.042798] vfs_write+0xa8/0x1a0
[90318.042798] ksys_write+0x9c/0xd0
[90318.042798] do_syscall_64+0x43/0x120
[90318.042798] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[90318.042798] RIP: 0033:0x7f53fa180154
[90318.042798] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[90318.042798] RSP: 002b:00007ffd5f71be78 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[90318.042798] RAX: ffffffffffffffda RBX: 00007ffd5f71bea8 RCX:
00007f53fa180154
[90318.042798] RDX: 0000000000002000 RSI: 00007ffd5f71beb0 RDI:
0000000000000003
[90318.042798] RBP: 0000000000000003 R08: 00007f53fa45b1d0 R09:
00007f53fa45b240
[90318.042798] R10: 00007f53fa1dad30 R11: 0000000000000246 R12:
00007ffd5f71deb0
[90318.042798] R13: 0000000000002000 R14: 0000000000002000 R15:
0000000000002000
[90318.042798] Modules linked in:
[90318.042798] CR2: 0000000000000014
[90318.042798] ---[ end trace 6d6ab3e3a5c67aee ]---
[90318.042798] RIP: 0010:mptcp_sendmsg_frag+0x4d4/0x570
[90318.042798] Code: 00 e9 72 fc ff ff 48 c7 44 24 20 00 00 00 00 41 80
4d 37 02 c6 44 24 1c 00 e9 0f fc ff ff 80 7c 24 1c 00 74 4c 48 8b 44 24
20 <66> 44 01 78 14 48 83 3c 24 00 0f 85 d4 fc ff ff 45 01 46 08 e9 cb
[90318.042798] RSP: 0018:ffff979e40287cc8 EFLAGS: 00010246
[90318.042798] RAX: 0000000000000000 RBX: ffff921f1d430900 RCX:
0000000000001fa0
[90318.042798] RDX: 0000000000001fa0 RSI: 0000000000000000 RDI:
ffff921f1d430900
[90318.042798] RBP: ffff921f1d451380 R08: 0000000000001fa0 R09:
0000000000001fa0
[90318.042798] R10: ffff921f1dc8eec0 R11: ffff921f1de58000 R12:
ffff921f1d4518d8
[90318.042798] R13: ffff921f1de56200 R14: ffff921f1edcf6d0 R15:
0000000000001fa0
[90318.042798] FS: 00007f53fa681500(0000) GS:ffff921f1f200000(0000)
knlGS:0000000000000000
[90318.042798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[90318.042798] CR2: 0000000000000014 CR3: 000000001dc30000 CR4:
00000000003406f0
Of course, in parallel I did other stuffs, I no longer have the proper
vmlinux and my MPTCP VM was not recent...
I suspect that "skb" is NULL and we then have an issue with this chunk:
collapsed = skb == tcp_write_queue_tail(ssk);
if (collapsed) {
WARN_ON_ONCE(!can_collapse);
/* when collapsing mpext always exists */
mpext->data_len += ret;
goto out;
}
Please also note that earlier in the function, skb is initialised with:
skb = tcp_write_queue_tail(ssk);
So I guess collapsed is always true, no?
If skb is NULL, mpext is NULL too (even if it could unlikely also be
even if skb is no NULL but sounds more like a bug I guess) and
can_collapse is False.
I prefer to share this now as I might not be able to look at this before
a few days, being off for a few days from now. Does someone interested
by looking at this? :)
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 2 months
[PATCH v2] mptcp: Basic single-subflow DATA_FIN
by Mat Martineau
Send a DATA_FIN along with any subflow TCP FIN flag when the MPTCP
socket is closing or shutting down writes.
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
---
Changes from v1:
* Fixed problem with the receive side truncating data. The issue happened
when a DSS mapping for DATA_FIN was found on a data segment, where the
data in the packet was already covered by an earlier mapping.
* Only send DATA_FIN when the subflow is sending a FIN and the MPTCP
socket is no longer in the TCP_ESTABLISHED state. This prevents sending
DATA_FIN when an individual subflow is removed but the MPTCP-level
connection is kept alive.
* Changes to warnings and comments suggested in code review.
net/mptcp/options.c | 40 +++++++++++++++++++++++++++++++++++++---
net/mptcp/protocol.c | 4 ++++
net/mptcp/subflow.c | 42 ++++++++++++++++++++++++++++--------------
3 files changed, 69 insertions(+), 17 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 9a18a3670cdf..f8dd8b2a4785 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -384,18 +384,48 @@ static bool mptcp_established_options_mp(struct sock *sk, unsigned int *size,
return false;
}
+static void mptcp_write_data_fin(struct mptcp_subflow_context *subflow,
+ struct mptcp_ext *ext)
+{
+ ext->data_fin = 1;
+
+ if (!ext->use_map) {
+ /* RFC6824 requires a DSS mapping with specific values
+ * if DATA_FIN is set but no data payload is mapped
+ */
+ ext->use_map = 1;
+ ext->dsn64 = 1;
+ ext->data_seq = mptcp_sk(subflow->conn)->write_seq;
+ ext->subflow_seq = 0;
+ ext->data_len = 1;
+ } else {
+ /* If there's an existing DSS mapping, DATA_FIN consumes
+ * 1 additional byte of mapping space.
+ */
+ ext->data_len++;
+ }
+}
+
static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
unsigned int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
+ struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
unsigned int dss_size = 0;
struct mptcp_ext *mpext;
unsigned int ack_size;
+ u8 tcp_fin;
- mpext = skb ? mptcp_get_ext(skb) : NULL;
+ if (skb) {
+ mpext = mptcp_get_ext(skb);
+ tcp_fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
+ } else {
+ mpext = NULL;
+ tcp_fin = 0;
+ }
- if (!skb || (mpext && mpext->use_map)) {
+ if (!skb || (mpext && mpext->use_map) || tcp_fin) {
unsigned int map_size;
map_size = TCPOLEN_MPTCP_DSS_BASE + TCPOLEN_MPTCP_DSS_MAP64;
@@ -405,6 +435,10 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
dss_size = map_size;
if (mpext)
opts->ext_copy = *mpext;
+
+ if (skb && tcp_fin &&
+ subflow->conn->sk_state != TCP_ESTABLISHED)
+ mptcp_write_data_fin(subflow, &opts->ext_copy);
} else {
opts->ext_copy.use_map = 0;
WARN_ONCE(1, "MPTCP: Map dropped");
@@ -422,7 +456,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
dss_size += ack_size;
- msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn);
+ msk = mptcp_sk(subflow->conn);
if (msk) {
opts->ext_copy.data_ack = msk->ack_seq;
} else {
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1a432abfb176..910cf26037b7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1284,6 +1284,10 @@ static int mptcp_shutdown(struct socket *sock, int how)
pr_debug("sk=%p, how=%d", msk, how);
lock_sock(sock->sk);
+
+ if (how == SHUT_WR || how == SHUT_RDWR)
+ inet_sk_state_store(sock->sk, TCP_FIN_WAIT1);
+
ssock = __mptcp_fallback_get_ref(msk);
if (ssock) {
release_sock(sock->sk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index ff38d54392cd..89e6533c97b6 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -422,6 +422,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_ext *mpext;
struct sk_buff *skb;
+ u16 data_len;
u64 map_seq;
skb = skb_peek(&ssk->sk_receive_queue);
@@ -446,26 +447,39 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
if (!subflow->map_valid)
return MAPPING_INVALID;
+
goto validate_seq;
}
- pr_debug("seq=%llu is64=%d ssn=%u data_len=%u", mpext->data_seq,
- mpext->dsn64, mpext->subflow_seq, mpext->data_len);
+ pr_debug("seq=%llu is64=%d ssn=%u data_len=%u data_fin=%d",
+ mpext->data_seq, mpext->dsn64, mpext->subflow_seq,
+ mpext->data_len, mpext->data_fin);
- if (mpext->data_len == 0) {
+ data_len = mpext->data_len;
+ if (data_len == 0) {
pr_err("Infinite mapping not handled");
MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPRX);
return MAPPING_INVALID;
- } else if (mpext->subflow_seq == 0 &&
- mpext->data_fin == 1) {
- if (WARN_ON_ONCE(mpext->data_len != 1))
- return false;
+ }
- /* do not try hard to handle this any better, till we have
- * real data_fin support
- */
- pr_debug("DATA_FIN with no payload");
- return MAPPING_DATA_FIN;
+ if (mpext->data_fin == 1) {
+ if (data_len == 1) {
+ pr_debug("DATA_FIN with no payload");
+ if (subflow->map_valid) {
+ /* A DATA_FIN might arrive in a DSS
+ * option before the previous mapping
+ * has been fully consumed. Continue
+ * handling the existing mapping.
+ */
+ skb_ext_del(skb, SKB_EXT_MPTCP);
+ return MAPPING_OK;
+ } else {
+ return MAPPING_DATA_FIN;
+ }
+ }
+
+ /* Adjust for DATA_FIN using 1 byte of sequence space */
+ data_len--;
}
if (!mpext->dsn64) {
@@ -480,7 +494,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
/* Allow replacing only with an identical map */
if (subflow->map_seq == map_seq &&
subflow->map_subflow_seq == mpext->subflow_seq &&
- subflow->map_data_len == mpext->data_len) {
+ subflow->map_data_len == data_len) {
skb_ext_del(skb, SKB_EXT_MPTCP);
return MAPPING_OK;
}
@@ -499,7 +513,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
subflow->map_seq = map_seq;
subflow->map_subflow_seq = mpext->subflow_seq;
- subflow->map_data_len = mpext->data_len;
+ subflow->map_data_len = data_len;
subflow->map_valid = 1;
pr_debug("new map seq=%llu subflow_seq=%u data_len=%u",
subflow->map_seq, subflow->map_subflow_seq,
--
2.24.0
1 year, 2 months
[RFC] mptcp: wmem accounting and nonblocking io support
by Florian Westphal
This (large, sigh) series fixes poll handling in mptcp.
The first patch extends the test suite with a mmap-based mode to
check large, blocking writes. This uncovered a minor problem with the
earlier v2 wmem accounting patch series -- we would happily take a lot
more data than sndbuf allowed, as we only limited based on what the subflow
could accept. So with a 4k sndbuf we could easily accept 256kb or even more.
This patch doesn't change anything in the test suite behaviour however,
you need to use "-b 4096" and/or "-m mmap" to enable this mode.
Second patch changes test suite to move to nonblocking io, this breaks mptcp
because mptcp_poll can signal EPOLLIN when it shouldn't, so userspace gets
-EAGAIN even though poll told it otherwise.
Patches 3/4/5/6 are an update vs. last wmem accounting series.
Remaining patches fix the nonblocking io behaviour.
mptcp_poll is made to be stand-alone, i.e. it no longer calls
__tcp_poll on the subflow sockets and only considers mptcp_sk state.
After this series the selftest works again and mptcp sk rtx queue is
limited by msk wmem.
The patches can't easily be rebased/merged so I propose that I would
squash this myself and send a pull request when done.
The following changes since commit d1dbb32dc58df543e89f4004c1a0b96fe8acf99b:
subflow: wake parent mptcp socket on subflow state change (2019-11-14 13:03:44 +0000)
are available in the Git repository at:
git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_06
for you to fetch changes up to fa583a84bcf9550783ac6a229ab584d68764659e:
sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-14 17:56:13 +0100)
----------------------------------------------------------------
Florian Westphal (14):
selftest: add mmap-write support
selftests: make sockets non-blocking for default poll mode
mptcp: add wmem_queued accounting
mptcp: allow partial cleaning of rtx head dfrag
mptcp: add and use mptcp RTX flag
sendmsg: block until mptcp sk is writeable
subflow: sk_data_ready: make wakeup on tcp sock conditional
mptcp: add and use mptcp_subflow_get_retrans
mptcp: sendmsg: transmit on backup if other subflows have been closed
recv: make DATA_READY reflect ssk in-sequence state
sendmsg: clear SEND_SPACE if write caused wmem to grow too large
mptcp_poll: don't consider subflow socket state anymore
sendmsg: don't restart mptcp_sendmsg_frag
sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace
net/mptcp/options.c | 2 +-
net/mptcp/protocol.c | 295 ++++++++++++++++-----
net/mptcp/protocol.h | 4 +-
net/mptcp/subflow.c | 12 +-
tools/testing/selftests/net/mptcp/mptcp_connect.c | 268 ++++++++++++++++++-
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 36 ++-
6 files changed, 544 insertions(+), 73 deletions(-)
1 year, 2 months