mptcp diag interface weirdness
by Paolo Abeni
hi all,
while looking at:
https://github.com/multipath-tcp/mptcp_net-next/issues/68
I noticed we have some very unnice behavior with kernel
without 3f935c75eb52dd968351dba824adf466fb9c9429, specifically:
ss fills 'sdiag_protocol' with MPTCP_PROTOCOL; since 'sdiag_protocol'
is 1 byte long, it will become 'TCP_PROTOCOL'. Then adds
INET_DIAG_REQ_PROTOCOL really containing 'MPTCP_PROTOCOL'.
Since the older kernel does not know 'INET_DIAG_REQ_PROTOCOL', it will
ignore it and will call the tcp diag, which in turn will look into the
TCP table. Even the filter will usually match, as we will likely have a
TCP subflow matching the MPTCP socket 4 tuple.
In the end 'ss' will get back a TCP diag info, but will interpret it as
an MPTCP one, dumping quite weird values.
The only solution I can think is changing 'ss' to
override 'sdiag_protocol' with 255 when setting INET_DIAG_REQ_PROTOCOL.
255 is reserver from IANA, so no diag handler is expected, and the
kernel should report (correctly) a protocol unknown error.
WDYT?
Thanks,
Paolo
p.s. all the above will not fix (yet) issues/68
1 year, 11 months
[PATCH net] mptcp: be careful on subflow creation
by Paolo Abeni
Nicolas reported the following oops:
[ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 1521.394189] #PF: supervisor read access in kernel mode
[ 1521.395376] #PF: error_code(0x0000) - not-present page
[ 1521.396607] PGD 0 P4D 0
[ 1521.397156] Oops: 0000 [#1] SMP PTI
[ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109
[ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 1521.401728] Workqueue: events mptcp_worker
[ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0
[ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b
[ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246
[ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000
[ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80
[ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00
[ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0
[ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000
[ 1521.417592] FS: 0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000
[ 1521.419490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0
[ 1521.422511] Call Trace:
[ 1521.423103] __mptcp_subflow_connect+0x94/0x1f0
[ 1521.425376] mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0
[ 1521.426736] mptcp_worker+0x31b/0x390
[ 1521.431324] process_one_work+0x1fc/0x3f0
[ 1521.432268] worker_thread+0x2d/0x3b0
[ 1521.434197] kthread+0x117/0x130
[ 1521.435783] ret_from_fork+0x22/0x30
on some unconventional configuration.
The MPTCP protocol is trying to create a subflow for an
unaccepted server socket. That is allowed by the RFC, even
if subflow creation will likely fail.
Unaccepted sockets have still a NULL sk_socket field,
avoid the issue by failing earlier.
Reported-and-tested-by: Nicolas Rybowski <nicolas.rybowski(a)tessares.net>
Fixes: 7d14b0d2b9b3 ("mptcp: set correct vfs info for subflows")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/subflow.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 3838a0b3a21f..3c31a8160f19 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1032,6 +1032,12 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock)
struct socket *sf;
int err;
+ /* un-accepted server sockets can reach here - on bad configuration
+ * bail early to avoid greater trouble later
+ */
+ if (unlikely(!sk->sk_socket))
+ return -EINVAL;
+
err = sock_create_kern(net, sk->sk_family, SOCK_STREAM, IPPROTO_TCP,
&sf);
if (err)
--
2.26.2
1 year, 11 months
[RFC PATCH 00/12] mptcp: multiple xmit substreams support
by Paolo Abeni
This is an early RFC to gather feedback and comments on the current status.
Needs the bugfix patch I sent before to avoid exploding badly on the first
packet - can still explode after a few ones.
It refactor send space notifications, introduces OoO handling via RBtree,
sndbuf autotuning, allows the PM to create non backup subflows and finally
a basic scheduler and some self-tests.
The pain point is the in-window check:
on the receiver side msk rcv window is set at tcp_space(msk) - which should be
a quite rough over-estimante of a more correct value.
on the sender side no limit is imposed on the xmitted sequence number, except
the one given by the sndbuf. The msk sndbuf is autotuned to the subflows
largest sndbuf size.
With all the above I observe several out of [MPTCP] window on the RX side, to
the point that if affects the bandwidth (self-tests fail, as they basically
looks at virtual link utilization).
Any comment more than welcome, especially about better mptcp window checks.
This has been quite painful, so I would propose to consider accepting on
export branch even a suboptimal version and then improve incrementally.
Paolo Abeni (12):
mptcp: msk is writable according to msk write space
mptcp: set data_ready status bit in subflow_check_data_avail()
mptcp: trigger msk processing even for OoO data
mptcp: basic sndbuf autotuning
mptcp: introduce and use mptcp_try_coalesce()
mptcp: move ooo skbs into msk out of order queue.
mptcp: cleanup mptcp_subflow_discard_data()
mptcp: add OoO related mibs
mptcp: move address attribute into mptcp_addr_info
mptcp: allow creating non-backup subflows
mptcp: allow picking different xmit subflows
mptcp: simult flow self-tests
net/mptcp/mib.c | 5 +
net/mptcp/mib.h | 5 +
net/mptcp/pm_netlink.c | 38 +-
net/mptcp/protocol.c | 459 ++++++++++++++----
net/mptcp/protocol.h | 18 +-
net/mptcp/subflow.c | 91 ++--
.../selftests/net/mptcp/simult_flows.sh | 290 +++++++++++
7 files changed, 743 insertions(+), 163 deletions(-)
create mode 100755 tools/testing/selftests/net/mptcp/simult_flows.sh
--
2.26.2
1 year, 11 months
[PATCH mptcp-next] selftests/mptcp: Better delay & reordering configuration
by Christoph Paasch
The delay was intended to be configured to "simulate" a high(er) BDP
link. As such, it needs to be set as part of the loss-configuration and
not as part of the netem reordering configuration.
The reordering-config also requires a delay but that delay is the
reordering-extend. So, a good approach is to set the reordering-extend
as a function of the configured latency. E.g., 25% of the overall
latency.
Finally, the intention of tc_reorder was that when it is unset, the test
picks a random configuration. However, currently it is always initialized
and thus the random config won't be picked up.
Github-issue: https://github.com/multipath-tcp/mptcp_net-next/issues/6
Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
---
Notes:
Admittedly, two changes here in this patch (delay-fix and unitializing
tc_reorder). If you want, I can split them but I thought that's overkill for
a selftest-patch.
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 6260520674d0..d29d189d1ae5 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -16,7 +16,6 @@ ipv6=true
ethtool_random_on=true
tc_delay="$((RANDOM%400))"
tc_loss=$((RANDOM%101))
-tc_reorder=""
testmode=""
sndbuf=0
rcvbuf=0
@@ -631,22 +630,24 @@ for sender in "$ns1" "$ns2" "$ns3" "$ns4";do
do_ping "$ns4" $sender dead:beef:3::1
done
-[ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss
+[ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss delay ${tc_delay}ms
echo -n "INFO: Using loss of $tc_loss "
test "$tc_delay" -gt 0 && echo -n "delay $tc_delay ms "
+reorder_delay=`expr $tc_delay / 4`
+
if [ -z "${tc_reorder}" ]; then
reorder1=$((RANDOM%10))
reorder1=$((100 - reorder1))
reorder2=$((RANDOM%100))
- if [ $tc_delay -gt 0 ] && [ $reorder1 -lt 100 ] && [ $reorder2 -gt 0 ]; then
+ if [ $reorder_delay -gt 0 ] && [ $reorder1 -lt 100 ] && [ $reorder2 -gt 0 ]; then
tc_reorder="reorder ${reorder1}% ${reorder2}%"
echo -n "$tc_reorder "
fi
elif [ "$tc_reorder" = "0" ];then
tc_reorder=""
-elif [ "$tc_delay" -gt 0 ];then
+elif [ "$reorder_delay" -gt 0 ];then
# reordering requires some delay
tc_reorder="reorder $tc_reorder"
echo -n "$tc_reorder "
@@ -654,7 +655,7 @@ fi
echo "on ns3eth4"
-tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${tc_delay}ms $tc_reorder
+tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${reorder_delay}ms $tc_reorder
for sender in $ns1 $ns2 $ns3 $ns4;do
run_tests_lo "$ns1" "$sender" 10.0.1.1 1
--
2.23.0
1 year, 11 months
[PATCH mptcp-next 0/4] bpf: add MPTCP subflow support
by Nicolas Rybowski
Previously it was not possible to make a distinction between plain TCP
sockets and MPTCP subflow sockets on the BPF_PROG_TYPE_SOCK_OPS hook.
This patch series now enables a fine control of subflow sockets. In its
current state, it allows to put different sockops on each subflow from a
same MPTCP connection (socket mark, TCP congestion algorithm, ...) using
BPF programs.
It should also be the basis of exposing MPTCP-specific fields through BPF.
There is still an open question on patch 3, any comment is welcome.
Nicolas Rybowski (4):
bpf: expose is_mptcp flag to bpf_tcp_sock
mptcp: attach subflow socket to parent cgroup
mptcp: moving struct definitions
bpf: adding 'bpf_mptcp_sock' structure and helper
include/linux/bpf.h | 33 +++++++++
include/net/mptcp.h | 132 +++++++++++++++++++++++++++++++++
include/uapi/linux/bpf.h | 14 ++++
kernel/bpf/verifier.c | 30 ++++++++
net/core/filter.c | 76 ++++++++++++++++++-
net/mptcp/protocol.h | 123 ------------------------------
net/mptcp/subflow.c | 25 +++++++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 14 ++++
9 files changed, 325 insertions(+), 124 deletions(-)
--
2.28.0
1 year, 11 months
[PATCH net] mptcp: fix bogus sendmsg() return code under pressure
by Paolo Abeni
In case of memory pressure, mptcp_sendmsg() may call
sk_stream_wait_memory() after succesfully xmitting some
bytes. If the latter fails we currently return to the
user-space the error code, ignoring the succeful xmit.
Address the issue always checking for the xmitted bytes
before mptcp_sendmsg() completes.
Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/protocol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c0abe738e7d3..a761d3c613bb 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -880,7 +880,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
mptcp_set_timeout(sk, ssk);
if (copied) {
- ret = copied;
tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle,
size_goal);
@@ -893,7 +892,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
release_sock(ssk);
out:
release_sock(sk);
- return ret;
+ return copied ? : ret;
}
static void mptcp_wait_data(struct sock *sk, long *timeo)
--
2.26.2
1 year, 11 months
[PATCH net-next] mptcp: use mptcp_for_each_subflow in mptcp_stream_accept
by Geliang Tang
Use mptcp_for_each_subflow in mptcp_stream_accept instead of
open-coding.
Signed-off-by: Geliang Tang <geliangtang(a)gmail.com>
---
net/mptcp/protocol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index d3fe7296e1c9..400824eabf73 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2249,7 +2249,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
* This is needed so NOSPACE flag can be set from tcp stack.
*/
__mptcp_flush_join_list(msk);
- list_for_each_entry(subflow, &msk->conn_list, node) {
+ mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
if (!ssk->sk_socket)
--
2.17.1
1 year, 11 months
[PATCH] mptcp: Allow ip mptcp endpoint show
by Christoph Paasch
The intention was to dump the entire address-list when no ID is given.
However, the condition needs to rather check on argc as argv will always
be != NULL.
Fixes: 7e0767cd862b ("add support for mptcp netlink interface")
Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
---
ip/ipmptcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ip/ipmptcp.c b/ip/ipmptcp.c
index bc12418bd39c..df38754c82b6 100644
--- a/ip/ipmptcp.c
+++ b/ip/ipmptcp.c
@@ -273,7 +273,7 @@ static int mptcp_addr_show(int argc, char **argv)
struct nlmsghdr *answer;
int ret;
- if (!argv)
+ if (argc == 0)
return mptcp_addr_dump();
ret = mptcp_parse_opt(argc, argv, &req.n, false);
--
2.23.0
1 year, 11 months
[PATCH net] mptcp: fix bogus sendmsg() return code under pressure
by Paolo Abeni
In case of memory pressure, mptcp_sendmsg() may call
sk_stream_wait_memory() after succesfully xmitting some
bytes. If the latter fails we currently return to the
user-space the error code, ignoring the succeful xmit.
Address the issue always checking for the xmitted bytes
before mptcp_sendmsg() completes.
Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/protocol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---
needed for upcoming multiple subflow xmit, I hope to sent this
upstream soon
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 650fae3e6e6d..3b9ae98c67bb 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -984,7 +984,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
mptcp_set_timeout(sk, ssk);
if (copied) {
- ret = copied;
tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle,
size_goal);
@@ -997,7 +996,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
release_sock(ssk);
out:
release_sock(sk);
- return ret;
+ return copied ? : ret;
}
static void mptcp_wait_data(struct sock *sk, long *timeo)
--
2.26.2
1 year, 11 months