get rid of the address_space override in setsockopt v2
by Christoph Hellwig
Hi Dave,
setsockopt is the last place in architecture-independ code that still
uses set_fs to force the uaccess routines to operate on kernel pointers.
This series adds a new sockptr_t type that can contained either a kernel
or user pointer, and which has accessors that do the right thing, and
then uses it for setsockopt, starting by refactoring some low-level
helpers and moving them over to it before finally doing the main
setsockopt method.
Note that apparently the eBPF selftests do not even cover this path, so
the series has been tested with a testing patch that always copies the
data first and passes a kernel pointer. This is something that works for
most common sockopts (and is something that the ePBF support relies on),
but unfortunately in various corner cases we either don't use the passed
in length, or in one case actually copy data back from setsockopt, or in
case of bpfilter straight out do not work with kernel pointers at all.
Against net-next/master.
Changes since v1:
- check that users don't pass in kernel addresses
- more bpfilter cleanups
- cosmetic mptcp tweak
Diffstat:
crypto/af_alg.c | 7
drivers/crypto/chelsio/chtls/chtls_main.c | 18 -
drivers/isdn/mISDN/socket.c | 4
include/linux/bpfilter.h | 6
include/linux/filter.h | 3
include/linux/mroute.h | 5
include/linux/mroute6.h | 8
include/linux/net.h | 4
include/linux/netfilter.h | 6
include/linux/netfilter/x_tables.h | 4
include/linux/sockptr.h | 132 ++++++++++++
include/net/inet_connection_sock.h | 3
include/net/ip.h | 7
include/net/ipv6.h | 6
include/net/sctp/structs.h | 2
include/net/sock.h | 7
include/net/tcp.h | 6
include/net/udp.h | 2
include/net/xfrm.h | 8
net/atm/common.c | 6
net/atm/common.h | 2
net/atm/pvc.c | 2
net/atm/svc.c | 6
net/ax25/af_ax25.c | 6
net/bluetooth/hci_sock.c | 8
net/bluetooth/l2cap_sock.c | 22 +-
net/bluetooth/rfcomm/sock.c | 12 -
net/bluetooth/sco.c | 6
net/bpfilter/bpfilter_kern.c | 55 ++---
net/bridge/netfilter/ebtables.c | 46 +---
net/caif/caif_socket.c | 8
net/can/j1939/socket.c | 12 -
net/can/raw.c | 16 -
net/core/filter.c | 6
net/core/sock.c | 36 +--
net/dccp/dccp.h | 2
net/dccp/proto.c | 20 -
net/decnet/af_decnet.c | 13 -
net/ieee802154/socket.c | 6
net/ipv4/bpfilter/sockopt.c | 16 -
net/ipv4/ip_options.c | 43 +---
net/ipv4/ip_sockglue.c | 66 +++---
net/ipv4/ipmr.c | 14 -
net/ipv4/netfilter/arp_tables.c | 33 +--
net/ipv4/netfilter/ip_tables.c | 29 +-
net/ipv4/raw.c | 8
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_ipv4.c | 4
net/ipv4/udp.c | 11 -
net/ipv4/udp_impl.h | 4
net/ipv6/ip6_flowlabel.c | 317 ++++++++++++++++--------------
net/ipv6/ip6mr.c | 17 -
net/ipv6/ipv6_sockglue.c | 203 +++++++++----------
net/ipv6/netfilter/ip6_tables.c | 28 +-
net/ipv6/raw.c | 10
net/ipv6/tcp_ipv6.c | 4
net/ipv6/udp.c | 7
net/ipv6/udp_impl.h | 4
net/iucv/af_iucv.c | 4
net/kcm/kcmsock.c | 6
net/l2tp/l2tp_ppp.c | 4
net/llc/af_llc.c | 4
net/mptcp/protocol.c | 6
net/netfilter/ipvs/ip_vs_ctl.c | 4
net/netfilter/nf_sockopt.c | 2
net/netfilter/x_tables.c | 20 -
net/netlink/af_netlink.c | 4
net/netrom/af_netrom.c | 4
net/nfc/llcp_sock.c | 6
net/packet/af_packet.c | 39 +--
net/phonet/pep.c | 4
net/rds/af_rds.c | 30 +-
net/rds/rdma.c | 14 -
net/rds/rds.h | 6
net/rose/af_rose.c | 4
net/rxrpc/af_rxrpc.c | 8
net/rxrpc/ar-internal.h | 4
net/rxrpc/key.c | 9
net/sctp/socket.c | 4
net/smc/af_smc.c | 4
net/socket.c | 24 --
net/tipc/socket.c | 8
net/tls/tls_main.c | 17 -
net/vmw_vsock/af_vsock.c | 4
net/x25/af_x25.c | 4
net/xdp/xsk.c | 8
net/xfrm/xfrm_state.c | 6
87 files changed, 894 insertions(+), 743 deletions(-)
1 year, 9 months
[RFC PATCH 00/12] mptcp: multiple xmit substreams support
by Paolo Abeni
This is an early RFC to gather feedback and comments on the current status.
Needs the bugfix patch I sent before to avoid exploding badly on the first
packet - can still explode after a few ones.
It refactor send space notifications, introduces OoO handling via RBtree,
sndbuf autotuning, allows the PM to create non backup subflows and finally
a basic scheduler and some self-tests.
The pain point is the in-window check:
on the receiver side msk rcv window is set at tcp_space(msk) - which should be
a quite rough over-estimante of a more correct value.
on the sender side no limit is imposed on the xmitted sequence number, except
the one given by the sndbuf. The msk sndbuf is autotuned to the subflows
largest sndbuf size.
With all the above I observe several out of [MPTCP] window on the RX side, to
the point that if affects the bandwidth (self-tests fail, as they basically
looks at virtual link utilization).
Any comment more than welcome, especially about better mptcp window checks.
This has been quite painful, so I would propose to consider accepting on
export branch even a suboptimal version and then improve incrementally.
Paolo Abeni (12):
mptcp: msk is writable according to msk write space
mptcp: set data_ready status bit in subflow_check_data_avail()
mptcp: trigger msk processing even for OoO data
mptcp: basic sndbuf autotuning
mptcp: introduce and use mptcp_try_coalesce()
mptcp: move ooo skbs into msk out of order queue.
mptcp: cleanup mptcp_subflow_discard_data()
mptcp: add OoO related mibs
mptcp: move address attribute into mptcp_addr_info
mptcp: allow creating non-backup subflows
mptcp: allow picking different xmit subflows
mptcp: simult flow self-tests
net/mptcp/mib.c | 5 +
net/mptcp/mib.h | 5 +
net/mptcp/pm_netlink.c | 38 +-
net/mptcp/protocol.c | 459 ++++++++++++++----
net/mptcp/protocol.h | 18 +-
net/mptcp/subflow.c | 91 ++--
.../selftests/net/mptcp/simult_flows.sh | 290 +++++++++++
7 files changed, 743 insertions(+), 163 deletions(-)
create mode 100755 tools/testing/selftests/net/mptcp/simult_flows.sh
--
2.26.2
1 year, 9 months
[PATCH mptcp-next] selftests/mptcp: Better delay & reordering configuration
by Christoph Paasch
The delay was intended to be configured to "simulate" a high(er) BDP
link. As such, it needs to be set as part of the loss-configuration and
not as part of the netem reordering configuration.
The reordering-config also requires a delay but that delay is the
reordering-extend. So, a good approach is to set the reordering-extend
as a function of the configured latency. E.g., 25% of the overall
latency.
Finally, the intention of tc_reorder was that when it is unset, the test
picks a random configuration. However, currently it is always initialized
and thus the random config won't be picked up.
Github-issue: https://github.com/multipath-tcp/mptcp_net-next/issues/6
Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
---
Notes:
Admittedly, two changes here in this patch (delay-fix and unitializing
tc_reorder). If you want, I can split them but I thought that's overkill for
a selftest-patch.
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 6260520674d0..d29d189d1ae5 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -16,7 +16,6 @@ ipv6=true
ethtool_random_on=true
tc_delay="$((RANDOM%400))"
tc_loss=$((RANDOM%101))
-tc_reorder=""
testmode=""
sndbuf=0
rcvbuf=0
@@ -631,22 +630,24 @@ for sender in "$ns1" "$ns2" "$ns3" "$ns4";do
do_ping "$ns4" $sender dead:beef:3::1
done
-[ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss
+[ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss delay ${tc_delay}ms
echo -n "INFO: Using loss of $tc_loss "
test "$tc_delay" -gt 0 && echo -n "delay $tc_delay ms "
+reorder_delay=`expr $tc_delay / 4`
+
if [ -z "${tc_reorder}" ]; then
reorder1=$((RANDOM%10))
reorder1=$((100 - reorder1))
reorder2=$((RANDOM%100))
- if [ $tc_delay -gt 0 ] && [ $reorder1 -lt 100 ] && [ $reorder2 -gt 0 ]; then
+ if [ $reorder_delay -gt 0 ] && [ $reorder1 -lt 100 ] && [ $reorder2 -gt 0 ]; then
tc_reorder="reorder ${reorder1}% ${reorder2}%"
echo -n "$tc_reorder "
fi
elif [ "$tc_reorder" = "0" ];then
tc_reorder=""
-elif [ "$tc_delay" -gt 0 ];then
+elif [ "$reorder_delay" -gt 0 ];then
# reordering requires some delay
tc_reorder="reorder $tc_reorder"
echo -n "$tc_reorder "
@@ -654,7 +655,7 @@ fi
echo "on ns3eth4"
-tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${tc_delay}ms $tc_reorder
+tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${reorder_delay}ms $tc_reorder
for sender in $ns1 $ns2 $ns3 $ns4;do
run_tests_lo "$ns1" "$sender" 10.0.1.1 1
--
2.23.0
1 year, 9 months
[PATCH net] mptcp: fix bogus sendmsg() return code under pressure
by Paolo Abeni
In case of memory pressure, mptcp_sendmsg() may call
sk_stream_wait_memory() after succesfully xmitting some
bytes. If the latter fails we currently return to the
user-space the error code, ignoring the succeful xmit.
Address the issue always checking for the xmitted bytes
before mptcp_sendmsg() completes.
Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/protocol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---
needed for upcoming multiple subflow xmit, I hope to sent this
upstream soon
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 650fae3e6e6d..3b9ae98c67bb 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -984,7 +984,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
mptcp_set_timeout(sk, ssk);
if (copied) {
- ret = copied;
tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle,
size_goal);
@@ -997,7 +996,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
release_sock(ssk);
out:
release_sock(sk);
- return ret;
+ return copied ? : ret;
}
static void mptcp_wait_data(struct sock *sk, long *timeo)
--
2.26.2
1 year, 9 months
[MPTCP][PATCH v4 mptcp-next 0/5] Add REMOVE_ADDR support
by Geliang Tang
v4:
- update mptcp_subflow_shutdown()'s args.
- add rm_id check to make sure we don't shutdown the first subflow.
- add conn_list empty check.
- move anno_list to mptcp_pm_data.
- add a new patch 'mptcp: add remove subflow support'.
v3:
- fix memory leak and lock issue in v2.
- drop alist in v2.
- fix mptcp_subflow_shutdown's arguments.
- bzero remote in mptcp_pm_create_subflow_or_signal_addr.
- add more commit message.
Geliang Tang (5):
mptcp: rename addr_signal and the related functions
mptcp: add the outgoing RM_ADDR support
mptcp: add the incoming RM_ADDR support
mptcp: trigger the RM_ADDR signal
mptcp: add remove subflow support
net/mptcp/options.c | 48 +++++++++++++++---
net/mptcp/pm.c | 56 ++++++++++++++++++---
net/mptcp/pm_netlink.c | 110 +++++++++++++++++++++++++++++++++++++++--
net/mptcp/protocol.c | 14 ++++--
net/mptcp/protocol.h | 28 +++++++++--
net/mptcp/subflow.c | 1 +
6 files changed, 231 insertions(+), 26 deletions(-)
--
2.17.1
1 year, 9 months
[MPTCP][PATCH v3 mptcp-next 0/4] Add REMOVE_ADDR support
by Geliang Tang
v3:
- fix memory leak and lock issue in v2.
- drop alist in v2.
- fix mptcp_subflow_shutdown's arguments.
- bzero remote in mptcp_pm_create_subflow_or_signal_addr.
- add more commit message.
Geliang Tang (4):
mptcp: rename addr_signal and the related functions
mptcp: add the outgoing RM_ADDR support
mptcp: add the incoming RM_ADDR support
mptcp: trigger the RM_ADDR signal
net/mptcp/options.c | 48 ++++++++++++++++++++++----
net/mptcp/pm.c | 54 ++++++++++++++++++++++++++----
net/mptcp/pm_netlink.c | 76 ++++++++++++++++++++++++++++++++++++++++--
net/mptcp/protocol.c | 17 +++++++---
net/mptcp/protocol.h | 29 +++++++++++++---
net/mptcp/subflow.c | 1 +
6 files changed, 199 insertions(+), 26 deletions(-)
--
2.17.1
1 year, 9 months
[PATCH RFC v3 00/10] mptcp: add syn cookie support
by Florian Westphal
TL;DR: patch #8 implements Paolos suggestion, a state table to keep
MP_JOIN data to reconstruct the request socket for join requests.
----
At this time, when syn-cookies are used and the SYN had an
MPTCP-option, the cookie is sent with MPTCP option cleared,
as the code path that creates a request socket based off a valid ACK
token lacks the needed changes to construct MPTCP request sockets.
After this series, if SYN carries an MPTCP option, the MPTCP option is
not cleared anymore and reconstruction will be done using the MPTCP option
that is re-sent with the ACK:
no additional state gets encoded into the syn cookie or the timestamp.
There are several differences from the normal (syn queue) case with
MPTCP.
I. When syn-cookies are used, the server-generated key is not stored,
it is best-effort only: Storing state would defeat the purpose of
cookies.
The drawback is that the next connection request that comes in before
the cookie-ACK has a small chance that it will generate the same
local_key.
If this happens, the cookie ACK that comes in "second" (which contains
the local and remote key in mptcp options) will compute the token hash
and then detects that this is already in use.
When this happens, late TCP fallback occurs, i.e. the connection sock
is not marked as mptcp capable.
II). SYN packets containing a MP_JOIN requests cannot be handled without
storing state. This is because the SYN contains a nonce value that
we need to store to validate the HMAC of the MP_JOIN ACK that
completes 3whs.
There are only 2 ways to solve this:
a). Do not support JOINs when cookies are in effect.
b). Store the nonce somewhere.
The approach chosen here is b). Patch #8 adds a small state table (1024
slots) to store the MP_JOIN syn mptcp option data.
This takes a total of 16kbyte of statically allocated memory
State storage is subject to following constraints:
1. The token in the JOIN request is valid (i.e. there is an
established MPTCP connection).
2. The MPTCP connection can still accept a new subflow.
Unless there are objects I will drop RFC tag and pass this to
net-next.
drivers/crypto/chelsio/chtls/chtls_cm.c | 1
include/net/mptcp.h | 11 +
include/net/request_sock.h | 3
include/net/tcp.h | 5
net/ipv4/syncookies.c | 44 ++++++-
net/ipv4/tcp_input.c | 7 -
net/ipv4/tcp_ipv4.c | 3
net/ipv4/tcp_output.c | 2
net/ipv6/syncookies.c | 5
net/ipv6/tcp_ipv6.c | 3
net/mptcp/Makefile | 1
net/mptcp/protocol.h | 19 +++
net/mptcp/subflow.c | 131 +++++++++++++++++----
net/mptcp/syncookies.c | 118 ++++++++++++++++++
net/mptcp/token.c | 38 ++++--
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 47 +++++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 66 ++++++++++
17 files changed, 450 insertions(+), 54 deletions(-)
Florian Westphal (10):
tcp: remove cookie_ts bit from request_sock
mptcp: token: move retry to caller
mptcp: subflow: split subflow_init_req
mptcp: rename and export mptcp_subflow_request_sock_ops
tcp: pass want_cookie down to req_init function
mptcp: subflow: add mptcp_subflow_init_cookie_req helper
tcp: syncookies: create mptcp request socket for ACK cookies with MPTCP option
mptcp: enable JOIN requests even if cookies are in use
selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally
selftests: mptcp: add test cases for mptcp join tests with syn cookies
1 year, 9 months
[MPTCP][PATCH v2 mptcp-next 0/4] Add REMOVE_ADDR support
by Geliang Tang
Add REMOVE_ADDR support.
Geliang Tang (4):
mptcp: rename the existing ADD_ADDR related functions
mptcp: add the RM_ADDR option writing
mptcp: add the RM_ADDR option parsing
mptcp: trigger the RM_ADDR signal
net/mptcp/options.c | 49 ++++++++++++++++++++++++++++++++------
net/mptcp/pm.c | 54 ++++++++++++++++++++++++++++++++++++------
net/mptcp/pm_netlink.c | 52 +++++++++++++++++++++++++++++++++++++++-
net/mptcp/protocol.c | 9 +++++--
net/mptcp/protocol.h | 25 +++++++++++++++----
net/mptcp/subflow.c | 1 +
6 files changed, 168 insertions(+), 22 deletions(-)
--
2.17.1
1 year, 9 months
[PATCH net-next 00/12] Exchange MPTCP DATA_FIN/DATA_ACK before TCP FIN
by Mat Martineau
This series allows the MPTCP-level connection to be closed with the
peers exchanging DATA_FIN and DATA_ACK according to the state machine in
appendix D of RFC 8684. The process is very similar to the TCP
disconnect state machine.
The prior code sends DATA_FIN only when TCP FIN packets are sent, and
does not allow for the MPTCP-level connection to be half-closed.
Patch 8 ("mptcp: Use full MPTCP-level disconnect state machine") is the
core of the series. Earlier patches in the series have some small fixes
and helpers in preparation, and the final four small patches do some
cleanup.
Mat Martineau (12):
mptcp: Allow DATA_FIN in headers without TCP FIN
mptcp: Return EPIPE if sending is shut down during a sendmsg
mptcp: Remove outdated and incorrect comment
mptcp: Use MPTCP-level flag for sending DATA_FIN
mptcp: Track received DATA_FIN sequence number and add related helpers
mptcp: Add mptcp_close_state() helper
mptcp: Add helper to process acks of DATA_FIN
mptcp: Use full MPTCP-level disconnect state machine
mptcp: Only use subflow EOF signaling on fallback connections
mptcp: Skip unnecessary skb extension allocation for bare acks
mptcp: Safely read sequence number when lock isn't held
mptcp: Safely store sequence number when sending data
net/mptcp/options.c | 57 +++++++--
net/mptcp/protocol.c | 295 ++++++++++++++++++++++++++++++++++++-------
net/mptcp/protocol.h | 6 +-
net/mptcp/subflow.c | 14 +-
4 files changed, 306 insertions(+), 66 deletions(-)
base-commit: 0003041e7a0bf24594e5d66fe217bbbefdac44ab
--
2.28.0
1 year, 9 months