[MPTCP][PATCH v4 mptcp-next 0/3] add ADD_ADDR echo flag support
by Geliang Tang
v4:
- Just updated some log messages in mptcp_join.sh
v3:
- move add_addr_echo into mptcp_pm_announce_addr()
- check return value of mptcp_pm_add_add_received
- hold lock for add_addr_echo writing
v2:
- add ADD_ADDR mibs
- add selftests for ADD_ADDR
v1:
- mptcp: send out ADD_ADDR with echo flag
Geliang Tang (3):
mptcp: send out ADD_ADDR with echo flag
mptcp: add ADD_ADDR related mibs
selftests: mptcp: add ADD_ADDR mibs check function
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 38 +++++++++++-----
net/mptcp/pm.c | 13 ++++--
net/mptcp/pm_netlink.c | 2 +-
net/mptcp/protocol.h | 8 ++--
.../testing/selftests/net/mptcp/mptcp_join.sh | 44 +++++++++++++++++++
7 files changed, 89 insertions(+), 20 deletions(-)
--
2.17.1
1 year, 10 months
[PATCH mptcp-next v2] mptcp: adjust mptcp receive buffer limit if subflow has larger one
by Florian Westphal
In addition to tcp autotuning during read, it may also increase the
receive buffer in tcp_clamp_window().
In this case, mptcp should adjust its receive buffer size as well so
it can pull all pending skbs at once.
At this time, when TCP grows its receive buffer, it may have more
skbs ready for processing than what mptcp allows.
In the mptcp case, the receive window is derived from free
space of the mptcp parent socket instead of the individual subflows.
Following the subflow allows mptcp to grow its receive buffer.
This is especially noticeable for loopback traffic, when even two
skbs are enough to fill the initial receive window.
In mptcp_data_ready() we do not hold the mptcp socket lock, so
modification of sk_rcvbuf is racy. Do it when moving skbs from subflow
to mptcp socket, both sockets are locked in this case.
v2: move rcvbuf update to __mptcp_move_skbs_from_subflow where both
mptcp and subflow sk locks are held (pointed out by Mat).
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
net/mptcp/protocol.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 77d655b0650c..ecf93d0bec14 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -454,10 +454,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct sock *sk = (struct sock *)msk;
unsigned int moved = 0;
+ int sk_rbuf, ssk_rbuf;
bool more_data_avail;
struct tcp_sock *tp;
bool done = false;
+ ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
+ sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
+
+ if (unlikely(ssk_rbuf > sk_rbuf))
+ WRITE_ONCE(sk->sk_rcvbuf, ssk_rbuf);
+
pr_debug("msk=%p ssk=%p", msk, ssk);
tp = tcp_sk(ssk);
do {
@@ -511,7 +518,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
WRITE_ONCE(tp->copied_seq, seq);
more_data_avail = mptcp_subflow_data_available(ssk);
- if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf)) {
+ if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) {
done = true;
break;
}
@@ -603,6 +610,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_sock *msk = mptcp_sk(sk);
+ int sk_rbuf, ssk_rbuf;
bool wake;
/* move_skbs_to_msk below can legitly clear the data_avail flag,
@@ -613,12 +621,16 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
if (wake)
set_bit(MPTCP_DATA_READY, &msk->flags);
- if (atomic_read(&sk->sk_rmem_alloc) < READ_ONCE(sk->sk_rcvbuf) &&
- move_skbs_to_msk(msk, ssk))
+ ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
+ sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
+ if (unlikely(ssk_rbuf > sk_rbuf))
+ sk_rbuf = ssk_rbuf;
+
+ /* over limit? can't append more skbs to msk */
+ if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf)
goto wake;
- /* don't schedule if mptcp sk is (still) over limit */
- if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf))
+ if (move_skbs_to_msk(msk, ssk))
goto wake;
/* mptcp socket is owned, release_cb should retry */
--
2.26.2
1 year, 10 months
[MPTCP][PATCH v3 mptcp-next 0/3] add ADD_ADDR echo flag support
by Geliang Tang
v3:
- move add_addr_echo into mptcp_pm_announce_addr()
- check return value of mptcp_pm_add_add_received
- hold lock for add_addr_echo writing
v2:
- add ADD_ADDR mibs
- add selftests for ADD_ADDR
v1:
- mptcp: send out ADD_ADDR with echo flag
Geliang Tang (3):
mptcp: send out ADD_ADDR with echo flag
mptcp: add ADD_ADDR related mibs
selftests: mptcp: add ADD_ADDR mibs check function
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 38 +++++++++++-----
net/mptcp/pm.c | 13 ++++--
net/mptcp/pm_netlink.c | 2 +-
net/mptcp/protocol.h | 8 ++--
.../testing/selftests/net/mptcp/mptcp_join.sh | 44 +++++++++++++++++++
7 files changed, 89 insertions(+), 20 deletions(-)
--
2.17.1
1 year, 10 months
[MPTCP][PATCH v2 mptcp-next 0/3] add ADD_ADDR echo flag support
by Geliang Tang
v2:
- add ADD_ADDR mibs
- add selftests for ADD_ADDR
v1:
- mptcp: send out ADD_ADDR with echo flag
Geliang Tang (3):
mptcp: send out ADD_ADDR with echo flag
mptcp: add ADD_ADDR related mibs
selftests: mptcp: add ADD_ADDR mibs check function
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 33 +++++++++-----
net/mptcp/pm.c | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
.../testing/selftests/net/mptcp/mptcp_join.sh | 45 +++++++++++++++++++
7 files changed, 74 insertions(+), 11 deletions(-)
--
2.17.1
1 year, 10 months
[PATCH mptcp-next v2 0/3] bpf: add MPTCP subflow support
by Nicolas Rybowski
Previously it was not possible to make a distinction between plain TCP
sockets and MPTCP subflow sockets on the BPF_PROG_TYPE_SOCK_OPS hook.
This patch series now enables a fine control of subflow sockets. In its
current state, it allows to put different sockopt on each subflow from a
same MPTCP connection (socket mark, TCP congestion algorithm, ...) using
BPF programs.
It should also be the basis of exposing MPTCP-specific fields through BPF.
v1 -> v2:
- update cgroup attachment code in net/mptcp/subflow.c due to additional #ifdef
- revert MPTCP private structure moving in public API (previous patch 3)
- move new BPF helper implementation in net/mptcp/bpf.c
- add bpf.c in Makefile of net/mptcp
- minor cosmetic changes: alignment of function's arguments on open
parenthesis
Nicolas Rybowski (3):
bpf: expose is_mptcp flag to bpf_tcp_sock
mptcp: attach subflow socket to parent cgroup
bpf: add 'bpf_mptcp_sock' structure and helper
include/linux/bpf.h | 33 ++++++++++++++++
include/uapi/linux/bpf.h | 14 +++++++
kernel/bpf/verifier.c | 30 ++++++++++++++
net/core/filter.c | 13 +++++-
net/mptcp/Makefile | 2 +
net/mptcp/bpf.c | 72 ++++++++++++++++++++++++++++++++++
net/mptcp/subflow.c | 27 +++++++++++++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 14 +++++++
9 files changed, 206 insertions(+), 1 deletion(-)
create mode 100644 net/mptcp/bpf.c
--
2.28.0
1 year, 10 months
[PATCH mptcp-next] mptcp: adjust mptcp receive buffer limit if subflow has larger one
by Florian Westphal
In addition to tcp autotuning during read, it may also increase the
receive buffer in tcp_clamp_window().
In this case, mptcp should adjust its receive buffer size as well so
it can pull all pending skbs at once.
At this time, when TCP grows its receive buffer, it may have more
skbs ready for processing than what mptcp allows.
In the mptcp case, the receive window is derived from free
space of the mptcp parent socket instead of the individual subflows.
Following the subflow allows mptcp to grow its receive buffer.
This is especially noticable for loopback traffic, when even two
skbs are enough to fill the initial receive window.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
net/mptcp/protocol.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c800b9147a3c..d9307a3e1e62 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -603,6 +603,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_sock *msk = mptcp_sk(sk);
+ int sk_rbuf, ssk_rbuf;
bool wake;
/* move_skbs_to_msk below can legitly clear the data_avail flag,
@@ -613,12 +614,18 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
if (wake)
set_bit(MPTCP_DATA_READY, &msk->flags);
- if (atomic_read(&sk->sk_rmem_alloc) < READ_ONCE(sk->sk_rcvbuf) &&
- move_skbs_to_msk(msk, ssk))
+ sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
+ ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
+ if (ssk_rbuf > sk_rbuf) {
+ WRITE_ONCE(sk->sk_rcvbuf, ssk_rbuf);
+ sk_rbuf = ssk_rbuf;
+ }
+
+ /* over limit? can't append more skbs to msk */
+ if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf)
goto wake;
- /* don't schedule if mptcp sk is (still) over limit */
- if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf))
+ if (move_skbs_to_msk(msk, ssk))
goto wake;
/* mptcp socket is owned, release_cb should retry */
--
2.26.2
1 year, 10 months
[PATCH net] mptcp: sendmsg: reset iter on error
by Florian Westphal
Once we've copied data from the iterator we need to revert in case we
end up not sending any data.
This bug doesn't trigger with normal 'poll' based tests, because
we only feed a small chunk of data to kernel after poll indicated
POLLOUT. With blocking IO and large writes this triggers. Receiver
ends up with less data than it should get.
Fixes: 72511aab95c94d ("mptcp: avoid blocking in tcp_sendpages")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
net/mptcp/protocol.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index d5aaa98b9136..2e7e87304930 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -725,8 +725,10 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
if (!psize)
return -EINVAL;
- if (!sk_wmem_schedule(sk, psize + dfrag->overhead))
+ if (!sk_wmem_schedule(sk, psize + dfrag->overhead)) {
+ iov_iter_revert(&msg->msg_iter, psize);
return -ENOMEM;
+ }
} else {
offset = dfrag->offset;
psize = min_t(size_t, dfrag->data_len, avail_size);
@@ -737,8 +739,10 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
*/
ret = do_tcp_sendpages(ssk, page, offset, psize,
msg->msg_flags | MSG_SENDPAGE_NOTLAST | MSG_DONTWAIT);
- if (ret <= 0)
+ if (ret <= 0) {
+ iov_iter_revert(&msg->msg_iter, psize);
return ret;
+ }
frag_truesize += ret;
if (!retransmission) {
--
2.26.2
1 year, 10 months
[MPTCP][PATCH v6 mptcp-next 0/2] Add REMOVE_ADDR support
by Geliang Tang
v6:
- rename lookup_anno_list_by_saddr to remove_anno_list_by_saddr as
Paolo suggested.
- add msk socket lock when traverse msk->conn_list as Paolo suggested.
- Since the first three patches in v5 have been merged to export
branch, drop them from this patchset.
- add remove addr and subflow selftest test case.
- this patchset is against mptcp_net-next's export branch.
v5:
- merge mptcp_nl_remove_subflow() and mptcp_nl_remove_addr()
- add cond_resched
- reduce the indentation level in mptcp_pm_nl_rm_addr_received
v4:
- update mptcp_subflow_shutdown()'s args.
- add rm_id check to make sure we don't shutdown the first subflow.
- add conn_list empty check.
- move anno_list to mptcp_pm_data.
- add a new patch 'mptcp: add remove subflow support'.
v3:
- fix memory leak and lock issue in v2.
- drop alist in v2.
- fix mptcp_subflow_shutdown's arguments.
- bzero remote in mptcp_pm_create_subflow_or_signal_addr.
- add more commit message.
Geliang Tang (2):
mptcp: remove addr and subflow in PM netlink
selftests: mptcp: add remove addr and subflow test case
net/mptcp/pm.c | 7 +-
net/mptcp/pm_netlink.c | 87 ++++++++++++++++++-
net/mptcp/protocol.c | 2 +
net/mptcp/protocol.h | 2 +
.../selftests/net/mptcp/mptcp_connect.c | 7 +-
.../testing/selftests/net/mptcp/mptcp_join.sh | 62 +++++++++----
6 files changed, 140 insertions(+), 27 deletions(-)
--
2.17.1
1 year, 10 months
[MPTCP][PATCH mptcp-next] mptcp: send out ADD_ADDR with echo flag
by Geliang Tang
When the ADD_ADDR suboption has been received, we need to send out the same
ADD_ADDR suboption with echo-flag=1.
Signed-off-by: Geliang Tang <geliangtang(a)gmail.com>
---
net/mptcp/options.c | 30 +++++++++++++++++++-----------
net/mptcp/pm.c | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
4 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index a52a05effac9..0f8eb7f919f5 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -242,7 +242,7 @@ static void mptcp_parse_option(const struct sk_buff *skb,
mp_opt->add_addr = 1;
mp_opt->port = 0;
mp_opt->addr_id = *ptr++;
- pr_debug("ADD_ADDR: id=%d", mp_opt->addr_id);
+ pr_debug("ADD_ADDR: id=%d, echo=%d", mp_opt->addr_id, mp_opt->echo);
if (mp_opt->family == MPTCP_ADDR_IPVERSION_4) {
memcpy((u8 *)&mp_opt->addr.s_addr, (u8 *)ptr, 4);
ptr += 4;
@@ -579,6 +579,7 @@ static bool mptcp_established_options_add_addr(struct sock *sk,
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
struct mptcp_addr_info saddr;
+ bool echo = READ_ONCE(msk->pm.add_addr_echo);
int len;
if (!mptcp_pm_should_add_signal(msk) ||
@@ -594,22 +595,26 @@ static bool mptcp_established_options_add_addr(struct sock *sk,
if (saddr.family == AF_INET) {
opts->suboptions |= OPTION_MPTCP_ADD_ADDR;
opts->addr = saddr.addr;
- opts->ahmac = add_addr_generate_hmac(msk->local_key,
- msk->remote_key,
- opts->addr_id,
- &opts->addr);
+ if (!echo) {
+ opts->ahmac = add_addr_generate_hmac(msk->local_key,
+ msk->remote_key,
+ opts->addr_id,
+ &opts->addr);
+ }
}
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
else if (saddr.family == AF_INET6) {
opts->suboptions |= OPTION_MPTCP_ADD_ADDR6;
opts->addr6 = saddr.addr6;
- opts->ahmac = add_addr6_generate_hmac(msk->local_key,
- msk->remote_key,
- opts->addr_id,
- &opts->addr6);
+ if (!echo) {
+ opts->ahmac = add_addr6_generate_hmac(msk->local_key,
+ msk->remote_key,
+ opts->addr_id,
+ &opts->addr6);
+ }
}
#endif
- pr_debug("addr_id=%d, ahmac=%llu", opts->addr_id, opts->ahmac);
+ pr_debug("addr_id=%d, ahmac=%llu, echo=%d", opts->addr_id, opts->ahmac, echo);
return true;
}
@@ -883,8 +888,11 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb,
addr.addr6 = mp_opt.addr6;
}
#endif
- if (!mp_opt.echo)
+ if (!mp_opt.echo) {
+ WRITE_ONCE(msk->pm.add_addr_echo, true);
+ mptcp_pm_announce_addr(msk, &addr);
mptcp_pm_add_addr_received(msk, &addr);
+ }
mp_opt.add_addr = 0;
}
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 558462d87eb3..4709b9562cb0 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -226,6 +226,7 @@ void mptcp_pm_data_init(struct mptcp_sock *msk)
WRITE_ONCE(msk->pm.rm_addr_signal, false);
WRITE_ONCE(msk->pm.accept_addr, false);
WRITE_ONCE(msk->pm.accept_subflow, false);
+ WRITE_ONCE(msk->pm.add_addr_echo, false);
msk->pm.status = 0;
spin_lock_init(&msk->pm.lock);
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 848649a82649..1328d6460a48 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -188,6 +188,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
if (local) {
msk->pm.add_addr_signaled++;
+ WRITE_ONCE(msk->pm.add_addr_echo, false);
mptcp_pm_announce_addr(msk, &local->addr);
} else {
/* pick failed, avoid fourther attempts later */
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 4b8a5308aeed..425fbbc30b7a 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -169,6 +169,7 @@ struct mptcp_pm_data {
bool work_pending;
bool accept_addr;
bool accept_subflow;
+ bool add_addr_echo;
u8 add_addr_signaled;
u8 add_addr_accepted;
u8 local_addr_used;
--
2.17.1
1 year, 10 months
[MPTCP][PATCH v5 mptcp-next 0/4] Add REMOVE_ADDR support
by Geliang Tang
v5:
- merge mptcp_nl_remove_subflow() and mptcp_nl_remove_addr()
- add cond_resched
- reduce the indentation level in mptcp_pm_nl_rm_addr_received
v4:
- update mptcp_subflow_shutdown()'s args.
- add rm_id check to make sure we don't shutdown the first subflow.
- add conn_list empty check.
- move anno_list to mptcp_pm_data.
- add a new patch 'mptcp: add remove subflow support'.
v3:
- fix memory leak and lock issue in v2.
- drop alist in v2.
- fix mptcp_subflow_shutdown's arguments.
- bzero remote in mptcp_pm_create_subflow_or_signal_addr.
- add more commit message.
Geliang Tang (4):
mptcp: rename addr_signal and the related functions
mptcp: add the outgoing RM_ADDR support
mptcp: add the incoming RM_ADDR support
mptcp: remove addr and subflow in PM netlink
net/mptcp/options.c | 48 ++++++++++++---
net/mptcp/pm.c | 56 +++++++++++++++---
net/mptcp/pm_netlink.c | 129 +++++++++++++++++++++++++++++++++++++++--
net/mptcp/protocol.c | 14 +++--
net/mptcp/protocol.h | 28 +++++++--
net/mptcp/subflow.c | 1 +
6 files changed, 248 insertions(+), 28 deletions(-)
--
2.17.1
1 year, 10 months