[PATCH net-next 0/6] MPTCP: improve fallback to TCP
by Davide Caratti
there are situations where MPTCP sockets should fall-back to regular TCP:
this series reworks the fallback code to pursue the following goals:
1) cleanup the non fallback code, removing most of 'if (<fallback>)' in
the data path
2) improve performance for non-fallback sockets, avoiding locks in poll()
further work will also leverage on this changes to achieve:
a) more consistent behavior of gestockopt()/setsockopt() on passive sockets
after fallback
b) support for "infinite maps" as per RFC8684, section 3.7
the series is made of the following items:
- patch 1 lets sendmsg() / recvmsg() / poll() use the main socket also
after fallback
- patch 2 fixes 'simultaneous connect' scenario after fallback. The
problem was present also before the rework, but the fix is much easier
to implement after patch 1
- patch 3, 4, 5 are clean-ups for code that is no more needed after the
fallback rework
- patch 6 fixes a race condition between close() and poll(). The problem
was theoretically present before the rework, but it became almost
systematic after patch 1
Davide Caratti (2):
net: mptcp: improve fallback to TCP
mptcp: fallback in case of simultaneous connect
Paolo Abeni (4):
mptcp: check for plain TCP sock at accept time
mptcp: create first subflow at msk creation time
mptcp: __mptcp_tcp_fallback() returns a struct sock
mptcp: close poll() races
net/mptcp/options.c | 9 +-
net/mptcp/protocol.c | 267 ++++++++++++++-----------------------------
net/mptcp/protocol.h | 43 +++++++
net/mptcp/subflow.c | 57 ++++++---
4 files changed, 175 insertions(+), 201 deletions(-)
--
2.26.2
9 months, 3 weeks
[RFC PATCH] mptcp: Use full MPTCP-level disconnect state machine
by Mat Martineau
RFC 8684 appendix D describes the connection state machine for
MPTCP. This patch implements the DATA_FIN / DATA_ACK exchanges and
MPTCP-level socket state changes described in that appendix, rather than
simply sending DATA_FIN along with TCP FIN when disconnecting subflows.
DATA_FIN is now sent and acknowledged before shutting down the
subflows. Received DATA_FIN information (if not part of a data packet)
is written to the MPTCP socket when the incoming DSS option is parsed by
the subflow, and the MPTCP worker is scheduled to process the
flag. DATA_FIN received as part of a full DSS mapping will be handled
when the mapping is processed.
The DATA_FIN is acknowledged by the worker if the reader is caught
up. If there is still data to be moved to the MPTCP-level queue, ack_seq
will be incremented to account for the DATA_FIN when it reaches the end
of the stream and a DATA_ACK will be sent to the peer.
In this RFC patch, DATA_FIN is sent when calling shutdown(fd, SHUT_WR)
on the MPTCP socket. mptcp_close() has not yet been updated to wait for
the peer to DATA_ACK before forcing all the subflows to be closed and
cleaned up. The handshake is working with both peers using test programs
with shutdown() and delays, but is not yet working consistently with the
self tests. It also doesn't currently detect if fallback has happened,
so there are extra acks sent on fallback connections.
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
---
net/mptcp/options.c | 36 +++++--
net/mptcp/protocol.c | 246 +++++++++++++++++++++++++++++++++++++------
net/mptcp/protocol.h | 4 +
net/mptcp/subflow.c | 32 +++++-
4 files changed, 272 insertions(+), 46 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index b96d3660562fb..75dc030d4362d 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -482,17 +482,10 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
struct mptcp_sock *msk;
unsigned int ack_size;
bool ret = false;
- u8 tcp_fin;
- if (skb) {
- mpext = mptcp_get_ext(skb);
- tcp_fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
- } else {
- mpext = NULL;
- tcp_fin = 0;
- }
+ mpext = skb ? mptcp_get_ext(skb) : NULL;
- if (!skb || (mpext && mpext->use_map) || tcp_fin) {
+ if (!skb || (mpext && mpext->use_map) || subflow->data_fin_tx_enable) {
unsigned int map_size;
map_size = TCPOLEN_MPTCP_DSS_BASE + TCPOLEN_MPTCP_DSS_MAP64;
@@ -502,7 +495,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
if (mpext)
opts->ext_copy = *mpext;
- if (skb && tcp_fin && subflow->data_fin_tx_enable)
+ if (skb && subflow->data_fin_tx_enable)
mptcp_write_data_fin(subflow, &opts->ext_copy);
ret = true;
}
@@ -784,6 +777,26 @@ static void update_una(struct mptcp_sock *msk,
}
}
+static void update_data_fin(struct sock *sk,
+ struct mptcp_options_received *mp_opt)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ /* Skip if DATA_FIN was already received. If updating
+ * simultaneously with the recvmsg loop, values should match. If
+ * they mismatch, the peer sent us bad data and we will prefer
+ * the most recent information.
+ */
+ if (READ_ONCE(msk->rcv_data_fin))
+ return;
+
+ WRITE_ONCE(msk->rcv_data_fin_seq, mp_opt->data_seq);
+ WRITE_ONCE(msk->rcv_data_fin, 1);
+
+ if (schedule_work(&msk->work))
+ sock_hold(sk);
+}
+
static bool add_addr_hmac_valid(struct mptcp_sock *msk,
struct mptcp_options_received *mp_opt)
{
@@ -854,6 +867,9 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb,
if (mp_opt.use_ack)
update_una(msk, &mp_opt);
+ if (mp_opt.data_fin && mp_opt.data_len == 1)
+ update_data_fin(subflow->conn, &mp_opt);
+
mpext = skb_ext_add(skb, SKB_EXT_MPTCP);
if (!mpext)
return;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8e9cef974f4b1..60596ef097b88 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -16,6 +16,7 @@
#include <net/inet_hashtables.h>
#include <net/protocol.h>
#include <net/tcp.h>
+#include <net/tcp_states.h>
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
#include <net/transp_v6.h>
#endif
@@ -141,6 +142,14 @@ static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
MPTCP_SKB_CB(skb)->offset = offset;
}
+static void mptcp_stop_timer(struct sock *sk)
+{
+ struct inet_connection_sock *icsk = inet_csk(sk);
+
+ sk_stop_timer(sk, &icsk->icsk_retransmit_timer);
+ mptcp_sk(sk)->timer_ival = 0;
+}
+
/* both sockets must be locked */
static bool mptcp_subflow_dsn_valid(const struct mptcp_sock *msk,
struct sock *ssk)
@@ -162,6 +171,116 @@ static bool mptcp_subflow_dsn_valid(const struct mptcp_sock *msk,
return mptcp_subflow_data_available(ssk);
}
+static void mptcp_check_data_fin_ack(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ /* Look for an acknowledged DATA_FIN */
+ if (((1 << sk->sk_state) &
+ (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK)) &&
+ (msk->write_seq == atomic64_read(&msk->snd_una))) {
+ struct mptcp_subflow_context *subflow;
+
+ mptcp_stop_timer(sk);
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+ lock_sock(ssk);
+ subflow->data_fin_tx_enable = 0;
+ release_sock(ssk);
+ }
+
+ switch (sk->sk_state) {
+ case TCP_FIN_WAIT1:
+ inet_sk_state_store(sk, TCP_FIN_WAIT2);
+ break;
+ case TCP_CLOSING:
+ fallthrough;
+ case TCP_LAST_ACK:
+ inet_sk_state_store(sk, TCP_CLOSE);
+ // @@ Close subflows now?
+ break;
+ }
+ }
+}
+
+static bool mptcp_pending_data_fin(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ return (READ_ONCE(msk->rcv_data_fin) &&
+ ((1 << sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2)) &&
+ msk->ack_seq == READ_ONCE(msk->rcv_data_fin_seq));
+}
+
+static void mptcp_check_data_fin(struct sock *sk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ /* Need to ack a DATA_FIN in ESTABLISHED, FIN_WAIT1, or FIN_WAIT2.
+ * If we are caught up to the sequence number of the incoming
+ * DATA_FIN, send the DATA_ACK now and do state transition.
+ * If not caught up, do nothing and let the recv code DATA_ACK
+ * along with the received data.
+ */
+
+ if (READ_ONCE(msk->rcv_data_fin) &&
+ ((1 << sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2))) {
+ u64 rcv_data_fin_seq = READ_ONCE(msk->rcv_data_fin_seq);
+ struct mptcp_subflow_context *subflow;
+
+ if (msk->ack_seq == rcv_data_fin_seq) {
+ msk->ack_seq++;
+ WRITE_ONCE(msk->rcv_data_fin, 0);
+
+ sk->sk_shutdown |= RCV_SHUTDOWN;
+
+ switch (sk->sk_state) {
+ case TCP_ESTABLISHED:
+ inet_sk_state_store(sk, TCP_CLOSE_WAIT);
+ break;
+ case TCP_FIN_WAIT1:
+ inet_sk_state_store(sk, TCP_CLOSING);
+ break;
+ case TCP_FIN_WAIT2:
+ inet_sk_state_store(sk, TCP_CLOSE);
+ // @@ Close subflows now?
+ break;
+ default:
+ /* Other states not expected */
+ WARN_ON_ONCE(1);
+ break;
+ }
+
+ mptcp_set_timeout(sk, NULL);
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+ lock_sock(ssk);
+ tcp_send_ack(ssk);
+ release_sock(ssk);
+ }
+
+ sk->sk_state_change(sk);
+
+ if (sk->sk_shutdown == SHUTDOWN_MASK ||
+ sk->sk_state == TCP_CLOSE)
+ sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_HUP);
+ else
+ sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
+ }
+ }
+}
+
+static void mptcp_schedule_work(struct sock *sk)
+{
+ if (schedule_work(&mptcp_sk(sk)->work))
+ sock_hold(sk);
+}
+
static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
struct sock *ssk,
unsigned int *bytes)
@@ -219,6 +338,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
seq += len;
moved += len;
+ if (mptcp_pending_data_fin(sk))
+ mptcp_schedule_work(sk);
+
if (WARN_ON_ONCE(map_remaining < len))
break;
} else {
@@ -302,7 +424,7 @@ static void __mptcp_flush_join_list(struct mptcp_sock *msk)
spin_unlock_bh(&msk->join_list_lock);
}
-static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
+void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
{
long tout = ssk && inet_csk(ssk)->icsk_pending ?
inet_csk(ssk)->icsk_timeout - jiffies : 0;
@@ -333,7 +455,8 @@ void mptcp_data_acked(struct sock *sk)
{
mptcp_reset_timer(sk);
- if (!sk_stream_is_writeable(sk) &&
+ if ((!sk_stream_is_writeable(sk) ||
+ (inet_sk_state_load(sk) != TCP_ESTABLISHED)) &&
schedule_work(&mptcp_sk(sk)->work))
sock_hold(sk);
}
@@ -368,14 +491,6 @@ static void mptcp_check_for_eof(struct mptcp_sock *msk)
}
}
-static void mptcp_stop_timer(struct sock *sk)
-{
- struct inet_connection_sock *icsk = inet_csk(sk);
-
- sk_stop_timer(sk, &icsk->icsk_retransmit_timer);
- mptcp_sk(sk)->timer_ival = 0;
-}
-
static bool mptcp_ext_cache_refill(struct mptcp_sock *msk)
{
const struct sock *sk = (const struct sock *)msk;
@@ -742,8 +857,15 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
restart:
mptcp_clean_una(sk);
+ if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) {
+ ret = -EPIPE;
+ goto out;
+ }
+
+
wait_for_sndbuf:
__mptcp_flush_join_list(msk);
+
ssk = mptcp_subflow_get_send(msk);
while (!sk_stream_memory_free(sk) ||
!ssk ||
@@ -1249,6 +1371,7 @@ static void mptcp_worker(struct work_struct *work)
lock_sock(sk);
mptcp_clean_una(sk);
+ mptcp_check_data_fin_ack(sk);
__mptcp_flush_join_list(msk);
__mptcp_move_skbs(msk);
@@ -1258,6 +1381,8 @@ static void mptcp_worker(struct work_struct *work)
if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
mptcp_check_for_eof(msk);
+ mptcp_check_data_fin(sk);
+
if (!test_and_clear_bit(MPTCP_WORK_RTX, &msk->flags))
goto unlock;
@@ -1383,30 +1508,28 @@ static void mptcp_cancel_work(struct sock *sk)
sock_put(sk);
}
-static void mptcp_subflow_shutdown(struct sock *ssk, int how,
- bool data_fin_tx_enable, u64 data_fin_tx_seq)
+static void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk,
+ u64 data_fin_tx_seq, int how)
{
+ struct mptcp_subflow_context *subflow;
+
lock_sock(ssk);
switch (ssk->sk_state) {
case TCP_LISTEN:
if (!(how & RCV_SHUTDOWN))
break;
- /* fall through */
+ fallthrough;
case TCP_SYN_SENT:
tcp_disconnect(ssk, O_NONBLOCK);
break;
default:
- if (data_fin_tx_enable) {
- struct mptcp_subflow_context *subflow;
-
- subflow = mptcp_subflow_ctx(ssk);
- subflow->data_fin_tx_seq = data_fin_tx_seq;
- subflow->data_fin_tx_enable = 1;
- }
+ subflow = mptcp_subflow_ctx(ssk);
+ subflow->data_fin_tx_seq = data_fin_tx_seq;
+ subflow->data_fin_tx_enable = 1;
- ssk->sk_shutdown |= how;
- tcp_shutdown(ssk, how);
+ mptcp_set_timeout(sk, ssk);
+ tcp_send_ack(ssk);
break;
}
@@ -1418,11 +1541,14 @@ static void mptcp_close(struct sock *sk, long timeout)
{
struct mptcp_subflow_context *subflow, *tmp;
struct mptcp_sock *msk = mptcp_sk(sk);
+ bool data_fin_tx_enable;
LIST_HEAD(conn_list);
u64 data_fin_tx_seq;
+ int prev_state;
lock_sock(sk);
-
+pr_debug("closing");
+ prev_state = sk->sk_state;
inet_sk_state_store(sk, TCP_CLOSE);
/* be sure to always acquire the join list lock, to sync vs
@@ -1433,7 +1559,17 @@ static void mptcp_close(struct sock *sk, long timeout)
spin_unlock_bh(&msk->join_list_lock);
list_splice_init(&msk->conn_list, &conn_list);
- data_fin_tx_seq = msk->write_seq;
+ if ((1 << prev_state) &
+ (TCPF_ESTABLISHED | TCPF_SYN_SENT |
+ TCPF_SYN_RECV | TCPF_CLOSE_WAIT)) {
+ data_fin_tx_seq = msk->write_seq;
+ data_fin_tx_enable = 1;
+
+ msk->write_seq++;
+ } else {
+ /* DATA_FIN has already been sent in other states */
+ data_fin_tx_enable = 0;
+ }
__mptcp_clear_xmit(sk);
@@ -1442,8 +1578,11 @@ static void mptcp_close(struct sock *sk, long timeout)
list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
- subflow->data_fin_tx_seq = data_fin_tx_seq;
- subflow->data_fin_tx_enable = 1;
+ if (data_fin_tx_enable) {
+ subflow->data_fin_tx_seq = data_fin_tx_seq;
+ subflow->data_fin_tx_enable = 1;
+ }
+
__mptcp_close_ssk(sk, ssk, subflow, timeout);
}
@@ -2109,6 +2248,33 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
return mask;
}
+static const unsigned char new_state[16] = {
+ /* current state: new state: action: */
+ [0 /* (Invalid) */] = TCP_CLOSE,
+ [TCP_ESTABLISHED] = TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+ [TCP_SYN_SENT] = TCP_CLOSE,
+ [TCP_SYN_RECV] = TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+ [TCP_FIN_WAIT1] = TCP_FIN_WAIT1,
+ [TCP_FIN_WAIT2] = TCP_FIN_WAIT2,
+ [TCP_TIME_WAIT] = TCP_CLOSE, /* should not happen ! */
+ [TCP_CLOSE] = TCP_CLOSE,
+ [TCP_CLOSE_WAIT] = TCP_LAST_ACK | TCP_ACTION_FIN,
+ [TCP_LAST_ACK] = TCP_LAST_ACK,
+ [TCP_LISTEN] = TCP_CLOSE,
+ [TCP_CLOSING] = TCP_CLOSING,
+ [TCP_NEW_SYN_RECV] = TCP_CLOSE, /* should not happen ! */
+};
+
+static int mptcp_close_state(struct sock *sk)
+{
+ int next = (int)new_state[sk->sk_state];
+ int ns = next & TCP_STATE_MASK;
+
+ inet_sk_state_store(sk, ns);
+
+ return next & TCP_ACTION_FIN;
+}
+
static int mptcp_shutdown(struct socket *sock, int how)
{
struct mptcp_sock *msk = mptcp_sk(sock->sk);
@@ -2118,11 +2284,8 @@ static int mptcp_shutdown(struct socket *sock, int how)
pr_debug("sk=%p, how=%d", msk, how);
lock_sock(sock->sk);
- if (how == SHUT_WR || how == SHUT_RDWR)
- inet_sk_state_store(sock->sk, TCP_FIN_WAIT1);
how++;
-
if ((how & ~SHUTDOWN_MASK) || !how) {
ret = -EINVAL;
goto out_unlock;
@@ -2136,11 +2299,28 @@ static int mptcp_shutdown(struct socket *sock, int how)
sock->state = SS_CONNECTED;
}
- __mptcp_flush_join_list(msk);
- mptcp_for_each_subflow(msk, subflow) {
- struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow);
+ /* If we've already sent a FIN, or it's a closed state, skip this. */
+ if ((how & SEND_SHUTDOWN) &&
+ ((1 << sock->sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_SYN_SENT |
+ TCPF_SYN_RECV | TCPF_CLOSE_WAIT))) {
+
+ if (mptcp_close_state(sock->sk)) {
+ u64 data_fin_tx_seq = msk->write_seq;
- mptcp_subflow_shutdown(tcp_sk, how, 1, msk->write_seq);
+ __mptcp_flush_join_list(msk);
+
+ msk->write_seq++;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *tcp_sk;
+
+ tcp_sk = mptcp_subflow_tcp_sock(subflow);
+
+ mptcp_subflow_shutdown(sock->sk, tcp_sk,
+ data_fin_tx_seq, how);
+ }
+ }
}
/* Wake up anyone sleeping in poll. */
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index dfe67e8a86ed2..834312af5ac66 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -193,11 +193,13 @@ struct mptcp_sock {
u64 remote_key;
u64 write_seq;
u64 ack_seq;
+ u64 rcv_data_fin_seq;
atomic64_t snd_una;
unsigned long timer_ival;
u32 token;
unsigned long flags;
bool can_ack;
+ bool rcv_data_fin;
spinlock_t join_list_lock;
struct work_struct work;
struct list_head conn_list;
@@ -505,4 +507,6 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
!subflow->conn_finished;
}
+void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk);
+
#endif /* __MPTCP_PROTOCOL_H */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0f0fa1ba57a89..76d05bf05f14c 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -581,7 +581,8 @@ static bool validate_mapping(struct sock *ssk, struct sk_buff *skb)
return true;
}
-static enum mapping_status get_mapping_status(struct sock *ssk)
+static enum mapping_status get_mapping_status(struct sock *ssk,
+ struct mptcp_sock *msk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_ext *mpext;
@@ -631,7 +632,9 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
if (mpext->data_fin == 1) {
if (data_len == 1) {
- pr_debug("DATA_FIN with no payload");
+ WRITE_ONCE(msk->rcv_data_fin_seq, mpext->data_seq);
+ WRITE_ONCE(msk->rcv_data_fin, 1);
+ pr_debug("DATA_FIN with no payload seq=%llu", mpext->data_seq);
if (subflow->map_valid) {
/* A DATA_FIN might arrive in a DSS
* option before the previous mapping
@@ -643,6 +646,11 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
} else {
return MAPPING_DATA_FIN;
}
+ } else {
+ WRITE_ONCE(msk->rcv_data_fin_seq,
+ mpext->data_seq + data_len);
+ WRITE_ONCE(msk->rcv_data_fin, 1);
+ pr_debug("DATA_FIN with mapping seq=%llu", mpext->data_seq + data_len);
}
/* Adjust for DATA_FIN using 1 byte of sequence space */
@@ -731,7 +739,25 @@ static bool subflow_check_data_avail(struct sock *ssk)
u64 ack_seq;
u64 old_ack;
- status = get_mapping_status(ssk);
+ if (READ_ONCE(msk->rcv_data_fin)) {
+ u64 rcv_data_fin_seq = READ_ONCE(msk->rcv_data_fin_seq);
+
+ ack_seq = READ_ONCE(msk->ack_seq);
+
+ if (ack_seq == rcv_data_fin_seq) {
+ return false;
+ } else if (ack_seq == rcv_data_fin_seq - 1) {
+ /* Acknowledge the received DATA_FIN
+ * by incrementing the sequence number
+ */
+ msk->ack_seq++;
+ mptcp_set_timeout(subflow->conn, ssk);
+ tcp_send_ack(ssk);
+ return false;
+ }
+ }
+
+ status = get_mapping_status(ssk, msk);
pr_debug("msk=%p ssk=%p status=%d", msk, ssk, status);
if (status == MAPPING_INVALID) {
ssk->sk_err = EBADMSG;
--
2.27.0
9 months, 3 weeks
Hello,
by mrs.victoria alexander
Dear friend,
I have a business container transaction what that some of( $13million dollars)
I would like to discuss with you. If you are interested, please
contact my email
address (mrs.victoria.alexander2(a)gmail.com)
My WhatsApp number but only message (+19293737780)
Please do not reply if you are not ready
Thanks
9 months, 3 weeks
[PATCH] Squash-to: "net: mptcp: improve fallback to TCP"
by Paolo Abeni
This fixes blocking accept() never waking-up.
When blocking, mptcp_stream_accept() calls into:
ssock->ops->acceptk -> ssock->sk->sk_prot->accept ->
inet_csk_accept() -> inet_csk_wait_for_connect(),
which in turns waits on ssock->sk->sk_wq, while
all signaling how happens on sock->sk->sk_wq.
Since we ssock->sk->sk_wq is never used otherwise,
just copy sock->sk->sk_wq into it at first subflow
creatin time.
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/protocol.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4d6d35e99d0f..5ae8952189d2 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -128,6 +128,10 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state)
list_add(&subflow->node, &msk->conn_list);
subflow->request_mptcp = 1;
+ /* accept() will wait on first subflow sk_wq, and we always wakes up
+ * via msk->sk_socket */
+ RCU_INIT_POINTER(msk->first->sk_wq, &sk->sk_socket->wq);
+
set_state:
if (state != MPTCP_SAME_STATE)
inet_sk_state_store(sk, state);
--
2.26.2
9 months, 3 weeks
To ~~> mptcp@lists.01.org
by Ms Karen Ngui
Kindly confirm if you got my business collaboration In-mail sent to you via LinkedIn.
Thanks. Mrs. Ngui
9 months, 3 weeks
[Weekly meetings] MoM - 25th of June 2020
by Matthieu Baerts
Hello,
Last Thursday, we had our 105th meeting with Mat, Ossama and Todd (Intel
OTC), Christoph (Apple), Paolo, Davide and Florian (RedHat), Geliang
Tang (Xiaomi) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Geliang Tang:
- (Geliang Tang sounds like Glenn Town)
- New contributor from Beijing, China (UTC+08:00)
- He recently looked at the code and send some fixes
- Working at Xiaomi
Accepted patches:
- The list of accepted patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=3
netdev (if mptcp ML is in cc) (Geliang Tang):
1314275 [net] mptcp: drop sndr_key in mptcp_syn_options
our repo (by: Florian Westphal, Matthieu Baerts, Paolo Abeni):
1316519 [mptcp-next,3/3] mptcp: support IPV6_V6ONLY setsockopt
1316518 [mptcp-next,2/3] mptcp: add REUSEADDR/REUSEPORT support
1316517 [mptcp-next,1/3] net: use mptcp setsockopt function for SOL_SOC
1315320 [v3,4/4] mptcp: __mptcp_tcp_fallback() returns a struct sock
1312006 [v2,3/4] mptcp: create first subflow at msk creation time
1312008 [v2,2/4] mptcp: check for plain TCP sock at accept time
1312005 [v2,1/4] Squash-to: "net: mptcp: improve fallback to TCP"
1312835 [mptcp-next] mptcp: use mptcp worker for path management
1310590 [mptcp-next] mptcp: default KUNIT_* fragments to KUNIT_ALL_TEST
1310586 [mptcp-next] selftests/mptcp: Capture pcap on both sender and
receiver
Pending patches:
- The list of pending patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=*
netdev (if mptcp ML is in cc) (by: /):
/
our repo (by: Christoph Paasch, Florian Westphal, Geliang Tang,
Paolo Abeni):
1310579: RFC: Re: [PATCH mptcp-next 2/2] mptcp: add receive buffer auto:
- It has been included in the next patch from Florian, just below
1314402: New: [mptcp-next] mptcp: init autotune state also in simultane:
- Waiting for review
1316824: New: [1/3] inet_diag: support for wider protocol numbers
1316826: New: [2/3] mptcp: add msk interations helpers
1316827: New: [3/3] mptcp: add MPTCP socket diag interface
1317043: New: [4/4] selftests/mptcp: add diag interface tests:
- diag interface for MPTCP socket
- following the recommendation from David Miller
- Also a new selftests
- Also waiting some feedbacks
1316834: New: [iproute2-next,1/2] include: update mptcp uAPI
1316833: New: [iproute2-next,2/2] ss: mptcp: add msk diag interface support:
- Userspace part, linked to the previous series.
Issues on Github:
https://github.com/multipath-tcp/mptcp_net-next/issues/
open: (latest from last week: 39)
39 [syzkaller] WARNING in subflow_data_ready (v2.0) @dcaratti
34 WARNING: Bad mapping: ssn=1 map_seq=498340137 map_data_len=79:
- *Paolo* has a fix for that (shared on Friday)
- The fix will be squashed later
33 weighttp-test on 1KB file with mptcp.org-client in "ndiffports"-mode
31 Allow MPTCP + SYN_COOKIES [enhancement]
30 ssh restart does not work
24 Revisit layout of struct mptcp_subflow_context [enhancement]
21 sort-out {set,get}sockopt handling [enhancement]
20 implement msk diag interface [enhancement] @pabeni
19 let PM netlink update live sockets on local addresses list change
[enhancement]
18 allow 'force to MPTCP' mode [enhancement]
17 audit 'backup' flag usage [enhancement]
15 reduce mptcp_out_option struct size [enhancement]
6 loss and delay without reordering causes very slow transfer:
- Matth has a simple patch to always add "reorder" to TC command
- Christoph suggests to add less delay but more packets with
this delay
- *Christoph* will send a patch
4 keep a single work struct in mptcp socket [enhancement]
New one from Mat (reported on the ML):
- issue with accept() (blocking), more details on the ML
- *Mat* will add more details and share that on Github (Done →
issue 40)
- Maybe related to the Fallback refactor (incoming socket events
are not handled the same way)
- *TODO* Could be good to cover this with a selftest.
recently closed:
26 Unstable packetdrill tests:
- fixed (in Packetdrill, by Todd)
FYI: Current Roadmap:
- Part 4 (next merge window):
- Fix bugs reported on Github:
https://github.com/multipath-tcp/mptcp_net-next/issues/
- IPv6 - IPv4 mapped support
- not dropping MPTCP options (ADD_ADDR, etc.)
- FAST_CLOSE
- full MPTCP v1 support (reliable add_addr, etc.)
- after a few attempts of failed MPTCP, we fallback to TCP
(like TFO is doing)
- PM server (more advanced)
- Active backup support
- sending "simultaneously" on different subflows (multiple non
backup subflows)
- Full DATA_FIN support [WIP by Mat]
- ADD_ADDR for MPTCPv1: echo bit [WIP by Peter]
- Opti in TCP option structures (unions) [DONE by Paolo]
- Shared recv window (full support) [DONE by Florian]
- Part 5 (extra needed for prod):
- opti/perfs
- TFO
- PM netlink
- PM bpf
- Scheduler bpf
- syncookies
- [gs]etsockopt per subflow
- notify the userspace when a subflow is added/removed → cmsg
*Matth*: TODO: look at what we can do on Github → project, label, etc.
https://github.com/multipath-tcp/mptcp_net-next/projects
Part 4: new features:
- news about "Full DATA_FIN support"?:
- Mat was blocked with the issue with accept()
- Mat should be able to share a RFC soon
- news about "ADD_ADDR for MPTCPv1: echo bit"?:
- no news, maybe more next week
Extra tests:
- news about Syzkaller? (Christoph):
- good news: nothing
- "bad mapping" issue will be closed soon
- issue 39 is still happening. Davide is looking at that
- news about interop with mptcp.org? (Christoph):
- weighttp has some issues
- debug have been shared on the ML
- according to the info from perf, the server seems idle
- *Christoph* will try with less connections
- news about Intel's kbuild? (Mat):
- no new update, looks like the builds are green but selftests
are unstable: maybe linked with issues with BPF and not linked to MPTCP
- *Mat* will continue to monitor this
- packetdrill (Davide & Todd):
- MP_JOIN on the way (Todd)
- Also some additional fixes, PR should be ready now
- *Davide* is looking at the review
- so soon we will be able to have pkt test with multiple subflows
- out-of-tree kernel is no longer supported (not compatible
with the socket API, even v1) but generating v0 packets is still OK
except to generate ADD_ADDR and MP_JOIN.
- we could drop the scripts to be able to test the whole dir
(mptcp)
- but keep the possibility to generate v0 packets (not a lot of
code, easy to maintain but no longer able to verify)
- CI (Matth):
- /
- Coverity (Davide):
- *Davide* is trying to clear the last (false positive,
confirmed) report that regards kmem_cache_create() for subflows.
- the false positive is just for MPTCP, the way we use
kmem_cache_create
Sysctl:
- similar to net.mptcp.mptcp_enabled in mptcp.org
- would it be accepted upstream?
- simpler than CGroup →
https://github.com/multipath-tcp/mptcp_net-next/issues/18
- check the systemtap script https://paste.centos.org/view/16641a10
- Paolo proposes to ask this to Netdev ML but after the patches we
want to share during this window
- We could also add support to only have MPTCP for incoming
connections (good for CDN) and outgoing connections as it was recently
done in mptcp.org by Christoph
BPF:
- An intern at Tessares is interested to look at that.
- What's the most interesting thing to look at "now"?
- Ideally something around the scheduler but hard now, no?
- PM?
- setsockopt per subflow? → maybe a good start
- issue 18? → suggested by other kernel devs at Netdev 0x12
syzkaller repro:
- https://github.com/multipath-tcp/mptcp-syzkaller-repro
- feel free to add/move new/existing ones
- syzkaller repro: needs syzexecuter but not as easy to put in
place, can probably be scripted (but it can also be long)
- start with .c reproducers
- kernel configuration: it is important, e.g. some depends on
Kasan, some not
Meeting at a different time:
- to have people from different timezone to be present
- have meeting at different times?
- *Matth*: TODO: send mail (when back to work)
Next meeting:
- We propose to have the next meeting on Thursday, the 2nd of July.
- Usual time (for the moment): 16:00 UTC (9am PDT, 6pm CEST)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20200702
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
9 months, 3 weeks
[PATCH net-next v2 0/4] mptcp: refactor token container
by Paolo Abeni
Currently the msk sockets are stored in a single radix tree, protected by a
global spin_lock. This series moves to an hash table, allocated at boot time,
with per bucker spin_lock - alike inet_hashtables, but using a different key:
the token itself.
The above improves scalability, as write operations will have a far later chance
to compete for lock acquisition, allows lockless lookup, and will allow
easier msk traversing - e.g. for diag interface implementation's sake.
This also introduces trivial, related, kunit tests and move the existing in
kernel's one to kunit.
v1 -> v2:
- fixed a few extra and sparse warns
Paolo Abeni (4):
mptcp: add __init annotation on setup functions
mptcp: refactor token container
mptcp: move crypto test to KUNIT
mptcp: introduce token KUNIT self-tests
net/mptcp/Kconfig | 20 ++-
net/mptcp/Makefile | 4 +
net/mptcp/crypto.c | 63 +--------
net/mptcp/crypto_test.c | 72 +++++++++++
net/mptcp/pm.c | 2 +-
net/mptcp/pm_netlink.c | 2 +-
net/mptcp/protocol.c | 49 ++++---
net/mptcp/protocol.h | 24 ++--
net/mptcp/subflow.c | 21 ++-
net/mptcp/token.c | 280 ++++++++++++++++++++++++++++------------
net/mptcp/token_test.c | 140 ++++++++++++++++++++
11 files changed, 487 insertions(+), 190 deletions(-)
create mode 100644 net/mptcp/crypto_test.c
create mode 100644 net/mptcp/token_test.c
--
2.26.2
9 months, 3 weeks
[PATCH net-next 0/4] mptcp: refactor token container
by Paolo Abeni
Currently the msk sockets are stored in a single radix tree, protected by a
global spin_lock. This series moves to an hash table, allocated at boot time,
with per bucker spin_lock - alike inet_hashtables, but using a different key:
the token itself.
The above improves scalability, as write operations will have a far later chance
to compete for lock acquisition, allows lockless lookup, and will allow
easier msk traversing - e.g. for diag interface implementation's sake.
This also introduces trivial, related, kunit tests and move the existing in
kernel's one to kunit.
Paolo Abeni (4):
mptcp: add __init annotation on setup functions
mptcp: refactor token container
mptcp: move crypto test to KUNIT
mptcp: introduce token KUNIT self-tests
net/mptcp/Kconfig | 20 ++-
net/mptcp/Makefile | 4 +
net/mptcp/crypto.c | 63 +--------
net/mptcp/crypto_test.c | 72 +++++++++++
net/mptcp/pm.c | 2 +-
net/mptcp/pm_netlink.c | 2 +-
net/mptcp/protocol.c | 49 ++++---
net/mptcp/protocol.h | 24 ++--
net/mptcp/subflow.c | 21 ++-
net/mptcp/token.c | 280 ++++++++++++++++++++++++++++------------
net/mptcp/token_test.c | 139 ++++++++++++++++++++
11 files changed, 486 insertions(+), 190 deletions(-)
create mode 100644 net/mptcp/crypto_test.c
create mode 100644 net/mptcp/token_test.c
--
2.26.2
9 months, 3 weeks
[PATCH 0/3] mptcp: msk diag support
by Paolo Abeni
This introduces basic mptcp sockets diag support.
As IPPROTO_MPTCP excedes 8 bits, we need some changes at the inet_diag level:
a new attribute is introduced to allow user-space providing u32 protocol
values.
Patch 2 introduces new token APIs to allow traversing the existing msks, while
patch 3 bring in the actual diag implementation.
Paolo Abeni (3):
inet_diag: support for wider protocol numbers
mptcp: add msk interations helpers
mptcp: add MPTCP socket diag interface
include/uapi/linux/inet_diag.h | 1 +
include/uapi/linux/mptcp.h | 15 ++++
net/core/sock.c | 1 +
net/ipv4/inet_diag.c | 63 +++++++++----
net/mptcp/Kconfig | 4 +
net/mptcp/Makefile | 2 +
net/mptcp/mptcp_diag.c | 160 +++++++++++++++++++++++++++++++++
net/mptcp/options.c | 6 +-
net/mptcp/protocol.h | 3 +
net/mptcp/token.c | 83 +++++++++++++++++
10 files changed, 318 insertions(+), 20 deletions(-)
create mode 100644 net/mptcp/mptcp_diag.c
--
2.26.2
9 months, 3 weeks