[RFC PATCH v4 00/17] MPTCP architecture proposal
by Mat Martineau
Hello everyone,
Peter and I have been working on this patch set to show how how MPTCP
can fit in to the Linux networking stack using these design ideas:
* Applications opt-in to MPTCP using IPPROTO_MPTCP, regular TCP sockets
are still the default. A socket created with
socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP) will attempt to form a
MPTCP connection. IPPROTO_MPTCP == 99 as a placeholder.
* Subflows exist within the kernel as separate sockets, owned by a
MPTCP connection-level socket that is visible to userspace.
* Adds private pointers to struct sk_buff to store MPTCP metadata.
* Adds the CONFIG_MPTCP option to Kconfig.
Note that this does not yet make use of Florian's CONFIG_SKB_EXTENSIONS,
but I plan to drop patch 12 of this series and use CONFIG_SKB_EXTENSIONS
instead (since they are designed for multiple uses and will hopefully
be merged upstream). Refer to
https://marc.info/?l=linux-netdev&m=154323251731893&w=2
The following patches can form an MPTCP connection with the
multipath-tcp.org kernel (tested with v0.94), and send DSS mappings that
are accepted for the initial data packet. It is an early implementation,
and I don't represent it as being upstreamable as-is or being everyone's
idea of what an eventual upstream implementation will necessarily look
like. It has significant limitations:
* Only one subflow is supported, no joins, and only ipv4.
* Does not support DSS checksums. Checksums must be disabled on the
remote stack (for multipath-tcp.org, 'sudo sysctl -w
net.mptcp.mptcp_checksum=0')
* Lots of debug statements (although they use dynamic debug and are
disabled by default) and TODOs.
* It's only been tested sending small amounts of data for each send
Hopefully there are are some interesting concepts to discuss, and this
code helps us assess how workable the above design principles
are. Thanks in advance for your feedback on the benefits or drawbacks of
this code, how it might be improved, or how other approaches might
compare.
The patch set applies to net-next (as of commit 1464193107da). I have also
pushed it to:
https://git.kernel.org/pub/scm/linux/kernel/git/martineau/linux.git
(mptcp-proposal branch)
v4 changes: Refine skb extension (remove copy hook), change rx path to
use skb extension instead of error queue,
v3 changes: Change skb extension technique, change rx path to use error
queue, add foundational code for multiple subflows, and many bug fixes.
v2 changes: Added receive path implementation (last two patches).
Reworked TCP option writing. Miscellaneous bug fixes including
header dependency cleanup.
Mat Martineau (7):
tcp: Add MPTCP option number
tcp: Define IPPROTO_MPTCP
skbuff: Add private data pointer
tcp: Prevent coalesce and collapse when skb->priv is used
tcp: Export low-level TCP functions
mptcp: Write MPTCP DSS headers to outgoing data packets
mptcp: Implement MPTCP receive path
Peter Krystad (10):
mptcp: Add MPTCP socket stubs
mptcp: Handle MPTCP TCP options
tcp: Add IPPROTO_SUBFLOW
tcp: expose tcp routines and structs for MPTCP
mptcp: Create SUBFLOW socket for outgoing connections
mptcp: Create SUBFLOW socket for incoming connections
mptcp: Add key generation and token tree
mptcp: Add shutdown() socket operation
mptcp: Add setsockopt()/getsockopt() socket operations
mptcp: Make connection_list a real list of subflows
include/linux/skbuff.h | 13 +-
include/linux/tcp.h | 26 ++
include/net/inet_common.h | 3 +
include/net/mptcp.h | 234 ++++++++++
include/net/tcp.h | 15 +
include/uapi/linux/in.h | 4 +
net/Kconfig | 1 +
net/Makefile | 1 +
net/core/skbuff.c | 5 +
net/ipv4/af_inet.c | 2 +-
net/ipv4/tcp.c | 12 +-
net/ipv4/tcp_input.c | 23 +-
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_output.c | 249 +++++++++-
net/mptcp/Kconfig | 10 +
net/mptcp/Makefile | 3 +
net/mptcp/crypto.c | 215 +++++++++
net/mptcp/options.c | 302 ++++++++++++
net/mptcp/protocol.c | 939 ++++++++++++++++++++++++++++++++++++++
net/mptcp/subflow.c | 377 +++++++++++++++
net/mptcp/token.c | 256 +++++++++++
21 files changed, 2663 insertions(+), 31 deletions(-)
create mode 100644 include/net/mptcp.h
create mode 100644 net/mptcp/Kconfig
create mode 100644 net/mptcp/Makefile
create mode 100644 net/mptcp/crypto.c
create mode 100644 net/mptcp/options.c
create mode 100644 net/mptcp/protocol.c
create mode 100644 net/mptcp/subflow.c
create mode 100644 net/mptcp/token.c
--
2.19.1
3 years, 8 months
One approach to indirect call optimization
by Mat Martineau
I noticed this patch on netdev to avoid an indirect call to md5_lookup,
which was accepted. It is mitigating the cost of an existing indirect call
rather than adding a new one, but shows how the maintainers are looking at
the problem.
--
Mat Martineau
Intel OTC
---------- Forwarded message ----------
Date: Mon, 23 Apr 2018 14:46:25
From: Eric Dumazet <edumazet(a)google.com>
To: David S . Miller <davem(a)davemloft.net>
Cc: netdev <netdev(a)vger.kernel.org>, Eric Dumazet <edumazet(a)google.com>,
Eric Dumazet <eric.dumazet(a)gmail.com>
Subject: [PATCH net-next] tcp: md5: only call tp->af_specific->md5_lookup() for
md5 sockets
RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive,
given they have no result.
We can omit the calls for sockets that have no md5 keys.
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
---
net/ipv4/tcp_output.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 383cac0ff0ec059ca7dbc1a6304cc7f8183e008d..95feffb6d53f8a9eadfb15a2fffeec498d6e993a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -585,14 +585,15 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
unsigned int remaining = MAX_TCP_OPTION_SPACE;
struct tcp_fastopen_request *fastopen = tp->fastopen_req;
+ *md5 = NULL;
#ifdef CONFIG_TCP_MD5SIG
- *md5 = tp->af_specific->md5_lookup(sk, sk);
- if (*md5) {
- opts->options |= OPTION_MD5;
- remaining -= TCPOLEN_MD5SIG_ALIGNED;
+ if (unlikely(rcu_access_pointer(tp->md5sig_info))) {
+ *md5 = tp->af_specific->md5_lookup(sk, sk);
+ if (*md5) {
+ opts->options |= OPTION_MD5;
+ remaining -= TCPOLEN_MD5SIG_ALIGNED;
+ }
}
-#else
- *md5 = NULL;
#endif
/* We always get an MSS option. The option bytes which will be seen in
@@ -720,14 +721,15 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
opts->options = 0;
+ *md5 = NULL;
#ifdef CONFIG_TCP_MD5SIG
- *md5 = tp->af_specific->md5_lookup(sk, sk);
- if (unlikely(*md5)) {
- opts->options |= OPTION_MD5;
- size += TCPOLEN_MD5SIG_ALIGNED;
+ if (unlikely(rcu_access_pointer(tp->md5sig_info))) {
+ *md5 = tp->af_specific->md5_lookup(sk, sk);
+ if (*md5) {
+ opts->options |= OPTION_MD5;
+ size += TCPOLEN_MD5SIG_ALIGNED;
+ }
}
-#else
- *md5 = NULL;
#endif
if (likely(tp->rx_opt.tstamp_ok)) {
--
2.17.0.484.g0c8726318c-goog
3 years, 8 months
[Weekly meetings] MoM - 13th of December 2018
by Matthieu Baerts
Hello,
We just had our 31st meeting with Mat, Peter and Ossama (Intel OTC),
Christoph (Apple), Florian (Redhat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
new version for Mat and Peter's patch-set:
- for the review, we can focus on:
- changes linked to the err queue:
- number 16 →
https://lists.01.org/pipermail/mptcp/2018-November/000868.html
- maybe other ways to deal with coalesce / collapse:
- number 13 →
https://lists.01.org/pipermail/mptcp/2018-November/000865.html
- note for number 13 from Christoph: what we want to do is
closed to what kTLS is doing, that could be "merged" in a generic function
- note from Florian: tcp coalesce is maybe not considered as
part of the fast path as it should not normally happen → might be OK to
add a generic way there
- next version will be on top of Florian changes (skb extension:
change the API calls but the rest should be the same)
new patch-set from Christoph:
- DSS checksum: not used in (big) deployments and not really working
in some conditions → not a priority, better to remove it
- thanks to that (not having the checksum), you could give data
quicker to userspace, you don't need the full mapping
- for the review: nothing complex, it is more to agree on what to remove
- new patch-set should come later to remove other stuffs, a bit more
complex.
Gerrit:
- new wiki page available:
https://github.com/multipath-tcp/mptcp_net-next/wiki/Gerrit
- Why Gerrit?
- Setup Gerrit
- Getting the notifications
- Typical use-cases / workflow
- example:
https://review.gerrithub.io/c/multipath-tcp/mptcp_net-next/+/436375/1
- note:
- the workflow is basically the same as before but instead of
using 'git send-email', 'git review' will be used.
- to download a patch-set in a new branch: `git review -d 436386`
Florian:
- had more time to work on his patch-set (skb extension). Tested
with ipset/bridge.
- hopes to send a new version tomorrow
Next meeting:
- We propose to have the next one on Thursday, the 13th of December.
- Usual time: 17:00 UTC (9am PST, 6pm CET)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20181213
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
3 years, 8 months