[RFC 0/1] Optimize mptcp_received_options
by Peter Krystad
This reduces the size from 104 to 32, at the cost of some loss of
clarity. I deliberately did not change the names of any fields,
reducing the impact on code (and the squasher).
This also seems like a good opportunity to remove all of the pr_debug()
statements from options processing which occupy so much of the log.
Peter Krystad (1):
mptcp: Optimize struct mptcp_received_options.
include/linux/tcp.h | 97 ++++++++++++++++++++++++----------------
net/mptcp/options.c | 102 ++++++++++++++++++++++++-------------------
net/mptcp/pm.c | 2 +-
net/mptcp/protocol.h | 2 +-
4 files changed, 118 insertions(+), 85 deletions(-)
--
2.17.2
2 years, 5 months
[PATCH 0/9] mptcp: support mptcp joins
by Florian Westphal
This set aims to improve handling of multiple subflows.
There are a few remaining problems, see below, but I wanted to send what
I have now rather than keep this back.
The first two patches are new, following few patches are a resend of the
earlier RFC series, I think they're good for acceptance now.
Patch 6 had a few changes since the RFC based on Paolos feedback, please
see patch description for details.
The last two patches are there to allow cancellation/teardown of
not-yet completed join requests. The connecting subflow flows
are placed on a 'join_list' so mptcp_shutdown can close these sockets
as well.
Last patch moves the 'eof hack' to a work queue so we don't set EOF on the
mptcp socket until all subflows have closed. Maybe this isn't needed, I
would be fine with dropping it in favor of mptcp-level DATA_FIN and
reset handling.
The rest of this work is to have msk->conn_list contain tcp_sock structs
without an outer 'struct socket'.
The problem is that for incoming join requests (which are not subject
to accept/request socket queue), there is no 'struct socket', so the kernel
will crash. Even when fixing those up, a NULL sk->sk_socket is problematic
as tcp stack will assume that such tcp sk is already detached from userspace,
i.e. NOSPACE wakeups do not work for them.
The solution here (suggested by Paolo) is to have sk->sk_socket refer
to the mptcp parent socket structure. This assignment is done by
a private function similar to sock_graft().
mptcp.org kernel will be able to establish multiple connections per
mptcp socket but does not consider those extra joins as established, as the
"4th ack" is missing.
A simple "tcp_send_ack" is enough, but there are two problems with this:
1. It can be lost
2. Given we are not expecting to receive any data on such flows this
doesn't seem to be a problem.
During testing I saw that sometimes mptcp.org kernel will emit further
join requests to mptcp-next but those trigger no response (and those
packets do not even show up in perfs network drop monitor, sigh).
I also saw a UAF at mptcp_close where a sock_hold() in tcp_close()
caused a 0->1 recount transition, but i've been unable to reproduce it so far
(might have been caused by a bug that has been fixed in the mean time).
The following changes since commit dd25ff04840135581b833762571aac08a2a94c6c:
sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-30 02:47:59 +0000)
are available in the Git repository at:
git://git.breakpoint.cc/fw/mptcp-next.git mptcp_multijoin_11
for you to fetch changes up to 010c2e19c2d643a87c991bf1e7d4bfc140dfdba5:
don't flag parent socket as RCV_SHUTDOWN if one one subflow has closed (2019-11-30 23:21:25 +0100)
----------------------------------------------------------------
Florian Westphal (9):
token: handle join-before-accept-completion case
mptcp: fix race from mptcp_close/mptcp join
copy connection id from first subflow to mptcp socket
store tcp_sk, not socket
add mptcp_subflow_shutdown
make accept not allocate kernel socket struct
pass subflow socket to mptcp_finish_connect
subflow: place further subflows on new 'join_list'
don't flag parent socket as RCV_SHUTDOWN if one one subflow has closed
net/mptcp/pm.c | 18 +--
net/mptcp/protocol.c | 313 +++++++++++++++++++++++++++++++++++++--------------
net/mptcp/protocol.h | 16 +--
net/mptcp/subflow.c | 56 +++++----
net/mptcp/token.c | 9 +-
5 files changed, 292 insertions(+), 120 deletions(-)
2 years, 5 months
[PATCH v3] mptcp: Basic single-subflow DATA_FIN
by Mat Martineau
Send a DATA_FIN along with any subflow TCP FIN flag when the MPTCP
socket is closing or shutting down writes.
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
---
The DATA_FIN ack patch (would have been a 2nd patch after this) isn't
quite working yet, but I wanted to share the sending patch since there
was a minor conflict after some recent squashing. I also confirmed that
this applies cleanly after the kselftest patch for "part 1", and that
the self tests pass.
Changes from v2:
* Rebased and confirmed selftests when cherry-picked to the end of the "part 1"
(single-subflow) chunk of the patch series.
Changes from v1:
* Fixed problem with the receive side truncating data. The issue happened
when a DSS mapping for DATA_FIN was found on a data segment, where the
data in the packet was already covered by an earlier mapping.
* Only send DATA_FIN when the subflow is sending a FIN and the MPTCP
socket is no longer in the TCP_ESTABLISHED state. This prevents sending
DATA_FIN when an individual subflow is removed but the MPTCP-level
connection is kept alive.
* Changes to warnings and comments suggested in code review.
net/mptcp/options.c | 38 ++++++++++++++++++++++++++++++++++++--
net/mptcp/protocol.c | 4 ++++
net/mptcp/subflow.c | 42 ++++++++++++++++++++++++++++--------------
3 files changed, 68 insertions(+), 16 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 644c30d46922..be180c4f71a4 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -286,19 +286,49 @@ static bool mptcp_established_options_mp(struct sock *sk, unsigned int *size,
return false;
}
+static void mptcp_write_data_fin(struct mptcp_subflow_context *subflow,
+ struct mptcp_ext *ext)
+{
+ ext->data_fin = 1;
+
+ if (!ext->use_map) {
+ /* RFC6824 requires a DSS mapping with specific values
+ * if DATA_FIN is set but no data payload is mapped
+ */
+ ext->use_map = 1;
+ ext->dsn64 = 1;
+ ext->data_seq = mptcp_sk(subflow->conn)->write_seq;
+ ext->subflow_seq = 0;
+ ext->data_len = 1;
+ } else {
+ /* If there's an existing DSS mapping, DATA_FIN consumes
+ * 1 additional byte of mapping space.
+ */
+ ext->data_len++;
+ }
+}
+
static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
unsigned int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
+ struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
unsigned int dss_size = 0;
struct mptcp_ext *mpext;
struct mptcp_sock *msk;
unsigned int ack_size;
+ u8 tcp_fin;
- mpext = skb ? mptcp_get_ext(skb) : NULL;
+ if (skb) {
+ mpext = mptcp_get_ext(skb);
+ tcp_fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
+ } else {
+ mpext = NULL;
+ tcp_fin = 0;
+ }
- if (!skb || (mpext && mpext->use_map)) {
+ if (!skb || (mpext && mpext->use_map) || tcp_fin) {
unsigned int map_size;
map_size = TCPOLEN_MPTCP_DSS_BASE + TCPOLEN_MPTCP_DSS_MAP64;
@@ -307,6 +337,10 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
dss_size = map_size;
if (mpext)
opts->ext_copy = *mpext;
+
+ if (skb && tcp_fin &&
+ subflow->conn->sk_state != TCP_ESTABLISHED)
+ mptcp_write_data_fin(subflow, &opts->ext_copy);
}
ack_size = TCPOLEN_MPTCP_DSS_ACK64;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9b384da910ce..f95a3f824a18 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -919,6 +919,10 @@ static int mptcp_shutdown(struct socket *sock, int how)
pr_debug("sk=%p, how=%d", msk, how);
lock_sock(sock->sk);
+
+ if (how == SHUT_WR || how == SHUT_RDWR)
+ inet_sk_state_store(sock->sk, TCP_FIN_WAIT1);
+
ssock = __mptcp_fallback_get_ref(msk);
if (ssock) {
release_sock(sock->sk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 998fc6066ea5..ee29281d4570 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -277,6 +277,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_ext *mpext;
struct sk_buff *skb;
+ u16 data_len;
u64 map_seq;
skb = skb_peek(&ssk->sk_receive_queue);
@@ -301,25 +302,38 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
if (!subflow->map_valid)
return MAPPING_INVALID;
+
goto validate_seq;
}
- pr_debug("seq=%llu is64=%d ssn=%u data_len=%u", mpext->data_seq,
- mpext->dsn64, mpext->subflow_seq, mpext->data_len);
+ pr_debug("seq=%llu is64=%d ssn=%u data_len=%u data_fin=%d",
+ mpext->data_seq, mpext->dsn64, mpext->subflow_seq,
+ mpext->data_len, mpext->data_fin);
- if (mpext->data_len == 0) {
+ data_len = mpext->data_len;
+ if (data_len == 0) {
pr_err("Infinite mapping not handled");
return MAPPING_INVALID;
- } else if (mpext->subflow_seq == 0 &&
- mpext->data_fin == 1) {
- if (WARN_ON_ONCE(mpext->data_len != 1))
- return false;
+ }
- /* do not try hard to handle this any better, till we have
- * real data_fin support
- */
- pr_debug("DATA_FIN with no payload");
- return MAPPING_DATA_FIN;
+ if (mpext->data_fin == 1) {
+ if (data_len == 1) {
+ pr_debug("DATA_FIN with no payload");
+ if (subflow->map_valid) {
+ /* A DATA_FIN might arrive in a DSS
+ * option before the previous mapping
+ * has been fully consumed. Continue
+ * handling the existing mapping.
+ */
+ skb_ext_del(skb, SKB_EXT_MPTCP);
+ return MAPPING_OK;
+ } else {
+ return MAPPING_DATA_FIN;
+ }
+ }
+
+ /* Adjust for DATA_FIN using 1 byte of sequence space */
+ data_len--;
}
if (!mpext->dsn64) {
@@ -334,7 +348,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
/* Allow replacing only with an identical map */
if (subflow->map_seq == map_seq &&
subflow->map_subflow_seq == mpext->subflow_seq &&
- subflow->map_data_len == mpext->data_len) {
+ subflow->map_data_len == data_len) {
skb_ext_del(skb, SKB_EXT_MPTCP);
return MAPPING_OK;
}
@@ -351,7 +365,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
subflow->map_seq = map_seq;
subflow->map_subflow_seq = mpext->subflow_seq;
- subflow->map_data_len = mpext->data_len;
+ subflow->map_data_len = data_len;
subflow->map_valid = 1;
pr_debug("new map seq=%llu subflow_seq=%u data_len=%u",
subflow->map_seq, subflow->map_subflow_seq,
--
2.24.0
2 years, 5 months
[RFC PATCH 0/3] sendmsg: fix pending issues.
by Paolo Abeni
This series addresses the known pending issue in mptcp_sendmsg,
specifically:
* the crash repored by matt (patch 3/3)
* the possible data corruption on ext allocation failure - the "TODO"" in
sendmsg_frag (patches 1,2/3)
uunfortunately none of the above is actually tested, since I can't
reproduce the issues - still the problems look real.
This is not squashed yet, as it will conflict with most patches, I'm just
trying to collect feedback.
Open points:
The patch 2/3 introduces a per socket cache of a single skb_ext. As per
recent upstream changes related to TCP rx/tx skb cache, that is debatable,
could have subtle performance and memory usage implications.
We could avoid such cache allways allocating the extension for each
sendmsg_frag() call, and than free it if unused. Both options looks
sub-optiomal to me do you have any preference? - I do have some for the
selected impl ;)
The new API exposed by patch 1/3 is open to misusage. e.g. the caller could
allocate a skb_ext, set arbitrary ext->offset[id] values and than call
__skb_ext_add(). That will lead to memory leak or GPF. I did no try to
address the issue, the '__' prefix should guide the users to extra care.
Can we do any better?
(this is a pre-existing topic) the sk_stream_wait_memory() calls in
sk_stream_wait_memory(), waits on the subflow. Should we wait on the mptcp
socket instead? Otherwise should we release the mptcp socket lock, before such
call?
Paolo Abeni (3):
skb: add helpers to allocate ext independently from sk_buff
mptcp: avoid data stream corruption on allocation failure.
mptcp: properly detect skb collapsing
include/linux/skbuff.h | 3 ++
net/core/skbuff.c | 16 ++++++++-
net/mptcp/protocol.c | 79 ++++++++++++++++++++++++++----------------
net/mptcp/protocol.h | 1 +
4 files changed, 68 insertions(+), 31 deletions(-)
--
2.21.0
2 years, 5 months
Re: MPTCP implementation feedback for RFC6824bis
by Christoph Paasch
Hello Alan,
> On Nov 28, 2019, at 8:16 AM, Alan Ford <alan.ford(a)gmail.com> wrote:
>
> Hi Christoph,
>
> Thank you for the feedback. Comments inline...
>
>> On 27 Nov 2019, at 19:29, Christoph Paasch <cpaasch(a)apple.com <mailto:cpaasch@apple.com>> wrote:
>>
>> Hello,
>>
>> as I mentioned during the meeting last week in Singapore, the team working on upstreaming MPTCP into the Linux Kernel has some feedback on RFC6824bis that would be good to be factored into the RFC before publication.
>>
>> Here is the list of comments/changes the team is suggesting:
>>
>> Section 3.1, page 19, "If B has data to send first,..."
>>
>> The first phrase states that when A receives a DSS from B it indicates successful delivery of the MP_CAPABLE. However, it is not the DSS but rather the DATA_ACK that indicates this. Digging through some past e-mail exchanges I had with Alan I see that that was indeed the intention.
>>
>> We are thus suggesting to change this text to:
>> If B has data to send first, then the reliable delivery of the ACK+MP_CAPABLE can be inferred by the receipt of this data with a MPTCP Data Sequence Signal (DSS) option that includes a DATA_ACK (Section 3.3). Further, whenever A receives a DATA_ACK from B it is a signal of the reliable delivery of A's MP_CAPABLE. After this, A can send data to B with a regular DSS option.
>
> That seems entirely logical clarification, and was the intention anyway. A developer probably infers the meaning but clarification is no bad thing.
Sounds good! It would be good to clarify this.
>
>> Section 3.3.1, page 32 & 33, "A data sequence mapping does not need..."
>>
>> This paragraph states that it is permissive to send a mapping in advance. Late-mapping is specified a bit higher around the sentence
>> "Implementations MAY hold onto such unmapped data for a short while in the expectation that a mapping will arrive shortly"
>>
>> This kind of early/late mapping announcements are difficult to handle in an implementation. The Linux Kernel implementation of multipath-tcp.org <http://multipath-tcp.org/> has always disallowed such kind of mappings. Meaning, whenever a DSS-option is received such that the range specified by the relative subflow-sequence number in the DSS-option and the data-length does not (partially) cover the TCP sequence number of the packet itself, the subflow will be killed with a TCP-RST. The problem around handling such early/late mappings is that it is unclear for how long the stack needs to remember these mappings (in the early-mapping case), or for how long he needs to hold on to the data (in the late-mapping case).
>>
>> We thus suggest to change this to the following:
>> Whenever a DSS-option is received on a packet such that the mapping of the subflow-sequence space does not partially cover the TCP-sequence number of the packet itself, the host MUST discard this mapping and MAY destroy the subflow with a TCP-RST. It should be noted that a DATA_FIN that does not come with data has a relative subflow-sequence number of 0 and thus should be handled separately.
>
> This one I do have an issue with:
>
> - It is a technical change
> - Wording to this effect has been in the document since pretty much the beginning
> - It is a MAY which might as well say “there is no guarantee this would work”
The problem with the MAY is that the sender can't really know if the receiver accepts it (more regarding this below)
> Most importantly, the replacement text seems not to address this issue at all. If I read it correctly, it says that the data sequence mapping option MUST partially cover the subflow sequence space of the packet itself. But that has nothing to do with late or early mapping, both could partially cover the subflow sequence space and preceding data.
>
> Can you clarify exactly what you want to permit and prevent, here?
Let me try to clarify what exactly we mean with early/late mapping so that we are all on the same terms here:
Early mapping:
A TCP-segment with sequence-number 1 holds a DSS-option with subflow-sequence number 1001 and data-length 100. This means we need to allocate space to store this DSS-option so that when the TCP-segment with seqno 1001 arrives we can know the mapping. There may be coming more of these DSS-options which all need to be stored in allocated memory. It is unclear what the limit to this is and there is no way to communicate this limit to the sender.
Late mapping:
The receiver receives data without DSS-options with TCP-sequence 1 to 1001. The corresponding DSS-option however arrives with the TCP-segment with seqno 2001. Here, the receiver needs to hold on to this data, waiting for the TCP-segment with the DSS-option. At one point the receiver needs to drop the data due to memory limits. Again, the sender has no way for knowing what this limit is.
When the DSS-option comes together with the corresponding TCP-sequence it is straight-forward to store it together with the data. There are no issues with memory-allocation,... as all of this is accounted together with the announced window (yes, the memory is not counted against the window, but the receiver can foresee the DSS-option overhead when computing what window he should announce).
When a receiver gets data without a DSS-option, he can store it for up to 64KB of data as that is the maximum data-level length and the last segment of the 64KB-train could be holding the DSS-option. After that he has to drop the data.
When the mapping partially covers the segment it also isn't a problem as the unmapped part can safely be dropped and the mapped part can be passed on to the MPTCP-layer.
All of this does not imply that every segment of a mapping needs to hold the DSS-option. Just one of them needs to have it.
Does this help? It is all about making things more deterministic as it is very hard for the receiver to handle these late/early mappings.
Thanks,
Christoph
2 years, 5 months
MPTCP implementation feedback for RFC6824bis
by Christoph Paasch
Hello,
as I mentioned during the meeting last week in Singapore, the team working on upstreaming MPTCP into the Linux Kernel has some feedback on RFC6824bis that would be good to be factored into the RFC before publication.
Here is the list of comments/changes the team is suggesting:
Section 3.1, page 19, "If B has data to send first,..."
The first phrase states that when A receives a DSS from B it indicates successful delivery of the MP_CAPABLE. However, it is not the DSS but rather the DATA_ACK that indicates this. Digging through some past e-mail exchanges I had with Alan I see that that was indeed the intention.
We are thus suggesting to change this text to:
If B has data to send first, then the reliable delivery of the ACK+MP_CAPABLE can be inferred by the receipt of this data with a MPTCP Data Sequence Signal (DSS) option that includes a DATA_ACK (Section 3.3). Further, whenever A receives a DATA_ACK from B it is a signal of the reliable delivery of A's MP_CAPABLE. After this, A can send data to B with a regular DSS option.
Section 3.3.1, page 32 & 33, "A data sequence mapping does not need..."
This paragraph states that it is permissive to send a mapping in advance. Late-mapping is specified a bit higher around the sentence
"Implementations MAY hold onto such unmapped data for a short while in the expectation that a mapping will arrive shortly"
This kind of early/late mapping announcements are difficult to handle in an implementation. The Linux Kernel implementation of multipath-tcp.org <http://multipath-tcp.org/> has always disallowed such kind of mappings. Meaning, whenever a DSS-option is received such that the range specified by the relative subflow-sequence number in the DSS-option and the data-length does not (partially) cover the TCP sequence number of the packet itself, the subflow will be killed with a TCP-RST. The problem around handling such early/late mappings is that it is unclear for how long the stack needs to remember these mappings (in the early-mapping case), or for how long he needs to hold on to the data (in the late-mapping case).
We thus suggest to change this to the following:
Whenever a DSS-option is received on a packet such that the mapping of the subflow-sequence space does not partially cover the TCP-sequence number of the packet itself, the host MUST discard this mapping and MAY destroy the subflow with a TCP-RST. It should be noted that a DATA_FIN that does not come with data has a relative subflow-sequence number of 0 and thus should be handled separately.
Would it be possible to make these adjustments to the RFC?
Regards,
Christoph (on behalf of the MPTCP upstreaming community)
2 years, 5 months
[PATCH] selftests: remove pcap captures with 'make clean'
by Davide Caratti
use EXTRA_CLEAN variable to ensure that pcap captures are removed with
'make clean'
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
---
Notes:
squash-to: "mptcp: add basic kselftest for mptcp"
tools/testing/selftests/net/mptcp/Makefile | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index 0fc5d45055ee..93de52016dde 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -8,4 +8,6 @@ TEST_PROGS := mptcp_connect.sh
TEST_GEN_FILES = mptcp_connect
+EXTRA_CLEAN := *.pcap
+
include ../../lib.mk
--
2.23.0
2 years, 5 months
[PATCH] mptcp: fix option length of mp_capable syn/ack
by Davide Caratti
in current MPTCP, the server sends the client's key back in the syn/ack
packet. Because of that, both tcpdump and packetdrill refuse to parse
the mp_capable option in the syn/ack. Both RFC6824 and RFC6824bis don't
require this in section 3.1, only the server's key should be sent in
the syn/ack.
- change TCPOLEN_MPTCP_MPC_SYNACK from 20 (4+8+8) to 12 (4+8)
- don't write the client's key in mp_capable syn/ack
3-way handshakes obtained with
# ./mptcp_connect.sh -4 -c -e tx
[...]
# tcpdump -c3 -tnnr ns1-5de04ec0-TGs4tf-ns1-5de04ec0-TGs4tf-MPTCP-MPTCP-10.0.1.1.pcap tcp
unpatched kernel:
IP 10.0.1.1.49752 > 10.0.1.1.10000: Flags [S], seq 2594718977, win 65535, options [mss 65495,sackOK,TS val 3944347269 ecr 0,nop,wscale 8,mptcp capable {0xf931b304deb39a42}], length 0
IP 10.0.1.1.10000 > 10.0.1.1.49752: Flags [S.], seq 3996234497, ack 2594718978, win 65535, options [mss 65495,sackOK,TS val 3944347270 ecr 3944347269,nop,wscale 8,mptcp capable[bad opt]>
^^^ bad option
IP 10.0.1.1.49752 > 10.0.1.1.10000: Flags [.], ack 1, win 256, options [nop,nop,TS val 3944347270 ecr 3944347270,mptcp capable {0xf931b304deb39a42,0x96614f185e633ce}], length 0
patched kernel:
IP 10.0.1.1.50850 > 10.0.1.1.10000: Flags [S], seq 626898452, win 65495, options [mss 65495,sackOK,TS val 2266878231 ecr 0,nop,wscale 7,mptcp capable {0x7fe37bfd872b9f9f}], length 0
IP 10.0.1.1.10000 > 10.0.1.1.50850: Flags [S.], seq 3966041155, ack 626898453, win 65483, options [mss 65495,sackOK,TS val 2266878232 ecr 2266878231,nop,wscale 7,mptcp capable {0x8a167e3af39b5fc1}], length 0
^^^ no more bad option
IP 10.0.1.1.50850 > 10.0.1.1.10000: Flags [.], ack 1, win 512, options [nop,nop,TS val 2266878232 ecr 2266878232,mptcp capable {0x7fe37bfd872b9f9f,0x8a167e3af39b5fc1}], length 0
v2: more conservative behavior when parsing received options
Fixes: 8fa520f034ed ("mptcp: Handle MPTCP TCP options")
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
---
net/mptcp/options.c | 5 ++---
net/mptcp/protocol.h | 2 +-
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 98eb0281d196..cee5c3968741 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -57,7 +57,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
mp_opt->sndr_key = get_unaligned_be64(ptr);
ptr += 8;
- if (opsize == TCPOLEN_MPTCP_MPC_SYNACK) {
+ if (opsize == TCPOLEN_MPTCP_MPC_ACK) {
mp_opt->rcvr_key = get_unaligned_be64(ptr);
ptr += 8;
pr_debug("MP_CAPABLE flags=%x, sndr=%llu, rcvr=%llu",
@@ -650,8 +650,7 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts)
MPTCP_CAP_HMAC_SHA1);
put_unaligned_be64(opts->sndr_key, ptr);
ptr += 2;
- if ((OPTION_MPTCP_MPC_SYNACK |
- OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
+ if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
put_unaligned_be64(opts->rcvr_key, ptr);
ptr += 2;
}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index acad61c70de9..10347c9b4dff 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -34,7 +34,7 @@
/* MPTCP suboption lengths */
#define TCPOLEN_MPTCP_MPC_SYN 12
-#define TCPOLEN_MPTCP_MPC_SYNACK 20
+#define TCPOLEN_MPTCP_MPC_SYNACK 12
#define TCPOLEN_MPTCP_MPC_ACK 20
#define TCPOLEN_MPTCP_MPJ_SYN 12
#define TCPOLEN_MPTCP_MPJ_SYNACK 16
--
2.23.0
2 years, 5 months
selftests: no re-ordering makes tests very slow
by Matthieu Baerts
Hello,
When trying to reproduce the crash I had, I saw that most of the time
when there is no re-ordering set (which can happen even with the default
options) but with loss and delay, the tests can be very slow with the ns4:
ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP (duration 157ms) [ OK ]
ns1 MPTCP -> ns4 (10.0.3.1:10014 ) MPTCP (duration 33023ms) [ OK ]
ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP (duration 36485ms) [ OK ]
ns2 MPTCP -> ns1 (10.0.1.1:10016 ) MPTCP (duration 118ms) [ OK ]
(...)
ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP (duration 126ms) [ OK ]
ns2 MPTCP -> ns4 (10.0.3.1:10022 ) MPTCP (duration 15036ms) [ OK ]
ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP (duration 11507ms) [ OK ]
ns3 MPTCP -> ns1 (10.0.1.1:10024 ) MPTCP (duration 129ms) [ OK ]
(...)
ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP (duration 139ms) [ OK ]
ns3 MPTCP -> ns4 (10.0.3.1:10030 ) MPTCP (duration 4317ms) [ OK ]
ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP (duration 4321ms) [ OK ]
ns4 MPTCP -> ns1 (10.0.1.1:10032 ) MPTCP (duration 48538ms) [ OK ]
ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP (duration 39943ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.1.2:10034 ) MPTCP (duration 24599ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP (duration 23126ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.2.1:10036 ) MPTCP (duration 64512ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP (duration 37747ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.2.2:10038 ) MPTCP (duration 4766ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP (duration 4772ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10040 ) MPTCP (duration 4699ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP (duration 4324ms) [ OK ]
Time: 425 seconds
This can be easily reproduced by using:
./mptcp_connect.sh -r 0
When there is re-ordering, the tests are not ~95% of the time that slow.
I guess it depends how much re-ordering we have.
Did you also see this in your tests?
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
2 years, 5 months
[PATCH 0/2] Implement MPTCPv1 Handshake
by Christoph Paasch
Implement the MP_CAPABLE exchange from RFC6824bis, allowing for MPTCP to
work reliably with stateless servers (aka., those using TCP
SYN-cookies). The trick is to reliably transmit the keys from the client
to the server by combining them with data. As the data will be
retransmitted, the keys will reach the server as well.
Christoph Paasch (2):
mptcp: parse and emit MP_CAPABLE option according to v1 spec.
mptcp: process MP_CAPABLE data option.
include/linux/tcp.h | 3 +-
include/net/mptcp.h | 11 +-
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_output.c | 2 +-
net/mptcp/options.c | 246 ++++++++++++++++++++++++++++++++----------
net/mptcp/protocol.c | 24 +++--
net/mptcp/protocol.h | 14 ++-
net/mptcp/subflow.c | 42 +++++++-
8 files changed, 261 insertions(+), 83 deletions(-)
--
2.23.0
2 years, 5 months