On 29/11/17 - 12:50:24, Christoph Paasch wrote:
On 29/11/17 - 12:19:32, Mat Martineau wrote:
> On Wed, 29 Nov 2017, Christoph Paasch wrote:
>
> > On 28/11/17 - 17:58:52, Mat Martineau wrote:
> > >
> > > On Tue, 28 Nov 2017, Mat Martineau wrote:
> > >
> > > >
> > > > On Mon, 27 Nov 2017, Christoph Paasch wrote:
> > > >
> > > > > Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> > > > > ---
> > > > > net/ipv4/tcp_input.c | 8 +++++---
> > > > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > index 4e84f299c96f..7acc1f46641c 100644
> > > > > --- a/net/ipv4/tcp_input.c
> > > > > +++ b/net/ipv4/tcp_input.c
> > > > > @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net
*net,
> > > > > break;
> > > > >
> > > > > default:
> > > > > - tcp_extra_options_parse(opcode, opsize, ptr,
> > > > > - skb, opt_rx,
> > > > > - tcp_to_sk(tp));
> > > > > + if
(static_branch_unlikely(&tcp_extra_options_enabled))
> > > >
> > > > Since this case in the switch statement is only executed for
> > > > unrecognized options, I had skipped the static branch in the initial
> > > > implementation. It is more consistent to use the static branch, and
> > > > there isn't much use in making the function call just to try to
iterate
> > > > over an empty list.
> > > >
> > > > > + tcp_extra_options_parse(opcode, opsize,
> > > > > + ptr, skb,
> > > > > + opt_rx,
> > > > > + tcp_to_sk(tp));
> > > > > break;
> > > > >
> > > > > }
> > > > > --
> > > > > 2.15.0
> > > > >
> > > > >
> > >
> > > I sent my reply and then realized I had one more point to make. We might
> > > need to take another look at the use of static branching by the extra
> > > options framework. It's one thing to inc/dec tcp_extra_options_enabled
when
> > > modules are added or removed, but we don't want to be rapidly
changing
> > > static branches as MPTCP sockets come and go. Does
tcp_extra_options_enabled
> > > still make sense for per-socket extra_option lists?
> >
> > You think that the test for whether or not the tcp_option_list is emtpy is
> > sufficient?
>
> It might be if we wrap it in a "unlikely()".
>
> >
> > I don't know about the cost of static_branch_inc/dec. Is it that bad?
> >
>
> Yeah, it's that bad. The kernel is patching machine code every time the key
> is enabled or disabled (rewriting no-ops with jump instructions and
> vice-versa).
>
> """
> * Since this relies on modifying code, the branch modifying functions
> * must be considered absolute slow paths (machine wide synchronization etc.).
> """
>
>
http://elixir.free-electrons.com/linux/v4.15-rc1/source/include/linux/jum...
>
> Not just a slow path, but an *absolute* slow path!
>
> static_key_deferred and static_key_slow_dec_deferred() are available in
> jump_label_ratelimit.h, which is one way to keep it from thrashing too much.
> This rate-limited static key is only used in one file
> (arch/x86/kvm/lapic.c). That subsystem limits jump label patching to once
> per second.
I see.
>
> >
> > One thing is that having the static branches allows for sure to avoid any
> > cost in the hot data-path. Which is definitely something netdev will want.
> >
> >
> > If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first
step ;))
>
> It would be more straightforward if only MPTCP was paying the cost, but in
> this case the hit is systemwide if any extra TCP options are in use. The
> combination of a rate-limiting static_key and checking the option list
> before the call could be fast enough.
If we have rate-limited static keys, do we still need to check the
option-list before the call?
Also, what if we rather inc/dec the static key more rarely. E.g., not for
every new connection but rather for the very first one and then dec again at
the very last one.
I'm trying to cook a patch...
This is what I have in mind:
-----
commit ba22182b0b3100539e545313e0eb12a9c466c1fe
Author: Christoph Paasch <cpaasch(a)apple.com>
Date: Wed Nov 29 13:11:44 2017 -0800
tcp_extra_options: Reduce static_branch_inc/dec through a refcnt
Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6db92bbbf3e2..344a0d859b37 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2008,6 +2008,7 @@ struct tcp_extra_option_ops {
struct tcp_extra_option_store *(*move)(struct tcp_extra_option_store *from);
void (*destroy)(struct tcp_extra_option_store *store);
struct module *owner;
+ atomic_t *refcnt;
};
/* Every extra-option in a TCP-socket gets stored in a _store structure which
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2dcb12c330eb..3067e7b5ad00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3570,7 +3570,8 @@ int tcp_register_extra_option(struct tcp_extra_option_store *store,
list_add_tail(&store->list, add_before);
pr_debug("Option kind %u registered\n", store->ops->option_kind);
- static_branch_inc(&tcp_extra_options_enabled);
+ if (atomic_inc_return(store->ops->refcnt) == 1)
+ static_branch_inc(&tcp_extra_options_enabled);
return ret;
}
@@ -3602,7 +3603,8 @@ void tcp_extra_options_copy(struct sock *listener,
list_add_tail(&new->list, to);
- static_branch_inc(&tcp_extra_options_enabled);
+ if (atomic_inc_return(new->ops->refcnt) == 1)
+ static_branch_inc(&tcp_extra_options_enabled);
}
}
@@ -3632,11 +3634,16 @@ void tcp_extra_options_destroy(struct sock *sk)
list_for_each_entry_safe(entry, tmp, lhead, list) {
struct module *owner = entry->ops->owner;
+ bool dec_static = false;
list_del(&entry->list);
+ if (atomic_dec_return(entry->ops->refcnt) == 0)
+ dec_static = true;
+
entry->ops->destroy(entry);
- static_branch_dec(&tcp_extra_options_enabled);
+ if (dec_static)
+ static_branch_dec(&tcp_extra_options_enabled);
module_put(owner);
}