Re: [LKP] rcu_read_lock lost its compiler barrier
by Alan Stern
On Sat, 8 Jun 2019, Paul E. McKenney wrote:
> On Thu, Jun 06, 2019 at 10:19:43AM -0400, Alan Stern wrote:
> > On Thu, 6 Jun 2019, Andrea Parri wrote:
> >
> > > This seems a sensible change to me: looking forward to seeing a patch,
> > > on top of -rcu/dev, for further review and testing!
> > >
> > > We could also add (to LKMM) the barrier() for rcu_read_{lock,unlock}()
> > > discussed in this thread (maybe once the RCU code and the informal doc
> > > will have settled in such direction).
> >
> > Yes. Also for SRCU. That point had not escaped me.
>
> And it does seem pretty settled. There are quite a few examples where
> there are normal accesses at either end of the RCU read-side critical
> sections, for example, the one in the requirements diffs below.
>
> For SRCU, srcu_read_lock() and srcu_read_unlock() have implied compiler
> barriers since 2006. ;-)
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
> index 5a9238a2883c..080b39cc1dbb 100644
> --- a/Documentation/RCU/Design/Requirements/Requirements.html
> +++ b/Documentation/RCU/Design/Requirements/Requirements.html
> @@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
> <li> <a href="#Hotplug CPU">Hotplug CPU</a>.
> <li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
> <li> <a href="#Tracing and RCU">Tracing and RCU</a>.
> +<li> <a href="#Accesses to User Mamory and RCU">
------------------------------------^
> +Accesses to User Mamory and RCU</a>.
---------------------^
> <li> <a href="#Energy Efficiency">Energy Efficiency</a>.
> <li> <a href="#Scheduling-Clock Interrupts and RCU">
> Scheduling-Clock Interrupts and RCU</a>.
> @@ -2521,6 +2523,75 @@ cannot be used.
> The tracing folks both located the requirement and provided the
> needed fix, so this surprise requirement was relatively painless.
>
> +<h3><a name="Accesses to User Mamory and RCU">
----------------------------------^
> +Accesses to User Mamory and RCU</a></h3>
---------------------^
Are these issues especially notable for female programmers? :-)
Alan
1 year, 7 months
Re: [LKP] CKI hackfest @Plumbers invite
by Dmitry Vyukov
On Thu, Jun 6, 2019 at 12:00 AM Shuah Khan <shuahkhan(a)gmail.com> wrote:
>
> Hi Veronika,
>
> On Wed, Jun 5, 2019 at 2:47 PM Dan Rue <dan.rue(a)linaro.org> wrote:
> >
> > On Tue, May 21, 2019 at 10:54:12AM -0400, Veronika Kabatova wrote:
> > > Hi,
> > >
> > > as some of you have heard, CKI Project is planning hackfest CI meetings after
> > > Plumbers conference this year (Sept. 12-13). We would like to invite everyone
> > > who has interest in CI for kernel to come and join us.
> > >
> > > The early agenda with summary is at the end of the email. If you think there's
> > > something important missing let us know! Also let us know in case you'd want to
> > > lead any of the sessions, we'd be happy to delegate out some work :)
> > >
> > >
> > > Please send us an email as soon as you decide to come and feel free to invite
> > > other people who should be present. We are not planning to cap the attendance
> > > right now but need to solve the logistics based on the interest. The event is
> > > free to attend, no additional registration except letting us know is needed.
> > >
>
> I am going be there and plan to attend.
>
> > > Feel free to contact us if you have any questions,
> > > Veronika
> > > CKI Project
> >
> > Hi Veronika! Thanks for organizing this. I plan to attend, and I'm happy
> > to help out.
> >
> > With regard to the agenda, I've been following the '[Ksummit-discuss]
> > [MAINTAINERS SUMMIT] Squashing bugs!'[1] thread with interest, as it
> > relates especially to 'Getting results to developers/maintainers'. This,
> > along with result aggregation, are important areas to focus.
> >
> >
> > [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-May/0063...
> >
>
> Good to know there is an overlap and it makes sense for me to attend. :)
Hi Shuah,
Oh, and I did not even know about
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-May/0063...
How can I be kept in the loop/provide inputs/receive
feedback/discussion summary?
Thanks
1 year, 7 months
Re: [LKP] rcu_read_lock lost its compiler barrier
by Alan Stern
On Mon, 3 Jun 2019, Paul E. McKenney wrote:
> On Mon, Jun 03, 2019 at 02:42:00PM +0800, Boqun Feng wrote:
> > On Mon, Jun 03, 2019 at 01:26:26PM +0800, Herbert Xu wrote:
> > > On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote:
> > > >
> > > > 1. These guarantees are of full memory barriers, -not- compiler
> > > > barriers.
> > >
> > > What I'm saying is that wherever they are, they must come with
> > > compiler barriers. I'm not aware of any synchronisation mechanism
> > > in the kernel that gives a memory barrier without a compiler barrier.
> > >
> > > > 2. These rules don't say exactly where these full memory barriers
> > > > go. SRCU is at one extreme, placing those full barriers in
> > > > srcu_read_lock() and srcu_read_unlock(), and !PREEMPT Tree RCU
> > > > at the other, placing these barriers entirely within the callback
> > > > queueing/invocation, grace-period computation, and the scheduler.
> > > > Preemptible Tree RCU is in the middle, with rcu_read_unlock()
> > > > sometimes including a full memory barrier, but other times with
> > > > the full memory barrier being confined as it is with !PREEMPT
> > > > Tree RCU.
> > >
> > > The rules do say that the (full) memory barrier must precede any
> > > RCU read-side that occur after the synchronize_rcu and after the
> > > end of any RCU read-side that occur before the synchronize_rcu.
> > >
> > > All I'm arguing is that wherever that full mb is, as long as it
> > > also carries with it a barrier() (which it must do if it's done
> > > using an existing kernel mb/locking primitive), then we're fine.
> > >
> > > > Interleaving and inserting full memory barriers as per the rules above:
> > > >
> > > > CPU1: WRITE_ONCE(a, 1)
> > > > CPU1: synchronize_rcu
> > > > /* Could put a full memory barrier here, but it wouldn't help. */
> > >
> > > CPU1: smp_mb();
> > > CPU2: smp_mb();
> > >
> > > Let's put them in because I think they are critical. smp_mb() also
> > > carries with it a barrier().
> > >
> > > > CPU2: rcu_read_lock();
> > > > CPU1: b = 2;
> > > > CPU2: if (READ_ONCE(a) == 0)
> > > > CPU2: if (b != 1) /* Weakly ordered CPU moved this up! */
> > > > CPU2: b = 1;
> > > > CPU2: rcu_read_unlock
> > > >
> > > > In fact, CPU2's load from b might be moved up to race with CPU1's store,
> > > > which (I believe) is why the model complains in this case.
> > >
> > > Let's put aside my doubt over how we're even allowing a compiler
> > > to turn
> > >
> > > b = 1
> > >
> > > into
> > >
> > > if (b != 1)
> > > b = 1
Even if you don't think the compiler will ever do this, the C standard
gives compilers the right to invent read accesses if a plain (i.e.,
non-atomic and non-volatile) write is present. The Linux Kernel Memory
Model has to assume that compilers will sometimes do this, even if it
doesn't take the exact form of checking a variable's value before
writing to it.
(Incidentally, regardless of whether the compiler will ever do this, I
have seen examples in the kernel where people did exactly this
manually, in order to avoid dirtying a cache line unnecessarily.)
> > > Since you seem to be assuming that (a == 0) is true in this case
> >
> > I think Paul's example assuming (a == 0) is false, and maybe
>
> Yes, otherwise, P0()'s write to "b" cannot have happened.
>
> > speculative writes (by compilers) needs to added into consideration?
On the other hand, the C standard does not allow compilers to add
speculative writes. The LKMM assumes they will never occur.
> I would instead call it the compiler eliminating needless writes
> by inventing reads -- if the variable already has the correct value,
> no write happens. So no compiler speculation.
>
> However, it is difficult to create a solid defensible example. Yes,
> from LKMM's viewpoint, the weakly reordered invented read from "b"
> can be concurrent with P0()'s write to "b", but in that case the value
> loaded would have to manage to be equal to 1 for anything bad to happen.
> This does feel wrong to me, but again, it is difficult to create a solid
> defensible example.
>
> > Please consider the following case (I add a few smp_mb()s), the case may
> > be a little bit crasy, you have been warned ;-)
> >
> > CPU1: WRITE_ONCE(a, 1)
> > CPU1: synchronize_rcu called
> >
> > CPU1: smp_mb(); /* let assume there is one here */
> >
> > CPU2: rcu_read_lock();
> > CPU2: smp_mb(); /* let assume there is one here */
> >
> > /* "if (b != 1) b = 1" reordered */
> > CPU2: r0 = b; /* if (b != 1) reordered here, r0 == 0 */
> > CPU2: if (r0 != 1) /* true */
> > CPU2: b = 1; /* b == 1 now, this is a speculative write
> > by compiler
> > */
> >
> > CPU1: b = 2; /* b == 2 */
> >
> > CPU2: if (READ_ONCE(a) == 0) /* false */
> > CPU2: ...
> > CPU2 else /* undo the speculative write */
> > CPU2: b = r0; /* b == 0 */
> >
> > CPU2: smp_mb();
> > CPU2: read_read_unlock();
> >
> > I know this is too crasy for us to think a compiler like this, but this
> > might be the reason why the model complain about this.
> >
> > Paul, did I get this right? Or you mean something else?
>
> Mostly there, except that I am not yet desperate enough to appeal to
> compilers speculating stores. ;-)
This example really does point out a weakness in the LKMM's handling of
data races. Herbert's litmus test is a great starting point:
C xu
{}
P0(int *a, int *b)
{
WRITE_ONCE(*a, 1);
synchronize_rcu();
*b = 2;
}
P1(int *a, int *b)
{
rcu_read_lock();
if (READ_ONCE(*a) == 0)
*b = 1;
rcu_read_unlock();
}
exists (~b=2)
Currently the LKMM says the test is allowed and there is a data race,
but this answer clearly is wrong since it would violate the RCU
guarantee.
The problem is that the LKMM currently requires all ordering/visibility
of plain accesses to be mediated by marked accesses. But in this case,
the visibility is mediated by RCU. Technically, we need to add a
relation like
([M] ; po ; rcu-fence ; po ; [M])
into the definitions of ww-vis, wr-vis, and rw-xbstar. Doing so
changes the litmus test's result to "not allowed" and no data race.
However, I'm not certain that this single change is the entire fix;
more thought is needed.
Alan
1 year, 7 months
Re: [LKP] rcu_read_lock lost its compiler barrier
by Alan Stern
On Thu, 6 Jun 2019, Andrea Parri wrote:
> This seems a sensible change to me: looking forward to seeing a patch,
> on top of -rcu/dev, for further review and testing!
>
> We could also add (to LKMM) the barrier() for rcu_read_{lock,unlock}()
> discussed in this thread (maybe once the RCU code and the informal doc
> will have settled in such direction).
Yes. Also for SRCU. That point had not escaped me.
Alan
1 year, 7 months
Re: [LKP] rcu_read_lock lost its compiler barrier
by Herbert Xu
On Thu, Jun 06, 2019 at 03:58:17AM -0700, Paul E. McKenney wrote:
>
> I cannot immediately think of a way that the compiler could get this
> wrong even in theory, but similar code sequences can be messed up.
> The reason for this is that in theory, the compiler could use the
> stored-to location as temporary storage, like this:
>
> a = whatever; // Compiler uses "a" as a temporary
> do_something();
> whatever = a;
> a = 1; // Intended store
Well if the compiler is going to do this then surely it would
continue to do this even if you used WRITE_ONCE. Remember a is
not volatile, only the access of a through WRITE_ONCE is volatile.
Cheers,
--
Email: Herbert Xu <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
1 year, 7 months
CKI hackfest @Plumbers invite
by Veronika Kabatova
Hi,
as some of you have heard, CKI Project is planning hackfest CI meetings after
Plumbers conference this year (Sept. 12-13). We would like to invite everyone
who has interest in CI for kernel to come and join us.
The early agenda with summary is at the end of the email. If you think there's
something important missing let us know! Also let us know in case you'd want to
lead any of the sessions, we'd be happy to delegate out some work :)
Please send us an email as soon as you decide to come and feel free to invite
other people who should be present. We are not planning to cap the attendance
right now but need to solve the logistics based on the interest. The event is
free to attend, no additional registration except letting us know is needed.
Feel free to contact us if you have any questions,
Veronika
CKI Project
-----------------------------------------------------------
Here is an early agenda we put together:
- Introductions
- Common place for upstream results, result publishing in general
- The discussion on the mailing list is going strong so we might be able to
substitute this session for a different one in case everything is solved by
September.
- Test result interpretation and bug detection
- How to autodetect infrastructure failures, regressions/new bugs and test
bugs? How to handle continuous failures due to known bugs in both tests and
kernel? What's your solution? Can people always trust the results they
receive?
- Getting results to developers/maintainers
- Aimed at kernel developers and maintainers, share your feedback and
expectations.
- How much data should be sent in the initial communication vs. a click away
in a dashboard? Do you want incremental emails with new results as they come
in?
- What about adding checks to tested patches in Patchwork when patch series
are being tested?
- Providing enough data/script to reproduce the failure. What if special HW
is needed?
- Onboarding new kernel trees to test
- Aimed at kernel developers and maintainers.
- Which trees are most prone to bring in new problems? Which are the most
critical ones? Do you want them to be tested? Which tests do you feel are
most beneficial for specific trees or in general?
- Security when testing untrusted patches
- How do we merge, compile, and test patches that have untrusted code in them
and have not yet been reviewed? How do we avoid abuse of systems,
information theft, or other damage?
- Check out the original patch that sparked the discussion at
https://patchwork.ozlabs.org/patch/862123/
- Avoiding effort duplication
- Food for thought by GregKH
- X different CI systems running ${TEST} on latest stable kernel on x86_64
might look useless on the first look but is it? AMD/Intel CPUs, different
network cards, different graphic drivers, compilers, kernel configuration...
How do we distribute the workload to avoid doing the same thing all over
again while still running in enough different environments to get the most
coverage?
- Common hardware pools
- Is this something people are interested in? Would be helpful especially for
HW that's hard to access, eg. ppc64le or s390x systems. Companies could also
sing up to share their HW for testing to ensure kernel works with their
products.
1 year, 7 months
Re: [LKP] CKI hackfest @Plumbers invite
by Veronika Kabatova
Added you both to the list :)
----- Original Message -----
> From: "Shuah Khan" <shuahkhan(a)gmail.com>
> To: "Veronika Kabatova" <vkabatov(a)redhat.com>, automated-testing(a)yoctoproject.org, info(a)kernelci.org, "Tim Bird"
> <Tim.Bird(a)sony.com>, khilamn(a)baylibre.org, syzkaller(a)googlegroups.com, lkp(a)lists.01.org, "stable"
> <stable(a)vger.kernel.org>, "Laura Abbott" <labbott(a)redhat.com>, "Eliska Slobodova" <eslobodo(a)redhat.com>, "CKI
> Project" <cki-project(a)redhat.com>
> Sent: Thursday, June 6, 2019 12:00:13 AM
> Subject: Re: CKI hackfest @Plumbers invite
>
> Hi Veronika,
>
> On Wed, Jun 5, 2019 at 2:47 PM Dan Rue <dan.rue(a)linaro.org> wrote:
> >
> > On Tue, May 21, 2019 at 10:54:12AM -0400, Veronika Kabatova wrote:
> > > Hi,
> > >
> > > as some of you have heard, CKI Project is planning hackfest CI meetings
> > > after
> > > Plumbers conference this year (Sept. 12-13). We would like to invite
> > > everyone
> > > who has interest in CI for kernel to come and join us.
> > >
> > > The early agenda with summary is at the end of the email. If you think
> > > there's
> > > something important missing let us know! Also let us know in case you'd
> > > want to
> > > lead any of the sessions, we'd be happy to delegate out some work :)
> > >
> > >
> > > Please send us an email as soon as you decide to come and feel free to
> > > invite
> > > other people who should be present. We are not planning to cap the
> > > attendance
> > > right now but need to solve the logistics based on the interest. The
> > > event is
> > > free to attend, no additional registration except letting us know is
> > > needed.
> > >
>
> I am going be there and plan to attend.
>
> > > Feel free to contact us if you have any questions,
> > > Veronika
> > > CKI Project
> >
> > Hi Veronika! Thanks for organizing this. I plan to attend, and I'm happy
> > to help out.
> >
> > With regard to the agenda, I've been following the '[Ksummit-discuss]
> > [MAINTAINERS SUMMIT] Squashing bugs!'[1] thread with interest, as it
> > relates especially to 'Getting results to developers/maintainers'. This,
> > along with result aggregation, are important areas to focus.
> >
> >
> > [1]
> > https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-May/0063...
> >
>
> Good to know there is an overlap and it makes sense for me to attend. :)
>
I've been pointed to this thread just yesterday (thanks Laura!) and I agree
you bring up interesting topics in there. In fact, the "Getting results out"
topic Dan mentioned has the reproducibility of the failures as one of the
agenda items.
There definitely *is* an overlap in some of the topics and we'd be excited
to have you both there to talk more!
Veronika
> thanks,
> -- Shuah
>
1 year, 7 months
Re: [LKP] rcu_read_lock lost its compiler barrier
by Herbert Xu
On Thu, Jun 06, 2019 at 10:38:56AM +0200, Andrea Parri wrote:
> On Mon, Jun 03, 2019 at 10:46:40AM +0800, Herbert Xu wrote:
>
> > The case we were discussing is from net/ipv4/inet_fragment.c from
> > the net-next tree:
>
> BTW, thank you for keeping me and other people who intervened in that
> discussion in Cc:...
FWIW I didn't drop you from the Cc list. The email discussion was
taken off-list by someone else and I simply kept that Cc list when
I brought it back onto lkml. On a second look I did end up dropping
Eric but I think he's had enough of this discussion :)
--
Email: Herbert Xu <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
1 year, 7 months
Re: [LKP] rcu_read_lock lost its compiler barrier
by Herbert Xu
On Thu, Jun 06, 2019 at 02:06:19AM -0700, Paul E. McKenney wrote:
>
> Or is your point instead that given the initial value of "a" being
> zero and the value stored to "a" being one, there is no way that
> any possible load and store tearing (your slicing and dicing) could
> possibly mess up the test of the value loaded from "a"?
Exactly. If you can dream up of a scenario where the compiler can
get this wrong I'm all ears.
> > But I do concede that in the general RCU case you must have the
> > READ_ONCE/WRITE_ONCE calls for rcu_dereference/rcu_assign_pointer.
>
> OK, good that we are in agreement on this part, at least! ;-)
Well only because we're allowing crazy compilers that can turn
a simple word-aligned word assignment (a = b) into two stores.
Cheers,
--
Email: Herbert Xu <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
1 year, 7 months
Re: [LKP] rcu_read_lock lost its compiler barrier
by Herbert Xu
On Wed, Jun 05, 2019 at 11:05:11PM -0700, Paul E. McKenney wrote:
>
> In case you were wondering, the reason that I was giving you such
> a hard time was that from what I could see, you were pushing for no
> {READ,WRITE}_ONCE() at all. ;-)
Hmm, that's exactly what it should be in net/ipv4/inet_fragment.c.
We don't need the READ_ONCE/WRITE_ONCE (or volatile marking) at
all. Even if the compiler dices and slices the reads/writes of
"a" into a thousand pieces, it should still work if the RCU
primitives are worth their salt.
But I do concede that in the general RCU case you must have the
READ_ONCE/WRITE_ONCE calls for rcu_dereference/rcu_assign_pointer.
Cheers,
--
Email: Herbert Xu <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
1 year, 7 months