On 21/04/21 11:20, Oliver Sang wrote:
hi, Valentin Schneider,
On Wed, Apr 14, 2021 at 06:17:38PM +0100, Valentin Schneider wrote:
> On 14/04/21 13:21, kernel test robot wrote:
> > Greeting,
> > FYI, we noticed a -13.8% regression of stress-ng.vm-segv.ops_per_sec due to
> > commit: 38ac256d1c3e6b5155071ed7ba87db50a40a4b58 ("[PATCH v5 1/3]
sched/fair: Ignore percpu threads for imbalance pulls")
> > url:
> > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git
> > in testcase: stress-ng
> > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G
> > with following parameters:
> > nr_threads: 10%
> > disk: 1HDD
> > testtime: 60s
> > fs: ext4
> > class: os
> > test: vm-segv
> > cpufreq_governor: performance
> > ucode: 0x5003006
> That's almost exactly the same result as , which is somewhat annoying
> for me because I wasn't able to reproduce those results back then. Save
> from scrounging the exact same machine to try this out, I'm not sure what's
> the best way forward. I guess I can re-run the workload on whatever
> machines I have and try to spot any potentially problematic pattern in the
what's the machine model you used upon which the regression cannot be reproduced?
we could check if we have similar model then re-check on the our machine.
I tested this on:
o Ampere eMAG (arm64, 32 cores)
o 2-socket Xeon E5-2690 (x86, 40 cores)
and found at worse a -0.3% regression and at best a 2% improvement. I know
that x86 box is somewhat ancient, but it's been my go-to "have I broken
x86?" test victim for a while :-)
BTW, we supplied perf data in original report, not sure if they are
or do you have suggestion which kind of data will be more helpful to you?
we will continuously improve our report based on suggestions from community.
Thanks a lot!
Staring at it some more, I notice a huge uptick in:
- major page faults (+315.2% and +270%)
- cache misses (+125.2% and +131.0%)
I don't really get the page faults; the cache misses I could somewhat
understand: this is adding p->flags and (p->set_child_tid)->flags accesses,
which are in different cachelines than p->se and p->cpus_mask used in
I think I could dig some more into this with perf, but I'd need to be able
to reproduce this locally first...
>> : http://lore.kernel.org/r/20210223023004.GB25487@xsang-OptiPlex-9020