On Mon, Aug 31, 2020 at 10:16:38AM +0800, Feng Tang wrote:
> So why don't you define both variables with
> check if all your bad measurements go away this way?
For 'arch_freq_scale', there are other percpu variables in the same
smpboot.c: 'arch_prev_aperf' and 'arch_prev_mperf', and in hot path
arch_scale_freq_tick(), these 3 variables are all accessed, so I didn't
touch it. Or maybe we can align the first of these 3 variables, so
that they sit in one cacheline.
> You'd also need to check whether there's no detrimental effect from
> this change on other, i.e., !KNL platforms, and I think there won't
> be because both variables will be in separate cachelines then and all
> should be good.
Yes, these kind of changes should be verified on other platforms.
One thing still puzzles me, that the 2 variables are per-cpu things, and
there is no case of many CPU contending, why the cacheline layout matters?
I doubt it is due to the contention of the same cache set, and am trying
to find some way to test it.
Because if you have two structures that are per-cpu and not cache-aligned
then a write in one can bounce the cache line in another due to
cache coherency protocol. It's generally called "false cache line
has basic examples
(lets not get into whether wikipedia is a valid citation source, there
are books on the topic if someone really cared).
While it's in my imagination, this should happen with the page allocator
pcpu structures because the core structure is 1.5 cache lines on 64-bit
currently and not aligned. That means that not only can two CPUs interfere
with each others lists and counters but that could happen cross-node.
The hypothesis can be tested with perf looking for abnormal cache
misses. In this case, an intense allocating process bound to one CPU
with intermittent allocations on the adjacent CPU should show unexpected
cache line bounces. It would not be perfect as collisions would happen
anyway when the pcpu lists spill over on either the alloc or free side
to the the buddy lists but in that case, the cache misses would happen
on different instructions.