These 2 variables are accessed in 2 hot call stacks (for this 288
Xeon Phi platform):
This might be the key element of "weirdness" for this system. It
has 288 CPUs ... cache alignment problems are often not too bad
on "small" systems. The as you scale up to bigger machines you
suddenly hit some critical point and performance drops dramatically.
It's good that you are picking up tips on how to bisect these and diagnose
the underlying problem. Number of cores is going to keep increasing, so
we will keep finding new issues like this.