Greeting,
FYI, we noticed a -16% regression of vm-scalability.throughput due to commit:
commit: 2aa6d036b716c9242222e054d4ef34905ad45fd3 ("mm: numa: avoid waiting on freed
migrated pages")
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
in testcase: vm-scalability
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
with following parameters:
runtime: 300s
size: 512G
test: anon-wx-rand-mt
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of
the mm/ of the Linux kernel which are of interest to us.
test-url:
https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run:
vm-scalability/300s-512G-anon-wx-rand-mt-performance/lkp-bdw-ep2
08cb8e5f83fd2d4f 2aa6d036b716c9242222e054d4
---------------- --------------------------
%stddev change %stddev
\ | \
0.04 ± 12% 218% 0.12 ± 21% vm-scalability.median_stddev
0.04 ± 8% 206% 0.11 ± 20% vm-scalability.stddev
4561044 -16% 3832033 vm-scalability.throughput
51928 -17% 43276 vm-scalability.median
26584 54% 41052
vm-scalability.time.voluntary_context_switches
152761 28% 194891 vm-scalability.time.minor_page_faults
274 14% 313 vm-scalability.time.elapsed_time
274 14% 313 vm-scalability.time.elapsed_time.max
22749 9% 24858 vm-scalability.time.user_time
8295 -4% 7926
vm-scalability.time.percent_of_cpu_this_job_got
24.84 -34% 16.36 ± 3% vm-scalability.time.system_time
10072 ± 78% -8e+03 1637 ±113%
latency_stats.sum.futex_wait_queue_me.futex_wait.do_futex.SyS_futex.entry_SYSCALL_64_fastpath
97849 96905 vmstat.system.in
22591267 ± 5% 76% 39830428 ± 5% perf-stat.iTLB-loads
0.00 60% 0.00 ± 6% perf-stat.dTLB-load-miss-rate%
82808556 45% 1.2e+08 ± 4% perf-stat.dTLB-load-misses
0.02 38% 0.02 ± 3% perf-stat.branch-miss-rate%
39251 35% 52915 perf-stat.cpu-migrations
4.301e+08 26% 5.405e+08 ± 3% perf-stat.branch-misses
738430 17% 866092 perf-stat.page-faults
738428 17% 866089 perf-stat.minor-faults
6.329e+13 9% 6.93e+13 perf-stat.cpu-cycles
88.72 87.58 perf-stat.cache-miss-rate%
53.02 -8% 49.04 perf-stat.node-store-miss-rate%
1.415e+11 -8% 1.3e+11 ± 3% perf-stat.cache-references
1.377e+12 -9% 1.256e+12 ± 3% perf-stat.dTLB-stores
3.416e+12 -9% 3.108e+12 ± 3% perf-stat.dTLB-loads
1.243e+13 -9% 1.131e+13 ± 3% perf-stat.instructions
2.863e+12 -9% 2.604e+12 ± 3% perf-stat.branch-instructions
1.255e+11 -9% 1.138e+11 ± 3% perf-stat.cache-misses
62.61 ± 4% -15% 53.06 ± 3% perf-stat.iTLB-load-miss-rate%
6.642e+10 -16% 5.573e+10 ± 4% perf-stat.node-store-misses
0.20 -17% 0.16 perf-stat.ipc
329433 ± 8% -23% 252456 ± 9% perf-stat.instructions-per-iTLB-miss
vm-scalability.throughput
4.7e+06 ++----------------------------------------------------------------+
4.6e+06 *+. *..*...*.. .*.. .*... |
| *... .. *.. ..*.. *... .*..*...*. *. *..*
4.5e+06 ++ * *..*. .. *. |
4.4e+06 ++ * |
4.3e+06 ++ |
4.2e+06 ++ |
| |
4.1e+06 ++ |
4e+06 ++ O |
3.9e+06 ++ O O O O O O |
3.8e+06 O+ O O |
| O O O |
3.7e+06 ++ O O |
3.6e+06 ++----------------------------------------------------------------+
vm-scalability.median
54000 ++------------------------------------------------------------------+
*.. .*.. .*.. |
52000 ++ *... .*...*. *... *... ..*.. .*...*. . .*
| *. *..*...*.. .. *..*. *. *. |
50000 ++ * |
| |
48000 ++ |
| |
46000 ++ |
| O O O |
44000 ++ O O O O O |
O O |
42000 ++ O O O |
| O O |
40000 ++------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong