Greeting,
FYI, we noticed a -31.4% regression of vm-scalability.throughput due to commit:
commit: 5246f0a7503891a7f6e6dc9e9a4e6c415eadb761 ("mm: page_alloc: High-order per-cpu
page allocator v6")
https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git
mm-pagealloc-highorder-percpu-v6r1
in testcase: vm-scalability
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:
runtime: 300s
test: lru-file-readonce
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of
the mm/ of the Linux kernel which are of interest to us.
test-url:
https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
In addition to that, the commit also has significant impact on the following tests:
+------------------+-----------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps -5.5% regression |
| test machine | 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | ip=ipv4 |
| | nr_threads=200% |
| | runtime=900s |
| | test=TCP_STREAM |
+------------------+-----------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run:
vm-scalability/300s-lru-file-readonce-performance/ivb43
ab947c6adc4a9cba 5246f0a7503891a7f6e6dc9e9a
---------------- --------------------------
0.49 ± 22% 1337% 7.09 ± 42% vm-scalability.stddev
15356569 -31% 10527439 ± 11% vm-scalability.throughput
12746 6% 13559 vm-scalability.time.system_time
297 6% 315 vm-scalability.time.elapsed_time
297 6% 315 vm-scalability.time.elapsed_time.max
410931 -27% 300312 ± 16%
vm-scalability.time.involuntary_context_switches
121.50 -32% 82.72 ± 12% vm-scalability.time.user_time
156053 7% 167404 ± 3% interrupts.CAL:Function_call_interrupts
93.64 94.92 turbostat.%Busy
2794 2828 turbostat.Avg_MHz
16.16 -11% 14.39 ± 4% turbostat.RAMWatt
28477 ± 20% -3e+04 55 ± 31%
latency_stats.max.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.clear_user.load_elf_binary.search_binary_handler.do_execveat_common
6897 ± 49% 5e+04 52835 ± 21%
latency_stats.sum.sigsuspend.SyS_rt_sigsuspend.entry_SYSCALL_64_fastpath
33272 ± 15% -3e+04 201 ± 16%
latency_stats.sum.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.clear_user.load_elf_binary.search_binary_handler.do_execveat_common
11.82 153% 29.92 ± 10% perf-stat.node-store-miss-rate%
11781 146% 28924 ± 24% perf-stat.instructions-per-iTLB-miss
8.015e+09 137% 1.902e+10 ± 5% perf-stat.node-store-misses
2.353e+09 47% 3.447e+09 ± 6% perf-stat.node-load-misses
30.49 19% 36.18 ± 4% perf-stat.node-load-miss-rate%
5.363e+09 13% 6.071e+09 perf-stat.node-loads
1.995e+12 13% 2.249e+12 perf-stat.branch-instructions
25313 9% 27481 ± 3% perf-stat.cpu-migrations
3.971e+13 7% 4.256e+13 perf-stat.cpu-cycles
581102 6% 614685 perf-stat.page-faults
581073 6% 614651 perf-stat.minor-faults
1.002e+13 5% 1.056e+13 perf-stat.instructions
2.87e+12 2.912e+12 perf-stat.dTLB-loads
0.25 0.25 perf-stat.ipc
97.16 -4% 93.66 perf-stat.iTLB-load-miss-rate%
7.415e+10 -22% 5.761e+10 ± 9% perf-stat.cache-misses
1.354e+11 -22% 1.051e+11 ± 8% perf-stat.cache-references
0.18 ± 5% -24% 0.14 ± 4% perf-stat.dTLB-store-miss-rate%
1.805e+12 -24% 1.368e+12 ± 9% perf-stat.dTLB-stores
5.978e+10 -25% 4.488e+10 ± 9% perf-stat.node-stores
5.41e+09 -28% 3.919e+09 ± 6% perf-stat.branch-misses
0.27 -36% 0.17 ± 8% perf-stat.branch-miss-rate%
2.324e+10 ± 9% -38% 1.442e+10 ± 8% perf-stat.dTLB-load-misses
0.80 ± 9% -39% 0.49 ± 8% perf-stat.dTLB-load-miss-rate%
3.262e+09 ± 5% -42% 1.893e+09 ± 13% perf-stat.dTLB-store-misses
8.506e+08 -55% 3.836e+08 ± 19% perf-stat.iTLB-load-misses
perf-stat.cpu-cycles
4.5e+13 ++----------------------------------------------------------------+
O O O O O.O.O.O..O.O O O.O.O.O.O.O.O.O |
4e+13 *+*.*.*.* * : : * *.*.*.*.*.*..*.*.*.*.*.*.*.*.*
3.5e+13 ++ : : |
| : : |
3e+13 ++ : : |
2.5e+13 ++ : : |
| : : |
2e+13 ++ : : |
1.5e+13 ++ : : |
| : : |
1e+13 ++ : |
5e+12 ++ : |
| : |
0 ++-------------------*--------------------------------------------+
perf-stat.cache-references
1.4e+11 *+*-*-*-*-*-*-*--*-*---*-*-*-*-*-*-*-*-*-*-*-*-*--*-*-*-*-*-----*-*
| : : *.* |
1.2e+11 ++ : : |
| O O O O O O O O O O O |
1e+11 O+O O O O : : |
| O : : O O |
8e+10 ++ : : |
| : : |
6e+10 ++ : : |
| : : |
4e+10 ++ : : |
| : |
2e+10 ++ : |
| : |
0 ++-------------------*--------------------------------------------+
perf-stat.cache-misses
8e+10 ++------------------------------------------------------------------+
*.*.*.*..*.*.*.*.*.* *..*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*..*.*.*.*.*
7e+10 ++ : : |
6e+10 ++ O O : O O O |
| O O O O : O O O O |
5e+10 O+O O O : : O O |
| : : |
4e+10 ++ : : |
| : : |
3e+10 ++ : : |
2e+10 ++ : : |
| : |
1e+10 ++ : |
| : |
0 ++-------------------*----------------------------------------------+
perf-stat.branch-instructions
2.5e+12 ++----------------------------------------------------------------+
O O O O O O O O O O O O |
| O O O O O O O |
2e+12 *+*.*.*.*.*.*.*..*.* *.*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*
| : : |
| : : |
1.5e+12 ++ : : |
| : : |
1e+12 ++ : : |
| : : |
| : : |
5e+11 ++ : : |
| : |
| : |
0 ++-------------------*--------------------------------------------+
perf-stat.branch-misses
6e+09 ++------------------------------------------------------------------+
| .*.*.*.*.*.*. .*.*.*.*
5e+09 *+*.*.*..*. .*. .*.* *..* *..*.*.*.*.*.*.*.*..* |
| * * : : |
| O : O O O O O |
4e+09 O+O O O O O O O: : O O |
| O : : O O |
3e+09 ++ : : |
| : : |
2e+09 ++ : : |
| : : |
| :: |
1e+09 ++ : |
| : |
0 ++-------------------*----------------------------------------------+
perf-stat.dTLB-stores
2e+12 ++----------------------------------------------------------------+
1.8e+12 *+*.*.*.*. .*..*.* *.*. .*. .*. .*.*.*.*.*.*..*.*.*.*.*.*.*.*.*
| *.* : : * * * |
1.6e+12 ++ O : O O |
1.4e+12 ++ O O : O : O O O O |
| O O O O O: : |
1.2e+12 O+ O : : O O |
1e+12 ++ : : |
8e+11 ++ : : |
| : : |
6e+11 ++ : : |
4e+11 ++ : |
| : |
2e+11 ++ : |
0 ++-------------------*--------------------------------------------+
perf-stat.dTLB-store-misses
4e+09 ++----------------------------------------------------------------+
| *.. |
3.5e+09 ++ .*.* : *. *. *. .*.. .*.*. |
3e+09 *+*.*.* + : * : *.*.*.*. + *.*.*.* *.*.*.* *.*.*
| * : : *.* |
2.5e+09 ++ : : |
| O O : :O O |
2e+09 ++ O O O O:O: O O O O |
O O : : |
1.5e+09 ++O O : : O O |
1e+09 ++ : : |
| : : |
5e+08 ++ : |
| : |
0 ++-------------------*--------------------------------------------+
perf-stat.iTLB-load-misses
9e+08 ++--------------------------*-*-*-*-*-*-----------------------*-----+
*.*.*.*..*.*.*. .*.* *..* *..*.*.*.*.*.*.*.*..* *.*.*
8e+08 ++ * : : |
7e+08 ++ : : |
| : : |
6e+08 ++ : : |
5e+08 ++ : : |
| O : :O O |
4e+08 ++ O O O O:O: O O O O |
3e+08 O+O O O : : |
| O : : O O |
2e+08 ++ : |
1e+08 ++ : |
| : |
0 ++-------------------*----------------------------------------------+
perf-stat.node-load-misses
4e+09 ++----------------------------------------------------------------+
O O O O O O |
3.5e+09 ++ O O O O O O O O |
3e+09 ++ O O O O O |
| |
2.5e+09 *+ .*.*.*..*.* *.*.*.*.*.*.*.*.*. .*. |
| *.*.*.* : : *.*.*.*. *.*.*.*.*.*.*.*
2e+09 ++ : : |
| : : |
1.5e+09 ++ : : |
1e+09 ++ : : |
| : : |
5e+08 ++ : |
| : |
0 ++-------------------*--------------------------------------------+
perf-stat.node-stores
7e+10 ++------------------------------------------------------------------+
| |
6e+10 *+*.*.*..*.*.*.*.*.* *..*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*..*.*.*.*.*
| : : |
5e+10 ++ O O : O O O |
| O O O O: : O O O O |
4e+10 O+O O O : : O O |
| : : |
3e+10 ++ : : |
| : : |
2e+10 ++ : : |
| :: |
1e+10 ++ : |
| : |
0 ++-------------------*----------------------------------------------+
perf-stat.node-store-misses
2.5e+10 ++----------------------------------------------------------------+
| |
| |
2e+10 O+O O O O O O O O O O O |
| O O O O O O O |
| |
1.5e+10 ++ |
| |
1e+10 ++ .*. |
*. .*. .*.* *..*.* *.*.*.*.*.*.*.*.*. .*. .*..*. .*. .*. .*. .*
| * * : : * * * * * * |
5e+09 ++ : : |
| : : |
| : |
0 ++-------------------*--------------------------------------------+
perf-stat.branch-miss-rate_
0.3 ++-------------------------------------------------------------------+
| .*.*.*..*.*.*. .*.*.*.*
0.25 *+*.*.*..*. .*. .* *.* *.*.*.*..*.*.*.*.*.*. |
| * *.* : : |
| : : |
0.2 ++ : O O O O |
O O O O O O O: O: O O |
0.15 ++ O O O : : O O |
| : : |
0.1 ++ : : |
| : : |
| : : |
0.05 ++ : |
| : |
0 ++--------------------*----------------------------------------------+
perf-stat.node-load-miss-rate_
40 ++--O----------------------O--------O----------------------------------+
O O O O O O O |
35 ++ O O O O O O O O O |
30 *+*.*..*.*.*.*..*.*.* *..*.*.*.*..*.*.*.*.*..*.*.*.*.*..*.*.*.*..*.*.*
| : : |
25 ++ : : |
| : : |
20 ++ : : |
| : : |
15 ++ : : |
10 ++ : : |
| : : |
5 ++ : |
| : |
0 ++--------------------*------------------------------------------------+
perf-stat.node-store-miss-rate_
35 ++--O----------------------O--------O----------------------------------+
O O O |
30 ++ O O O O O |
| O O O O O O |
25 ++ O O |
| |
20 ++ |
| |
15 ++ .*.. |
*.*.*..*.*.* *.*.* *..*.*.*.*..*.*.*.*.*..*.*.*.*.*..*.*.*.*..*.*.*
10 ++ : : |
| : : |
5 ++ : : |
| : |
0 ++--------------------*------------------------------------------------+
perf-stat.instructions-per-iTLB-miss
45000 ++------------------------------------------------------------------+
| O O |
40000 ++ O |
35000 ++O O |
O O |
30000 ++ O |
25000 ++ O O O O O O O O |
| O O O |
20000 ++ |
15000 ++ |
*.*.*.*..*.*.*.*.*.* *..*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*..*.*.*.*.*
10000 ++ : : |
5000 ++ : : |
| :: |
0 ++-------------------*----------------------------------------------+
vm-scalability.time.elapsed_time
350 ++--------------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O |
300 *+*.*..*.*.*.*.*..*.* *.*.*..*.*.*.*.*..*.*.*.*.*..*.*.*.*.*.*..*.*.*
| : : |
250 ++ : : |
| : : |
200 ++ : : |
| : : |
150 ++ : : |
| : : |
100 ++ : : |
| :: |
50 ++ : |
| : |
0 ++--------------------*-----------------------------------------------+
vm-scalability.time.elapsed_time.max
350 ++--------------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O |
300 *+*.*..*.*.*.*.*..*.* *.*.*..*.*.*.*.*..*.*.*.*.*..*.*.*.*.*.*..*.*.*
| : : |
250 ++ : : |
| : : |
200 ++ : : |
| : : |
150 ++ : : |
| : : |
100 ++ : : |
| :: |
50 ++ : |
| : |
0 ++--------------------*-----------------------------------------------+
turbostat._Busy
100 ++--------------------------------------------------------------------+
90 O+O.O..O.O.O.O.O..O.O O O.O.O..O.O.O.O.O..*.*.*.*.*..*.*.*.*.*.*..*.*.*
| : : |
80 ++ : : |
70 ++ : : |
| : : |
60 ++ : : |
50 ++ : : |
40 ++ : : |
| : : |
30 ++ : : |
20 ++ : |
| : |
10 ++ : |
0 ++--------------------*-----------------------------------------------+
vm-scalability.throughput
1.6e+07 ++*---*----------------------------------*---*------*---*-------*-+
* * *.*.*.*..*.* *.*.*.*.*.*.*.*.* * *..* * *.*.* *
1.4e+07 ++ : : |
1.2e+07 ++ O : : |
| O O : O O O O O |
1e+07 ++ O O O: : O O |
O O O O : : O O |
8e+06 ++ : : |
| : : |
6e+06 ++ : : |
4e+06 ++ : : |
| : |
2e+06 ++ : |
| : |
0 ++-------------------*--------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong