Greeting,
FYI, we noticed a 90.3% improvement of vm-scalability.throughput due to commit:
commit: 66a6197c118540d454913eef24d68d7491ab5d5f ("mm: provide helper for finishing
mkwrite faults")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
in testcase: vm-scalability
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
with following parameters:
runtime: 300s
size: 1T
test: msync-mt
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of
the mm/ of the Linux kernel which are of interest to us.
test-url:
https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run:
vm-scalability/300s-1T-msync-mt-performance/lkp-bdw-ep2
997dd98dd68beb2a 66a6197c118540d454913eef24
---------------- --------------------------
%stddev change %stddev
\ | \
0.00 ± 53% 282% 0.01 ± 94% vm-scalability.stddev
6727484 90% 12800442 vm-scalability.throughput
4.852e+08 29% 6.237e+08 interrupts.CAL:Function_call_interrupts
48016 144% 116970 vmstat.io.bo
612808 24% 758937 vmstat.system.in
556800 20% 666097 vmstat.system.cs
613 53% 941 turbostat.Avg_MHz
21.98 53% 33.70 turbostat.%Busy
166 ± 4% 11% 184 turbostat.PkgWatt
67.29 7% 71.73 turbostat.RAMWatt
42508120 ± 5% 9e+07 1.351e+08 ± 3%
latency_stats.sum.wait_on_page_bit.__migration_entry_wait.migration_entry_wait.do_swap_page.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
4971400 ± 11% 9e+07 94614270 ± 7%
latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_vn_update_time.file_update_time.xfs_filemap_page_mkwrite.do_page_mkwrite.do_wp_page.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
2932983 ± 11% 5e+07 56035780 ± 7%
latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_file_iomap_begin_delay.xfs_file_iomap_begin.iomap_apply.iomap_page_mkwrite.xfs_filemap_page_mkwrite.do_page_mkwrite.do_wp_page.handle_mm_fault.__do_page_fault.do_page_fault
429 ± 55% 6e+04 59266 ± 16%
latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_vn_update_time.file_update_time.xfs_filemap_page_mkwrite.do_page_mkwrite.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
112 ± 88% 3e+04 32879 ± 11%
latency_stats.sum.call_rwsem_down_write_failed.xfs_ilock.xfs_file_iomap_begin_delay.xfs_file_iomap_begin.iomap_apply.iomap_page_mkwrite.xfs_filemap_page_mkwrite.do_page_mkwrite.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
0 1e+04 11193 ± 58%
latency_stats.sum.submit_bio_wait.blkdev_issue_flush.xfs_blkdev_issue_flush.xfs_file_fsync.vfs_fsync_range.SyS_msync.entry_SYSCALL_64_fastpath
1.857e+10 94% 3.607e+10 perf-stat.node-store-misses
2.475e+11 88% 4.654e+11 perf-stat.cache-references
6.691e+10 74% 1.164e+11 perf-stat.cache-misses
6.154e+12 71% 1.055e+13 perf-stat.branch-instructions
2.46e+13 67% 4.12e+13 perf-stat.instructions
5.878e+12 62% 9.493e+12 perf-stat.dTLB-loads
44015 57% 68958 ± 8% perf-stat.instructions-per-iTLB-miss
4.973e+13 55% 7.717e+13 perf-stat.cpu-cycles
1.721e+12 54% 2.644e+12 perf-stat.dTLB-stores
2.403e+09 52% 3.657e+09 perf-stat.dTLB-store-misses
3.92e+08 ± 4% 46% 5.729e+08 ± 13% perf-stat.node-loads
10963820 39% 15208401 ± 4% perf-stat.cpu-migrations
1.336e+09 25% 1.673e+09 ± 6% perf-stat.dTLB-load-misses
6.689e+09 25% 8.346e+09 perf-stat.iTLB-loads
5.175e+08 20% 6.233e+08 perf-stat.context-switches
3.556e+09 19% 4.223e+09 perf-stat.node-stores
1.307e+10 15% 1.508e+10 perf-stat.branch-misses
1.449e+10 10% 1.591e+10 perf-stat.node-load-misses
0.49 8% 0.53 perf-stat.ipc
5.589e+08 8% 6.022e+08 ± 9% perf-stat.iTLB-load-misses
83.93 7% 89.52 perf-stat.node-store-miss-rate%
27.04 -8% 25.00 perf-stat.cache-miss-rate%
7.71 -13% 6.73 ± 8% perf-stat.iTLB-load-miss-rate%
4.593e+08 -16% 3.878e+08 perf-stat.minor-faults
4.593e+08 -16% 3.878e+08 perf-stat.page-faults
0.02 -22% 0.02 ± 5% perf-stat.dTLB-load-miss-rate%
0.21 -33% 0.14 perf-stat.branch-miss-rate%
perf-stat.instructions
4.4e+13 ++----------------------------------------------------------------+
4.2e+13 ++O O O O O O O O |
O O O O O O O O O O O O O O O O O O O |
4e+13 ++ |
3.8e+13 ++ |
| |
3.6e+13 ++ |
3.4e+13 ++ |
3.2e+13 ++ |
| |
3e+13 ++ |
2.8e+13 ++ |
*.*.*.*.* *.*.*.*.*.*.*.*.*.* |
2.6e+13 ++ + + + .*. .*. |
2.4e+13 ++--------*-*-*--*-*-*-*-*---------------------*--*---*-*-*---*-*-*
perf-stat.cache-references
5.5e+11 ++----------------------------------------------------------------+
| |
5e+11 ++O O O O O |
O O O O O O O O O O O O O |
4.5e+11 ++ O O O O O O O O O |
| |
4e+11 ++ |
| |
3.5e+11 ++ |
| |
3e+11 *+*.*.*.* *.*.*.*.*.*.*.*.*.* |
| + + + |
2.5e+11 ++ *.*.*..*.*. .*.* *..*.*.*.*.*.*.*.*.*
| * |
2e+11 ++----------------------------------------------------------------+
perf-stat.cache-misses
1.4e+11 ++----------------------------------------------------------------+
| |
1.2e+11 O+O O O O O O O O O O O O O O O O O O O O O |
| O O O O O |
1e+11 ++ |
| |
8e+10 ++ |
| .*. .*.*. .* .*.*.*.*.*.*.*.*
6e+10 ++ * *. *.* : *..* |
| : : : |
4e+10 ++ : : : |
| : : : |
2e+10 ++ : : : |
*.*.*.*.* *.*.*.*.*.*.*.*.*.* |
0 ++----------------------------------------------------------------+
perf-stat.branch-instructions
1.2e+13 ++----------------------------------------------------------------+
| |
1.1e+13 ++O O O |
O O O O O O O O O O O O O O O O O O O |
| O O O O O |
1e+13 ++ |
| |
9e+12 ++ |
| |
8e+12 ++ |
| |
*.*.*.*. *.*.*.*. .*.*.*.*. |
7e+12 ++ * : * * |
| + : + |
6e+12 ++--------*-*-*--*-*-*-*-*---------------------*--*-*-*-*-*-*-*-*-*
perf-stat.dTLB-loads
1e+13 ++----------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O |
9.5e+12 ++ O O O O O O O O |
9e+12 ++ |
| |
8.5e+12 ++ |
8e+12 ++ |
| |
7.5e+12 ++ |
7e+12 ++ |
| |
6.5e+12 *+*.*.*. *.*.*.*. .*.*.*.*. |
6e+12 ++ *. + * *. |
| *.*.*..*.*.*.*.* *..*.*.*.*.*.*.*.*.*
5.5e+12 ++----------------------------------------------------------------+
perf-stat.dTLB-stores
2.8e+12 ++------O-O-------------------------------------------------------+
| O O O O O O O O O |
2.6e+12 O+ O O O O O O O O O O O O O O O |
| |
| |
2.4e+12 ++ |
| |
2.2e+12 ++ |
| |
2e+12 ++ |
| |
*.*.*.*. *.*.*.*. .*.*.*.*. |
1.8e+12 ++ *. + * *. .*.*
| *.*.*..*.*.*.*.* *..*.*.*.*.*.*.* |
1.6e+12 ++----------------------------------------------------------------+
perf-stat.dTLB-store-misses
4.2e+09 ++----------------------------------------------------------------+
4e+09 ++ O O |
| O O O O |
3.8e+09 O+ O O O O O O O O |
3.6e+09 ++ O O O O O O O O |
| O O O O |
3.4e+09 ++ |
3.2e+09 ++ |
3e+09 ++ |
| |
2.8e+09 ++ |
2.6e+09 ++ |
| .*. .*
2.4e+09 ++ .*.*.*..*.*.*.*.*. .*..*.*.*.* *.*.* |
2.2e+09 *+*-*-*-*------------------*-*-*-*-*-*-*-*-*-*--------------------+
perf-stat.node-store-misses
4e+10 ++----------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O O |
3.5e+10 ++ O O O O O O O |
3e+10 ++ |
| |
2.5e+10 ++ |
| |
2e+10 ++ .*. .*.*
| *.*.*..*.*.*.*.* *..*.*.*.*.* * |
1.5e+10 ++ : : : |
1e+10 ++ : : : |
| : : : |
5e+09 ++ : : : |
| : : : |
0 *+*-*-*-*------------------*-*-*-*-*-*-*-*-*-*--------------------+
perf-stat.page-faults
4.7e+08 ++----*--------------------------*---------*----------------------+
4.6e+08 *+*.* *. .*.*.*.*.*.*.* *.*.*.* *. .*. .*.*
| *.*.*..* *..*. .*.* *.* |
4.5e+08 ++ * |
4.4e+08 ++ |
| |
4.3e+08 ++ |
4.2e+08 ++ |
4.1e+08 ++ |
| |
4e+08 ++ O O O O |
3.9e+08 O+O O O O O O O |
| O O O O O O O O |
3.8e+08 ++ O O O O O O O |
3.7e+08 ++----------------------------------------------------------------+
perf-stat.context-switches
6.4e+08 ++----------------------------------------------------------------+
| O O O O O |
6.2e+08 ++ O O O O O O O O O O |
6e+08 O+ O O O O O O O O O |
| O O |
5.8e+08 ++ |
| |
5.6e+08 ++ |
| |
5.4e+08 ++ |
5.2e+08 ++ .*. .*.*
| *.*.*..*.*.*.*.* *.. .*.* *.* |
5e+08 ++ : : : *.* |
|.*.*.*. : : .*.*.*. .*.*.*. : |
4.8e+08 *+------*------------------*-------*-*-------*--------------------+
perf-stat.minor-faults
4.7e+08 ++----*--------------------------*---------*----------------------+
4.6e+08 *+*.* *. .*.*.*.*.*.*.* *.*.*.* *. .*. .*.*
| *.*.*..* *..*. .*.* *.* |
4.5e+08 ++ * |
4.4e+08 ++ |
| |
4.3e+08 ++ |
4.2e+08 ++ |
4.1e+08 ++ |
| |
4e+08 ++ O O O O |
3.9e+08 O+O O O O O O O |
| O O O O O O O O |
3.8e+08 ++ O O O O O O O |
3.7e+08 ++----------------------------------------------------------------+
perf-stat.branch-miss-rate_
0.23 ++-------------------------------------------------------------------+
0.22 ++ .*. .*.* |
| .* *.*.*.*..* + .*. .*..*.*. .*.*.*.. .*.*.*
0.21 *+*.*.*. *.*.*..* *.*.*.* * * |
0.2 ++ |
0.19 ++ |
0.18 ++ |
| |
0.17 ++ |
0.16 ++ |
0.15 ++ |
0.14 ++ O O O O O O O O O O O O O O |
O O O O O O O O O O O |
0.13 ++O O |
0.12 ++-------------------------------------------------------------------+
perf-stat.node-store-miss-rate_
95 ++---------------------------------------------------------------------+
90 O+O O O O O O O O O O O O O O O O O O O O O O O O O O |
| |
85 ++ *.*..*.*.*.*.*..* *.*.*..*.*.*.*..*.*.*
80 ++ : : : |
75 ++ : : : |
70 ++ : : : |
| : : : |
65 ++ : : : |
60 ++ : : : |
55 ++ : : : |
50 ++ * : : * * : |
*. + +: *. + + .*. + +: |
45 ++*. + * *. + * *. + * |
40 ++--*----------------------------*----------*--------------------------+
turbostat.RAMWatt
74 ++O------O-------------------------------------------------------------+
72 O+ O O O O O O O O O O O O O |
| O O O O O O O O O O O |
70 ++ |
68 ++ .*. .*
| *.*..*.*.*.*.*..* *.*.*..* *.*..*.* |
66 ++ : : : |
64 ++ : : : |
62 ++ : : : |
| : : : |
60 ++ : : : |
58 ++*. : : *. *. : |
|+ *..*.* :+ *..*.*. + *..*.* |
56 *+ * * |
54 ++---------------------------------------------------------------------+
140000 ++---------------O-------------------------------------------------+
| |
120000 ++ |
| |
100000 ++ |
| |
80000 ++ O |
O O O |
60000 ++O O O O |
| O O O O O O O O |
40000 ++ O O O O O O O O O O |
| |
20000 ++ |
| |
0 *+*-*-*-*-*--*-*-*-*-*-*-*-*-*-*--*-*-*-*-*-*-*-*-*-*-*--*-*-*-*-*-*
vm-scalability.throughput
1.5e+07 ++----------------------------------------------------------------+
| O O |
1.4e+07 ++O O O O O |
1.3e+07 O+ O O O O O O O O O O O O O |
| O O O O O O |
1.2e+07 ++ |
1.1e+07 ++ |
| |
1e+07 ++ |
9e+06 ++ |
| |
8e+06 ++*. .*. .*. .*. .*. .*. |
7e+06 *+ * *. .* * *.* * *. |
| *.*.*..*.*.*.*.* *..*.*.*.*.*.*.*.*.*
6e+06 ++----------------------------------------------------------------+
interrupts.CAL:Function_call_interrupts
7e+08 ++----------------------------------------------------------------+
| |
6.5e+08 O+ O O O O O O O O O O O O O O |
6e+08 ++O O O O O O O O O O O |
| O |
5.5e+08 ++ |
5e+08 ++ |
| *.*.*..*.*.*.*.* *..*.*.*.*.*.*.*.*.*
4.5e+08 ++ : : : |
4e+08 ++ : : : |
| : : : |
3.5e+08 ++ : : : |
3e+08 *+*. .*.* *.*. .*.*.*.*. .*.* |
| * * * |
2.5e+08 ++----------------------------------------------------------------+
vmstat.io.bo
140000 ++-----------------------------------------------------------------+
130000 ++O O O |
| O O O O O O |
120000 O+ O O O O O O O O O O O |
110000 ++ O O O O O O |
| |
100000 ++ |
90000 ++ |
80000 ++ |
| |
70000 ++ |
60000 *+*.*.*.* *.*.*..*.*.*.*.*.*.* |
| + + + |
50000 ++ *..*.*.*.*.*.*.* *.*.*.*..*.*.*.*.*.*
40000 ++-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong