Greeting,
FYI, we noticed a 57.1% improvement of vm-scalability.throughput due to commit:
commit: 4b922b23ce8026a6cdd79ecd57aaa515d8144f2a ("mm/swap: Split swap cache into
64MB trunks")
git://bee.sh.intel.com/git/yhuang/linux.git swap_optimize_v4
in testcase: vm-scalability
on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory
with following parameters:
thp_enabled: never
thp_defrag: never
nr_task: 16
disk: 1pmem
test: swap-w-seq
unit_size: 96G
size: 96G
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of
the mm/ of the Linux kernel which are of interest to us.
test-url:
https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
In addition to that, the commit also has significant impact on the following tests:
+------------------+-----------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput 57.6% improvement
|
| test machine | 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory
|
| test parameters | cpufreq_governor=performance
|
| | disk=1pmem
|
| | nr_task=32
|
| | size=96G
|
| | test=swap-w-seq
|
| | thp_defrag=never
|
| | thp_enabled=never
|
| | unit_size=96G
|
+------------------+-----------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run:
vm-scalability/never-never-16-1pmem-swap-w-seq-96G-96G-performance/lkp-hsw-ep4
91526a72b886b53e 4b922b23ce8026a6cdd79ecd57
---------------- --------------------------
%stddev change %stddev
\ | \
2977834 57% 4677757 vm-scalability.throughput
2501424 ± 3% -5% 2368700 ± 5%
vm-scalability.time.maximum_resident_set_size
95.11 -27% 69.01 vm-scalability.time.elapsed_time
95.11 -27% 69.01 vm-scalability.time.elapsed_time.max
1427 -28% 1028 vm-scalability.time.system_time
27580435 19% 32703586 interrupts.CAL:Function_call_interrupts
55.27 9% 60.14 turbostat.RAMWatt
176 5% 185 turbostat.PkgWatt
7320 ± 83% -6e+03 822 ± 57%
latency_stats.sum.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
526 ± 27% 77% 928 ± 17% vmstat.swap.si
1430761 40% 1999866 vmstat.swap.so
360122 49% 535772 vmstat.system.in
4624 ± 10% 61% 7446 vmstat.system.cs
4.084e+08 ± 4% 69% 6.886e+08 perf-stat.node-stores
7.14e+08 ± 24% 60% 1.141e+09 ± 4% perf-stat.node-loads
3.818e+09 ± 6% 22% 4.653e+09 ± 4% perf-stat.cache-misses
1.402e+10 ± 10% 25% 1.746e+10 ± 3% perf-stat.cache-references
443182 ± 11% 19% 525174 ± 4% perf-stat.context-switches
0.41 17% 0.48 perf-stat.ipc
38.30 ± 3% 6% 40.66 perf-stat.iTLB-load-miss-rate%
1.837e+09 ± 6% -11% 1.634e+09 ± 5% perf-stat.node-load-misses
40411593 -15% 34404698 perf-stat.iTLB-load-misses
72.33 ± 6% -19% 58.85 ± 3% perf-stat.node-load-miss-rate%
4582 ± 5% -11% 4079 ± 5% perf-stat.cpu-migrations
1.041e+09 ± 3% -8% 9.572e+08 perf-stat.branch-misses
65214946 ± 5% -23% 50246132 ± 3% perf-stat.iTLB-loads
2.208e+12 -9% 2.014e+12 ± 3% perf-stat.instructions
5.646e+11 -10% 5.091e+11 ± 3% perf-stat.branch-instructions
3.016e+08 ± 4% -13% 2.63e+08 ± 6% perf-stat.dTLB-store-misses
0.19 ± 7% -18% 0.15 ± 4% perf-stat.dTLB-store-miss-rate%
56.96 -23% 43.72 perf-stat.node-store-miss-rate%
5.421e+12 ± 3% -22% 4.229e+12 ± 4% perf-stat.cpu-cycles
5.85e+08 ± 9% -25% 4.409e+08 ± 18% perf-stat.dTLB-load-misses
perf-stat.iTLB-load-misses
4.3e+07 ++----------------------------------------------------------------+
4.2e+07 *+.. |
| . ..*... .*... ..*... |
4.1e+07 ++ *. . .. . ..*....*.. *....*.... ..*... |
4e+07 ++ *. *. *.. *....*
| |
3.9e+07 ++ |
3.8e+07 ++ |
3.7e+07 ++ |
| |
3.6e+07 ++ |
3.5e+07 ++ O O |
O O |
3.4e+07 ++ O |
3.3e+07 ++----------------------------------------------------------------+
perf-stat.node-stores
7.5e+08 ++----------------------------------------------------------------+
O O |
7e+08 ++ |
6.5e+08 ++ O O O |
| |
6e+08 ++ |
| |
5.5e+08 ++ |
| ..*.... |
5e+08 *+...*. *.... ..*...*....*....*...*....*... |
4.5e+08 ++ *.. . |
| *....*... |
4e+08 ++ *....|
| *
3.5e+08 ++----------------------------------------------------------------+
perf-stat.node-store-miss-rate_
62 *+---------------------------------------------------------------------+
60 ++ *....*....*....*....*....*.....*....*....*.... |
| *.. .*
58 ++ .. .*.... .. |
56 ++ .. *. |
| *. |
54 ++ |
52 ++ |
50 ++ |
| |
48 ++ |
46 ++ |
| O |
44 ++ O O O |
42 O+---------------------------------------------------------------------+
perf-stat.ipc
0.5 ++-------------------------------------------------------------------+
| |
0.48 O+ O O O |
| |
0.46 ++ O |
| |
0.44 ++ |
| |
0.42 ++ |
| ..*....*.... ..*
0.4 ++ .*.... ..*... ..*.. *.. |
| .. *.... ..*....*....*.. *....*.. |
0.38 *+ *.. |
| |
0.36 ++-------------------------------------------------------------------+
vm-scalability.time.system_time
1600 ++-------------------------------------------------------------------+
*... |
1500 ++ . ..*.... ..*....*...*.... |
| *.. *....*....*.. *....*....*....*....*... |
1400 ++ .|
| *
1300 ++ |
| |
1200 ++ |
| |
1100 ++ |
O O O O |
1000 ++ O |
| |
900 ++-------------------------------------------------------------------+
vm-scalability.time.elapsed_time
105 ++--------------------------------------------------------------------+
| |
100 *+... ..*.... ..*.... |
95 ++ *.. *....*....*....*....*.. *....*....*....*....*....|
| *
90 ++ |
| |
85 ++ |
| |
80 ++ |
75 ++ |
| |
70 ++ O O |
O O O |
65 ++--------------------------------------------------------------------+
vm-scalability.time.elapsed_time.max
105 ++--------------------------------------------------------------------+
| |
100 *+... ..*.... ..*.... |
95 ++ *.. *....*....*....*....*.. *....*....*....*....*....|
| *
90 ++ |
| |
85 ++ |
| |
80 ++ |
75 ++ |
| |
70 ++ O O |
O O O |
65 ++--------------------------------------------------------------------+
turbostat.RAMWatt
61 ++---------------------------------------------------------------------+
| O |
60 O+ O O O |
| |
59 ++ |
| |
58 ++ |
| |
57 ++ |
| *..... *.. |
56 ++ *.... ..*... .. *. .. .. |
| .. *....*.. . .. .. *.... .. |
55 ++. * . .. * *....*....*
* .. |
54 ++---------------------------------------*-----------------------------+
vm-scalability.throughput
5e+06 ++----------------------------------------------------------------+
| O |
O O O O |
4.5e+06 ++ |
| |
| |
4e+06 ++ |
| |
3.5e+06 ++ |
| |
| |
3e+06 ++ ..*.... ..*....*
| ..*...*....*....*....*...*....*....*...*....*.. *. |
*.. |
2.5e+06 ++----------------------------------------------------------------+
vmstat.swap.so
2.1e+06 ++---O------------------------------------------------------------+
| |
2e+06 O+ O O |
| O |
1.9e+06 ++ |
1.8e+06 ++ |
| |
1.7e+06 ++ |
| |
1.6e+06 ++ |
1.5e+06 ++ |
| ..*.... ..*
1.4e+06 ++ ..*...*....*.... ..*...*....*.... ..*.. *....*...*.. |
*.. *.. *. |
1.3e+06 ++----------------------------------------------------------------+
vmstat.system.in
560000 ++---O-------------------------------------------------------------+
540000 O+ |
| O O |
520000 ++ O |
500000 ++ |
480000 ++ |
460000 ++ |
| |
440000 ++ |
420000 ++ |
400000 ++ ..*.... ..*.. |
380000 *+...*....*...*.... ..*.. *...*....*.. .. |
| *.. |
360000 ++ *...*....*....*
340000 ++-----------------------------------------------------------------+
vmstat.system.cs
8000 ++-------------------------------------------------------------------+
| O O |
7500 O+ O O |
7000 ++ |
| |
6500 ++ |
6000 ++ |
| |
5500 ++ ..*... |
5000 ++ *.... *.. . |
| . *.. *....*....*.. . *... |
4500 ++.. .. .. . .. . |
4000 ++ .. *.... . *....*
* *....* * |
3500 ++-------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong