Greeting,
We noticed a -12.1% regression of hackbench.throughput due to commit:
commit: 11a1251e3a3cc9532f358c889deea63169bd2c65 ("x86/entry/64: Make
cpu_entry_area.tss read-only")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: hackbench
on test machine: 256 threads Phi with 96G memory
with following parameters:
nr_threads: 100%
mode: threads
ipc: pipe
cpufreq_governor: performance
test-description: Hackbench is both a benchmark and a stress test for the Linux kernel
scheduler.
test-url:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sc...
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
gcc-7/x86_64-kexec/debian-x86_64-2016-08-31.cgz/1/vm-lkp-wsx03-2G/boot
commit:
dff71e3c0e ("x86/entry: Clean up the SYSENTER_stack code")
11a1251e3a ("x86/entry/64: Make cpu_entry_area.tss read-only")
dff71e3c0e180fed 11a1251e3a3cc9532f358c889d
---------------- --------------------------
%stddev %change %stddev
\ | \
84922 -12.1% 74626 hackbench.throughput
710.29 +13.8% 807.99 hackbench.time.elapsed_time
710.29 +13.8% 807.99 hackbench.time.elapsed_time.max
1.375e+09 +4.6% 1.439e+09 hackbench.time.involuntary_context_switches
170288 +14.1% 194216 hackbench.time.system_time
3.659e+09 +6.7% 3.905e+09 hackbench.time.voluntary_context_switches
27.95 ± 2% -4.2% 26.77 ± 2% boot-time.kernel_boot
4.67 -0.5 4.15 ± 2% mpstat.cpu.usr%
1959105 +28.6% 2519832 softirqs.SCHED
171.26 -1.4% 168.86 turbostat.PkgWatt
6810850 -17.9% 5590300 ± 3% numa-numastat.node0.local_node
6810780 -17.9% 5590339 ± 3% numa-numastat.node0.numa_hit
4390448 ± 2% -15.0% 3732717 ± 3% numa-vmstat.node0.numa_hit
4390455 ± 2% -15.0% 3732722 ± 3% numa-vmstat.node0.numa_local
710.29 +13.8% 807.99 time.elapsed_time
710.29 +13.8% 807.99 time.elapsed_time.max
170288 +14.1% 194216 time.system_time
4113 -11.0% 3663 vmstat.procs.r
7205243 -7.3% 6681584 vmstat.system.cs
1136312 -7.3% 1053720 vmstat.system.in
6826262 -17.8% 5607814 ± 3% proc-vmstat.numa_hit
6826245 -17.8% 5607797 ± 3% proc-vmstat.numa_local
6893928 -17.6% 5682332 ± 2% proc-vmstat.pgalloc_normal
1001095 +18.9% 1190215 proc-vmstat.pgfault
6814642 -17.7% 5608336 ± 2% proc-vmstat.pgfree
6.678e+12 +7.1% 7.154e+12 perf-stat.branch-instructions
9.07 -0.3 8.74 perf-stat.branch-miss-rate%
6.057e+11 +3.2% 6.251e+11 perf-stat.branch-misses
8.52 +0.6 9.11 perf-stat.cache-miss-rate%
1.736e+11 +11.6% 1.938e+11 perf-stat.cache-misses
2.037e+12 +4.5% 2.127e+12 perf-stat.cache-references
5.067e+09 +6.5% 5.398e+09 perf-stat.context-switches
8.20 +7.9% 8.84 perf-stat.cpi
2.479e+14 +13.8% 2.82e+14 perf-stat.cpu-cycles
2.687e+08 +9.5% 2.942e+08 perf-stat.cpu-migrations
1.91 -0.1 1.84 perf-stat.iTLB-load-miss-rate%
5.896e+11 +0.9% 5.947e+11 perf-stat.iTLB-load-misses
3.03e+13 +4.9% 3.178e+13 perf-stat.iTLB-loads
3.024e+13 +5.5% 3.19e+13 perf-stat.instructions
51.29 +4.6% 53.64 perf-stat.instructions-per-iTLB-miss
0.12 -7.3% 0.11 perf-stat.ipc
969235 +18.8% 1151230 perf-stat.minor-faults
969196 +18.8% 1151862 perf-stat.page-faults
11893 ± 2% +35.4% 16103 ± 4% sched_debug.cfs_rq:/.MIN_vruntime.avg
hackbench.throughput
90000 +-+-----------------------------------------------------------------+
|..+..+ +..+..+..+..+..+..+..+.+..+..+..+..+..+..+..+..+..+..+..|
80000 O-+O : : O O O |
70000 +-+ O O O O O O O O |
| : : O |
60000 +-+ : : |
50000 +-+ : : |
| : : |
40000 +-+ : : |
30000 +-+ : : |
| : : |
20000 +-+ : : |
10000 +-+ : |
| : |
0 +-+-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong