FYI, we noticed a -2.8% regression of aim9.fork_test.ops_per_sec due to commit:
commit 5903b0cc463db12dd495942d405e581783074905 ("sched: propagate load during
synchronous attach/detach")
https://git.linaro.org/people/vincent.guittot/kernel.git sched/pelt
in testcase: aim9
on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
with following parameters:
testtime: 300s
test: fork_test
cpufreq_governor: performance
Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic
benchmark.
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-hsx04/fork_test/aim9/300s
commit:
c3c8a02759 ("sched: factorize PELT update")
5903b0cc46 ("sched: propagate load during synchronous attach/detach")
c3c8a027596a40e1 5903b0cc463db12dd495942d40
---------------- --------------------------
%stddev %change %stddev
\ | \
5463 ± 0% -2.8% 5308 ± 0% aim9.fork_test.ops_per_sec
9613 ± 0% +125.0% 21630 ± 0% aim9.time.involuntary_context_switches
2553 ± 1% -3.3% 2468 ± 0% aim9.time.maximum_resident_set_size
3265963 ± 0% -3.4% 3154181 ± 0% aim9.time.voluntary_context_switches
452695 ± 2% +3.7% 469383 ± 1% interrupts.CAL:Function_call_interrupts
937.75 ± 0% +28.4% 1204 ± 1% proc-vmstat.nr_page_table_pages
24930 ± 0% -1.4% 24586 ± 0% vmstat.system.cs
648224 ± 1% +37.4% 890843 ± 0% meminfo.Committed_AS
3736 ± 1% +28.9% 4815 ± 1% meminfo.PageTables
13.25 ± 6% -66.0% 4.50 ±100% numa-numastat.node2.other_node
5803 ± 34% +99.7% 11587 ± 20% numa-numastat.node3.numa_foreign
5803 ± 34% +99.7% 11587 ± 20% numa-numastat.node3.numa_miss
3264239 ± 7% +80.3% 5885636 ± 6% cpuidle.C1-HSW.time
2.079e+08 ± 4% +31.6% 2.736e+08 ± 2% cpuidle.C1E-HSW.time
3.677e+08 ± 0% -16.9% 3.056e+08 ± 0% cpuidle.C3-HSW.time
1722328 ± 0% -19.2% 1390958 ± 0% cpuidle.C3-HSW.usage
3.853e+09 ± 1% -55.5% 1.716e+09 ± 2% cpuidle.POLL.time
11.80 ± 1% -43.2% 6.70 ± 1% turbostat.%Busy
324.25 ± 1% -41.1% 191.00 ± 1% turbostat.Avg_MHz
0.19 ± 0% +67.1% 0.32 ± 1% turbostat.CPU%c3
20.25 ± 4% -58.4% 8.41 ± 3% turbostat.Pkg%pc2
279.46 ± 0% -4.3% 267.33 ± 0% turbostat.PkgWatt
808.25 ± 23% +65.2% 1335 ± 18% numa-meminfo.node0.PageTables
6746 ±173% +210.1% 20919 ± 48% numa-meminfo.node1.AnonHugePages
42119 ± 6% -27.7% 30432 ± 29% numa-meminfo.node2.Active
35629 ± 6% -32.6% 24019 ± 37% numa-meminfo.node2.Active(anon)
26794 ± 7% -62.2% 10127 ± 71% numa-meminfo.node2.AnonHugePages
34316 ± 5% -52.0% 16479 ± 55% numa-meminfo.node2.AnonPages
521152 ± 3% +19.9% 625041 ± 9% numa-meminfo.node3.MemUsed
13644 ± 31% -65.4% 4725 ± 86% numa-meminfo.node3.Shmem
201.75 ± 23% +62.6% 328.00 ± 17% numa-vmstat.node0.nr_page_table_pages
8909 ± 6% -32.6% 6008 ± 37% numa-vmstat.node2.nr_active_anon
8581 ± 5% -52.0% 4122 ± 55% numa-vmstat.node2.nr_anon_pages
8909 ± 6% -32.6% 6008 ± 37% numa-vmstat.node2.nr_zone_active_anon
12.00 ± 5% -70.8% 3.50 ±109% numa-vmstat.node2.numa_other
3410 ± 31% -65.4% 1181 ± 86% numa-vmstat.node3.nr_shmem
82956 ± 2% +6.7% 88479 ± 2% numa-vmstat.node3.numa_foreign
82956 ± 2% +6.7% 88479 ± 2% numa-vmstat.node3.numa_miss
1.02e+12 ± 8% -40.2% 6.097e+11 ± 6% perf-stat.branch-instructions
0.40 ± 9% +133.3% 0.93 ± 4% perf-stat.branch-miss-rate%
4.049e+09 ± 4% +40.0% 5.668e+09 ± 1% perf-stat.branch-misses
9.08 ± 3% -22.5% 7.04 ± 2% perf-stat.cache-miss-rate%
2.399e+09 ± 2% -6.8% 2.236e+09 ± 2% perf-stat.cache-misses
2.644e+10 ± 2% +20.1% 3.176e+10 ± 0% perf-stat.cache-references
7515466 ± 0% -1.4% 7410587 ± 0% perf-stat.context-switches
1.611e+13 ± 8% -38.7% 9.871e+12 ± 6% perf-stat.cpu-cycles
84523 ± 0% +26.1% 106582 ± 1% perf-stat.cpu-migrations
0.17 ± 0% +89.9% 0.32 ± 2% perf-stat.dTLB-load-miss-rate%
1.562e+09 ± 0% +16.3% 1.817e+09 ± 1% perf-stat.dTLB-load-misses
9.298e+11 ± 0% -38.8% 5.687e+11 ± 1% perf-stat.dTLB-loads
2.945e+08 ± 1% +11.0% 3.269e+08 ± 0% perf-stat.dTLB-store-misses
2.385e+11 ± 4% +7.9% 2.574e+11 ± 0% perf-stat.dTLB-stores
53.06 ± 0% +5.9% 56.17 ± 2% perf-stat.iTLB-load-miss-rate%
4.535e+08 ± 1% +10.7% 5.022e+08 ± 4% perf-stat.iTLB-load-misses
4.011e+08 ± 0% -2.4% 3.915e+08 ± 0% perf-stat.iTLB-loads
4.305e+12 ± 7% -38.1% 2.666e+12 ± 6% perf-stat.instructions
9488 ± 7% -43.9% 5318 ± 8% perf-stat.instructions-per-iTLB-miss
0.27 ± 0% +1.0% 0.27 ± 0% perf-stat.ipc
99.18 ± 0% -1.4% 97.79 ± 0% perf-stat.node-load-miss-rate%
1.427e+09 ± 1% -11.8% 1.258e+09 ± 2% perf-stat.node-load-misses
11841576 ± 0% +140.6% 28485390 ± 3% perf-stat.node-loads
78.98 ± 0% -4.9% 75.11 ± 0% perf-stat.node-store-miss-rate%
3.361e+08 ± 0% -10.5% 3.008e+08 ± 0% perf-stat.node-store-misses
89434382 ± 2% +11.4% 99662914 ± 2% perf-stat.node-stores
276.81 ± 8% +76.5% 488.51 ± 1% sched_debug.cfs_rq:/.exec_clock.min
193902 ± 41% -39.1% 118044 ± 44% sched_debug.cfs_rq:/.load.max
182.25 ± 5% +35.5% 247.03 ± 7% sched_debug.cfs_rq:/.load_avg.avg
297.25 ± 9% +55.3% 461.54 ± 2% sched_debug.cfs_rq:/.load_avg.max
142.33 ± 5% +25.0% 177.96 ± 9% sched_debug.cfs_rq:/.load_avg.min
27.50 ± 13% +86.3% 51.24 ± 13% sched_debug.cfs_rq:/.load_avg.stddev
52563 ± 12% +64.8% 86617 ± 9% sched_debug.cfs_rq:/.min_vruntime.avg
111274 ± 8% +103.8% 226757 ± 14% sched_debug.cfs_rq:/.min_vruntime.max
15219 ± 15% +126.2% 34421 ± 6% sched_debug.cfs_rq:/.min_vruntime.min
20511 ± 6% +96.3% 40265 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev
0.31 ± 5% +334.2% 1.33 ± 38% sched_debug.cfs_rq:/.runnable_load_avg.avg
28.79 ± 5% +173.1% 78.62 ± 34% sched_debug.cfs_rq:/.runnable_load_avg.max
2.55 ± 4% +219.9% 8.15 ± 35%
sched_debug.cfs_rq:/.runnable_load_avg.stddev
-34683 ±-23% +81.7% -63037 ± -9% sched_debug.cfs_rq:/.spread0.avg
24059 ± 54% +220.6% 77132 ± 33% sched_debug.cfs_rq:/.spread0.max
-72035 ±-15% +60.0% -115239 ± -9% sched_debug.cfs_rq:/.spread0.min
20515 ± 6% +96.3% 40269 ± 2% sched_debug.cfs_rq:/.spread0.stddev
85.16 ± 4% +162.5% 223.52 ± 3% sched_debug.cfs_rq:/.util_avg.avg
393.46 ± 8% +117.0% 853.88 ± 10% sched_debug.cfs_rq:/.util_avg.max
48.57 ± 4% +260.1% 174.88 ± 3% sched_debug.cfs_rq:/.util_avg.stddev
179135 ± 5% +20.2% 215299 ± 4% sched_debug.cpu.avg_idle.stddev
0.30 ± 11% +262.6% 1.07 ± 45% sched_debug.cpu.cpu_load[0].avg
28.37 ± 8% +112.3% 60.25 ± 44% sched_debug.cpu.cpu_load[0].max
2.49 ± 8% +141.4% 6.00 ± 49% sched_debug.cpu.cpu_load[0].stddev
0.32 ± 7% +2166.1% 7.17 ± 13% sched_debug.cpu.cpu_load[1].avg
27.75 ± 5% +403.9% 139.83 ± 17% sched_debug.cpu.cpu_load[1].max
2.45 ± 5% +762.8% 21.16 ± 10% sched_debug.cpu.cpu_load[1].stddev
0.30 ± 7% +1627.4% 5.21 ± 11% sched_debug.cpu.cpu_load[2].avg
26.00 ± 5% +295.7% 102.88 ± 18% sched_debug.cpu.cpu_load[2].max
2.29 ± 5% +556.8% 15.02 ± 9% sched_debug.cpu.cpu_load[2].stddev
0.29 ± 7% +1185.9% 3.78 ± 10% sched_debug.cpu.cpu_load[3].avg
23.38 ± 4% +207.7% 71.92 ± 15% sched_debug.cpu.cpu_load[3].max
2.11 ± 5% +381.6% 10.15 ± 6% sched_debug.cpu.cpu_load[3].stddev
0.27 ± 8% +890.3% 2.71 ± 10% sched_debug.cpu.cpu_load[4].avg
19.08 ± 8% +152.2% 48.12 ± 15% sched_debug.cpu.cpu_load[4].max
1.81 ± 8% +268.4% 6.68 ± 5% sched_debug.cpu.cpu_load[4].stddev
195679 ± 41% -40.0% 117329 ± 45% sched_debug.cpu.load.max
4594 ± 9% +75.9% 8079 ± 1% sched_debug.cpu.nr_switches.min
-28.83 ±-48% -58.4% -12.00 ±-21% sched_debug.cpu.nr_uninterruptible.min
4.73 ± 14% -28.2% 3.40 ± 11% sched_debug.cpu.nr_uninterruptible.stddev
4317 ± 9% +80.7% 7800 ± 1% sched_debug.cpu.sched_count.min
2102 ± 9% +81.1% 3807 ± 1% sched_debug.cpu.sched_goidle.min
2333 ± 8% +68.8% 3938 ± 1% sched_debug.cpu.ttwu_count.min
102.08 ± 2% +38.4% 141.29 ± 4% sched_debug.cpu.ttwu_local.min
0.81 ± 8% +110.5% 1.70 ± 11%
perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
0.58 ± 7% +78.1% 1.04 ± 10%
perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
1.31 ± 4% +54.4% 2.02 ± 10%
perf-profile.calltrace.cycles-pp._do_fork.sys_clone.do_syscall_64.return_from_SYSCALL_64
2.25 ± 10% +68.9% 3.80 ± 7%
perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
1.14 ± 3% +56.7% 1.78 ± 10%
perf-profile.calltrace.cycles-pp.copy_process._do_fork.sys_clone.do_syscall_64.return_from_SYSCALL_64
1.62 ± 9% +50.3% 2.44 ± 7%
perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.sys_exit_group.entry_SYSCALL_64_fastpath
1.64 ± 9% +50.3% 2.46 ± 7%
perf-profile.calltrace.cycles-pp.do_group_exit.sys_exit_group.entry_SYSCALL_64_fastpath
0.81 ± 7% +111.1% 1.71 ± 11%
perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
1.31 ± 4% +54.6% 2.02 ± 10%
perf-profile.calltrace.cycles-pp.do_syscall_64.return_from_SYSCALL_64
1.83 ± 7% +51.8% 2.77 ± 8%
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
1.20 ± 11% +64.7% 1.98 ± 7%
perf-profile.calltrace.cycles-pp.exit_mmap.mmput.do_exit.do_group_exit.sys_exit_group
0.00 ± -1% +Inf% 0.87 ± 16%
perf-profile.calltrace.cycles-pp.filemap_map_pages.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
0.69 ± 8% +111.6% 1.46 ± 13%
perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
0.79 ± 5% +75.0% 1.38 ± 10%
perf-profile.calltrace.cycles-pp.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter
32.78 ± 0% +15.3% 37.81 ± 2%
perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
0.66 ± 22% +48.7% 0.99 ± 7%
perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
0.85 ± 6% +77.6% 1.50 ± 9%
perf-profile.calltrace.cycles-pp.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
1.21 ± 11% +63.9% 1.99 ± 7%
perf-profile.calltrace.cycles-pp.mmput.do_exit.do_group_exit.sys_exit_group.entry_SYSCALL_64_fastpath
0.81 ± 8% +110.1% 1.71 ± 11% perf-profile.calltrace.cycles-pp.page_fault
58.29 ± 0% -17.4% 48.18 ± 4%
perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
1.31 ± 4% +54.6% 2.02 ± 10%
perf-profile.calltrace.cycles-pp.return_from_SYSCALL_64
2.09 ± 11% +69.0% 3.54 ± 7%
perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry
1.31 ± 4% +54.4% 2.02 ± 10%
perf-profile.calltrace.cycles-pp.sys_clone.do_syscall_64.return_from_SYSCALL_64
1.64 ± 9% +50.3% 2.46 ± 7%
perf-profile.calltrace.cycles-pp.sys_exit_group.entry_SYSCALL_64_fastpath
0.85 ± 7% +107.0% 1.77 ± 11%
perf-profile.children.cycles-pp.__do_page_fault
0.63 ± 6% +76.6% 1.11 ± 9%
perf-profile.children.cycles-pp.__hrtimer_run_queues
1.32 ± 4% +54.4% 2.03 ± 10% perf-profile.children.cycles-pp._do_fork
2.60 ± 7% +57.3% 4.09 ± 6%
perf-profile.children.cycles-pp.apic_timer_interrupt
1.15 ± 3% +55.9% 1.79 ± 10%
perf-profile.children.cycles-pp.copy_process
1.62 ± 9% +50.3% 2.44 ± 7% perf-profile.children.cycles-pp.do_exit
1.64 ± 9% +50.5% 2.47 ± 7%
perf-profile.children.cycles-pp.do_group_exit
0.86 ± 8% +107.6% 1.78 ± 11%
perf-profile.children.cycles-pp.do_page_fault
1.33 ± 3% +54.5% 2.06 ± 10%
perf-profile.children.cycles-pp.do_syscall_64
1.92 ± 7% +52.2% 2.92 ± 8%
perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
1.21 ± 11% +65.0% 1.99 ± 7% perf-profile.children.cycles-pp.exit_mmap
0.40 ± 16% +120.1% 0.88 ± 16%
perf-profile.children.cycles-pp.filemap_map_pages
0.70 ± 8% +111.0% 1.48 ± 13%
perf-profile.children.cycles-pp.handle_mm_fault
0.85 ± 5% +73.1% 1.46 ± 10%
perf-profile.children.cycles-pp.hrtimer_interrupt
32.86 ± 0% +15.8% 38.04 ± 3% perf-profile.children.cycles-pp.intel_idle
0.73 ± 20% +48.3% 1.08 ± 5% perf-profile.children.cycles-pp.irq_exit
0.90 ± 5% +76.2% 1.59 ± 9%
perf-profile.children.cycles-pp.local_apic_timer_interrupt
1.22 ± 12% +64.1% 2.00 ± 7% perf-profile.children.cycles-pp.mmput
0.86 ± 8% +107.6% 1.79 ± 11% perf-profile.children.cycles-pp.page_fault
58.52 ± 0% -17.4% 48.32 ± 4% perf-profile.children.cycles-pp.poll_idle
1.33 ± 3% +54.5% 2.06 ± 10%
perf-profile.children.cycles-pp.return_from_SYSCALL_64
2.46 ± 8% +56.5% 3.86 ± 6%
perf-profile.children.cycles-pp.smp_apic_timer_interrupt
1.32 ± 4% +54.0% 2.02 ± 10% perf-profile.children.cycles-pp.sys_clone
1.64 ± 9% +50.5% 2.47 ± 7%
perf-profile.children.cycles-pp.sys_exit_group
32.85 ± 0% +15.8% 38.04 ± 3% perf-profile.self.cycles-pp.intel_idle
58.52 ± 0% -17.4% 48.32 ± 4% perf-profile.self.cycles-pp.poll_idle
perf-stat.dTLB-loads
1.2e+12 ++----------------------------------------------------------------+
| .* |
1e+12 *+ .*. + .*. .*. .*.. .*. |
| * *.*.*.* * *.*..*.* * * *.*.*..*.* *.*..*.*.*
| : : : : |
8e+11 ++: : : : |
| : : O O O : : |
6e+11 ++O: :O O O O O O O O O O : : |
O :O : O O O : : |
4e+11 ++ : : : : |
| : : : : |
| : : : : |
2e+11 ++ :: : |
| : : |
0 ++--*--------------------------------------------------*----------+
perf-stat.node-loads
3.5e+07 ++----------------------------------------------------------------+
| |
3e+07 ++ O O O O O O |
O O O O O O O O O |
2.5e+07 ++ O O O O |
| |
2e+07 ++ |
| |
1.5e+07 ++ |
*.* *.*. .*.*..*.*.*.*.*..*.*.*.*.*..*.*.*.*.*..*.* *.*..*.*.*
1e+07 ++: : * : : |
| : : : : |
5e+06 ++ : : : : |
| :: : |
0 ++--*--------------------------------------------------*----------+
perf-stat.cpu-migrations
120000 ++-----------------------------------------------------------------+
| O O O O |
100000 O+O O O O O O O O O O O O O O |
| |
*.* *.*. .*..*.*.*.*..*.*.*.*..*.*.*.*.*..*.*.*.*..* *.*..*.*.*
80000 ++: : * : : |
| : : : : |
60000 ++ : : : : |
| : : : : |
40000 ++ : : : : |
| : : : : |
| : : : : |
20000 ++ :: : |
| : : |
0 ++--*---------------------------------------------------*----------+
perf-stat.node-load-miss-rate_
100 O+O--O-O-O--O-O-O--O-O-O--O-O-O--O-O-O--O-O-*--*-*-*--*-*----*-*-*--*-*
90 ++: : : : |
| : : : : |
80 ++ : : : : |
70 ++ : : : : |
| : : : : |
60 ++ : : : : |
50 ++ : : : : |
40 ++ : : : : |
| : : : : |
30 ++ : : : : |
20 ++ :: :: |
| : : |
10 ++ : : |
0 ++---*----------------------------------------------------*-----------+
turbostat.Avg_MHz
400 ++--------------------------------------------------------------------+
| .*..* .* |
350 *+* *.*..*.* + .*.. .* + .*.*.*..*. .*..*.* *. .*
300 ++: : * *.*.*. *. * : : *.*..* |
| : : : : |
250 ++ : : : : |
| : : O O O O O : : |
200 O+O: O:O O O O O O O O O O : : |
| : : O : : |
150 ++ : : : : |
100 ++ : : : : |
| : : :: |
50 ++ : :: |
| : : |
0 ++---*----------------------------------------------------*-----------+
turbostat._Busy
14 ++---------------------------------------------------------------------+
*. .*.*. .*. .*. |
12 ++* *.*..*.*. *. *.*..*.*. *.*..*.*.*..*.*.*..* *.*.*..*.*
| : : : : |
10 ++ : : : : |
| : : : : |
8 ++ : : O O O O O : : |
O O: O:O O O O O O O O O O : : |
6 ++ : : O : : |
| : : : : |
4 ++ : : : : |
| : : :: |
2 ++ : :: |
| : : |
0 ++---*-----------------------------------------------------*-----------+
aim9.time.involuntary_context_switches
25000 ++------------------------------------------------------------------+
| |
O O O O O O O O O O O O O O O O |
20000 ++ O O O |
| |
| |
15000 ++ |
| |
10000 ++ *. .*. .*..*. .*.* |
*.* : *.*. *.* *.*.*..*.*.*..*.*.*.*..*.* : *.*.*..*.*
| : : : : |
5000 ++ : : : : |
| : : : : |
| :: : |
0 ++---*---------------------------------------------------*----------+
[*] bisect-good sample
[O] bisect-bad sample
Thanks,
Xiaolong