FYI, we noticed a 95.9% improvement of pigz.throughput due to commit:
commit 0329dacce87d6741b391d73cab1ec831a3262d6b ("sched/fair: Fix
effective_load()")
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent
in testcase: pigz
on test machine: Sandy Bridge-EP with 32G memory
with following parameters: blocksize=128K/cpufreq_governor=performance/nr_threads=100%
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
blocksize/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
128K/gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-snb01/pigz
commit:
feb245e304 ("sched/core: Allow kthreads to fall back to online && !active
cpus")
0329dacce8 ("sched/fair: Fix effective_load()")
feb245e304f343cf 0329dacce87d6741b391d73cab
---------------- --------------------------
%stddev %change %stddev
\ | \
1.689e+08 ± 0% +95.9% 3.308e+08 ± 0% pigz.throughput
404296 ± 1% +262.9% 1467248 ± 6% pigz.time.involuntary_context_switches
495105 ± 12% -26.2% 365286 ± 2% pigz.time.minor_page_faults
1411 ± 0% +102.6% 2860 ± 0% pigz.time.percent_of_cpu_this_job_got
53.11 ± 0% +96.0% 104.11 ± 0% pigz.time.system_time
4186 ± 0% +102.6% 8481 ± 0% pigz.time.user_time
1634887 ± 0% +138.2% 3894693 ± 0% pigz.time.voluntary_context_switches
13893 ± 7% +16.1% 16136 ± 6% meminfo.AnonHugePages
11155 ± 1% +17.9% 13152 ± 4% meminfo.Shmem
29496 ± 41% -66.7% 9811 ± 18%
latency_stats.sum.call_rwsem_down_write_failed_killable.SyS_mprotect.entry_SYSCALL_64_fastpath
345745 ± 7% -83.7% 56432 ± 65%
latency_stats.sum.call_rwsem_down_write_failed_killable.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath
45562179 ± 0% +223.8% 1.475e+08 ± 4%
latency_stats.sum.pipe_wait.pipe_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
262132 ± 7% +175.1% 721086 ± 6% softirqs.RCU
222907 ± 5% -13.4% 193136 ± 3% softirqs.SCHED
2242938 ± 1% +96.2% 4399919 ± 1% softirqs.TIMER
21.75 ± 1% +37.9% 30.00 ± 0% vmstat.procs.r
11427 ± 0% +123.2% 25506 ± 0% vmstat.system.cs
35564 ± 0% +6.6% 37905 ± 0% vmstat.system.in
1556 ± 6% +25.5% 1953 ± 7% numa-meminfo.node0.PageTables
3292 ±127% +185.5% 9397 ± 32% numa-meminfo.node0.Shmem
6724 ± 52% -62.8% 2498 ±140% numa-meminfo.node1.Inactive(anon)
2249 ± 4% -16.3% 1882 ± 8% numa-meminfo.node1.PageTables
24242 ± 6% -15.5% 20477 ± 13% numa-meminfo.node1.SReclaimable
8337913 ± 9% +101.4% 16794745 ± 1% numa-numastat.node0.local_node
8337914 ± 9% +101.4% 16794750 ± 1% numa-numastat.node0.numa_hit
0.75 ±110% +500.0% 4.50 ± 45% numa-numastat.node0.other_node
8752444 ± 8% +80.6% 15803230 ± 1% numa-numastat.node1.local_node
8752446 ± 8% +80.6% 15803231 ± 1% numa-numastat.node1.numa_hit
44132035 ± 1% -17.8% 36258196 ± 6% cpuidle.C1-SNB.time
57897133 ± 0% +160.8% 1.51e+08 ± 6% cpuidle.C1E-SNB.time
445452 ± 0% +151.6% 1120681 ± 6% cpuidle.C1E-SNB.usage
52296865 ± 0% +99.7% 1.045e+08 ± 6% cpuidle.C3-SNB.time
189453 ± 0% +103.8% 386071 ± 6% cpuidle.C3-SNB.usage
4.957e+09 ± 0% -85.9% 6.996e+08 ± 6% cpuidle.C7-SNB.time
5532860 ± 0% -81.4% 1030940 ± 6% cpuidle.C7-SNB.usage
12144693 ± 13% -80.7% 2338103 ± 13% cpuidle.POLL.time
4589 ± 2% +45.9% 6698 ± 5% cpuidle.POLL.usage
388.25 ± 6% +25.8% 488.25 ± 7% numa-vmstat.node0.nr_page_table_pages
822.75 ±127% +185.5% 2349 ± 32% numa-vmstat.node0.nr_shmem
4359980 ± 12% +92.1% 8377668 ± 1% numa-vmstat.node0.numa_hit
4359979 ± 12% +92.1% 8377665 ± 1% numa-vmstat.node0.numa_local
0.25 ±173% +1000.0% 2.75 ± 30% numa-vmstat.node0.numa_other
1680 ± 52% -62.9% 624.25 ±140% numa-vmstat.node1.nr_inactive_anon
561.75 ± 4% -16.3% 470.25 ± 8% numa-vmstat.node1.nr_page_table_pages
6060 ± 6% -15.5% 5119 ± 13% numa-vmstat.node1.nr_slab_reclaimable
4386634 ± 12% +84.9% 8110181 ± 1% numa-vmstat.node1.numa_hit
4386632 ± 12% +84.9% 8110180 ± 1% numa-vmstat.node1.numa_local
45.60 ± 0% +96.1% 89.40 ± 0% turbostat.%Busy
1392 ± 0% +98.7% 2766 ± 0% turbostat.Avg_MHz
47.72 ± 0% -81.4% 8.88 ± 6% turbostat.CPU%c1
1.26 ± 3% -66.7% 0.42 ± 6% turbostat.CPU%c3
5.42 ± 2% -76.2% 1.29 ± 7% turbostat.CPU%c7
103.49 ± 0% +64.6% 170.30 ± 0% turbostat.CorWatt
52.50 ± 2% +16.7% 61.25 ± 2% turbostat.CoreTmp
0.52 ± 6% -94.7% 0.03 ± 15% turbostat.Pkg%pc2
51.75 ± 2% +18.8% 61.50 ± 1% turbostat.PkgTmp
130.10 ± 0% +51.5% 197.10 ± 0% turbostat.PkgWatt
2788 ± 1% +17.9% 3287 ± 4% proc-vmstat.nr_shmem
455876 ± 13% -28.2% 327389 ± 2% proc-vmstat.numa_hint_faults
322855 ± 16% -37.3% 202339 ± 2% proc-vmstat.numa_hint_faults_local
17088791 ± 0% +90.7% 32595705 ± 0% proc-vmstat.numa_hit
17088788 ± 0% +90.7% 32595700 ± 0% proc-vmstat.numa_local
53382 ± 5% -16.2% 44721 ± 6% proc-vmstat.numa_pages_migrated
627932 ± 14% -26.1% 464325 ± 2% proc-vmstat.numa_pte_updates
1779 ± 5% +44.6% 2573 ± 9% proc-vmstat.pgactivate
1575353 ± 11% +104.0% 3214030 ± 1% proc-vmstat.pgalloc_dma32
15572701 ± 1% +89.0% 29435610 ± 0% proc-vmstat.pgalloc_normal
1133203 ± 5% -11.7% 1001016 ± 0% proc-vmstat.pgfault
17133382 ± 0% +90.5% 32634133 ± 0% proc-vmstat.pgfree
53382 ± 5% -16.2% 44721 ± 6% proc-vmstat.pgmigrate_success
3.183e+11 ± 0% +97.2% 6.277e+11 ± 0% perf-stat.L1-dcache-load-misses
4.312e+12 ± 0% +93.4% 8.339e+12 ± 0% perf-stat.L1-dcache-loads
1.881e+10 ± 0% +96.3% 3.693e+10 ± 0% perf-stat.L1-dcache-prefetch-misses
9.274e+10 ± 0% +99.5% 1.851e+11 ± 2% perf-stat.L1-dcache-store-misses
1.865e+12 ± 0% +94.6% 3.629e+12 ± 0% perf-stat.L1-dcache-stores
2.966e+09 ± 0% +57.6% 4.674e+09 ± 0% perf-stat.L1-icache-load-misses
8.855e+08 ± 0% +157.4% 2.279e+09 ± 3% perf-stat.LLC-load-misses
5.42e+10 ± 0% +116.6% 1.174e+11 ± 1% perf-stat.LLC-loads
5.889e+08 ± 2% +18.3% 6.969e+08 ± 1% perf-stat.LLC-prefetch-misses
2.012e+10 ± 0% +100.6% 4.035e+10 ± 1% perf-stat.LLC-prefetches
1.849e+09 ± 3% +57.5% 2.912e+09 ± 1% perf-stat.LLC-store-misses
9.486e+09 ± 1% +94.6% 1.846e+10 ± 0% perf-stat.LLC-stores
1.865e+12 ± 1% +94.0% 3.616e+12 ± 0% perf-stat.branch-instructions
1.111e+11 ± 0% +96.3% 2.18e+11 ± 0% perf-stat.branch-load-misses
1.848e+12 ± 0% +95.7% 3.616e+12 ± 0% perf-stat.branch-loads
1.123e+11 ± 0% +95.4% 2.194e+11 ± 0% perf-stat.branch-misses
4.26e+11 ± 0% +100.0% 8.519e+11 ± 1% perf-stat.bus-cycles
1.64e+09 ± 0% +233.7% 5.474e+09 ± 51% perf-stat.cache-misses
6.13e+10 ± 1% +118.4% 1.339e+11 ± 0% perf-stat.cache-references
3457574 ± 0% +122.7% 7698507 ± 0% perf-stat.context-switches
1.322e+13 ± 0% +99.7% 2.641e+13 ± 0% perf-stat.cpu-cycles
4.322e+12 ± 1% +93.6% 8.369e+12 ± 0% perf-stat.dTLB-loads
1.867e+12 ± 0% +94.3% 3.628e+12 ± 0% perf-stat.dTLB-stores
1.348e+13 ± 1% +95.8% 2.639e+13 ± 0% perf-stat.instructions
1120896 ± 5% -12.0% 986347 ± 0% perf-stat.minor-faults
5.064e+08 ± 11% +245.9% 1.752e+09 ± 3% perf-stat.node-load-misses
9.531e+08 ± 9% +178.3% 2.653e+09 ± 16% perf-stat.node-loads
2.826e+08 ± 3% +60.9% 4.548e+08 ± 2% perf-stat.node-prefetch-misses
5.825e+08 ± 0% +17.8% 6.859e+08 ± 2% perf-stat.node-prefetches
8.244e+08 ± 4% +138.1% 1.963e+09 ± 0% perf-stat.node-store-misses
1.881e+09 ± 2% +49.6% 2.815e+09 ± 1% perf-stat.node-stores
1120891 ± 5% -12.0% 986320 ± 0% perf-stat.page-faults
1.149e+13 ± 0% +99.2% 2.29e+13 ± 0% perf-stat.ref-cycles
5.774e+12 ± 1% +66.8% 9.628e+12 ± 0% perf-stat.stalled-cycles-backend
9.923e+12 ± 0% +86.2% 1.848e+13 ± 0% perf-stat.stalled-cycles-frontend
66057 ± 0% +102.5% 133791 ± 0% sched_debug.cfs_rq:/.exec_clock.avg
80710 ± 4% +68.0% 135631 ± 0% sched_debug.cfs_rq:/.exec_clock.max
54468 ± 7% +141.7% 131665 ± 0% sched_debug.cfs_rq:/.exec_clock.min
8796 ± 36% -86.4% 1195 ± 11% sched_debug.cfs_rq:/.exec_clock.stddev
499977 ± 5% +60.9% 804652 ± 3% sched_debug.cfs_rq:/.load.avg
507458 ± 0% -45.6% 275869 ± 14% sched_debug.cfs_rq:/.load.stddev
363.39 ± 1% +105.2% 745.57 ± 1% sched_debug.cfs_rq:/.load_avg.avg
8.12 ± 57% +5597.9% 462.96 ± 6% sched_debug.cfs_rq:/.load_avg.min
361.56 ± 3% -66.5% 121.09 ± 10% sched_debug.cfs_rq:/.load_avg.stddev
68713 ± 0% +102.4% 139061 ± 0% sched_debug.cfs_rq:/.min_vruntime.avg
92025 ± 6% +64.1% 150977 ± 1% sched_debug.cfs_rq:/.min_vruntime.max
55624 ± 7% +144.2% 135862 ± 0% sched_debug.cfs_rq:/.min_vruntime.min
10246 ± 31% -68.7% 3210 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
0.49 ± 5% +62.3% 0.79 ± 3% sched_debug.cfs_rq:/.nr_running.avg
0.49 ± 0% -45.4% 0.27 ± 14% sched_debug.cfs_rq:/.nr_running.stddev
286.06 ± 8% +128.7% 654.30 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.avg
336.03 ± 2% -43.5% 189.87 ± 9%
sched_debug.cfs_rq:/.runnable_load_avg.stddev
-29098 ±-21% -57.1% -12469 ± -8% sched_debug.cfs_rq:/.spread0.min
10247 ± 31% -68.6% 3212 ± 9% sched_debug.cfs_rq:/.spread0.stddev
379.13 ± 1% +105.1% 777.77 ± 1% sched_debug.cfs_rq:/.util_avg.avg
12.42 ± 62% +3921.5% 499.33 ± 5% sched_debug.cfs_rq:/.util_avg.min
351.40 ± 2% -71.8% 99.21 ± 7% sched_debug.cfs_rq:/.util_avg.stddev
723080 ± 4% -39.2% 439473 ± 10% sched_debug.cpu.avg_idle.avg
281945 ± 3% -12.7% 246227 ± 5% sched_debug.cpu.avg_idle.stddev
1.64 ± 1% +130.3% 3.77 ± 20% sched_debug.cpu.clock.stddev
1.64 ± 1% +130.3% 3.77 ± 20% sched_debug.cpu.clock_task.stddev
273.36 ± 7% +133.2% 637.46 ± 7% sched_debug.cpu.cpu_load[0].avg
333.65 ± 2% -40.9% 197.33 ± 12% sched_debug.cpu.cpu_load[0].stddev
319.03 ± 2% +120.4% 703.30 ± 3% sched_debug.cpu.cpu_load[1].avg
0.12 ±110% +2.3e+05% 281.92 ± 23% sched_debug.cpu.cpu_load[1].min
326.18 ± 2% -58.2% 136.38 ± 14% sched_debug.cpu.cpu_load[1].stddev
309.05 ± 2% +127.6% 703.52 ± 2% sched_debug.cpu.cpu_load[2].avg
0.33 ±122% +94175.0% 314.25 ± 13% sched_debug.cpu.cpu_load[2].min
318.54 ± 2% -61.0% 124.20 ± 10% sched_debug.cpu.cpu_load[2].stddev
302.04 ± 2% +131.3% 698.60 ± 2% sched_debug.cpu.cpu_load[3].avg
0.29 ± 84% +1.1e+05% 321.58 ± 9% sched_debug.cpu.cpu_load[3].min
311.42 ± 2% -62.2% 117.84 ± 8% sched_debug.cpu.cpu_load[3].stddev
299.24 ± 2% +130.7% 690.20 ± 2% sched_debug.cpu.cpu_load[4].avg
0.21 ±131% +1.6e+05% 340.67 ± 10% sched_debug.cpu.cpu_load[4].min
306.33 ± 2% -62.8% 113.99 ± 7% sched_debug.cpu.cpu_load[4].stddev
1836 ± 8% +72.3% 3164 ± 7% sched_debug.cpu.curr->pid.avg
1948 ± 0% -38.9% 1190 ± 16% sched_debug.cpu.curr->pid.stddev
494648 ± 5% +62.4% 803296 ± 3% sched_debug.cpu.load.avg
507009 ± 0% -45.4% 277011 ± 14% sched_debug.cpu.load.stddev
92293 ± 1% +56.1% 144114 ± 0% sched_debug.cpu.nr_load_updates.avg
103158 ± 3% +42.8% 147350 ± 1% sched_debug.cpu.nr_load_updates.max
80689 ± 3% +75.2% 141389 ± 1% sched_debug.cpu.nr_load_updates.min
7246 ± 30% -74.5% 1849 ± 3% sched_debug.cpu.nr_load_updates.stddev
0.72 ± 9% +19.7% 0.86 ± 3% sched_debug.cpu.nr_running.avg
4.58 ± 14% -58.2% 1.92 ± 4% sched_debug.cpu.nr_running.max
1.04 ± 10% -60.0% 0.42 ± 7% sched_debug.cpu.nr_running.stddev
54842 ± 1% +120.8% 121113 ± 1% sched_debug.cpu.nr_switches.avg
83702 ± 6% +81.6% 151983 ± 2% sched_debug.cpu.nr_switches.max
37499 ± 5% +173.0% 102356 ± 2% sched_debug.cpu.nr_switches.min
20.96 ± 9% -37.2% 13.17 ± 15% sched_debug.cpu.nr_uninterruptible.max
-32.58 ±-18% -53.7% -15.08 ±-35% sched_debug.cpu.nr_uninterruptible.min
11.70 ± 6% -49.4% 5.92 ± 11% sched_debug.cpu.nr_uninterruptible.stddev
57158 ± 1% +114.8% 122752 ± 1% sched_debug.cpu.sched_count.avg
138444 ± 17% +32.5% 183397 ± 9% sched_debug.cpu.sched_count.max
37234 ± 6% +173.6% 101869 ± 2% sched_debug.cpu.sched_count.min
18192 ± 1% +66.3% 30260 ± 6% sched_debug.cpu.sched_goidle.avg
31689 ± 8% +39.5% 44207 ± 4% sched_debug.cpu.sched_goidle.max
11469 ± 5% +104.6% 23471 ± 6% sched_debug.cpu.sched_goidle.min
28714 ± 1% +127.8% 65424 ± 1% sched_debug.cpu.ttwu_count.avg
40305 ± 10% +112.9% 85821 ± 6% sched_debug.cpu.ttwu_count.max
20231 ± 4% +175.8% 55807 ± 1% sched_debug.cpu.ttwu_count.min
8226 ± 1% +235.8% 27625 ± 5% sched_debug.cpu.ttwu_local.avg
10307 ± 4% +224.0% 33392 ± 4% sched_debug.cpu.ttwu_local.max
6505 ± 6% +275.7% 24443 ± 4% sched_debug.cpu.ttwu_local.min
1145 ± 37% +64.4% 1882 ± 7% sched_debug.cpu.ttwu_local.stddev
0.54 ± 25% +55.9% 0.85 ± 6% sched_debug.rt_rq:/.rt_time.max
0.12 ± 34% +45.6% 0.17 ± 11% sched_debug.rt_rq:/.rt_time.stddev
2.64 ± 11% +101.9% 5.33 ± 11%
perf-profile.cycles.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate
1.65 ± 15% +78.2% 2.95 ± 10%
perf-profile.cycles.__alloc_pages_nodemask.alloc_pages_current.pipe_write.__vfs_write.vfs_write
1.56 ± 29% -54.7% 0.71 ± 70%
perf-profile.cycles.__const_udelay.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write
1.94 ± 12% -100.0% 0.00 ± -1%
perf-profile.cycles.__do_softirq.irq_exit.scheduler_ipi.smp_reschedule_interrupt.reschedule_interrupt
0.32 ±100% +391.4% 1.57 ± 22%
perf-profile.cycles.__do_softirq.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
9.07 ± 7% +39.8% 12.69 ± 5%
perf-profile.cycles.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
1.17 ± 28% +74.8% 2.05 ± 14%
perf-profile.cycles.__put_page.anon_pipe_buf_release.pipe_read.__vfs_read.vfs_read
1.23 ± 17% -100.0% 0.00 ± -1%
perf-profile.cycles.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
17.38 ± 12% +54.2% 26.80 ± 1%
perf-profile.cycles.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath
11.99 ± 8% +53.8% 18.43 ± 5%
perf-profile.cycles.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
0.70 ± 63% +118.7% 1.52 ± 14%
perf-profile.cycles.__wake_up_common.__wake_up_sync_key.pipe_read.__vfs_read.vfs_read
0.70 ± 63% +131.3% 1.61 ± 17%
perf-profile.cycles.__wake_up_sync_key.pipe_read.__vfs_read.vfs_read.sys_read
1.69 ± 23% -100.0% 0.00 ± -1%
perf-profile.cycles._raw_spin_lock.tick_do_update_jiffies64.tick_irq_enter.irq_enter.smp_apic_timer_interrupt
0.00 ± -1% +Inf% 1.54 ± 14%
perf-profile.cycles.activate_task.ttwu_do_activate.sched_ttwu_pending.cpu_startup_entry.start_secondary
0.36 ±100% +220.8% 1.15 ± 4%
perf-profile.cycles.activate_task.ttwu_do_activate.try_to_wake_up.default_wake_function.autoremove_wake_function
1.09 ± 17% +83.1% 2.00 ± 14%
perf-profile.cycles.activate_task.ttwu_do_activate.try_to_wake_up.wake_up_q.futex_requeue
1.86 ± 16% +81.1% 3.37 ± 7%
perf-profile.cycles.alloc_pages_current.pipe_write.__vfs_write.vfs_write.sys_write
0.45 ± 59% +180.0% 1.26 ± 11%
perf-profile.cycles.anon_pipe_buf_release.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath
1.22 ± 23% +75.4% 2.14 ± 12%
perf-profile.cycles.anon_pipe_buf_release.pipe_read.__vfs_read.vfs_read.sys_read
11.52 ± 4% +65.0% 19.02 ± 9% perf-profile.cycles.apic_timer_interrupt
9.30 ± 9% -76.1% 2.23 ± 17%
perf-profile.cycles.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
0.70 ± 63% +118.7% 1.52 ± 14%
perf-profile.cycles.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.pipe_read.__vfs_read
5.55 ± 30% -49.0% 2.83 ± 56%
perf-profile.cycles.call_console_drivers.constprop.23.console_unlock.vprintk_emit.vprintk_default.printk
2.76 ± 41% -87.1% 0.35 ±101%
perf-profile.cycles.call_cpuidle.cpu_startup_entry.rest_init.start_kernel.x86_64_start_reservations
29.61 ± 7% -74.2% 7.63 ± 16%
perf-profile.cycles.call_cpuidle.cpu_startup_entry.start_secondary
5.55 ± 30% -49.0% 2.83 ± 56%
perf-profile.cycles.console_unlock.vprintk_emit.vprintk_default.printk.perf_duration_warn
6.93 ± 9% +65.5% 11.46 ± 7%
perf-profile.cycles.copy_page_from_iter.pipe_write.__vfs_write.vfs_write.sys_write
9.86 ± 13% +61.9% 15.95 ± 0%
perf-profile.cycles.copy_page_to_iter.pipe_read.__vfs_read.vfs_read.sys_read
6.19 ± 9% +69.5% 10.49 ± 8%
perf-profile.cycles.copy_user_generic_string.copy_page_from_iter.pipe_write.__vfs_write.vfs_write
9.35 ± 15% +62.1% 15.15 ± 0%
perf-profile.cycles.copy_user_generic_string.copy_page_to_iter.pipe_read.__vfs_read.vfs_read
2.84 ± 38% -82.5% 0.50 ±107%
perf-profile.cycles.cpu_startup_entry.rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel
33.50 ± 6% -61.1% 13.03 ± 14%
perf-profile.cycles.cpu_startup_entry.start_secondary
2.76 ± 41% -87.1% 0.35 ±101%
perf-profile.cycles.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init.start_kernel
29.51 ± 7% -74.4% 7.56 ± 16%
perf-profile.cycles.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
1.27 ± 75% -100.0% 0.00 ± -1%
perf-profile.cycles.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init
19.02 ± 14% -74.7% 4.81 ± 16%
perf-profile.cycles.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
0.97 ± 21% -44.0% 0.54 ± 62%
perf-profile.cycles.cpuidle_select.cpu_startup_entry.start_secondary
0.68 ± 61% +124.4% 1.52 ± 14%
perf-profile.cycles.default_wake_function.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.pipe_read
1.56 ± 29% -54.7% 0.71 ± 70%
perf-profile.cycles.delay_tsc.__const_udelay.wait_for_xmitr.serial8250_console_putchar.uart_console_write
9.03 ± 6% +27.8% 11.54 ± 5%
perf-profile.cycles.do_futex.sys_futex.entry_SYSCALL_64_fastpath
2.43 ± 11% +95.6% 4.75 ± 17%
perf-profile.cycles.dump_trace.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
0.00 ± -1% +Inf% 1.48 ± 11%
perf-profile.cycles.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending
3.30 ± 14% +52.4% 5.03 ± 7%
perf-profile.cycles.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
0.00 ± -1% +Inf% 1.51 ± 12%
perf-profile.cycles.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending.cpu_startup_entry
0.36 ±100% +217.4% 1.14 ± 4%
perf-profile.cycles.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.default_wake_function
3.02 ± 7% +34.9% 4.08 ± 8%
perf-profile.cycles.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.wake_up_q
39.66 ± 6% +48.3% 58.83 ± 2%
perf-profile.cycles.entry_SYSCALL_64_fastpath
1.31 ± 20% -73.0% 0.35 ±104%
perf-profile.cycles.find_busiest_group.load_balance.pick_next_task_fair.__schedule.schedule
1.10 ± 31% +66.7% 1.83 ± 15%
perf-profile.cycles.free_hot_cold_page.__put_page.anon_pipe_buf_release.pipe_read.__vfs_read
0.36 ±103% +182.9% 1.03 ± 20%
perf-profile.cycles.free_pcppages_bulk.free_hot_cold_page.__put_page.anon_pipe_buf_release.pipe_read
1.81 ± 28% +101.7% 3.66 ± 10%
perf-profile.cycles.futex_requeue.do_futex.sys_futex.entry_SYSCALL_64_fastpath
4.11 ± 4% -11.5% 3.64 ± 6%
perf-profile.cycles.futex_wait.do_futex.sys_futex.entry_SYSCALL_64_fastpath
3.71 ± 5% -14.8% 3.17 ± 8%
perf-profile.cycles.futex_wait_queue_me.futex_wait.do_futex.sys_futex.entry_SYSCALL_64_fastpath
2.99 ± 14% +25.1% 3.75 ± 8%
perf-profile.cycles.futex_wake.do_futex.sys_futex.entry_SYSCALL_64_fastpath
1.04 ± 27% +107.9% 2.17 ± 11%
perf-profile.cycles.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_current.pipe_write.__vfs_write
9.14 ± 7% +61.1% 14.72 ± 9%
perf-profile.cycles.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
2.44 ± 5% -63.4% 0.90 ± 20%
perf-profile.cycles.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter
14.30 ± 5% -73.8% 3.74 ± 10%
perf-profile.cycles.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
3.37 ± 35% -47.0% 1.78 ± 63%
perf-profile.cycles.io_serial_in.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write
3.49 ± 11% -89.5% 0.36 ±103%
perf-profile.cycles.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
1.96 ± 11% -93.0% 0.14 ±173%
perf-profile.cycles.irq_exit.scheduler_ipi.smp_reschedule_interrupt.reschedule_interrupt.cpuidle_enter
0.77 ± 17% +150.8% 1.94 ± 24%
perf-profile.cycles.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
1.62 ± 9% -100.0% 0.00 ± -1%
perf-profile.cycles.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
4.66 ± 32% -39.2% 2.83 ± 56% perf-profile.cycles.irq_work_interrupt
4.66 ± 32% -39.2% 2.83 ± 56%
perf-profile.cycles.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
4.66 ± 32% -39.2% 2.83 ± 56%
perf-profile.cycles.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
1.63 ± 18% -54.8% 0.74 ± 28%
perf-profile.cycles.load_balance.pick_next_task_fair.__schedule.schedule.futex_wait_queue_me
9.44 ± 7% +60.2% 15.12 ± 9%
perf-profile.cycles.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
2.76 ± 3% -66.1% 0.93 ± 17%
perf-profile.cycles.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
1.69 ± 23% -100.0% 0.00 ± -1%
perf-profile.cycles.native_queued_spin_lock_slowpath._raw_spin_lock.tick_do_update_jiffies64.tick_irq_enter.irq_enter
5.55 ± 30% -49.0% 2.83 ± 56%
perf-profile.cycles.perf_duration_warn.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
2.04 ± 12% -51.8% 0.99 ± 14%
perf-profile.cycles.pick_next_task_fair.__schedule.schedule.futex_wait_queue_me.futex_wait
16.68 ± 12% +52.3% 25.41 ± 1%
perf-profile.cycles.pipe_read.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath
11.94 ± 8% +53.7% 18.35 ± 6%
perf-profile.cycles.pipe_write.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
4.71 ± 72% -86.2% 0.65 ± 60%
perf-profile.cycles.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
2.08 ± 2% +106.1% 4.29 ± 16%
perf-profile.cycles.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity
5.55 ± 30% -49.0% 2.83 ± 56%
perf-profile.cycles.printk.perf_duration_warn.irq_work_run_list.irq_work_run.smp_irq_work_interrupt
0.63 ± 57% +61.0% 1.01 ± 13%
perf-profile.cycles.rcu_check_callbacks.update_process_times.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues
1.28 ± 28% -100.0% 0.00 ± -1%
perf-profile.cycles.rebalance_domains.run_rebalance_domains.__do_softirq.irq_exit.scheduler_ipi
1.11 ± 16% -100.0% 0.00 ± -1%
perf-profile.cycles.reschedule_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init
0.89 ± 29% -84.6% 0.14 ±173%
perf-profile.cycles.reschedule_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
2.84 ± 38% -82.5% 0.50 ±107%
perf-profile.cycles.rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel
1.87 ± 12% -100.0% 0.00 ± -1%
perf-profile.cycles.run_rebalance_domains.__do_softirq.irq_exit.scheduler_ipi.smp_reschedule_interrupt
2.43 ± 11% +98.3% 4.82 ± 15%
perf-profile.cycles.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.activate_task
0.00 ± -1% +Inf% 1.66 ± 14%
perf-profile.cycles.sched_ttwu_pending.cpu_startup_entry.start_secondary
2.00 ± 15% -93.1% 0.14 ±173%
perf-profile.cycles.scheduler_ipi.smp_reschedule_interrupt.reschedule_interrupt.cpuidle_enter.call_cpuidle
4.08 ± 12% +45.2% 5.92 ± 11%
perf-profile.cycles.scheduler_tick.update_process_times.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues
5.09 ± 30% -48.8% 2.61 ± 56%
perf-profile.cycles.serial8250_console_putchar.uart_console_write.serial8250_console_write.univ8250_console_write.call_console_drivers.constprop.23
5.24 ± 30% -48.8% 2.68 ± 56%
perf-profile.cycles.serial8250_console_write.univ8250_console_write.call_console_drivers.constprop.23.console_unlock.vprintk_emit
10.51 ± 7% +68.2% 17.68 ± 10%
perf-profile.cycles.smp_apic_timer_interrupt.apic_timer_interrupt
8.47 ± 7% -75.4% 2.08 ± 18%
perf-profile.cycles.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry
4.66 ± 32% -39.2% 2.83 ± 56%
perf-profile.cycles.smp_irq_work_interrupt.irq_work_interrupt
2.00 ± 15% -93.1% 0.14 ±173%
perf-profile.cycles.smp_reschedule_interrupt.reschedule_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry
2.84 ± 38% -82.5% 0.50 ±107%
perf-profile.cycles.start_kernel.x86_64_start_reservations.x86_64_start_kernel
34.02 ± 5% -61.3% 13.16 ± 14% perf-profile.cycles.start_secondary
9.09 ± 5% +27.8% 11.62 ± 5%
perf-profile.cycles.sys_futex.entry_SYSCALL_64_fastpath
17.60 ± 12% +54.3% 27.16 ± 1%
perf-profile.cycles.sys_read.entry_SYSCALL_64_fastpath
12.16 ± 8% +54.0% 18.72 ± 5%
perf-profile.cycles.sys_write.entry_SYSCALL_64_fastpath
2.32 ± 19% -100.0% 0.00 ± -1%
perf-profile.cycles.tick_do_update_jiffies64.tick_irq_enter.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt
3.12 ± 15% -93.7% 0.20 ±173%
perf-profile.cycles.tick_irq_enter.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter
1.30 ± 17% -100.0% 0.00 ± -1%
perf-profile.cycles.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter
0.98 ± 21% -100.0% 0.00 ± -1%
perf-profile.cycles.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt
6.01 ± 11% +38.7% 8.34 ± 6%
perf-profile.cycles.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt
6.13 ± 10% +43.3% 8.79 ± 8%
perf-profile.cycles.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt
0.68 ± 61% +124.4% 1.52 ± 14%
perf-profile.cycles.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
1.58 ± 28% +98.1% 3.12 ± 10%
perf-profile.cycles.try_to_wake_up.wake_up_q.futex_requeue.do_futex.sys_futex
0.00 ± -1% +Inf% 1.54 ± 14%
perf-profile.cycles.ttwu_do_activate.sched_ttwu_pending.cpu_startup_entry.start_secondary
0.36 ±100% +237.5% 1.21 ± 2%
perf-profile.cycles.ttwu_do_activate.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common
1.20 ± 19% +72.3% 2.07 ± 15%
perf-profile.cycles.ttwu_do_activate.try_to_wake_up.wake_up_q.futex_requeue.do_futex
5.09 ± 30% -48.8% 2.61 ± 56%
perf-profile.cycles.uart_console_write.serial8250_console_write.univ8250_console_write.call_console_drivers.constprop.23.console_unlock
5.24 ± 30% -48.8% 2.68 ± 56%
perf-profile.cycles.univ8250_console_write.call_console_drivers.constprop.23.console_unlock.vprintk_emit.vprintk_default
5.88 ± 10% +37.5% 8.09 ± 8%
perf-profile.cycles.update_process_times.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
1.14 ± 24% -82.9% 0.20 ±173%
perf-profile.cycles.update_sd_lb_stats.find_busiest_group.load_balance.pick_next_task_fair.__schedule
17.51 ± 12% +54.4% 27.04 ± 1%
perf-profile.cycles.vfs_read.sys_read.entry_SYSCALL_64_fastpath
12.16 ± 8% +53.0% 18.60 ± 5%
perf-profile.cycles.vfs_write.sys_write.entry_SYSCALL_64_fastpath
5.55 ± 30% -49.0% 2.83 ± 56%
perf-profile.cycles.vprintk_default.printk.perf_duration_warn.irq_work_run_list.irq_work_run
5.55 ± 30% -49.0% 2.83 ± 56%
perf-profile.cycles.vprintk_emit.vprintk_default.printk.perf_duration_warn.irq_work_run_list
4.95 ± 29% -47.6% 2.59 ± 56%
perf-profile.cycles.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write.univ8250_console_write
1.64 ± 28% +96.5% 3.22 ± 10%
perf-profile.cycles.wake_up_q.futex_requeue.do_futex.sys_futex.entry_SYSCALL_64_fastpath
2.84 ± 38% -82.5% 0.50 ±107% perf-profile.cycles.x86_64_start_kernel
2.84 ± 38% -82.5% 0.50 ±107%
perf-profile.cycles.x86_64_start_reservations.x86_64_start_kernel
pigz.throughput
3.6e+08 ++----------------------------------------------------------------+
3.4e+08 O+O O O O O O O OO O O O O O O O |
| O O O O |
3.2e+08 ++ |
3e+08 ++ |
| |
2.8e+08 ++ |
2.6e+08 ++ |
2.4e+08 ++ |
| |
2.2e+08 ++ |
2e+08 ++ |
| |
1.8e+08 *+*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*
1.6e+08 ++----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Thanks,
Xiaolong