FYI, we noticed a -5.5% regression of will-it-scale.per_process_ops due to commit:
commit f0d9b9d4aa33f7ce9a0347256e5ffd91c8ad7c08 ("pt_regs_frame")
https://github.com/jpoimboe/linux pt_regs_frame-nosave
in testcase: will-it-scale
on test machine: 16 threads Haswell High-end Desktop (i7-5960X 3.0G) with 16G memory
with following parameters: cpufreq_governor=performance/test=signal1
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lituya/signal1/will-it-scale
commit:
4c195e79c9 ("x86: Fix thread_saved_pc()")
f0d9b9d4aa ("pt_regs_frame")
4c195e79c9fb2f1b f0d9b9d4aa33f7ce9a0347256e
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
%stddev %change %stddev
\ | \
825921 ± 0% -5.5% 780374 ± 0% will-it-scale.per_process_ops
428112 ± 0% -9.3% 388301 ± 0% will-it-scale.per_thread_ops
0.29 ± 0% +14.3% 0.33 ± 0% will-it-scale.scalability
1377 ± 0% -1.2% 1360 ± 0% will-it-scale.time.system_time
28.13 ± 0% +60.4% 45.12 ± 0% will-it-scale.time.user_time
86876 ± 2% -6.3% 81414 ± 1% meminfo.DirectMap4k
2219 ± 0% -44.8% 1224 ± 57% proc-vmstat.pgactivate
126.75 ± 45% +81.2% 229.67 ± 7% slabinfo.taskstats.active_objs
126.75 ± 45% +81.2% 229.67 ± 7% slabinfo.taskstats.num_objs
9.762e+08 ± 1% -5.3% 9.249e+08 ± 2% perf-stat.L1-icache-load-misses
6.878e+09 ± 1% -2.5% 6.709e+09 ± 0% perf-stat.LLC-stores
4.061e+09 ± 0% -4.4% 3.881e+09 ± 0% perf-stat.branch-load-misses
8.024e+11 ± 0% -1.3% 7.919e+11 ± 0% perf-stat.branch-loads
4.022e+09 ± 1% -3.1% 3.898e+09 ± 0% perf-stat.branch-misses
6028 ± 2% -4.4% 5763 ± 5% perf-stat.cpu-migrations
1.387e+12 ± 1% -2.6% 1.351e+12 ± 1% perf-stat.dTLB-loads
0.50 ±173% +485.5% 2.94 ± 22% sched_debug.cfs_rq:/.MIN_vruntime.avg
8.05 ±173% +369.3% 37.76 ± 7% sched_debug.cfs_rq:/.MIN_vruntime.max
1.95 ±173% +405.1% 9.84 ± 3% sched_debug.cfs_rq:/.MIN_vruntime.stddev
0.50 ±173% +485.5% 2.94 ± 22% sched_debug.cfs_rq:/.max_vruntime.avg
8.05 ±173% +369.3% 37.76 ± 7% sched_debug.cfs_rq:/.max_vruntime.max
1.95 ±173% +405.1% 9.84 ± 3% sched_debug.cfs_rq:/.max_vruntime.stddev
0.38 ± 1% +31.5% 0.50 ± 20% sched_debug.cpu.nr_running.stddev
-33.08 ±-16% -17.4% -27.33 ± -6% sched_debug.cpu.nr_uninterruptible.min
10.59 ± 10% -14.9% 9.01 ± 14% sched_debug.cpu.nr_uninterruptible.stddev
801.71 ± 29% -28.3% 574.61 ± 21% sched_debug.cpu.sched_count.min
252.75 ± 18% -33.4% 168.22 ± 7% sched_debug.cpu.ttwu_local.min
13785 ± 61% -100.0% 0.00 ± -1%
latency_stats.avg.perf_event_alloc.SYSC_perf_event_open.SyS_perf_event_open.entry_SYSCALL_64_fastpath
228145 ± 4% -100.0% 0.00 ± -1%
latency_stats.hits.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 253537 ± 15%
latency_stats.hits.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath.entry_frame_ret
13785 ± 61% -100.0% 0.00 ± -1%
latency_stats.max.perf_event_alloc.SYSC_perf_event_open.SyS_perf_event_open.entry_SYSCALL_64_fastpath
560819 ± 3% -100.0% 0.00 ± -1%
latency_stats.sum.do_wait.SyS_wait4.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 537995 ± 6%
latency_stats.sum.do_wait.SyS_wait4.entry_SYSCALL_64_fastpath.entry_frame_ret
15770 ± 25% -100.0% 0.00 ± -1%
latency_stats.sum.ep_poll.SyS_epoll_wait.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 9999 ± 42%
latency_stats.sum.ep_poll.SyS_epoll_wait.entry_SYSCALL_64_fastpath.entry_frame_ret
13785 ± 61% -100.0% 0.00 ± -1%
latency_stats.sum.perf_event_alloc.SYSC_perf_event_open.SyS_perf_event_open.entry_SYSCALL_64_fastpath
1279350 ± 5% -100.0% 0.00 ± -1%
latency_stats.sum.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 1217387 ± 6%
latency_stats.sum.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath.entry_frame_ret
1379 ± 5% +590.6% 9527 ± 75%
latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_access.[nfsv4].nfs4_proc_access.[nfsv4].nfs_do_access.nfs_permission.__inode_permission.inode_permission.link_path_walk
17.25 ± 2% -42.5% 9.92 ± 70%
perf-profile.cycles-pp.__dequeue_signal.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop
3.60 ± 1% -40.0% 2.16 ± 70%
perf-profile.cycles-pp.__fpu__restore_sig.fpu__restore_sig.sys_rt_sigreturn.do_syscall_64.return_from_SYSCALL_64
16.64 ± 1% -42.4% 9.59 ± 70%
perf-profile.cycles-pp.__send_signal.send_signal.do_send_sig_info.do_send_specific.do_tkill
1.46 ± 3% -40.6% 0.87 ± 70%
perf-profile.cycles-pp.__set_current_blocked.signal_setup_done.do_signal.exit_to_usermode_loop.syscall_return_slowpath
11.98 ± 1% -43.7% 6.75 ± 70%
perf-profile.cycles-pp.__sigqueue_alloc.__send_signal.send_signal.do_send_sig_info.do_send_specific
16.54 ± 2% -42.5% 9.51 ± 70%
perf-profile.cycles-pp.__sigqueue_free.part.16.__dequeue_signal.dequeue_signal.get_signal.do_signal
8.82 ± 3% -42.2% 5.10 ± 70%
perf-profile.cycles-pp._atomic_dec_and_lock.free_uid.__sigqueue_free.part.16.__dequeue_signal.dequeue_signal
1.23 ± 1% -42.5% 0.71 ± 71%
perf-profile.cycles-pp.check_kill_permission.do_send_specific.do_tkill.sys_tgkill.entry_SYSCALL_64_fastpath
2.87 ± 2% -39.2% 1.74 ± 70%
perf-profile.cycles-pp.complete_signal.__send_signal.send_signal.do_send_sig_info.do_send_specific
3.79 ± 0% -39.3% 2.30 ± 70%
perf-profile.cycles-pp.copy_fpstate_to_sigframe.do_signal.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
18.08 ± 2% -42.6% 10.38 ± 70%
perf-profile.cycles-pp.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop.syscall_return_slowpath
18.94 ± 1% -41.7% 11.04 ± 70%
perf-profile.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.sys_tgkill.entry_SYSCALL_64_fastpath
21.38 ± 1% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.do_send_specific.do_tkill.sys_tgkill.entry_SYSCALL_64_fastpath
31.50 ± 0% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.do_signal.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
9.47 ± 1% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.do_syscall_64.return_from_SYSCALL_64
22.80 ± 0% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.do_tkill.sys_tgkill.entry_SYSCALL_64_fastpath
1.40 ± 3% -58.3% 0.58 ± 70%
perf-profile.cycles-pp.entry_SYSCALL_64_after_swapgs
55.83 ± 0% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.entry_SYSCALL_64_fastpath
1.05 ± 2% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.exit_to_usermode_loop.do_syscall_64.return_from_SYSCALL_64
32.13 ± 0% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
2.25 ± 3% -37.0% 1.42 ± 70%
perf-profile.cycles-pp.fpu__clear.do_signal.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
4.29 ± 1% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.fpu__restore_sig.sys_rt_sigreturn.do_syscall_64.return_from_SYSCALL_64
9.99 ± 1% -42.8% 5.72 ± 70%
perf-profile.cycles-pp.free_uid.__sigqueue_free.part.16.__dequeue_signal.dequeue_signal.get_signal
19.88 ± 1% -42.1% 11.51 ± 70%
perf-profile.cycles-pp.get_signal.do_signal.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
9.72 ± 1% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.return_from_SYSCALL_64
0.99 ± 2% -40.9% 0.58 ± 71%
perf-profile.cycles-pp.security_task_kill.check_kill_permission.do_send_specific.do_tkill.sys_tgkill
17.17 ± 1% -42.2% 9.93 ± 70%
perf-profile.cycles-pp.send_signal.do_send_sig_info.do_send_specific.do_tkill.sys_tgkill
1.54 ± 1% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.set_current_blocked.sys_rt_sigreturn.do_syscall_64.return_from_SYSCALL_64
1.81 ± 2% -40.3% 1.08 ± 70%
perf-profile.cycles-pp.signal_setup_done.do_signal.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
2.45 ± 2% -39.2% 1.49 ± 70%
perf-profile.cycles-pp.signal_wake_up_state.complete_signal.__send_signal.send_signal.do_send_sig_info
7.74 ± 1% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.sys_rt_sigreturn.do_syscall_64.return_from_SYSCALL_64
22.95 ± 0% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.sys_tgkill.entry_SYSCALL_64_fastpath
32.42 ± 0% -100.0% 0.00 ± -1%
perf-profile.cycles-pp.syscall_return_slowpath.entry_SYSCALL_64_fastpath
1.59 ± 3% -37.6% 0.99 ± 70%
perf-profile.cycles-pp.try_to_wake_up.wake_up_state.signal_wake_up_state.complete_signal.__send_signal
1.66 ± 3% -38.0% 1.03 ± 70%
perf-profile.cycles-pp.wake_up_state.signal_wake_up_state.complete_signal.__send_signal.send_signal
will-it-scale.scalability
0.34 ++------------------------------------------------------------------+
0.335 O+O O O O O O O |
| O O O O O O O O O O O O O O O O O |
0.33 ++ O |
0.325 ++ |
| |
0.32 ++ |
0.315 ++ |
0.31 ++ |
| |
0.305 ++ |
0.3 ++ |
|.*.. .*. |
0.295 *+ * *.*..*.*.*.*..*.*.*. .*.*..*.*.*.*..*.*.*.*..*.*.*.*..*.*
0.29 ++---------------------------*--*-----------------------------------+
will-it-scale.per_process_ops
830000 ++-------------------------------------------------*--------*--*---+
*. .*..*.*. .*.*.*. .*.*.*.*..*.*.*.*.*..*.*.* *.*.* + .*
820000 ++* *.*. *. * |
| |
| |
810000 ++ |
| |
800000 ++ |
| |
790000 ++ |
| |
| O O O O O |
780000 O+ O O O O O O O O O O O O O O O O |
| O O O O |
770000 ++-----------------------------------------------------------------+
will-it-scale.per_thread_ops
430000 ++---------*--------*-*----*---*-------------------*--*---*------*-+
*.*.*..*.* *..*.* * * *.*.*.*.*..*.*.* * *..* *
425000 ++ |
420000 ++ |
| |
415000 ++ |
410000 ++ |
| |
405000 ++ |
400000 ++ |
| |
395000 ++ |
390000 ++ O O O |
O O O O O O O O O O O O O O O O O O O O O O |
385000 ++-----O-----------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Thanks,
Xiaolong