FYI, we noticed a -16.8% improvement of hackbench.time.user_time due to commit:
commit bedc1d70ce363e2eddab9de946bf1c724fb3337d ("richacl: Compute maximum file masks
from an acl")
https://git.kernel.org/pub/scm/linux/kernel/git/agruen/linux-richacl.git richacl-wip
in testcase: hackbench
on test machine: 24 threads Westmere-EP with 16G memory
with following parameters:
nr_threads: 1600%
mode: process
ipc: socket
Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/ipc/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-6/socket/x86_64-rhel-7.2/process/1600%/debian-x86_64-2016-08-31.cgz/lkp-ws02/hackbench
commit:
4e0a9d33c4 ("richacl: Permission check algorithm")
bedc1d70ce ("richacl: Compute maximum file masks from an acl")
4e0a9d33c43d7838 bedc1d70ce363e2eddab9de946
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
%stddev %change %stddev
\ | \
16725596 ± 3% -6.4% 15649069 ± 2% hackbench.time.involuntary_context_switches
13761 ± 0% -1.2% 13599 ± 0% hackbench.time.system_time
656.07 ± 1% -16.8% 546.15 ± 2% hackbench.time.user_time
166139 ± 0% -3.2% 160883 ± 0% interrupts.CAL:Function_call_interrupts
424.00 ± 0% -100.0% 0.00 ± -1% vmstat.memory.buff
229619 ± 1% -8.0% 211297 ± 7% vmstat.system.cs
32428 ± 0% -8.4% 29717 ± 1% vmstat.system.in
5962 ± 4% -11.3% 5289 ± 0% proc-vmstat.nr_active_file
5962 ± 4% -11.3% 5289 ± 0% proc-vmstat.nr_zone_active_file
835.25 ± 12% +55.5% 1298 ± 27% proc-vmstat.numa_pte_updates
32608 ± 0% -100.0% 0.00 ± -1% proc-vmstat.pgpgin
2875 ± 0% -12.4% 2518 ± 12% turbostat.Avg_MHz
1.14 ± 1% +19.0% 1.36 ± 10% turbostat.CPU%c1
0.86 ± 1% +37.3% 1.19 ± 5% turbostat.CPU%c3
0.94 ± 1% +23.9% 1.17 ± 8% turbostat.CPU%c6
23846 ± 4% -11.3% 21162 ± 0% meminfo.Active(file)
5694722 ± 15% -27.1% 4152320 ± 0% meminfo.DirectMap2M
277350 ± 7% -89.0% 30420 ± 5% meminfo.DirectMap4k
1499 ± 27% -29.6% 1056 ± 1% meminfo.Mlocked
1499 ± 27% -29.6% 1056 ± 1% meminfo.Unevictable
11934 ± 5% -11.5% 10561 ± 0% numa-meminfo.node0.Active(file)
32089 ± 23% +59.9% 51299 ± 22% numa-meminfo.node0.AnonHugePages
11901 ± 4% -11.0% 10586 ± 0% numa-meminfo.node1.Active(file)
791.25 ± 24% -36.9% 499.50 ± 13% numa-meminfo.node1.Mlocked
791.25 ± 24% -36.9% 499.50 ± 13% numa-meminfo.node1.Unevictable
30583174 ± 3% -22.6% 23672117 ± 6% cpuidle.C1-NHM.time
7133823 ± 2% +94.7% 13886928 ± 12% cpuidle.C1E-NHM.time
114014 ± 5% +88.2% 214599 ± 8% cpuidle.C1E-NHM.usage
1.569e+08 ± 1% +32.4% 2.078e+08 ± 6% cpuidle.C3-NHM.time
264229 ± 2% +29.5% 342065 ± 6% cpuidle.C3-NHM.usage
2.264e+08 ± 0% +31.0% 2.967e+08 ± 8% cpuidle.C6-NHM.time
235969 ± 0% +31.0% 309198 ± 8% cpuidle.C6-NHM.usage
2983 ± 5% -11.5% 2640 ± 0% numa-vmstat.node0.nr_active_file
2983 ± 5% -11.5% 2640 ± 0% numa-vmstat.node0.nr_zone_active_file
2975 ± 4% -11.0% 2647 ± 0% numa-vmstat.node1.nr_active_file
195.75 ± 25% -37.0% 123.25 ± 14% numa-vmstat.node1.nr_mlock
195.75 ± 25% -37.0% 123.25 ± 14% numa-vmstat.node1.nr_unevictable
2975 ± 4% -11.0% 2647 ± 0% numa-vmstat.node1.nr_zone_active_file
195.75 ± 25% -37.0% 123.25 ± 14% numa-vmstat.node1.nr_zone_unevictable
1021 ± 11% -80.7% 196.86 ± 51% sched_debug.cpu.clock.stddev
1021 ± 11% -80.7% 196.86 ± 51% sched_debug.cpu.clock_task.stddev
7313 ± 21% -18.4% 5966 ± 14% sched_debug.cpu.curr->pid.stddev
0.00 ± 11% -79.9% 0.00 ± 48% sched_debug.cpu.next_balance.stddev
2191 ± 12% -36.0% 1402 ± 6% sched_debug.cpu.nr_load_updates.stddev
61.61 ± 25% +628.6% 448.91 ± 58% sched_debug.cpu.nr_uninterruptible.max
-150.62 ±-39% +656.3% -1139 ±-86% sched_debug.cpu.nr_uninterruptible.min
42.65 ± 30% +871.0% 414.16 ± 76% sched_debug.cpu.nr_uninterruptible.stddev
0.93 ± 0% -12.9% 0.81 ± 0% perf-stat.branch-miss-rate%
4.047e+10 ± 0% -21.2% 3.19e+10 ± 10% perf-stat.branch-misses
1.633e+11 ± 0% -11.5% 1.445e+11 ± 9% perf-stat.cache-references
1.443e+08 ± 1% -8.8% 1.316e+08 ± 8% perf-stat.context-switches
4.28e+13 ± 0% -13.2% 3.717e+13 ± 13% perf-stat.cpu-cycles
2.05 ± 3% +7.1% 2.20 ± 5% perf-stat.dTLB-load-miss-rate%
0.19 ± 0% +9.4% 0.21 ± 1% perf-stat.dTLB-store-miss-rate%
0.00 ± 3% -18.9% 0.00 ± 1% perf-stat.iTLB-load-miss-rate%
8.641e+08 ± 3% -26.5% 6.35e+08 ± 10% perf-stat.iTLB-load-misses
28480 ± 3% +23.0% 35029 ± 1% perf-stat.instructions-per-iTLB-miss
0.57 ± 0% +4.6% 0.60 ± 2% perf-stat.ipc
3.041e+10 ± 0% -10.8% 2.714e+10 ± 10% perf-stat.node-stores
1.81 ± 1% -16.6% 1.51 ± 3%
perf-profile.calltrace.cycles-pp.copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
6.43 ± 1% -10.4% 5.77 ± 4%
perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath
5.58 ± 2% -9.1% 5.08 ± 5%
perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath
1.34 ± 1% +12.5% 1.51 ± 7%
perf-profile.calltrace.cycles-pp.security_socket_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write
0.93 ± 4% +13.4% 1.06 ± 10%
perf-profile.calltrace.cycles-pp.selinux_socket_sendmsg.security_socket_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
4.97 ± 0% -17.6% 4.10 ± 3%
perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
1.51 ± 0% +29.7% 1.96 ± 20%
perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
0.88 ± 4% +14.5% 1.00 ± 10%
perf-profile.calltrace.cycles-pp.sock_has_perm.selinux_socket_sendmsg.security_socket_sendmsg.sock_sendmsg.sock_write_iter
1.78 ± 1% -27.2% 1.29 ± 4%
perf-profile.children.cycles-pp.__might_fault
1.50 ± 1% +48.7% 2.23 ± 42%
perf-profile.children.cycles-pp._raw_spin_lock_irqsave
1.09 ± 1% -19.5% 0.88 ± 1%
perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
1.25 ± 3% -7.4% 1.16 ± 3%
perf-profile.children.cycles-pp.apic_timer_interrupt
1.89 ± 1% -16.0% 1.58 ± 2%
perf-profile.children.cycles-pp.copy_from_iter
1.11 ± 2% -10.4% 0.99 ± 5%
perf-profile.children.cycles-pp.hrtimer_interrupt
1.13 ± 3% -10.0% 1.01 ± 5%
perf-profile.children.cycles-pp.local_apic_timer_interrupt
1.40 ± 1% +12.2% 1.56 ± 6%
perf-profile.children.cycles-pp.security_socket_sendmsg
1.00 ± 3% +12.7% 1.13 ± 9%
perf-profile.children.cycles-pp.selinux_socket_sendmsg
5.07 ± 0% -16.8% 4.22 ± 3%
perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
1.57 ± 0% +28.1% 2.01 ± 20%
perf-profile.children.cycles-pp.sock_def_readable
1.23 ± 1% -20.4% 0.98 ± 8% perf-profile.self.cycles-pp.__vfs_write
1.09 ± 1% -19.5% 0.88 ± 1%
perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
1.26 ± 1% -14.1% 1.08 ± 2% perf-profile.self.cycles-pp.copy_from_iter
1.15 ± 0% -14.4% 0.98 ± 7% perf-profile.self.cycles-pp.kfree
2.07 ± 3% -24.5% 1.56 ± 5%
perf-profile.self.cycles-pp.security_file_permission
1.06 ± 2% -11.3% 0.94 ± 2% perf-profile.self.cycles-pp.sock_wfree
682.25 ± 10% -94.3% 39.00 ± 0% slabinfo.bdev_cache.active_objs
682.25 ± 10% -94.3% 39.00 ± 0% slabinfo.bdev_cache.num_objs
1115 ± 1% -100.0% 0.00 ± -1% slabinfo.blkdev_requests.active_objs
1115 ± 1% -100.0% 0.00 ± -1% slabinfo.blkdev_requests.num_objs
433.00 ± 11% -100.0% 0.00 ± -1% slabinfo.buffer_head.active_objs
433.00 ± 11% -100.0% 0.00 ± -1% slabinfo.buffer_head.num_objs
229.50 ± 19% -100.0% 0.00 ± -1% slabinfo.ext4_extent_status.active_objs
229.50 ± 19% -100.0% 0.00 ± -1% slabinfo.ext4_extent_status.num_objs
1204 ± 0% -100.0% 0.00 ± -1% slabinfo.ext4_groupinfo_4k.active_objs
1204 ± 0% -100.0% 0.00 ± -1% slabinfo.ext4_groupinfo_4k.num_objs
703.75 ± 6% -72.6% 193.00 ± 11% slabinfo.file_lock_cache.active_objs
703.75 ± 6% -72.6% 193.00 ± 11% slabinfo.file_lock_cache.num_objs
4306 ± 0% -15.1% 3655 ± 0% slabinfo.ftrace_event_field.active_objs
4306 ± 0% -15.1% 3655 ± 0% slabinfo.ftrace_event_field.num_objs
256.00 ± 0% -100.0% 0.00 ± -1% slabinfo.jbd2_revoke_table_s.active_objs
256.00 ± 0% -100.0% 0.00 ± -1% slabinfo.jbd2_revoke_table_s.num_objs
39639 ± 0% -29.3% 28035 ± 0% slabinfo.kernfs_node_cache.active_objs
1165 ± 0% -29.3% 823.75 ± 0% slabinfo.kernfs_node_cache.active_slabs
39639 ± 0% -29.3% 28035 ± 0% slabinfo.kernfs_node_cache.num_objs
1165 ± 0% -29.3% 823.75 ± 0% slabinfo.kernfs_node_cache.num_slabs
3256 ± 1% -49.1% 1656 ± 1% slabinfo.kmalloc-128.active_objs
3256 ± 1% -49.1% 1656 ± 1% slabinfo.kmalloc-128.num_objs
4385 ± 4% -37.2% 2754 ± 0% slabinfo.kmalloc-192.active_objs
4406 ± 3% -37.1% 2772 ± 1% slabinfo.kmalloc-192.num_objs
1036 ± 3% -37.2% 651.00 ± 9% slabinfo.mnt_cache.active_objs
1036 ± 3% -37.2% 651.00 ± 9% slabinfo.mnt_cache.num_objs
1103 ± 6% -43.7% 620.50 ± 13% slabinfo.nsproxy.active_objs
1103 ± 6% -43.7% 620.50 ± 13% slabinfo.nsproxy.num_objs
2449 ± 2% -28.5% 1750 ± 3% slabinfo.shmem_inode_cache.active_objs
2449 ± 2% -28.5% 1750 ± 3% slabinfo.shmem_inode_cache.num_objs
2806 ± 1% -24.7% 2114 ± 2% slabinfo.trace_event_file.active_objs
2806 ± 1% -24.7% 2114 ± 2% slabinfo.trace_event_file.num_objs
hackbench.time.user_time
680 ++--------------------------------------------------------------------+
| .*. |
660 ++ *.*. .*. .* * |
640 ++ .*. .*. .*. .. * *. |
*.*.*. *.*.*.*.. .*.*.* *..*.*.* * |
620 ++ * |
600 ++ |
| |
580 ++ O O |
560 O+O O O O O O O O O O O O |
| O O O O O O O O O O
540 ++ O O |
520 ++ O O O O O O |
| |
500 ++--------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Thanks,
Xiaolong