Greeting,
FYI, we noticed a 2.0% improvement of will-it-scale.per_process_ops due to commit:
commit: 80340f8d5f7423c4611fa6e9b07435006aa5834f ("sched: Add saved_state for tasks
blocked on sleeping locks")
https://git.kernel.org/cgit/linux/kernel/git/bigeasy/staging.git rtmutex
in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G
memory
with following parameters:
nr_task: 16
mode: process
test: poll1
cpufreq_governor: performance
ucode: 0x5002f01
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel
copies to see if the testcase will scale. It builds both a process and threads based test
in order to see any differences between the two.
test-url:
https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap3/poll1/will-it-scale/0x5002f01
commit:
35398ef3be ("locking/rtmutex: export lockdep-less version of rt_mutex's lock,
trylock and unlock")
80340f8d5f ("sched: Add saved_state for tasks blocked on sleeping locks")
35398ef3be1827cb 80340f8d5f7423c4611fa6e9b07
---------------- ---------------------------
%stddev %change %stddev
\ | \
6677815 +2.0% 6813971 will-it-scale.per_process_ops
1.068e+08 +2.0% 1.09e+08 will-it-scale.workload
467077 ± 42% +4879.3% 23257159 ± 98% cpuidle.C6.usage
1295308 ± 2% -12.5% 1133062 ± 4% meminfo.DirectMap4k
4.13 ± 8% -1.0 3.14 ± 9%
perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.13 ± 8% -1.0 3.14 ± 9%
perf-profile.children.cycles-pp.__fget_light
3.98 ± 8% -1.0 2.99 ± 9% perf-profile.self.cycles-pp.__fget_light
30899 -1.6% 30390 proc-vmstat.nr_slab_reclaimable
70023 -3.0% 67946 proc-vmstat.nr_slab_unreclaimable
63981 +5.0% 67159 ± 2% proc-vmstat.pgreuse
1927180 ± 12% +22.2% 2355975 ± 3% sched_debug.cfs_rq:/.spread0.max
-566638 -75.3% -140233 sched_debug.cfs_rq:/.spread0.min
1064 ± 7% +12.1% 1193 ± 6% sched_debug.cpu.nr_switches.min
40639 ± 4% -29.6% 28615 ± 13% sched_debug.cpu.ttwu_count.max
3004 ± 4% -20.3% 2394 ± 9% sched_debug.cpu.ttwu_count.stddev
34941 ± 3% +16.1% 40583 ± 3% softirqs.CPU0.SCHED
20504 ± 9% -26.3% 15122 ± 31% softirqs.CPU109.RCU
4872 +369.6% 22879 ± 78% softirqs.CPU109.SCHED
20332 ± 9% -28.8% 14481 ± 22% softirqs.CPU111.RCU
12995 ± 10% +25.9% 16362 ± 9% softirqs.CPU41.RCU
38366 +6.2% 40738 ± 5% softirqs.CPU43.SCHED
38584 +5.8% 40826 ± 4% softirqs.CPU68.SCHED
36431 ± 4% +10.0% 40056 ± 3% softirqs.CPU73.SCHED
12373 ± 8% +12.7% 13949 ± 8% softirqs.CPU93.RCU
11975 ± 15% -42.0% 6943 ± 20% softirqs.CPU96.SCHED
3409 ± 82% +2989.8% 105337 ± 54% numa-meminfo.node0.AnonHugePages
28904 ± 12% +485.1% 169104 ± 43% numa-meminfo.node0.AnonPages
283677 +12.5% 319149 ± 6% numa-meminfo.node0.FilePages
30171 ± 10% +468.3% 171475 ± 41% numa-meminfo.node0.Inactive
30171 ± 10% +468.3% 171475 ± 41% numa-meminfo.node0.Inactive(anon)
7508 ± 3% +12.9% 8477 ± 6% numa-meminfo.node0.KernelStack
707950 ± 5% +23.6% 874864 ± 7% numa-meminfo.node0.MemUsed
1726 ± 8% +89.8% 3276 ± 16% numa-meminfo.node0.PageTables
281724 +11.2% 313355 ± 5% numa-meminfo.node0.Unevictable
11159 ± 22% -38.1% 6909 ± 3% numa-meminfo.node1.Mapped
82531 ± 67% -90.9% 7522 ±155% numa-meminfo.node3.AnonHugePages
123663 ± 56% -82.0% 22306 ± 67% numa-meminfo.node3.AnonPages
126435 ± 56% -78.3% 27463 ± 61% numa-meminfo.node3.Inactive
126435 ± 56% -78.3% 27463 ± 61% numa-meminfo.node3.Inactive(anon)
808823 ± 12% -20.2% 645340 ± 7% numa-meminfo.node3.MemUsed
7234 ± 12% +484.5% 42283 ± 43% numa-vmstat.node0.nr_anon_pages
70919 +12.5% 79786 ± 6% numa-vmstat.node0.nr_file_pages
7551 ± 10% +467.8% 42876 ± 41% numa-vmstat.node0.nr_inactive_anon
7508 ± 3% +12.9% 8476 ± 6% numa-vmstat.node0.nr_kernel_stack
1763 +30.2% 2295 ± 28% numa-vmstat.node0.nr_mapped
431.50 ± 8% +89.5% 817.50 ± 16% numa-vmstat.node0.nr_page_table_pages
70430 +11.2% 78338 ± 5% numa-vmstat.node0.nr_unevictable
7551 ± 10% +467.8% 42876 ± 41% numa-vmstat.node0.nr_zone_inactive_anon
70430 +11.2% 78338 ± 5% numa-vmstat.node0.nr_zone_unevictable
408102 ± 10% +68.0% 685464 ± 20% numa-vmstat.node0.numa_local
2822 ± 22% -37.7% 1759 ± 6% numa-vmstat.node1.nr_mapped
30923 ± 56% -82.0% 5581 ± 67% numa-vmstat.node3.nr_anon_pages
31618 ± 56% -78.3% 6874 ± 61% numa-vmstat.node3.nr_inactive_anon
31618 ± 56% -78.3% 6874 ± 61% numa-vmstat.node3.nr_zone_inactive_anon
59517 ± 34% +110.1% 125034 numa-vmstat.node3.numa_other
7889 ± 7% -16.0% 6625 ± 5% slabinfo.Acpi-State.active_objs
7889 ± 7% -16.0% 6625 ± 5% slabinfo.Acpi-State.num_objs
2614 ± 6% -16.8% 2175 ± 10% slabinfo.PING.active_objs
2614 ± 6% -16.8% 2175 ± 10% slabinfo.PING.num_objs
5648 ± 9% -16.3% 4729 ± 8% slabinfo.files_cache.active_objs
5648 ± 9% -16.3% 4729 ± 8% slabinfo.files_cache.num_objs
1922 ± 5% -14.9% 1635 ± 12% slabinfo.khugepaged_mm_slot.active_objs
1922 ± 5% -14.9% 1635 ± 12% slabinfo.khugepaged_mm_slot.num_objs
5298 ± 4% -16.3% 4434 ± 8% slabinfo.mm_struct.active_objs
5298 ± 4% -16.3% 4434 ± 8% slabinfo.mm_struct.num_objs
12456 ± 9% -19.5% 10030 ± 17% slabinfo.pde_opener.active_objs
12456 ± 9% -19.5% 10030 ± 17% slabinfo.pde_opener.num_objs
3556 ± 3% -9.7% 3212 ± 6% slabinfo.sighand_cache.num_objs
5417 ± 5% -10.9% 4828 ± 5% slabinfo.signal_cache.num_objs
4544 ± 4% -11.5% 4022 ± 7% slabinfo.sock_inode_cache.active_objs
4544 ± 4% -11.5% 4022 ± 7% slabinfo.sock_inode_cache.num_objs
8411 ± 7% -13.5% 7276 ± 6% slabinfo.task_delay_info.active_objs
8411 ± 7% -13.5% 7276 ± 6% slabinfo.task_delay_info.num_objs
0.28 +135.9% 0.67 ± 93% perf-stat.i.MPKI
1.346e+10 +2.1% 1.375e+10 perf-stat.i.branch-instructions
61401177 +11.6% 68508179 ± 14% perf-stat.i.branch-misses
4100686 +23.1% 5047396 ± 21% perf-stat.i.cache-misses
18651514 +142.1% 45154376 ± 94% perf-stat.i.cache-references
5.593e+10 +2.1% 5.71e+10 perf-stat.i.cpu-cycles
13932 -14.2% 11959 ± 17% perf-stat.i.cycles-between-cache-misses
1.709e+10 +1.6% 1.737e+10 perf-stat.i.dTLB-loads
1.302e+10 +2.0% 1.328e+10 perf-stat.i.dTLB-stores
6.662e+10 +2.1% 6.802e+10 perf-stat.i.instructions
0.29 +2.1% 0.30 perf-stat.i.metric.GHz
0.84 ± 2% +25.9% 1.06 ± 16% perf-stat.i.metric.K/sec
227.02 +2.0% 231.48 perf-stat.i.metric.M/sec
92.68 -3.0 89.71 ± 2% perf-stat.i.node-load-miss-rate%
97.55 -8.0 89.54 ± 7% perf-stat.i.node-store-miss-rate%
3947 ± 8% +211.0% 12275 ± 60% perf-stat.i.node-stores
0.28 +137.3% 0.66 ± 94% perf-stat.overall.MPKI
13639 -14.0% 11729 ± 16%
perf-stat.overall.cycles-between-cache-misses
92.39 -3.1 89.32 ± 2% perf-stat.overall.node-load-miss-rate%
1.342e+10 +2.1% 1.37e+10 perf-stat.ps.branch-instructions
61207914 +11.6% 68287200 ± 14% perf-stat.ps.branch-misses
4087407 +23.1% 5030794 ± 21% perf-stat.ps.cache-misses
18592626 +142.1% 45003748 ± 94% perf-stat.ps.cache-references
5.574e+10 +2.1% 5.691e+10 perf-stat.ps.cpu-cycles
1.703e+10 +1.6% 1.731e+10 perf-stat.ps.dTLB-loads
1.297e+10 +2.0% 1.324e+10 perf-stat.ps.dTLB-stores
6.64e+10 +2.1% 6.779e+10 perf-stat.ps.instructions
3944 ± 7% +210.4% 12241 ± 60% perf-stat.ps.node-stores
2.003e+13 +2.3% 2.048e+13 perf-stat.total.instructions
1726 ± 23% +25.6% 2168 ± 5%
interrupts.CPU0.CAL:Function_call_interrupts
140.00 ± 22% -63.0% 51.75 ± 52% interrupts.CPU0.RES:Rescheduling_interrupts
1803 ± 18% +52.9% 2757 ± 16%
interrupts.CPU115.CAL:Function_call_interrupts
93.25 ± 25% +41.6% 132.00 ± 12%
interrupts.CPU120.NMI:Non-maskable_interrupts
93.25 ± 25% +41.6% 132.00 ± 12%
interrupts.CPU120.PMI:Performance_monitoring_interrupts
113.25 ± 9% +37.7% 156.00 ± 23%
interrupts.CPU121.NMI:Non-maskable_interrupts
113.25 ± 9% +37.7% 156.00 ± 23%
interrupts.CPU121.PMI:Performance_monitoring_interrupts
90.00 ± 25% +39.7% 125.75 ± 12%
interrupts.CPU139.NMI:Non-maskable_interrupts
90.00 ± 25% +39.7% 125.75 ± 12%
interrupts.CPU139.PMI:Performance_monitoring_interrupts
94.00 ± 26% +41.5% 133.00 ± 12%
interrupts.CPU144.NMI:Non-maskable_interrupts
94.00 ± 26% +41.5% 133.00 ± 12%
interrupts.CPU144.PMI:Performance_monitoring_interrupts
100.25 ± 26% +42.9% 143.25 ± 22%
interrupts.CPU168.NMI:Non-maskable_interrupts
100.25 ± 26% +42.9% 143.25 ± 22%
interrupts.CPU168.PMI:Performance_monitoring_interrupts
107.00 ± 7% +15.9% 124.00 ± 4%
interrupts.CPU17.NMI:Non-maskable_interrupts
107.00 ± 7% +15.9% 124.00 ± 4%
interrupts.CPU17.PMI:Performance_monitoring_interrupts
109.00 ± 7% +17.7% 128.25 ± 11%
interrupts.CPU189.NMI:Non-maskable_interrupts
109.00 ± 7% +17.7% 128.25 ± 11%
interrupts.CPU189.PMI:Performance_monitoring_interrupts
18.75 ± 89% +1310.7% 264.50 ± 60%
interrupts.CPU19.RES:Rescheduling_interrupts
1.25 ±131% +43040.0% 539.25 ±168%
interrupts.CPU22.RES:Rescheduling_interrupts
106.25 ± 4% +23.5% 131.25 ± 12%
interrupts.CPU24.NMI:Non-maskable_interrupts
106.25 ± 4% +23.5% 131.25 ± 12%
interrupts.CPU24.PMI:Performance_monitoring_interrupts
107.25 ± 3% +19.3% 128.00 ± 14%
interrupts.CPU26.NMI:Non-maskable_interrupts
107.25 ± 3% +19.3% 128.00 ± 14%
interrupts.CPU26.PMI:Performance_monitoring_interrupts
107.75 ± 2% +19.3% 128.50 ± 9%
interrupts.CPU27.NMI:Non-maskable_interrupts
107.75 ± 2% +19.3% 128.50 ± 9%
interrupts.CPU27.PMI:Performance_monitoring_interrupts
111.25 ± 2% +16.6% 129.75 ± 9%
interrupts.CPU28.NMI:Non-maskable_interrupts
111.25 ± 2% +16.6% 129.75 ± 9%
interrupts.CPU28.PMI:Performance_monitoring_interrupts
108.25 ± 10% +19.4% 129.25 ± 15%
interrupts.CPU29.NMI:Non-maskable_interrupts
108.25 ± 10% +19.4% 129.25 ± 15%
interrupts.CPU29.PMI:Performance_monitoring_interrupts
90.75 ± 25% +47.4% 133.75 ± 20%
interrupts.CPU30.NMI:Non-maskable_interrupts
90.75 ± 25% +47.4% 133.75 ± 20%
interrupts.CPU30.PMI:Performance_monitoring_interrupts
104.50 +20.6% 126.00 ± 12%
interrupts.CPU31.NMI:Non-maskable_interrupts
104.50 +20.6% 126.00 ± 12%
interrupts.CPU31.PMI:Performance_monitoring_interrupts
103.25 ± 5% +34.1% 138.50 ± 25%
interrupts.CPU32.NMI:Non-maskable_interrupts
103.25 ± 5% +34.1% 138.50 ± 25%
interrupts.CPU32.PMI:Performance_monitoring_interrupts
109.75 ± 9% +15.3% 126.50 ± 13%
interrupts.CPU33.NMI:Non-maskable_interrupts
109.75 ± 9% +15.3% 126.50 ± 13%
interrupts.CPU33.PMI:Performance_monitoring_interrupts
106.75 ± 8% +19.4% 127.50 ± 14%
interrupts.CPU35.NMI:Non-maskable_interrupts
106.75 ± 8% +19.4% 127.50 ± 14%
interrupts.CPU35.PMI:Performance_monitoring_interrupts
89.00 ± 25% +42.7% 127.00 ± 12%
interrupts.CPU38.NMI:Non-maskable_interrupts
89.00 ± 25% +42.7% 127.00 ± 12%
interrupts.CPU38.PMI:Performance_monitoring_interrupts
7661 ± 24% -38.5% 4711 ± 51% interrupts.CPU4.NMI:Non-maskable_interrupts
7661 ± 24% -38.5% 4711 ± 51%
interrupts.CPU4.PMI:Performance_monitoring_interrupts
104.25 ± 5% +20.9% 126.00 ± 13%
interrupts.CPU41.NMI:Non-maskable_interrupts
104.25 ± 5% +20.9% 126.00 ± 13%
interrupts.CPU41.PMI:Performance_monitoring_interrupts
104.25 ± 5% +20.4% 125.50 ± 11%
interrupts.CPU44.NMI:Non-maskable_interrupts
104.25 ± 5% +20.4% 125.50 ± 11%
interrupts.CPU44.PMI:Performance_monitoring_interrupts
103.00 ± 5% +21.4% 125.00 ± 12%
interrupts.CPU45.NMI:Non-maskable_interrupts
103.00 ± 5% +21.4% 125.00 ± 12%
interrupts.CPU45.PMI:Performance_monitoring_interrupts
91.50 ± 28% +36.1% 124.50 ± 12%
interrupts.CPU46.NMI:Non-maskable_interrupts
91.50 ± 28% +36.1% 124.50 ± 12%
interrupts.CPU46.PMI:Performance_monitoring_interrupts
92.75 ± 29% +35.3% 125.50 ± 13%
interrupts.CPU47.NMI:Non-maskable_interrupts
92.75 ± 29% +35.3% 125.50 ± 13%
interrupts.CPU47.PMI:Performance_monitoring_interrupts
78.00 ± 33% +65.7% 129.25 ± 8%
interrupts.CPU49.NMI:Non-maskable_interrupts
78.00 ± 33% +65.7% 129.25 ± 8%
interrupts.CPU49.PMI:Performance_monitoring_interrupts
108.25 ± 5% +19.4% 129.25 ± 12%
interrupts.CPU50.NMI:Non-maskable_interrupts
108.25 ± 5% +19.4% 129.25 ± 12%
interrupts.CPU50.PMI:Performance_monitoring_interrupts
107.50 ± 2% +28.6% 138.25 ± 11%
interrupts.CPU51.NMI:Non-maskable_interrupts
107.50 ± 2% +28.6% 138.25 ± 11%
interrupts.CPU51.PMI:Performance_monitoring_interrupts
7660 ± 24% -38.5% 4710 ± 51% interrupts.CPU6.NMI:Non-maskable_interrupts
7660 ± 24% -38.5% 4710 ± 51%
interrupts.CPU6.PMI:Performance_monitoring_interrupts
92.50 ± 22% +35.7% 125.50 ± 11%
interrupts.CPU70.NMI:Non-maskable_interrupts
92.50 ± 22% +35.7% 125.50 ± 11%
interrupts.CPU70.PMI:Performance_monitoring_interrupts
106.00 ± 7% +18.4% 125.50 ± 12%
interrupts.CPU71.NMI:Non-maskable_interrupts
106.00 ± 7% +18.4% 125.50 ± 12%
interrupts.CPU71.PMI:Performance_monitoring_interrupts
97.75 ± 25% +37.9% 134.75 ± 14%
interrupts.CPU72.NMI:Non-maskable_interrupts
97.75 ± 25% +37.9% 134.75 ± 14%
interrupts.CPU72.PMI:Performance_monitoring_interrupts
108.75 ± 8% +17.9% 128.25 ± 10%
interrupts.CPU79.NMI:Non-maskable_interrupts
108.75 ± 8% +17.9% 128.25 ± 10%
interrupts.CPU79.PMI:Performance_monitoring_interrupts
108.75 ± 7% +19.8% 130.25 ± 10%
interrupts.CPU82.NMI:Non-maskable_interrupts
108.75 ± 7% +19.8% 130.25 ± 10%
interrupts.CPU82.PMI:Performance_monitoring_interrupts
97.00 ± 27% +40.5% 136.25 ± 12%
interrupts.CPU83.NMI:Non-maskable_interrupts
97.00 ± 27% +40.5% 136.25 ± 12%
interrupts.CPU83.PMI:Performance_monitoring_interrupts
114.25 ± 4% +19.3% 136.25 ± 10%
interrupts.CPU95.NMI:Non-maskable_interrupts
114.25 ± 4% +19.3% 136.25 ± 10%
interrupts.CPU95.PMI:Performance_monitoring_interrupts
2851 ± 7% +73.3% 4940 ± 19% interrupts.RES:Rescheduling_interrupts
will-it-scale.per_process_ops
6.9e+06 +----------------------------------------------------------------+
| O O |
| OO O O |
6.85e+06 |-+ O O O |
| O O O O O O O OO O O O |
| O O O O |
6.8e+06 |-+ O |
| |
6.75e+06 |-+ |
| |
| +.+. .+. .+.+.+.+.+.+ |
6.7e+06 |-+ .+ + + : |
| .+. .+ : .+. .+. .+. .+. .+ .+.+.+.+.|
|.+ + ++ + +.+ + + + |
6.65e+06 +----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen