Greeting,
FYI, we noticed a 174.4% improvement of will-it-scale.per_thread_ops due to commit:
commit: 8181789168eabdead7fb02645968fc0be58f8eb0 ("locking/rwsem: Enable count-based
spinning on reader")
git://internal_merge_and_test_tree
revert-fb835fe7f0adbd7c2c074b98ec783713407f3bb3-8181789168eabdead7fb02645968fc0be58f8eb0
in testcase: will-it-scale
on test machine: 8 threads Ivy Bridge with 16G memory
with following parameters:
nr_task: 100%
mode: thread
test: brk1
cpufreq_governor: performance
ucode: 0x20
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel
copies to see if the testcase will scale. It builds both a process and threads based test
in order to see any differences between the two.
test-url:
https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-ivb-d01/brk1/will-it-scale/0x20
commit:
a01ed4953d ("locking/rwsem: Enable readers spinning on writer")
8181789168 ("locking/rwsem: Enable count-based spinning on reader")
a01ed4953d180af6 8181789168eabdead7fb026459
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 dmesg.RIP:init_module[raid#_pq]
:4 25% 1:4 dmesg.RIP:mprotect_fixup
1:4 -25% :4 kmsg.c78a6c>]usb_hcd_irq
:4 25% 1:4 kmsg.c873fb>]usb_hcd_irq
:4 25% 1:4 kmsg.dc4e9cc>]usb_hcd_irq
1:4 -25% :4 kmsg.dd79181>]usb_hcd_irq
1:4 -25% :4 kmsg.usb_hcd_irq
%stddev %change %stddev
\ | \
71137 +174.4% 195166 ± 4% will-it-scale.per_thread_ops
3370 ± 18% +5133.1% 176355 ± 8%
will-it-scale.time.involuntary_context_switches
164.50 +351.8% 743.25
will-it-scale.time.percent_of_cpu_this_job_got
472.05 +364.1% 2190 will-it-scale.time.system_time
24.26 +98.9% 48.25 ± 4% will-it-scale.time.user_time
1.291e+08 -80.2% 25595517 ± 5%
will-it-scale.time.voluntary_context_switches
569101 +174.4% 1561331 ± 4% will-it-scale.workload
83.59 -76.2 7.37 ± 2% mpstat.cpu.idle%
15.26 +75.0 90.31 mpstat.cpu.sys%
1.14 ± 2% +1.2 2.32 ± 4% mpstat.cpu.usr%
1.00 +425.0% 5.25 ± 8% vmstat.procs.r
855678 -79.7% 173703 ± 4% vmstat.system.cs
30627 ± 2% -17.5% 25262 ± 3% vmstat.system.in
3067 ± 5% -15.1% 2604 ± 3% slabinfo.kmalloc-96.active_objs
3067 ± 5% -15.1% 2604 ± 3% slabinfo.kmalloc-96.num_objs
2431 ± 5% +13.2% 2751 ± 4% slabinfo.lsm_file_cache.active_objs
2431 ± 5% +13.2% 2751 ± 4% slabinfo.lsm_file_cache.num_objs
620.50 ± 10% +469.7% 3534 ±119% softirqs.NET_RX
99685 ± 3% +26.8% 126424 ± 6% softirqs.RCU
398049 -59.0% 163090 ± 3% softirqs.SCHED
970097 ± 2% +11.5% 1081290 softirqs.TIMER
6718 +1.5% 6822 proc-vmstat.nr_slab_unreclaimable
343140 -2.8% 333375 proc-vmstat.numa_hit
343140 -2.8% 333375 proc-vmstat.numa_local
379725 -2.5% 370227 proc-vmstat.pgalloc_normal
388517 -2.2% 380125 proc-vmstat.pgfault
367968 -2.5% 358850 proc-vmstat.pgfree
2.775e+08 ± 5% -90.2% 27180872 ± 10% cpuidle.C1.time
6348788 ± 7% -43.8% 3567322 ± 4% cpuidle.C1.usage
3.835e+08 ± 10% -91.5% 32545628 ± 8% cpuidle.C1E.time
4566503 ± 9% -81.6% 838701 ± 7% cpuidle.C1E.usage
4.618e+08 ± 30% -92.9% 32788485 ± 4% cpuidle.C3.time
3028732 ± 18% -88.8% 339744 ± 9% cpuidle.C3.usage
6.83e+08 ± 15% -92.7% 49999190 ± 11% cpuidle.C6.time
3006999 ± 9% -92.9% 212206 ± 19% cpuidle.C6.usage
26478811 ± 7% -88.4% 3066144 ± 6% cpuidle.POLL.time
88423033 ± 6% -77.9% 19530154 ± 3% cpuidle.POLL.usage
931.00 +273.0% 3472 turbostat.Avg_MHz
25.03 +69.0 94.03 turbostat.Busy%
6348723 ± 7% -43.8% 3567227 ± 4% turbostat.C1
11.46 ± 5% -10.3 1.12 ± 10% turbostat.C1%
4566488 ± 9% -81.6% 838684 ± 7% turbostat.C1E
15.84 ± 10% -14.5 1.34 ± 8% turbostat.C1E%
3028728 ± 18% -88.8% 339740 ± 9% turbostat.C3
19.09 ± 30% -17.7 1.35 ± 4% turbostat.C3%
3006980 ± 9% -92.9% 212201 ± 19% turbostat.C6
28.21 ± 15% -26.1 2.06 ± 11% turbostat.C6%
46.33 ± 2% -89.4% 4.91 ± 3% turbostat.CPU%c1
27.56 ± 5% -97.5% 0.70 ± 9% turbostat.CPU%c3
1.09 ± 94% -67.1% 0.36 ± 23% turbostat.CPU%c6
24.84 +59.0% 39.50 turbostat.CorWatt
9317828 ± 2% -17.6% 7679004 ± 3% turbostat.IRQ
65.75 +9.5% 72.00 ± 2% turbostat.PkgTmp
29.63 +49.9% 44.41 turbostat.PkgWatt
10.00 ± 3% -46.7% 5.33 perf-stat.i.MPKI
1.161e+09 ± 2% +166.1% 3.091e+09 perf-stat.i.branch-instructions
1.38 -0.9 0.46 perf-stat.i.branch-miss-rate%
16384662 ± 2% -17.9% 13459272 perf-stat.i.branch-misses
0.42 ± 22% -0.2 0.20 ± 13% perf-stat.i.cache-miss-rate%
52929464 ± 5% +50.4% 79594155 perf-stat.i.cache-references
861920 -79.7% 175141 ± 5% perf-stat.i.context-switches
1.40 ± 2% +29.1% 1.81 perf-stat.i.cpi
7.431e+09 +273.0% 2.772e+10 perf-stat.i.cpu-cycles
209.33 ± 5% +38.3% 289.44 ± 8% perf-stat.i.cpu-migrations
212025 ± 29% +125.1% 477226 ± 19% perf-stat.i.cycles-between-cache-misses
0.57 ± 13% -0.2 0.33 ± 7% perf-stat.i.dTLB-load-miss-rate%
8568642 ± 13% +74.5% 14951677 ± 7% perf-stat.i.dTLB-load-misses
1.482e+09 ± 2% +207.6% 4.558e+09 perf-stat.i.dTLB-loads
0.03 ± 48% -0.0 0.01 ± 34% perf-stat.i.dTLB-store-miss-rate%
252194 ± 47% -77.0% 57973 ± 33% perf-stat.i.dTLB-store-misses
8.818e+08 ± 2% -18.4% 7.199e+08 perf-stat.i.dTLB-stores
86.18 +10.8 96.98 perf-stat.i.iTLB-load-miss-rate%
1105085 +30.5% 1442114 ± 4% perf-stat.i.iTLB-load-misses
179975 ± 5% -67.5% 58558 ± 2% perf-stat.i.iTLB-loads
5.322e+09 ± 2% +188.1% 1.533e+10 perf-stat.i.instructions
4887 +124.3% 10963 ± 5% perf-stat.i.instructions-per-iTLB-miss
0.72 ± 2% -22.6% 0.55 perf-stat.i.ipc
1242 -1.9% 1218 perf-stat.i.minor-faults
1242 -1.9% 1218 perf-stat.i.page-faults
9.94 ± 3% -47.8% 5.19 perf-stat.overall.MPKI
1.41 -1.0 0.44 perf-stat.overall.branch-miss-rate%
0.26 ± 30% -0.1 0.16 ± 15% perf-stat.overall.cache-miss-rate%
1.40 ± 2% +29.4% 1.81 perf-stat.overall.cpi
58230 ± 21% +286.9% 225305 ± 16%
perf-stat.overall.cycles-between-cache-misses
0.58 ± 13% -0.2 0.33 ± 7% perf-stat.overall.dTLB-load-miss-rate%
0.03 ± 48% -0.0 0.01 ± 31% perf-stat.overall.dTLB-store-miss-rate%
86.00 +10.1 96.09 perf-stat.overall.iTLB-load-miss-rate%
4815 +121.3% 10657 ± 5%
perf-stat.overall.instructions-per-iTLB-miss
0.72 ± 2% -22.8% 0.55 perf-stat.overall.ipc
1.158e+09 ± 2% +166.0% 3.081e+09 perf-stat.ps.branch-instructions
16340733 ± 2% -17.9% 13416233 perf-stat.ps.branch-misses
52788670 ± 5% +50.3% 79338867 perf-stat.ps.cache-references
859596 -79.7% 174583 ± 5% perf-stat.ps.context-switches
7.411e+09 +272.8% 2.763e+10 perf-stat.ps.cpu-cycles
208.75 ± 5% +38.2% 288.53 ± 8% perf-stat.ps.cpu-migrations
8544594 ± 13% +74.4% 14903812 ± 7% perf-stat.ps.dTLB-load-misses
1.478e+09 ± 2% +207.5% 4.544e+09 perf-stat.ps.dTLB-loads
251406 ± 47% -77.0% 57787 ± 33% perf-stat.ps.dTLB-store-misses
8.794e+08 ± 2% -18.4% 7.176e+08 perf-stat.ps.dTLB-stores
1102106 +30.4% 1437495 ± 4% perf-stat.ps.iTLB-load-misses
179482 ± 5% -67.5% 58373 ± 2% perf-stat.ps.iTLB-loads
5.307e+09 ± 2% +187.9% 1.528e+10 perf-stat.ps.instructions
1238 -1.9% 1214 perf-stat.ps.minor-faults
1238 -1.9% 1214 perf-stat.ps.page-faults
1.604e+12 ± 2% +188.1% 4.62e+12 perf-stat.total.instructions
31796 +325.2% 135188 sched_debug.cfs_rq:/.exec_clock.avg
40766 ± 15% +234.3% 136267 sched_debug.cfs_rq:/.exec_clock.max
21244 ± 31% +532.9% 134460 sched_debug.cfs_rq:/.exec_clock.min
6932 ± 61% -91.4% 598.66 ± 23% sched_debug.cfs_rq:/.exec_clock.stddev
133200 ± 12% -19.2% 107580 ± 11% sched_debug.cfs_rq:/.load.avg
428355 ± 19% -45.5% 233398 ± 5% sched_debug.cfs_rq:/.load.max
1049 ±173% +2732.7% 29729 ± 59% sched_debug.cfs_rq:/.load.min
167434 ± 15% -59.4% 67945 ± 10% sched_debug.cfs_rq:/.load.stddev
131.83 ± 3% +53.6% 202.49 ± 7% sched_debug.cfs_rq:/.load_avg.avg
10.75 ± 30% +684.5% 84.33 ± 4% sched_debug.cfs_rq:/.load_avg.min
91234 +1027.5% 1028659 sched_debug.cfs_rq:/.min_vruntime.avg
119056 ± 11% +777.7% 1044927 sched_debug.cfs_rq:/.min_vruntime.max
61125 ± 28% +1568.9% 1020110 sched_debug.cfs_rq:/.min_vruntime.min
0.45 ± 8% +68.6% 0.76 ± 10% sched_debug.cfs_rq:/.nr_running.avg
0.04 ±173% +600.0% 0.29 ± 47% sched_debug.cfs_rq:/.nr_running.min
0.46 ± 2% -33.8% 0.31 ± 22% sched_debug.cfs_rq:/.nr_running.stddev
0.33 ± 34% +379.7% 1.60 ± 16% sched_debug.cfs_rq:/.nr_spread_over.avg
1.38 ± 36% +609.1% 9.75 ± 19% sched_debug.cfs_rq:/.nr_spread_over.max
0.51 ± 30% +531.9% 3.23 ± 16% sched_debug.cfs_rq:/.nr_spread_over.stddev
60.99 ± 4% +60.9% 98.14 ± 13% sched_debug.cfs_rq:/.runnable_load_avg.avg
0.88 ±173% +2290.5% 20.92 ± 27% sched_debug.cfs_rq:/.runnable_load_avg.min
124179 ± 10% -25.1% 93004 ± 12% sched_debug.cfs_rq:/.runnable_weight.avg
403393 ± 19% -47.8% 210725 ± 3% sched_debug.cfs_rq:/.runnable_weight.max
1043 ±173% +2259.9% 24631 ± 36% sched_debug.cfs_rq:/.runnable_weight.min
159573 ± 15% -63.2% 58712 ± 8% sched_debug.cfs_rq:/.runnable_weight.stddev
-15806 -119.9% 3149 ± 84% sched_debug.cfs_rq:/.spread0.avg
-45918 -88.2% -5402 sched_debug.cfs_rq:/.spread0.min
371.44 ± 2% +154.2% 944.31 ± 3% sched_debug.cfs_rq:/.util_avg.avg
666.54 ± 5% +128.2% 1520 ± 12% sched_debug.cfs_rq:/.util_avg.max
145.92 ± 18% +193.5% 428.33 ± 50% sched_debug.cfs_rq:/.util_avg.min
54.83 ± 18% +890.8% 543.29 ± 17% sched_debug.cfs_rq:/.util_est_enqueued.avg
286.88 ± 22% +361.0% 1322 ± 17% sched_debug.cfs_rq:/.util_est_enqueued.max
0.04 ±173% +4200.0% 1.79 ± 82% sched_debug.cfs_rq:/.util_est_enqueued.min
102.64 ± 19% +332.3% 443.73 ± 11%
sched_debug.cfs_rq:/.util_est_enqueued.stddev
184670 ± 24% -32.9% 124000 ± 14% sched_debug.cpu.avg_idle.avg
0.71 ± 8% +53.0% 1.09 ± 12% sched_debug.cpu.clock.stddev
0.71 ± 8% +53.0% 1.09 ± 12% sched_debug.cpu.clock_task.stddev
63.14 ± 9% +65.0% 104.20 ± 13% sched_debug.cpu.cpu_load[0].avg
1.00 ±173% +3254.2% 33.54 ± 64% sched_debug.cpu.cpu_load[0].min
55.03 ± 5% +88.8% 103.88 ± 10% sched_debug.cpu.cpu_load[1].avg
1.46 ± 64% +2714.3% 41.04 ± 36% sched_debug.cpu.cpu_load[1].min
44.93 ± 5% +134.9% 105.54 ± 7% sched_debug.cpu.cpu_load[2].avg
166.67 ± 10% +26.8% 211.25 ± 10% sched_debug.cpu.cpu_load[2].max
3.79 ± 23% +1485.7% 60.12 ± 16% sched_debug.cpu.cpu_load[2].min
37.43 ± 5% +184.7% 106.55 ± 5% sched_debug.cpu.cpu_load[3].avg
114.21 ± 5% +53.6% 175.37 ± 6% sched_debug.cpu.cpu_load[3].max
5.42 ± 20% +1275.4% 74.50 ± 9% sched_debug.cpu.cpu_load[3].min
31.70 ± 6% +235.6% 106.41 ± 2% sched_debug.cpu.cpu_load[4].avg
82.75 ± 11% +83.2% 151.62 ± 5% sched_debug.cpu.cpu_load[4].max
5.12 ± 25% +1507.3% 82.38 ± 6% sched_debug.cpu.cpu_load[4].min
686.12 ± 6% +60.2% 1099 ± 11% sched_debug.cpu.curr->pid.avg
1034 -17.2% 856.53 ± 8% sched_debug.cpu.curr->pid.stddev
168360 ± 16% -53.8% 77820 ± 35% sched_debug.cpu.load.stddev
0.00 ± 17% +54.4% 0.00 ± 9% sched_debug.cpu.next_balance.stddev
0.55 ± 12% +77.4% 0.98 ± 10% sched_debug.cpu.nr_running.avg
15742605 -69.1% 4866675 ± 4% sched_debug.cpu.nr_switches.avg
21230040 ± 5% -75.7% 5158061 ± 3% sched_debug.cpu.nr_switches.max
10293994 ± 16% -56.3% 4500141 ± 4% sched_debug.cpu.nr_switches.min
3507942 ± 24% -93.7% 219305 ± 8% sched_debug.cpu.nr_switches.stddev
0.66 ± 11% -58.7% 0.27 ± 38% sched_debug.cpu.nr_uninterruptible.avg
15739725 -68.2% 5006936 ± 4% sched_debug.cpu.sched_count.avg
21227365 ± 5% -75.0% 5297524 ± 3% sched_debug.cpu.sched_count.max
10291630 ± 16% -55.0% 4636270 ± 4% sched_debug.cpu.sched_count.min
3508025 ± 24% -93.7% 220391 ± 7% sched_debug.cpu.sched_count.stddev
7867901 -69.6% 2388502 ± 4% sched_debug.cpu.sched_goidle.avg
10611894 ± 5% -76.1% 2534509 ± 3% sched_debug.cpu.sched_goidle.max
5143691 ± 16% -57.1% 2206080 ± 5% sched_debug.cpu.sched_goidle.min
1754105 ± 24% -93.8% 109474 ± 9% sched_debug.cpu.sched_goidle.stddev
7918259 -66.8% 2627741 ± 3% sched_debug.cpu.ttwu_count.avg
11316762 ± 14% -75.5% 2772370 ± 4% sched_debug.cpu.ttwu_count.max
2365022 ± 51% -95.7% 101878 ± 52% sched_debug.cpu.ttwu_count.stddev
158517 ± 24% -35.4% 102386 ± 3% sched_debug.cpu.ttwu_local.max
31298 ± 60% -79.5% 6427 ± 40% sched_debug.cpu.ttwu_local.stddev
will-it-scale.per_thread_ops
250000 +-+----------------------------------------------------------------+
| |
| O O O |
200000 O-+O O O O |
| O |
| |
150000 +-+ |
| |
100000 +-+ +..+.. |
| .+ +..+... .. +.. |
| +..+...+. : : + +..+...+..+..+..+..+...+..+..|
50000 +-+ : : |
| : : : |
|: : : |
0 +-+----------------------------------------------------------------+
will-it-scale.workload
1.8e+06 +-+---------------------------------------------------------------+
| O O O O |
1.6e+06 O-+O O O |
1.4e+06 +-+ O |
| |
1.2e+06 +-+ |
1e+06 +-+ |
| |
800000 +-+ ..+ .+..+... |
600000 +-++..+..+. : +..+..+. +.. .+.. .+..+... .+..|
| : : : +..+. +. +..+. |
400000 +-+ : : |
200000 +-+ : : |
|: :: |
0 +-+---------------------------------------------------------------+
will-it-scale.time.user_time
60 +-+--------------------------------------------------------------------+
| |
50 O-+ O O O O |
| O O O O |
| |
40 +-+ |
| |
30 +-+ ..+ .+..+...+..+... |
| +...+..+. : +...+. +..+..+...+..+.. ..+..+...+..|
20 +-+: : : +. |
| : : : |
| : : : |
10 +-+ : : |
|: :: |
0 +-+--------------------------------------------------------------------+
will-it-scale.time.system_time
2500 +-+------------------------------------------------------------------+
| |
O O O O O O O O O |
2000 +-+ |
| |
| |
1500 +-+ |
| |
1000 +-+ |
| |
| |
500 +-++...+..+..+ +..+..+...+..+..+..+...+..+..+...+..+..+..+...+..|
| + + .. |
|+ + . |
0 +-+------------------------------------------------------------------+
will-it-scale.time.percent_of_cpu_this_job_got
800 +-+-------------------------------------------------------------------+
O O O O O O O O O |
700 +-+ |
600 +-+ |
| |
500 +-+ |
| |
400 +-+ |
| |
300 +-+ |
200 +-+ |
| +...+..+..+. +..+...+..+..+...+..+..+...+..+..+...+..+..+...+..|
100 +-+ .. + |
|+ + |
0 +-+-------------------------------------------------------------------+
will-it-scale.time.voluntary_context_switches
1.6e+08 +-+---------------------------------------------------------------+
| .+ .+.. |
1.4e+08 +-+ .+.. .. : +..+.. .+... .+. . .+..|
1.2e+08 +-++. + : : +..+. +..+..+..+..+. +..+. |
| : : : |
1e+08 +-+ : : |
| : : : |
8e+07 +-+ : : |
| : : : |
6e+07 +-+ : : |
4e+07 +-+ : : |
|: : : |
2e+07 O-+O O O O O: O O O |
| : |
0 +-+---------------------------------------------------------------+
will-it-scale.time.involuntary_context_switches
250000 +-+----------------------------------------------------------------+
| |
| O |
200000 O-+ O O |
| O O O |
| O O |
150000 +-+ |
| |
100000 +-+ |
| |
| |
50000 +-+ |
| |
| .+.. ..+..+.. |
0 +-+----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen