Greeting,
FYI, we noticed a 8.9% improvement of netperf.Throughput_Mbps due to commit:
commit: dc86d23b330b040a0d64a9e9c0f2e6fea2dac89a ("mm: page_alloc: High-order per-cpu
page allocator")
https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git
mm-pagealloc-highorder-percpu-v2r4
in testcase: netperf
on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory
with following parameters:
ip: ipv4
runtime: 300s
nr_threads: 25%
cluster: cs-localhost
send_size: 10K
test: SCTP_STREAM_MANY
cpufreq_governor: performance
test-description: Netperf is a benchmark that can be use to measure various aspect of
networking performance.
test-url:
http://www.netperf.org/netperf/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run:
netperf/ipv4-300s-25%-cs-localhost-10K-SCTP_STREAM_MANY-performance/lkp-bdw-de1
v4.9-rc5 dc86d23b330b040a0d64a9e9c0
---------------- --------------------------
%stddev change %stddev
\ | \
14333 9% 15603 netperf.Throughput_Mbps
1574 ± 5% 131% 3630 netperf.time.voluntary_context_switches
386 -5% 365 netperf.time.percent_of_cpu_this_job_got
1152 -5% 1089 netperf.time.system_time
395677 9% 429320 vmstat.system.cs
8.25 17% 9.67 turbostat.RAMWatt
35.55 35.95 turbostat.PkgWatt
2.231e+08 16% 2.583e+08 ± 5% perf-stat.dTLB-load-misses
14144 12% 15850 perf-stat.cpu-migrations
0.02 9% 0.02 ± 5% perf-stat.dTLB-load-miss-rate%
1.203e+08 9% 1.305e+08 perf-stat.context-switches
8.175e+11 7% 8.774e+11 perf-stat.dTLB-stores
0.68 6% 0.72 perf-stat.ipc
1.126e+12 6% 1.197e+12 perf-stat.dTLB-loads
5.804e+11 5% 6.123e+11 perf-stat.branch-instructions
3.105e+12 5% 3.27e+12 perf-stat.instructions
2.129e+09 5% 2.226e+09 perf-stat.iTLB-loads
2.13e+11 -6% 2.003e+11 perf-stat.cache-references
2.13e+11 -6% 2.003e+11 perf-stat.cache-misses
1.624e+09 -26% 1.203e+09 ± 3% perf-stat.branch-misses
0.28 -30% 0.20 ± 3% perf-stat.branch-miss-rate%
perf-stat.branch-misses
1.8e+09 *+----------------------------------------------------------------+
|+ .*. .*. *.*.*.*.*.*.*.*.*. .*. .*.*. .**.*. .*. .*.*.|
1.6e+09 ++* * *.*.*.* *.* * * *.* * *
1.4e+09 ++ |
| O O OO O O O O O O |
1.2e+09 O+O O O O O O O O O O O |
1e+09 ++ |
| |
8e+08 ++ |
6e+08 ++ |
| |
4e+08 ++ |
2e+08 ++ |
| |
0 ++--------O-------------------------------------------------------+
perf-stat.context-switches
1.4e+08 ++----------------------------------------------------------------+
O O O O O O O OO O O O O O O O O O O O O O |
1.2e+08 *+*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*
| |
1e+08 ++ |
| |
8e+07 ++ |
| |
6e+07 ++ |
| |
4e+07 ++ |
| |
2e+07 ++ |
| |
0 ++--------O-------------------------------------------------------+
turbostat.RAMWatt
10 ++O-O-O-O----O-O-O-----O-O---O-O-O--O-O-O-O-O-O------------------------+
9 O+ O O O |
*.*.*.*.*. .*. .*.*.*.*.*.*.*. .*.*.*.*.*.*.*.*.*. .*.*..*.*.*.*.*.*
8 ++ *. *.* *. * |
7 ++ |
| |
6 ++ |
5 ++ |
4 ++ |
| |
3 ++ |
2 ++ |
| |
1 ++ |
0 ++--------O------------------------------------------------------------+
netperf.Throughput_Mbps
16000 O+----O-O---O-O-O-O-O-O-O---O-O-O-O-O-O-O-O-O-----------------------+
*.O.O. .*.*.*.*.*.*.*. .O.*.*.*.*.*. .*.*.*.*.*.*.*.*.*. .*.*.*.*.*
14000 ++ *.* * * * |
12000 ++ |
| |
10000 ++ |
| |
8000 ++ |
| |
6000 ++ |
4000 ++ |
| |
2000 ++ |
| |
0 ++--------O---------------------------------------------------------+
netperf.time.system_time
1200 ++--------*-------*-----------*--------------------*-----------------+
O.*.O.O.O O.O.O O O.O.*.*.O O O..O.O.O.O.O.O.*.* *.*.*.*.*.*.*.*.*
1000 ++O O O |
| |
| |
800 ++ |
| |
600 ++ |
| |
400 ++ |
| |
| |
200 ++ |
| |
0 ++--------O----------------------------------------------------------+
netperf.time.percent_of_cpu_this_job_got
400 ++--*-*---*-*-*-*--*-*-*---*-*-*-*---*-*-*-*-*---*-*--*---*-*-*-*-*---*
O.O O O O O O O O O O O O O O O O O O O O O * * * |
350 ++ |
300 ++ |
| |
250 ++ |
| |
200 ++ |
| |
150 ++ |
100 ++ |
| |
50 ++ |
| |
0 ++--------O-----------------------------------------------------------+
netperf.time.voluntary_context_switches
4500 ++-------------------------------------------------------------------+
| O O O O O |
4000 O+ O O O O O O O O O O O |
3500 ++ O O O O O |
| |
3000 ++ |
2500 ++ |
| |
2000 *+ |
1500 ++*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
| |
1000 ++ |
500 ++ |
| |
0 ++--------O----------------------------------------------------------+
vmstat.system.cs
450000 ++-----------------------------------------------------------------+
O O O O O O O O O O O O O O O O OO O O O O |
400000 *+*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
350000 ++ |
| |
300000 ++ |
250000 ++ |
| |
200000 ++ |
150000 ++ |
| |
100000 ++ |
50000 ++ |
| |
0 ++--------O--------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong