Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
https://github.com/0day-ci/linux Borislav-Petkov/x86-Optimize-clear_page/20170210-053052
commit 0ad07c8104eb5c12dfcb86581c1cc657183496cc
Author: Borislav Petkov <bp(a)suse.de>
AuthorDate: Thu Feb 9 20:51:25 2017 +0100
Commit: 0day robot <fengguang.wu(a)intel.com>
CommitDate: Fri Feb 10 05:30:58 2017 +0800
x86: Optimize clear_page()
Currently, we CALL clear_page() which then JMPs to the proper function
chosen by the alternatives.
What we should do instead is CALL the proper function directly. (This
was something Ingo suggested a while ago). So let's do that.
Measuring our favourite kernel build workload shows that there are no
significant changes in performance.
AMD
===
--- /tmp/before 2017-02-09 18:01:46.451961188 +0100
+++ /tmp/after 2017-02-09 18:01:54.883961175 +0100
@@ -1,15 +1,15 @@
Performance counter stats for 'system wide' (5 runs):
- 1028960.373643 cpu-clock (msec) # 6.000 CPUs utilized
( +- 1.41% )
+ 1023086.018961 cpu-clock (msec) # 6.000 CPUs utilized
( +- 1.20% )
- 518,744 context-switches # 0.504 K/sec
( +- 1.04% )
+ 518,254 context-switches # 0.507 K/sec
( +- 1.01% )
- 38,112 cpu-migrations # 0.037 K/sec
( +- 1.95% )
+ 37,917 cpu-migrations # 0.037 K/sec
( +- 1.02% )
- 20,874,266 page-faults # 0.020 M/sec
( +- 0.07% )
+ 20,918,897 page-faults # 0.020 M/sec
( +- 0.18% )
- 2,043,646,230,667 cycles # 1.986 GHz
( +- 0.14% ) (66.67%)
+ 2,045,305,584,032 cycles # 1.999 GHz
( +- 0.16% ) (66.67%)
- 553,698,855,431 stalled-cycles-frontend # 27.09% frontend cycles idle
( +- 0.07% ) (66.67%)
+ 555,099,401,413 stalled-cycles-frontend # 27.14% frontend cycles idle
( +- 0.13% ) (66.67%)
- 621,544,286,390 stalled-cycles-backend # 30.41% backend cycles idle
( +- 0.39% ) (66.67%)
+ 621,371,430,254 stalled-cycles-backend # 30.38% backend cycles idle
( +- 0.32% ) (66.67%)
- 1,738,364,431,659 instructions # 0.85 insn per cycle
+ 1,739,895,771,901 instructions # 0.85 insn per cycle
- # 0.36 stalled cycles per
insn ( +- 0.11% ) (66.67%)
+ # 0.36 stalled cycles per
insn ( +- 0.13% ) (66.67%)
- 391,170,943,850 branches # 380.161 M/sec
( +- 0.13% ) (66.67%)
+ 391,398,551,757 branches # 382.567 M/sec
( +- 0.13% ) (66.67%)
- 22,567,810,411 branch-misses # 5.77% of all branches
( +- 0.11% ) (66.67%)
+ 22,574,726,683 branch-misses # 5.77% of all branches
( +- 0.13% ) (66.67%)
- 171.480741921 seconds time elapsed (
+- 1.41% )
+ 170.509229451 seconds time elapsed (
+- 1.20% )
Intel
=====
--- /tmp/before 2017-02-09 20:36:19.851947473 +0100
+++ /tmp/after 2017-02-09 20:36:30.151947458 +0100
@@ -1,15 +1,15 @@
Performance counter stats for 'system wide' (5 runs):
- 2207248.598126 cpu-clock (msec) # 8.000 CPUs utilized
( +- 0.69% )
+ 2213300.106631 cpu-clock (msec) # 8.000 CPUs utilized
( +- 0.73% )
- 899,342 context-switches # 0.407 K/sec
( +- 0.68% )
+ 898,381 context-switches # 0.406 K/sec
( +- 0.79% )
- 80,553 cpu-migrations # 0.036 K/sec
( +- 1.13% )
+ 80,979 cpu-migrations # 0.037 K/sec
( +- 1.11% )
- 36,171,148 page-faults # 0.016 M/sec
( +- 0.02% )
+ 36,179,791 page-faults # 0.016 M/sec
( +- 0.02% )
- 6,665,288,826,484 cycles # 3.020 GHz
( +- 0.07% ) (83.33%)
+ 6,671,638,410,799 cycles # 3.014 GHz
( +- 0.06% ) (83.33%)
- 5,065,975,115,197 stalled-cycles-frontend # 76.01% frontend cycles idle
( +- 0.11% ) (83.33%)
+ 5,076,835,183,223 stalled-cycles-frontend # 76.10% frontend cycles idle
( +- 0.11% ) (83.33%)
- 3,841,556,350,614 stalled-cycles-backend # 57.64% backend cycles idle
( +- 0.13% ) (66.67%)
+ 3,852,823,974,333 stalled-cycles-backend # 57.75% backend cycles idle
( +- 0.12% ) (66.67%)
- 4,148,398,171,079 instructions # 0.62 insn per cycle
+ 4,148,997,156,059 instructions # 0.62 insn per cycle
- # 1.22 stalled cycles per
insn ( +- 0.10% ) (83.33%)
+ # 1.22 stalled cycles per
insn ( +- 0.11% ) (83.33%)
- 887,187,118,591 branches # 401.943 M/sec
( +- 0.09% ) (83.33%)
+ 887,271,341,121 branches # 400.882 M/sec
( +- 0.11% ) (83.33%)
- 30,139,439,034 branch-misses # 3.40% of all branches
( +- 0.09% ) (83.33%)
+ 30,134,864,997 branch-misses # 3.40% of all branches
( +- 0.06% ) (83.33%)
- 275.904405540 seconds time elapsed (
+- 0.69% )
+ 276.660352016 seconds time elapsed (
+- 0.73% )
allmodconfig vmlinux size grows by a ~1Kb but that's fine - we optimize
our calling of the clear_page variants.
text data bss dec hex filename
9051979 23067670 27009024 59128673 3863b61 vmlinux
9053000 23067670 27009024 59129694 3863f5e vmlinux.clear_page
Signed-off-by: Borislav Petkov <bp(a)suse.de>
+-------------------------------------------------------------------+------------+------------+------------+
| | 10b9dd5686 |
0ad07c8104 | 00667aaf17 |
+-------------------------------------------------------------------+------------+------------+------------+
| boot_successes | 20 | 0
| 0 |
| boot_failures | 46 | 26
| 21 |
| Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 46 |
| |
| BUG:unable_to_handle_kernel | 0 | 26
| 21 |
| Oops | 0 | 26
| 21 |
| RIP:clear_page_orig | 0 | 4
| |
| calltrace:netlink_proto_init | 0 | 26
| |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 26
| 21 |
| RIP:clear_page_rep | 0 | 20
| |
| BUG:kernel_in_stage | 0 | 1
| 5 |
| RIP:clear_page_erms | 0 | 2
| |
| BUG:kernel_reboot-without-warning_in_early-boot_stage,last_printk | 0 | 1
| |
| BUG:kernel_hang_in_test_stage | 0 | 0
| 2 |
+-------------------------------------------------------------------+------------+------------+------------+
[ 0.324616] gcov: version magic: 0x3630322a
[ 0.329304] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns:
1911260446275000 ns
[ 0.331770] atomic64_test: passed for x86-64 platform with CX8 and with SSE
[ 0.334605] BUG: unable to handle kernel paging request at 000000001ea0d000
[ 0.336767] IP: [<ffffffff814050b7>] clear_page_rep+0x7/0x10
[ 0.338200] PGD 0
[ 0.338642]
[ 0.339102] Oops: 0002 [#1] PREEMPT SMP
[ 0.339935] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc6-00134-g0ad07c8 #1
[ 0.341521] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3-20161025_171302-gandalf 04/01/2014
[ 0.343839] task: ffff88001e942040 task.stack: ffffc900000d0000
[ 0.345137] RIP: 0010:[<ffffffff814050b7>] [<ffffffff814050b7>]
clear_page_rep+0x7/0x10
[ 0.346999] RSP: 0000:ffffc900000d3b30 EFLAGS: 00010246
[ 0.348235] RAX: 0000000000000000 RBX: ffff88001f3cc340 RCX: 0000000000000200
[ 0.349924] RDX: ffffffff83a37440 RSI: ffff88001ec24000 RDI: 000000001ea0d000
[ 0.351027] RBP: ffffc900000d3be0 R08: 00000000001d7e50 R09: ffff88001e942040
[ 0.352112] R10: 0000000000000001 R11: 000000006a64d04b R12: ffff88001e942040
[ 0.353272] R13: ffff88001e942040 R14: ffff88001f3cc400 R15: ffff88001fca4600
[ 0.354322] FS: 0000000000000000(0000) GS:ffff88001fa00000(0000)
knlGS:0000000000000000
[ 0.355474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.356398] CR2: 000000001ea0d000 CR3: 000000000240c000 CR4: 00000000000006f0
[ 0.357463] Stack:
[ 0.357775] ffffffff8120f231 0000000000000008 ffff88001e801500 ffff88001fca4a90
[ 0.359005] 0000000000000014 0000000000000002 ffffffff00000000 0000000000000000
[ 0.360091] ffffc900000d3bd8 0000000000000202 ffff88001fca4600 ffff88001fca4600
[ 0.361188] Call Trace:
[ 0.361520] [<ffffffff8120f231>] ? get_page_from_freelist+0x991/0xe30
[ 0.362432] [<ffffffff812100da>] __alloc_pages_nodemask+0x2aa/0x1520
[ 0.363269] [<ffffffff81052e55>] ? unwind_next_frame+0x35/0x60
[ 0.364063] [<ffffffff81033cc9>] ? __save_stack_trace+0xe9/0x150
[ 0.364870] [<ffffffff8102c941>] ? sched_clock+0x11/0x20
[ 0.365789] [<ffffffff81a0d336>] ? proto_register+0x26/0x2f0
[ 0.366547] [<ffffffff812692b9>] alloc_page_interleave+0x49/0xc0
[ 0.367413] [<ffffffff8126bcb4>] alloc_pages_current+0x1f4/0x2b0
[ 0.368302] [<ffffffff8123463b>] kmalloc_order+0x1b/0x90
[ 0.369254] [<ffffffff82d599c1>] netlink_proto_init+0x78/0x289
[ 0.370231] [<ffffffff82d59949>] ? netlink_net_init+0x90/0x90
[ 0.371251] [<ffffffff82ca03a4>] do_one_initcall+0x113/0x28a
[ 0.372374] [<ffffffff82ca09b4>] kernel_init_freeable+0x499/0x630
[ 0.374274] [<ffffffff81c19af0>] ? rest_init+0x120/0x120
[ 0.375786] [<ffffffff81c19b01>] kernel_init+0x11/0x1d0
[ 0.377166] [<ffffffff81c25b45>] ret_from_fork+0x25/0x30
[ 0.378583] Code: 8d 44 24 18 4c 89 4c 24 40 c7 04 24 10 00 00 00 48 89 44 24 10 e8 3a
f1 ff ff c9 c3 90 90 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44
00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00
[ 0.385298] RIP [<ffffffff814050b7>] clear_page_rep+0x7/0x10
[ 0.386686] RSP <ffffc900000d3b30>
[ 0.387493] CR2: 000000001ea0d000
[ 0.388277] ---[ end trace a7f1f3ffa0ea1d28 ]---
[ 0.389358] Kernel panic - not syncing: Fatal exception
git bisect start 00667aaf171632d226c7c7fca267a079f73b9931
d5adbfcd5f7bcc6fa58a41c5c5ada0e5c826ce2c --
git bisect good 0feccff1c5df3fd2a3479a336a4728829685386a # 20:27 21+ 21 Merge
'pci/pci/aer' into devel-spot-201702101513
git bisect bad 38ef0935885d068ebfe6fa006209f5f28b2730b0 # 20:40 0- 2 Merge
'linux-review/Or-Gerlitz/net-mlx5e-Add-preemption-enable-disable-around-TC-statistics-upcall/20170210-005814'
into devel-spot-201702101513
git bisect bad eeb64cd65c7de9ca4ee56377703a0b80349cc2ea # 20:58 0- 12 Merge
'arm64/for-next/core' into devel-spot-201702101513
git bisect good 4d039269f20b4a151b4055f214c1ef83b069e981 # 03:14 22+ 19 Merge
'linux-review/Ben-Gardner/eeprom-at24-use-device_property_-functions-instead-of-of_get_property/20170210-060909'
into devel-spot-201702101513
git bisect bad 93cfd48226219b636fa1b7ea0f5570c69a529e8a # 03:57 0- 4 Merge
'linux-review/Philippe-Reynes/net-micrel-ksz884x-use-new-api-ethtool_-get-set-_link_ksettings/20170210-052528'
into devel-spot-201702101513
git bisect good a6859423fcceb31c4f420a0e7d992c0969d94991 # 04:15 21+ 22 Merge
'linux-review/Avraham-Shukron/staging-omap4iss-fix-multiline-comment-style/20170210-050947'
into devel-spot-201702101513
git bisect bad 5a4dc79a5a99841901be08ff8ce58f3ab70426b6 # 04:27 0- 5 Merge
'linux-review/Borislav-Petkov/x86-Optimize-clear_page/20170210-053052' into
devel-spot-201702101513
git bisect bad 0ad07c8104eb5c12dfcb86581c1cc657183496cc # 04:50 0- 1 x86:
Optimize clear_page()
# first bad commit: [0ad07c8104eb5c12dfcb86581c1cc657183496cc] x86: Optimize clear_page()
git bisect good 10b9dd56860e93f11cd352e8c75a33357b80b70b # 05:29 66+ 46 Merge
tag 'nfs-for-4.9-4' of
git://git.linux-nfs.org/projects/anna/linux-nfs
# extra tests with CONFIG_DEBUG_INFO_REDUCED
git bisect bad 0ad07c8104eb5c12dfcb86581c1cc657183496cc # 05:42 0- 32 x86:
Optimize clear_page()
# extra tests on HEAD of linux-devel/devel-spot-201702101513
git bisect bad 00667aaf171632d226c7c7fca267a079f73b9931 # 05:42 0- 21 0day
head guard for 'devel-spot-201702101513'
# extra tests on tree/branch
linux-review/Borislav-Petkov/x86-Optimize-clear_page/20170210-053052
git bisect bad 0ad07c8104eb5c12dfcb86581c1cc657183496cc # 05:44 0- 26 x86:
Optimize clear_page()
# extra tests with first bad commit reverted
git bisect good d9d04c9a587a9963e6df6b6b6622cd0eaa0a6d2a # 06:25 66+ 35 Revert
"x86: Optimize clear_page()"
# extra tests on tree/branch linus/master
git bisect good 3d88460dbd285e7f32437b530d5bb7cb916142fa # 06:55 64+ 46 Merge
tag 'drm-fixes-for-v4.10-rc8' of
git://people.freedesktop.org/~airlied/linux
# extra tests on tree/branch linux-next/master
git bisect good 632571b1bee00494aef749512d9f3290dfba0ead # 07:10 63+ 38 Add
linux-next specific files for 20170210
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation