Hi Matt,
FYI, this may be your expected results from the debug patch.
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/debug
commit 4aa9e831f4b4662b26f4bebc1f3d3be30179d12a
Author: Matt Fleming <matt(a)codeblueprint.co.uk>
AuthorDate: Wed Sep 21 14:38:13 2016 +0100
Commit: Peter Zijlstra <peterz(a)infradead.org>
CommitDate: Fri Dec 16 21:24:39 2016 +0100
sched/core: Add debug code to catch missing update_rq_clock()
There's no diagnostic checks for figuring out when we've accidentally
missed update_rq_clock() calls. Let's add some by piggybacking on the
rq_*pin_lock() wrappers.
The idea behind the diagnostic checks is that upon pining rq lock the
rq clock should be updated, via update_rq_clock(), before anybody
reads the clock with rq_clock() or rq_clock_task().
The exception to this rule is when updates have explicitly been
disabled with the rq_clock_skip_update() optimisation.
There are some functions that only unpin the rq lock in order to grab
some other lock and avoid deadlock. In that case we don't need to
update the clock again and the previous diagnostic state can be
carried over in rq_repin_lock() by saving the state in the rq_flags
context.
Since this patch adds a new clock update flag and some already exist
in rq::clock_skip_update, that field has now been renamed. An attempt
has been made to keep the flag manipulation code small and fast since
it's used in the heart of the __schedule() fast path.
For the !CONFIG_SCHED_DEBUG case the only object code change (other
than addresses) is the following change to reset RQCF_ACT_SKIP inside
of __schedule(),
- c7 83 38 09 00 00 00 movl $0x0,0x938(%rbx)
- 00 00 00
+ 83 a3 38 09 00 00 fc andl $0xfffffffc,0x938(%rbx)
Cc: Yuyang Du <yuyang.du(a)intel.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Luca Abeni <luca.abeni(a)unitn.it>
Cc: Wanpeng Li <wanpeng.li(a)hotmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: Byungchul Park <byungchul.park(a)lge.com>
Cc: Frederic Weisbecker <fweisbec(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com>
Suggested-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link:
http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.uk
+-------------------------------------------------------+------------+------------+------------+
| | 527bb8647b | 4aa9e831f4 |
abdf6489a5 |
+-------------------------------------------------------+------------+------------+------------+
| boot_successes | 66 | 0 | 0
|
| boot_failures | 0 | 22 | 13
|
| WARNING:at_kernel/sched/sched.h:#attach_entity_cfs_rq | 0 | 22 | 13
|
+-------------------------------------------------------+------------+------------+------------+
[ 0.658154] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.660410] Freeing SMP alternatives memory: 16K
[ 0.663773] ------------[ cut here ]------------
[ 0.664276] WARNING: CPU: 0 PID: 0 at kernel/sched/sched.h:804
attach_entity_cfs_rq+0xd31/0x11b0
[ 0.665396] rq->clock_update_flags < RQCF_ACT_SKIP
[ 0.665854] Modules linked in:
[ 0.666201] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-02686-g4aa9e83 #1
[ 0.666897] Call Trace:
[ 0.667179] dump_stack+0x83/0xb3
[ 0.667568] ? attach_entity_cfs_rq+0xd31/0x11b0
[ 0.668034] __warn+0x13d/0x160
git bisect start abdf6489a5f073e70c6fa6d6bf420eb579e08381
69973b830859bc6529a7a0468ba0d80ee5117826 --
git bisect bad 0afefecb9930e9e849a9d98b83a82eb248f0cfe7 # 17:50 0- 5 Merge
'jpirko-mlxsw/combined_queue' into devel-spot-201612171441
git bisect good ca7d27b9a6bbafade95bcf24fdf4741e3c30e50b # 18:03 22+ 0 Merge
'linux-review/Sudip-Mukherjee/usb-atm-cxacru-remove-impossible-condition/20161217-081754'
into devel-spot-201612171441
git bisect good 64f3ae7e6a84f5c315b786e2ecffccc82a3e6a38 # 18:12 22+ 0 Merge
'miklos-vfs/next' into devel-spot-201612171441
git bisect bad ade1719dd918143d65cb2b22ebd9610d1a2f7a72 # 18:20 0- 1 Merge
'peterz-queue/sched/debug' into devel-spot-201612171441
git bisect good 597fa42b6a80f7179dfdb13c7a9231efbbc25baf # 18:29 22+ 0 Merge
'linux-review/Radim-Kr-m/KVM-x86-minor-irqchip-improvements-API-change/20161217-043342'
into devel-spot-201612171441
git bisect good beb3aa7a2adc1714ecead2aff44d28dc2accdca8 # 18:39 21+ 0 Merge
'tile/master' into devel-spot-201612171441
git bisect good e9d28f9d9f7c8fcf58e0d6a631689d53fe003abb # 18:51 22+ 0 Merge
'peterz-queue/master' into devel-spot-201612171441
git bisect good f90df7c568ab27b934952ea6b840fbec6580e024 # 19:03 22+ 0
sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
git bisect bad 4aa9e831f4b4662b26f4bebc1f3d3be30179d12a # 19:11 0- 7
sched/core: Add debug code to catch missing update_rq_clock()
git bisect good 527bb8647b2c15275495474cb8b5db4533ca3bff # 19:21 20+ 0
sched/fair: Push rq lock pin/unpin into idle_balance()
# first bad commit: [4aa9e831f4b4662b26f4bebc1f3d3be30179d12a] sched/core: Add debug code
to catch missing update_rq_clock()
git bisect good 527bb8647b2c15275495474cb8b5db4533ca3bff # 19:24 66+ 0
sched/fair: Push rq lock pin/unpin into idle_balance()
# extra tests with CONFIG_DEBUG_INFO_REDUCED
git bisect bad 4aa9e831f4b4662b26f4bebc1f3d3be30179d12a # 19:35 0- 8
sched/core: Add debug code to catch missing update_rq_clock()
# extra tests on HEAD of linux-devel/devel-spot-201612171441
git bisect bad abdf6489a5f073e70c6fa6d6bf420eb579e08381 # 19:36 0- 13 0day
head guard for 'devel-spot-201612171441'
# extra tests on tree/branch peterz-queue/sched/debug
git bisect good 597f6fc362a11b362050939a734b954fcb285795 # 19:47 66+ 0 sched:
Avoid double update_rq_clock()
# extra tests on tree/branch linus/master
git bisect good 59331c215daf600a650e281b6e8ef3e1ed1174c2 # 19:57 66+ 0 Merge
tag 'ceph-for-4.10-rc1' of
git://github.com/ceph/ceph-client
# extra tests on tree/branch linux-next/master
git bisect good bf579a3afa46c74b7e89930974ba119d4c76bab2 # 20:07 66+ 0 Add
linux-next specific files for 20161216
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation