[do_execve] attempted to set unsupported pgprot
by Fengguang Wu
Hello,
FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.
It occurs in 4 out of 4 boots.
[ 12.345562] Write protecting the kernel text: 14376k
[ 12.346649] Write protecting the kernel read-only data: 4740k
[ 12.347584] rodata_test: all tests were successful
[ 12.348499] ------------[ cut here ]------------
[ 12.349193] attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff
[ 12.350792] WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0:
check_pgprot at arch/x86/include/asm/pgtable.h:535
(inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549
(inlined by) do_anonymous_page at mm/memory.c:3169
(inlined by) handle_pte_fault at mm/memory.c:3961
(inlined by) __handle_mm_fault at mm/memory.c:4087
(inlined by) handle_mm_fault at mm/memory.c:4124
[ 12.352294] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #172
[ 12.353357] EIP: handle_mm_fault+0xfc1/0xfe0:
check_pgprot at arch/x86/include/asm/pgtable.h:535
(inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549
(inlined by) do_anonymous_page at mm/memory.c:3169
(inlined by) handle_pte_fault at mm/memory.c:3961
(inlined by) __handle_mm_fault at mm/memory.c:4087
(inlined by) handle_mm_fault at mm/memory.c:4124
[ 12.354047] EFLAGS: 00210296 CPU: 0
[ 12.354581] EAX: 00000068 EBX: 80000000 ECX: 00000002 EDX: 00000038
[ 12.362768] ESI: 00000025 EDI: cc679000 EBP: cf02de70 ESP: cf02ddf0
[ 12.363657] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 12.364458] CR0: 80050033 CR2: c1e1e110 CR3: 024a6000 CR4: 000006b0
[ 12.365413] Call Trace:
[ 12.365885] ? touch_atime+0x5e/0xa0:
touch_atime at include/linux/fs.h:1502
[ 12.366451] __get_user_pages+0x104/0x3a0:
faultin_page at mm/gup.c:503
(inlined by) __get_user_pages at mm/gup.c:699
[ 12.374172] get_user_pages_remote+0xf3/0x1c0:
__get_user_pages_locked at mm/gup.c:872
(inlined by) get_user_pages_remote at mm/gup.c:1062
[ 12.374959] copy_strings+0x12d/0x380:
get_arg_page at fs/exec.c:218
(inlined by) copy_strings at fs/exec.c:557
[ 12.375478] copy_strings_kernel+0x26/0x40:
set_fs at arch/x86/include/asm/uaccess.h:32
(inlined by) copy_strings_kernel at fs/exec.c:603
[ 12.376087] do_execveat_common+0x486/0x6a0:
do_execveat_common at fs/exec.c:1803
[ 12.376718] ? rest_init+0x100/0x100:
kernel_init at init/main.c:1050
[ 12.377275] do_execve+0x14/0x20:
do_execve at fs/exec.c:1863
[ 12.377769] run_init_process+0x1c/0x20:
run_init_process at init/main.c:1006
[ 12.378364] kernel_init+0x46/0x100:
kernel_init at init/main.c:1067
[ 12.378876] ret_from_fork+0x2e/0x40:
ret_from_fork at arch/x86/entry/entry_32.S:311
[ 12.379408] Code: ff c6 05 55 fc 3b c2 01 52 50 8b 5d b0 89 4d a0 31 f3 89 d8 8b 5d b4 31 cb 53 50 ff 75 b4 ff 75 b0 68 18 b0 15 c2 e8 bf cc f0 ff <0f> 0b 8b 4d a0 83 c4 1c 89 75 b0 89 4d b4 e9 ee fa ff ff be 02
[ 12.382173] ---[ end trace 0276fe2191187186 ]---
[ 12.383485] ------------[ cut here ]------------
[ 12.383485] ------------[ cut here ]------------
[ 12.384245] attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff
[ 12.391613] WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/pgtable.h:540 change_protection_range+0x721/0x8a0:
check_pgprot at arch/x86/include/asm/pgtable.h:535
(inlined by) pte_modify at arch/x86/include/asm/pgtable.h:573
(inlined by) change_pte_range at mm/mprotect.c:114
(inlined by) change_pmd_range at mm/mprotect.c:210
(inlined by) change_pud_range at mm/mprotect.c:238
(inlined by) change_p4d_range at mm/mprotect.c:258
(inlined by) change_protection_range at mm/mprotect.c:283
[ 12.393324] CPU: 0 PID: 1 Comm: init Tainted: G W 4.17.0-rc1 #172
[ 12.394400] EIP: change_protection_range+0x721/0x8a0:
check_pgprot at arch/x86/include/asm/pgtable.h:535
(inlined by) pte_modify at arch/x86/include/asm/pgtable.h:573
(inlined by) change_pte_range at mm/mprotect.c:114
(inlined by) change_pmd_range at mm/mprotect.c:210
(inlined by) change_pud_range at mm/mprotect.c:238
(inlined by) change_p4d_range at mm/mprotect.c:258
(inlined by) change_protection_range at mm/mprotect.c:283
[ 12.395106] EFLAGS: 00010292 CPU: 0
[ 12.395607] EAX: 00000068 EBX: 00000000 ECX: 00000003 EDX: 00000054
[ 12.396527] ESI: b7f40000 EDI: cc67ea00 EBP: cf02df10 ESP: cf02de60
[ 12.397472] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 12.398286] CR0: 80050033 CR2: b7f40dc4 CR3: 0c698000 CR4: 000006b0
[ 12.399207] Call Trace:
[ 12.399582] mprotect_fixup+0xba/0x240:
mprotect_fixup at mm/mprotect.c:389
[ 12.400109] sys_mprotect+0x13d/0x220:
do_mprotect_pkey at mm/mprotect.c:507
(inlined by) __do_sys_mprotect at mm/mprotect.c:531
(inlined by) __se_sys_mprotect at mm/mprotect.c:528
[ 12.400637] do_int80_syscall_32+0x49/0x140:
do_syscall_32_irqs_on at arch/x86/entry/common.c:331
(inlined by) do_int80_syscall_32 at arch/x86/entry/common.c:346
[ 12.401250] entry_INT80_32+0x31/0x31:
restore_all at arch/x86/entry/entry_32.S:551
[ 12.401802] EIP: 0xb7f858fd
[ 12.402237] EFLAGS: 00000206 CPU: 0
[ 12.402761] EAX: ffffffda EBX: b7f40000 ECX: 00001000 EDX: 00000001
[ 12.403681] ESI: b7f7d8b0 EDI: b7f7d8b0 EBP: bff7cd94 ESP: bff7cd88
[ 12.404633] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 12.405416] Code: 89 9d 74 ff ff ff 8b 45 88 89 85 78 ff ff ff ff b5 78 ff ff ff ff b5 74 ff ff ff ff 75 0c ff 75 08 68 18 b0 15 c2 e8 9f 48 f0 ff <0f> 0b 83 c4 1c 8b 85 70 ff ff ff 8b 95 6c ff ff ff e9 27 fe ff
[ 12.408185] ---[ end trace 0276fe2191187187 ]---
[ 12.422790] init: Console is alive
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months
BUG: Bad page map in process python2 pte:10000000000 pmd:17e8be067
by Fengguang Wu
Hello,
FYI this happens in mainline kernel and dates back at least to
v4.16-rc5 . It's rather rare and beyond bisect.
[ 716.494065] PASS concurrent_autogo_5ghz_ht40 4.803608 2018-03-23 09:57:21.586794
[ 716.494069]
[ 716.496923] passed all 1 test case(s)
[ 716.496926]
[ 716.511702] swap_info_get: Bad swap file entry 04000000
[ 716.512731] BUG: Bad page map in process python2 pte:10000000000 pmd:17e8be067
[ 716.513844] addr:00000000860ba23b vm_flags:00000070 anon_vma: (null) mapping:000000004c76fece index:1e2
[ 716.515160] file:libpcre.so.3.13.3 fault:filemap_fault mmap:generic_file_mmap readpage:simple_readpage
[ 716.516418] CPU: 2 PID: 8907 Comm: python2 Not tainted 4.16.0-rc5 #1
[ 716.517533] Hardware name: /DH67GD, BIOS BLH6710H.86A.0132.2011.1007.1505 10/07/2011
[ 716.518717] Call Trace:
[ 716.519647] dump_stack+0x5c/0x7c:
dump_stack at lib/dump_stack.c:55
[ 716.520619] print_bad_pte+0x1de/0x290:
print_bad_pte at mm/memory.c:776 (discriminator 12)
[ 716.521609] ? __swap_info_get+0x8e/0xa0:
__swap_info_get at mm/swapfile.c:1063
[ 716.522596] unmap_page_range+0x803/0xa20:
zap_pte_range at mm/memory.c:1384
(inlined by) zap_pmd_range at mm/memory.c:1441
(inlined by) zap_pud_range at mm/memory.c:1470
(inlined by) zap_p4d_range at mm/memory.c:1491
(inlined by) unmap_page_range at mm/memory.c:1512
[ 716.523578] unmap_vmas+0x4c/0xa0:
unmap_vmas at mm/memory.c:1586 (discriminator 3)
[ 716.524517] exit_mmap+0x82/0x150:
constant_test_bit at arch/x86/include/asm/bitops.h:328
(inlined by) mm_is_oom_victim at include/linux/oom.h:75
(inlined by) exit_mmap at mm/mmap.c:3022
[ 716.525447] mmput+0x67/0x160:
__mmput at kernel/fork.c:961
(inlined by) mmput at kernel/fork.c:981
[ 716.526354] do_exit+0x2a5/0xb80:
constant_test_bit at arch/x86/include/asm/bitops.h:328
(inlined by) test_ti_thread_flag at include/linux/thread_info.h:79
(inlined by) exit_mm at kernel/exit.c:545
(inlined by) do_exit at kernel/exit.c:852
[ 716.527278] ? __do_page_fault+0x27e/0x4f0:
__do_page_fault at arch/x86/mm/fault.c:1449
[ 716.528250] do_group_exit+0x3a/0xa0:
__read_once_size at include/linux/compiler.h:188
(inlined by) list_empty at include/linux/list.h:203
(inlined by) thread_group_empty at include/linux/sched/signal.h:594
(inlined by) do_group_exit at kernel/exit.c:953
[ 716.529199] SyS_exit_group+0x10/0x10
[ 716.530143] do_syscall_64+0x74/0x190:
do_syscall_64 at arch/x86/entry/common.c:287
[ 716.531089] entry_SYSCALL_64_after_hwframe+0x3d/0xa2:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:246
[ 716.532116] RIP: 0033:0x7fde0df8fa38
[ 716.533070] RSP: 002b:00007ffccf97ba28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 716.534235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fde0df8fa38
[ 716.535373] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 716.536522] RBP: 00007fde0e26e8e0 R08: 00000000000000e7 R09: ffffffffffffff98
[ 716.537663] R10: 00007fde0816f178 R11: 0000000000000246 R12: 00007fde0e26e8e0
[ 716.538799] R13: 00007fde0e273c20 R14: 0000000000000000 R15: 0000000000000000
[ 716.539948] Disabling lock debugging due to kernel taint
[ 716.548523] 2018-03-23 09:57:21 ./run-tests.py concurrent_autogo_crossconnect
[ 716.548526]
Attached the full dmesg and kconfig.
Thanks,
Fengguang
4 years, 2 months
[i915] WARNING: CPU: 2 PID: 183 at drivers/gpu/drm/i915/intel_display.c:14584 intel_modeset_init+0x3be/0x1060
by Fengguang Wu
Hello,
FYI this happens in mainline kernel and at least dates back to v4.13 .
[ 75.245840]
[ 75.247783] /opt/deb/gawk_1%3a4.1.4+dfsg-1_amd64.deb
[ 75.247785]
[ 75.248145] ------------[ cut here ]------------
[ 75.248446] Could not determine valid watermarks for inherited state
[ 75.248889] WARNING: CPU: 2 PID: 183 at drivers/gpu/drm/i915/intel_display.c:14584 intel_modeset_init+0x3be/0x1060 [i915]
[ 75.249372] /opt/deb/sysstat_11.6.0-1_amd64.deb
[ 75.249617] Modules linked in:
[ 75.249619]
[ 75.249883] crc32c_intel(+) ghash_clmulni_intel pcbc wmi_bmof ahci libahci snd_hda_intel i915(+) uvcvideo videobuf2_vmalloc aesni_intel crypto_simd cryptd videobuf2_memops glue_helper snd_hda_codec videobuf2_v4l2 pcspkr videobuf2_common serio_raw drm_kms_helper snd_hda_core snd_hwdep libata snd_pcm syscopyarea sysfillrect sysimgblt fb_sys_fops bcma snd_timer snd soundcore drm videodev shpchp ideapad_laptop wmi sparse_keymap rfkill video ip_tables
[ 75.252346] CPU: 2 PID: 183 Comm: systemd-udevd Not tainted 4.17.0-rc1 #1
[ 75.252718] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
[ 75.253140] (Reading database ... 2202 files and directories currently installed.)
[ 75.253229] RIP: 0010:intel_modeset_init+0x3be/0x1060 [i915]
[ 75.253230]
[ 75.254044] RSP: 0018:ffffc900009f7af0 EFLAGS: 00010282
[ 75.254339] RAX: 0000000000000038 RBX: ffff88011bfc0000 RCX: 0000000000000006
[ 75.254731] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff88011f296910
[ 75.255121] RBP: ffff88011933e400 R08: 0000000000000363 R09: 000000000000000a
[ 75.255512] R10: ffffc900009f7988 R11: ffffffff82d4e10d R12: ffff88011933fc00
[ 75.255898] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff88011bfc0358
[ 75.256290] FS: 00007f0b39a7d8c0(0000) GS:ffff88011f280000(0000) knlGS:0000000000000000
[ 75.256731] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 75.257050] CR2: 00007f599dcec750 CR3: 000000011bdcc004 CR4: 00000000001606e0
[ 75.257111] Preparing to unpack .../opt/deb/debconf_1.5.66_all.deb ...
[ 75.257449] Call Trace:
[ 75.257450]
[ 75.258115] i915_driver_load+0xa99/0xee0 [i915]
[ 75.258388] ? acpi_dev_found+0x5f/0x70:
acpi_dev_found at drivers/acpi/utils.c:737
[ 75.258616] local_pci_probe+0x42/0xa0:
local_pci_probe at drivers/pci/pci-driver.c:304
[ 75.258835] ? _cond_resched+0x19/0x30:
_cond_resched at kernel/sched/core.c:4982
[ 75.259055] pci_device_probe+0x11d/0x180:
pci_call_probe at drivers/pci/pci-driver.c:358
(inlined by) __pci_device_probe at drivers/pci/pci-driver.c:383
(inlined by) pci_device_probe at drivers/pci/pci-driver.c:423
[ 75.259287] driver_probe_device+0x30b/0x480:
really_probe at drivers/base/dd.c:449
(inlined by) driver_probe_device at drivers/base/dd.c:590
[ 75.259531] __driver_attach+0xb8/0xe0:
__driver_attach at drivers/base/dd.c:824
[ 75.259661] Unpacking debconf (1.5.66) over (1.5.59) ...
[ 75.259748] ? driver_probe_device+0x480/0x480:
__driver_attach at drivers/base/dd.c:794
[ 75.259749]
[ 75.260394] bus_for_each_dev+0x65/0x90:
bus_for_each_dev at drivers/base/bus.c:310
[ 75.260624] bus_add_driver+0x161/0x260:
bus_add_driver at drivers/base/bus.c:668
[ 75.260849] ? 0xffffffffa0553000
[ 75.261042] driver_register+0x57/0xc0:
driver_register at drivers/base/driver.c:167
[ 75.261259] ? 0xffffffffa0553000
[ 75.261451] do_one_initcall+0x36/0x1bb:
do_one_initcall at init/main.c:883
[ 75.261676] ? _cond_resched+0x19/0x30:
_cond_resched at kernel/sched/core.c:4982
[ 75.261898] ? kmem_cache_alloc_trace+0x3e/0x1e0:
slab_pre_alloc_hook at mm/slab.h:423
(inlined by) slab_alloc_node at mm/slub.c:2667
(inlined by) slab_alloc at mm/slub.c:2749
(inlined by) kmem_cache_alloc_trace at mm/slub.c:2766
[ 75.262166] do_init_module+0x5b/0x1f8:
do_init_module at kernel/module.c:3459
[ 75.262379] load_module+0x1465/0x1b50:
load_module at kernel/module.c:3782
[ 75.262598] ? kernel_read_file+0x165/0x180:
kernel_read_file at fs/exec.c:946
[ 75.262838] ? __do_sys_finit_module+0xb7/0xd0:
__do_sys_finit_module at kernel/module.c:3876
[ 75.263093] __do_sys_finit_module+0xb7/0xd0:
__do_sys_finit_module at kernel/module.c:3876
[ 75.263338] do_syscall_64+0x5b/0x180:
do_syscall_64 at arch/x86/entry/common.c:287
[ 75.263554] entry_SYSCALL_64_after_hwframe+0x44/0xa9:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:247
[ 75.263843] RIP: 0033:0x7f0b388f8d49
[ 75.264046] RSP: 002b:00007ffe26652b48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 75.264473] RAX: ffffffffffffffda RBX: 000055c9b781efa0 RCX: 00007f0b388f8d49
[ 75.264871] RDX: 0000000000000000 RSI: 00007f0b392142d5 RDI: 0000000000000014
[ 75.265268] RBP: 00007f0b392142d5 R08: 0000000000000000 R09: 00007ffe266530c0
[ 75.265672] R10: 0000000000000014 R11: 0000000000000246 R12: 0000000000000000
[ 75.266074] R13: 000055c9b7833400 R14: 0000000000020000 R15: 000055c9b64d95b3
[ 75.266469] Code: 00 f7 c6 00 00 18 00 0f 84 d7 0a 00 00 85 c9 0f 94 c1 41 88 8c 24 8d 02 00 00 e9 f6 08 00 00 48 c7 c7 60 c5 4b a0 e8 b2 54 c4 e0 <0f> 0b f0 41 ff 0c 24 0f 88 7d 8e 04 00 75 08 4c 89 e7 e8 8b 89
[ 75.266657] Preparing to unpack .../libtext-charwidth-perl_0.04-7.1_amd64.deb ...
[ 75.267544] ---[ end trace 9612bbbc31991c71 ]---
[ 75.268238]
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months
[cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050
by Fengguang Wu
Hello,
FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.
It occurs in 5 out of 5 boots.
[ 6.524361] ledtrig-cpu: registered to indicate activity on CPUs
[ 6.527658] NET: Registered protocol family 4
[ 6.528191] comedi: version 0.7.76 - http://www.comedi.org
[ 6.528851] LNetError: 1:0:(module.c:546:libcfs_init()) misc_register: error -16
[ 7.220272] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[ 7.586283] BUG: unable to handle kernel NULL pointer dereference at 00000050
[ 7.586962] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
[ 7.587002] Oops: 0000 [#1] PREEMPT
[ 7.587002] CPU: 0 PID: 1 Comm: swapper Not tainted 4.17.0-rc1 #1
[ 7.587002] EIP: cfs_trace_lock_tcd+0xb/0xa0:
cfs_trace_lock_tcd at drivers/staging/lustre/lnet/libcfs/linux/linux-tracefile.c:149
[ 7.587002] EFLAGS: 00210246 CPU: 0
[ 7.587002] EAX: 00000000 EBX: 00000000 ECX: 81fcb588 EDX: 00000000
[ 7.587002] ESI: 00001800 EDI: 8f5d1e08 EBP: 8f5d1d7c ESP: 8f5d1d70
[ 7.587002] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 7.587002] CR0: 80050033 CR2: 00000050 CR3: 022f0000 CR4: 000006b0
[ 7.587002] Call Trace:
[ 7.587002] libcfs_debug_vmsg2+0x8f/0x82f:
libcfs_debug_vmsg2 at drivers/staging/lustre/lnet/libcfs/tracefile.c:317
[ 7.587002] ? trace_irq_enable_rcuidle+0x25/0x62:
static_key_false at include/linux/jump_label.h:206
(inlined by) trace_irq_enable_rcuidle at include/trace/events/preemptirq.h:40
[ 7.587002] ? slob_free+0x249/0x251:
slob_free at mm/slob.c:421
[ 7.587002] libcfs_debug_msg+0x19/0x1b:
libcfs_debug_msg at drivers/staging/lustre/lnet/libcfs/tracefile.c:287
[ 7.587002] ksocknal_startup+0xe77/0x12b2:
ksocknal_startup at drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c:2845 (discriminator 3)
[ 7.587002] ? lock_release+0x135/0x1ec:
lock_release at kernel/locking/lockdep.c:3942
[ 7.587002] ? _raw_spin_unlock+0x3c/0x4b:
__raw_spin_unlock at include/linux/spinlock_api_smp.h:152
(inlined by) _raw_spin_unlock at kernel/locking/spinlock.c:176
[ 7.587002] lnet_startup_lndni+0x4cd/0x9ec:
lnet_startup_lndni at drivers/staging/lustre/lnet/lnet/api-ni.c:1304
[ 7.587002] LNetNIInit+0x880/0xa00:
lnet_startup_lndnis at drivers/staging/lustre/lnet/lnet/api-ni.c:1385
(inlined by) LNetNIInit at drivers/staging/lustre/lnet/lnet/api-ni.c:1543
[ 7.587002] ? read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at include/linux/seqlock.h:222
read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at include/linux/seqlock.h:222
read_seqcount_retry+0x1b/0x22:
read_seqcount_retry at include/linux/seqlock.h:222
[ 7.587002] srpc_startup+0x84/0x381:
srpc_startup at drivers/staging/lustre/lnet/selftest/rpc.c:1613
[ 7.587002] lnet_selftest_init+0x2c4/0x5d9:
lnet_selftest_init at drivers/staging/lustre/lnet/selftest/module.c:134
[ 7.587002] ? lnet_selftest_exit+0x8d/0x8d:
lnet_selftest_init at drivers/staging/lustre/lnet/selftest/module.c:90
[ 7.587002] do_one_initcall+0x76/0x1d7:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) static_key_count at include/linux/jump_label.h:194
(inlined by) static_key_false at include/linux/jump_label.h:206
(inlined by) trace_initcall_finish at include/trace/events/initcall.h:44
(inlined by) do_one_initcall at init/main.c:884
[ 7.587002] ? do_early_param+0x78/0x78:
repair_env_string at init/main.c:250
[ 7.587002] ? rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
rcu_read_unlock_sched_notrace+0x1e/0x2d:
rcu_read_unlock_sched_notrace at include/linux/rcupdate.h:778
[ 7.587002] ? trace_initcall_level+0x3e/0x64:
trace_initcall_level at include/trace/events/initcall.h:10 (discriminator 18)
[ 7.587002] kernel_init_freeable+0xfc/0x177:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 7.587002] ? rest_init+0xbe/0xbe:
kernel_init at init/main.c:1050
[ 7.587002] kernel_init+0xd/0xd0:
kernel_init at init/main.c:1055
[ 7.587002] ret_from_fork+0x2e/0x38:
ret_from_fork at arch/x86/entry/entry_32.S:311
[ 7.587002] Code: 16 94 ff 8b 15 68 ab 03 82 f7 c2 00 00 0f 00 b8 02 00 00 00 75 08 31 c0 80 e6 ff 0f 95 c0 5d c3 55 89 e5 57 56 53 e8 99 16 94 ff <8b> 78 50 89 d6 6a 00 31 d2 89 c3 b8 50 f0 1b 82 66 83 ff 02 0f
[ 7.587002] EIP: cfs_trace_lock_tcd+0xb/0xa0:
cfs_trace_lock_tcd at drivers/staging/lustre/lnet/libcfs/linux/linux-tracefile.c:149 SS:ESP: 0068:8f5d1d70
[ 7.587002] CR2: 0000000000000050
[ 7.587002] ---[ end trace 4fbdcb515c61525e ]---
[ 7.587002] Kernel panic - not syncing: Fatal exception
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months
[console_unlock] BUG: KASAN: use-after-scope in console_unlock+0x9cd/0xd10
by Fengguang Wu
Hello,
FYI this happens in mainline kernel 4.17.0-rc1.
It at least dates back to v4.15-rc1 .
The regression was reported before
https://lkml.org/lkml/2017/11/30/33
Where the last message from Dmitry mentions that use-after-scope has
known false positives with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
If so, what would be the best way to workaround such false positives
in boot testing? Disable the above config?
0day bisects produce diverged results, with 2 of them converge to
commit d17a1d97dc ("x86/mm/kasan: don't use vmemmap_populate() to
initialize shadow") and 1 bisected to the earlier a4a3ede213 ("mm:
zero reserved and unavailable struct pages"). I'll send the bisect
reports in follow up emails.
This occurs in 6 out of 6 boots.
[ 0.001000] RCU CPU stall warnings timeout set to 100 (rcu_cpu_stall_timeout).
[ 0.001000] Tasks RCU enabled.
[ 0.001000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.001000] NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16
[ 0.001000] ==================================================================
[ 0.001000] BUG: KASAN: use-after-scope in console_unlock+0x9cd/0xd10:
console_unlock at kernel/printk/printk.c:2396
[ 0.001000] Write of size 1 at addr ffffffff84c07998 by task swapper/0
[ 0.001000]
[ 0.001000] CPU: 0 PID: 0 Comm: swapper Tainted: G T 4.17.0-rc1 #1
[ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 0.001000] Call Trace:
[ 0.001000] ? dump_stack+0x11a/0x20b:
dump_stack at lib/dump_stack.c:115
[ 0.001000] ? show_regs_print_info+0x5/0x5:
dump_stack at lib/dump_stack.c:89
[ 0.001000] ? do_raw_spin_trylock+0xee/0x190:
arch_atomic_cmpxchg at arch/x86/include/asm/atomic.h:191
(inlined by) atomic_cmpxchg at include/asm-generic/atomic-instrumented.h:58
(inlined by) queued_spin_trylock at include/asm-generic/qspinlock.h:72
(inlined by) do_raw_spin_trylock at kernel/locking/spinlock_debug.c:119
[ 0.001000] ? print_address_description+0x24/0x330:
print_address_description at mm/kasan/report.c:257
[ 0.001000] ? kasan_report+0x22a/0x380:
kasan_report_error at mm/kasan/report.c:355
(inlined by) kasan_report at mm/kasan/report.c:412
[ 0.001000] ? console_unlock+0x9cd/0xd10:
console_unlock at kernel/printk/printk.c:2396
[ 0.001000] ? lock_acquire+0x4d0/0x560:
get_current at arch/x86/include/asm/current.h:15
(inlined by) lock_acquire at kernel/locking/lockdep.c:3922
[ 0.001000] ? vprintk_emit+0x555/0x880:
console_trylock_spinning at kernel/printk/printk.c:1643
(inlined by) vprintk_emit at kernel/printk/printk.c:1906
[ 0.001000] ? wake_up_klogd+0x110/0x110:
console_unlock at kernel/printk/printk.c:2289
[ 0.001000] ? lock_release+0xf80/0xf80:
lock_acquire at kernel/locking/lockdep.c:3909
[ 0.001000] ? do_raw_spin_trylock+0x190/0x190:
do_raw_spin_unlock at kernel/locking/spinlock_debug.c:133
[ 0.001000] ? trace_hardirqs_on+0x3f0/0x400:
trace_hardirqs_on at kernel/trace/trace_irqsoff.c:795
[ 0.001000] ? vprintk_emit+0x555/0x880:
console_trylock_spinning at kernel/printk/printk.c:1643
(inlined by) vprintk_emit at kernel/printk/printk.c:1906
[ 0.001000] ? vprintk_emit+0x813/0x880:
__preempt_count_sub at arch/x86/include/asm/preempt.h:81
(inlined by) vprintk_emit at kernel/printk/printk.c:1908
[ 0.001000] ? console_unlock+0xd10/0xd10:
vprintk_emit at kernel/printk/printk.c:1830
[ 0.001000] ? memblock_add+0x163/0x163:
memblock_reserve at mm/memblock.c:716
[ 0.001000] ? lock_release+0xf23/0xf80:
lock_release at kernel/locking/lockdep.c:3929
[ 0.001000] ? memblock_virt_alloc_internal+0x191/0x2ef:
memblock_virt_alloc_internal at mm/memblock.c:1277 (discriminator 1)
[ 0.001000] ? memset+0x1f/0x40:
memset at mm/kasan/kasan.c:287
[ 0.001000] ? zero_pud_populate+0x5b1/0x936:
set_pmd at arch/x86/include/asm/paravirt.h:468
(inlined by) pmd_populate_kernel at arch/x86/include/asm/pgalloc.h:80
(inlined by) zero_pmd_populate at mm/kasan/kasan_init.c:76
(inlined by) zero_pud_populate at mm/kasan/kasan_init.c:109
[ 0.001000] ? printk+0x9c/0xc3:
printk at kernel/printk/printk.c:1975
[ 0.001000] ? kmsg_dump_rewind+0x134/0x134:
printk at kernel/printk/printk.c:1975
[ 0.001000] ? kasan_init+0x413/0x4af:
__flush_tlb_global at arch/x86/include/asm/paravirt.h:299
(inlined by) __flush_tlb_all at arch/x86/include/asm/tlbflush.h:433
(inlined by) kasan_init at arch/x86/mm/kasan_init_64.c:390
[ 0.001000] ? setup_arch+0x1fdf/0x225a:
setup_arch at arch/x86/kernel/setup.c:1216
[ 0.001000] ? reserve_standard_io_resources+0x88/0x88:
setup_arch at arch/x86/kernel/setup.c:816
[ 0.001000] ? debug_check_no_locks_freed+0x241/0x280:
debug_check_no_locks_freed at kernel/locking/lockdep.c:4422 (discriminator 1)
[ 0.001000] ? printk+0x9c/0xc3:
printk at kernel/printk/printk.c:1975
[ 0.001000] ? kmsg_dump_rewind+0x134/0x134:
printk at kernel/printk/printk.c:1975
[ 0.001000] ? do_device_not_available+0x60/0x60:
idt_setup_from_table at arch/x86/kernel/idt.c:220
[ 0.001000] ? start_kernel+0xf3/0xfd4:
add_latent_entropy at include/linux/random.h:26
(inlined by) start_kernel at init/main.c:556
[ 0.001000] ? early_idt_handler_common+0x3b/0x52:
early_idt_handler_common at arch/x86/kernel/head_64.S:327
[ 0.001000] ? mem_encrypt_init+0x33/0x33
[ 0.001000] ? memcpy_orig+0x54/0x110:
memcpy_orig at arch/x86/lib/memcpy_64.S:106
[ 0.001000] ? secondary_startup_64+0xa5/0xb0:
secondary_startup_64 at arch/x86/kernel/head_64.S:242
[ 0.001000]
[ 0.001000] Memory state around the buggy address:
[ 0.001000] ffffffff84c07880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 0.001000] ffffffff84c07900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1
[ 0.001000] >ffffffff84c07980: f1 f1 f1 f8 f2 f2 f2 f2 f2 f2 f2 01 f2 f2 f2 f2
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months
[string_selftest_init] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
by Fengguang Wu
Hi Geert,
FYI this happens in mainline kernel 4.17.0-rc1.
It dates back to v4.14-rc1 , perhaps since the test was introduced.
It occurs in 1 out of 1 boots.
[ 83.615461] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 246)
[ 83.627616] io scheduler noop registered
[ 83.634396] io scheduler deadline registered (default)
[ 83.643012] io scheduler mq-deadline registered (default)
[ 83.652509] io scheduler bfq registered
[ 108.580122] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
[ 108.580122] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #210
[ 108.580122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 108.580122] RIP: 0010:check_kcov_mode+0x24/0x37:
__read_once_size at include/linux/compiler.h:188 (discriminator 1)
(inlined by) check_kcov_mode at kernel/kcov.c:69 (discriminator 1)
[ 108.580122] RSP: 0000:ffff88000fd43e48 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 108.580122] RAX: 0000000000000000 RBX: ffff88000fd3e000 RCX: 0000000000000000
[ 108.580122] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
[ 108.580122] RBP: ffffffff853b9f96 R08: 00000000014000c0 R09: 00000000ffffffff
[ 108.580122] R10: ffff88000fd3e000 R11: 0000000000000002 R12: ffff88001a6c0458
[ 108.580122] R13: 0000000000000011 R14: ffff88001a6c0530 R15: 00000000a1a1a1a1
[ 108.580122] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
[ 108.580122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 108.580122] CR2: 0000000000000000 CR3: 0000000003c6a000 CR4: 00000000000006b0
[ 108.580122] Call Trace:
[ 108.580122] __sanitizer_cov_trace_pc+0x2b/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:101
[ 108.580122] string_selftest_init+0x25e/0x5c0:
memset32_selftest at lib/test_string.c:60
(inlined by) string_selftest_init at lib/test_string.c:124
[ 108.580122] ? prandom_reseed+0x52/0x52:
string_selftest_init at lib/test_string.c:115
[ 108.580122] do_one_initcall+0x18e/0x3df:
do_one_initcall at init/main.c:883
[ 108.580122] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 108.580122] ? __sanitizer_cov_trace_pc+0x2b/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:101
[ 108.580122] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 108.580122] kernel_init_freeable+0x250/0x3a5:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 108.580122] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 108.580122] kernel_init+0x17/0x218:
kernel_init at init/main.c:1053
[ 108.580122] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 108.580122] ret_from_fork+0x35/0x40:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 108.580122] Code: 41 5c 41 5d 41 5e c3 e8 e8 c6 a3 01 48 ff 05 71 33 fd 04 31 c0 65 8b 15 78 00 e5 7e 81 e2 00 01 1f 00 75 19 48 ff 05 61 33 fd 04 <8b> 86 18 1c 00 00 39 c7 0f 94 c0 48 ff 05 57 33 fd 04 c3 e8 b1
[ 108.580122] Kernel panic - not syncing: softlockup: hung tasks
[ 108.580122] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G L 4.17.0-rc1 #210
[ 108.580122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 108.580122] Call Trace:
[ 108.580122] <IRQ>
[ 108.580122] dump_stack+0xf3/0x135:
__dump_stack at lib/dump_stack.c:77
(inlined by) dump_stack at lib/dump_stack.c:113
[ 108.580122] panic+0x173/0x440:
panic at kernel/panic.c:195
[ 108.580122] watchdog_timer_fn+0x343/0x382:
watchdog_timer_fn at kernel/watchdog.c:444
[ 108.580122] __hrtimer_run_queues+0x2a0/0x4c3:
__run_hrtimer at kernel/time/hrtimer.c:1386
(inlined by) __hrtimer_run_queues at kernel/time/hrtimer.c:1448
[ 108.580122] hrtimer_run_queues+0xcb/0xed:
hrtimer_run_queues at kernel/time/hrtimer.c:1621
[ 108.580122] run_local_timers+0x2d/0x85:
run_local_timers at kernel/time/timer.c:1704
[ 108.580122] update_process_times+0x30/0x7e:
update_process_times at kernel/time/timer.c:1636
[ 108.580122] tick_nohz_handler+0x141/0x19a:
tick_sched_handle at kernel/time/tick-sched.c:173
(inlined by) tick_nohz_handler at kernel/time/tick-sched.c:1182
[ 108.580122] smp_apic_timer_interrupt+0x1a5/0x2c4:
local_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1025
(inlined by) smp_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1050
[ 108.580122] apic_timer_interrupt+0xf/0x20:
apic_timer_interrupt at arch/x86/entry/entry_64.S:863
[ 108.580122] </IRQ>
[ 108.580122] RIP: 0010:check_kcov_mode+0x24/0x37:
__read_once_size at include/linux/compiler.h:188 (discriminator 1)
(inlined by) check_kcov_mode at kernel/kcov.c:69 (discriminator 1)
[ 108.580122] RSP: 0000:ffff88000fd43e48 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 108.580122] RAX: 0000000000000000 RBX: ffff88000fd3e000 RCX: 0000000000000000
[ 108.580122] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
[ 108.580122] RBP: ffffffff853b9f96 R08: 00000000014000c0 R09: 00000000ffffffff
[ 108.580122] R10: ffff88000fd3e000 R11: 0000000000000002 R12: ffff88001a6c0458
[ 108.580122] R13: 0000000000000011 R14: ffff88001a6c0530 R15: 00000000a1a1a1a1
[ 108.580122] ? string_selftest_init+0x25e/0x5c0:
memset32_selftest at lib/test_string.c:60
(inlined by) string_selftest_init at lib/test_string.c:124
[ 108.580122] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 108.580122] __sanitizer_cov_trace_pc+0x2b/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:101
[ 108.580122] string_selftest_init+0x25e/0x5c0:
memset32_selftest at lib/test_string.c:60
(inlined by) string_selftest_init at lib/test_string.c:124
[ 108.580122] ? prandom_reseed+0x52/0x52:
string_selftest_init at lib/test_string.c:115
[ 108.580122] do_one_initcall+0x18e/0x3df:
do_one_initcall at init/main.c:883
[ 108.580122] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 108.580122] ? __sanitizer_cov_trace_pc+0x2b/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:101
[ 108.580122] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 108.580122] kernel_init_freeable+0x250/0x3a5:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 108.580122] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 108.580122] kernel_init+0x17/0x218:
kernel_init at init/main.c:1053
[ 108.580122] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 108.580122] ret_from_fork+0x35/0x40:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 108.580122] Kernel Offset: disabled
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months
WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830
by Fengguang Wu
Hello,
FYI this warning dates back to v4.16-rc5 . It's rather rare and often
happen together with other errors. For example,
[ 168.976238] perf: interrupt took too long (10016 > 9973), lowering kernel.perf_event_max_sample_rate to 19000
[ 171.793224] WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830
[ 225.573912] BUG: Bad page map in process sort pte:00000002 pmd:1b6303067
[ 225.574302] addr:00000000efb51519 vm_flags:00000070 anon_vma: (null) mapping:00000000c7c7d07a index:12f
[ 171.556542] perf: interrupt took too long (9849 > 9811), lowering kernel.perf_event_max_sample_rate to 20000
[ 172.667037] WARNING: stack going in the wrong direction? ip=sched_slice+0x51/0xa0
[ 350.325279] BUG: Bad page map in process wc pte:00000002 pmd:167b93067
[ 350.325595] addr:0000000006ece489 vm_flags:00000070 anon_vma: (null) mapping:00000000e8941173 index:1cd
[ 133.751073] WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830
[ 134.048965] perf: interrupt took too long (9682 > 9626), lowering kernel.perf_event_max_sample_rate to 20000
[ 134.472390] perf: interrupt took too long (12178 > 12102), lowering kernel.perf_event_max_sample_rate to 16000
[ 234.324541] 2018-04-17 16:08:50 umount /fs/pmem0
[ 234.324546]
[ 240.185400] WARNING: CPU: 0 PID: 6954 at kernel/workqueue.c:4142 destroy_workqueue+0x64/0x1e0
[ 174.376074] perf: interrupt took too long (7722 > 7721), lowering kernel.perf_event_max_sample_rate to 25000
[ 178.761072] WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830
[ 304.683193] usemem invoked oom-killer: gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[ 43.869050] perf: interrupt took too long (6180 > 6147), lowering kernel.perf_event_max_sample_rate to 32000
[ 48.272805] perf: interrupt took too long (7733 > 7725), lowering kernel.perf_event_max_sample_rate to 25000
[ 49.568211] WARNING: stack going in the wrong direction? ip=__slab_free+0x14b/0x2c0
[ 53.576116] perf: page allocation failure: order:2, mode:0x108c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO), nodemask=(null)
[ 168.465169] perf: interrupt took too long (5016 > 4992), lowering kernel.perf_event_max_sample_rate to 39000
[ 168.529886] perf: interrupt took too long (6301 > 6270), lowering kernel.perf_event_max_sample_rate to 31000
[ 168.657802] perf: interrupt took too long (7979 > 7876), lowering kernel.perf_event_max_sample_rate to 25000
[ 168.976238] perf: interrupt took too long (10016 > 9973), lowering kernel.perf_event_max_sample_rate to 19000
[ 171.793224] WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830:
perf_sw_event_sched at include/linux/perf_event.h:1062
(inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1100
(inlined by) prepare_task_switch at kernel/sched/core.c:2636
(inlined by) context_switch at kernel/sched/core.c:2813
(inlined by) __schedule at kernel/sched/core.c:3490
[ 225.573912] BUG: Bad page map in process sort pte:00000002 pmd:1b6303067
[ 225.574302] addr:00000000efb51519 vm_flags:00000070 anon_vma: (null) mapping:00000000c7c7d07a index:12f
[ 225.574820] file:libpthread-2.23.so fault:filemap_fault mmap:generic_file_mmap readpage:simple_readpage
[ 225.575327] CPU: 5 PID: 29228 Comm: sort Not tainted 4.17.0-rc1 #1
[ 225.575643] Hardware name: Dell Inc. Studio XPS 8000/0X231R, BIOS A01 08/11/2009
[ 225.576038] Call Trace:
[ 225.576207] dump_stack+0x5c/0x7b:
dump_stack at lib/dump_stack.c:115
[ 225.576409] print_bad_pte+0x1de/0x290:
print_bad_pte at mm/memory.c:776 (discriminator 12)
[ 225.576628] unmap_page_range+0x803/0xa20:
zap_pte_range at mm/memory.c:1384
(inlined by) zap_pmd_range at mm/memory.c:1441
(inlined by) zap_pud_range at mm/memory.c:1470
(inlined by) zap_p4d_range at mm/memory.c:1491
(inlined by) unmap_page_range at mm/memory.c:1512
[ 225.576855] unmap_vmas+0x4c/0xa0:
unmap_vmas at mm/memory.c:1586 (discriminator 3)
[ 225.577060] exit_mmap+0x82/0x150:
constant_test_bit at arch/x86/include/asm/bitops.h:328
(inlined by) mm_is_oom_victim at include/linux/oom.h:75
(inlined by) exit_mmap at mm/mmap.c:3040
[ 225.577263] mmput+0x67/0x160:
__mmput at kernel/fork.c:963
(inlined by) mmput at kernel/fork.c:983
[ 225.577453] do_exit+0x2a5/0xb80:
constant_test_bit at arch/x86/include/asm/bitops.h:328
(inlined by) test_ti_thread_flag at include/linux/thread_info.h:79
(inlined by) exit_mm at kernel/exit.c:545
(inlined by) do_exit at kernel/exit.c:852
[ 225.577652] ? __fput+0x18d/0x220:
__fput at fs/file_table.c:229
[ 225.577856] ? _cond_resched+0x19/0x30:
_cond_resched at kernel/sched/core.c:4982
[ 225.578078] do_group_exit+0x3a/0xa0:
__read_once_size at include/linux/compiler.h:188
(inlined by) list_empty at include/linux/list.h:203
(inlined by) thread_group_empty at include/linux/sched/signal.h:594
(inlined by) do_group_exit at kernel/exit.c:953
[ 225.578291] __x64_sys_exit_group+0x14/0x20:
__x64_sys_exit_group at kernel/exit.c:979
[ 225.578528] do_syscall_64+0x5b/0x180:
do_syscall_64 at arch/x86/entry/common.c:287
[ 225.578745] entry_SYSCALL_64_after_hwframe+0x44/0xa9:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:247
[ 225.579017] RIP: 0033:0x7f36a8f501c8
[ 225.579230] RSP: 002b:00007ffcf2d32458 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 225.579635] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f36a8f501c8
[ 225.579988] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 225.580343] RBP: 00007f36a92308e0 R08: 00000000000000e7 R09: ffffffffffffff98
[ 225.580697] R10: 00007f36a9452250 R11: 0000000000000246 R12: 00007f36a92308e0
[ 225.581046] R13: 00007f36a9235c40 R14: 0000000000000000 R15: 0000000000000000
[ 225.581429] Disabling lock debugging due to kernel taint
[ 225.581791] BUG: Bad rss-counter state mm:00000000997e66fa idx:2 val:-1
[ 225.602808] general protection fault: 0000 [#1] SMP PTI
[ 225.603092] Modules linked in: netconsole sr_mod cdrom sd_mod sg snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic snd_hda_codec_hdmi coretemp uas snd_hda_intel kvm_intel ata_generic dcdbas pata_acpi snd_hda_codec dell_smm_hwmon snd_hda_core kvm snd_hwdep snd_pcm firewire_ohci irqbypass crc32c_intel usb_storage pcspkr snd_timer serio_raw firewire_core ata_piix crc_itu_t snd i7core_edac soundcore libata shpchp acpi_cpufreq ip_tables broadcom bcm_phy_lib
[ 225.604945] CPU: 3 PID: 29384 Comm: tee Tainted: G B 4.17.0-rc1 #1
[ 225.605370] Hardware name: Dell Inc. Studio XPS 8000/0X231R, BIOS A01 08/11/2009
[ 225.606395] RIP: 0010:kmem_cache_alloc+0xa0/0x1e0:
prefetch_freepointer at mm/slub.c:275
(inlined by) slab_alloc_node at mm/slub.c:2734
(inlined by) slab_alloc at mm/slub.c:2749
(inlined by) kmem_cache_alloc at mm/slub.c:2754
[ 225.606648] RSP: 0018:ffffc90003f3bc68 EFLAGS: 00010202
[ 225.606927] RAX: 0000000000000000 RBX: 0003ffff88018f94 RCX: 00000000004d265b
[ 225.607312] RDX: 00000000004d265a RSI: 00000000014080c0 RDI: 0000000000027360
[ 225.607665] RBP: ffff88018f94dff2 R08: ffff8801bfce7360 R09: ffff8801b7c44100
[ 225.608018] R10: ffffc90003f3bef0 R11: 8080808080808080 R12: 00000000014080c0
[ 225.608372] R13: ffffffff813b13f2 R14: ffff8801bf15ac00 R15: ffff8801bf15ac00
[ 225.608726] FS: 00007fcfb09a3700(0000) GS:ffff8801bfcc0000(0000) knlGS:0000000000000000
[ 225.609151] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 225.609450] CR2: 00007fcfb082a000 CR3: 00000001b7cf6000 CR4: 00000000000006e0
[ 225.609803] Call Trace:
[ 225.609975] selinux_file_alloc_security+0x32/0x50:
file_alloc_security at security/selinux/hooks.c:392
(inlined by) selinux_file_alloc_security at security/selinux/hooks.c:3549
[ 225.610239] security_file_alloc+0x22/0x40:
security_file_alloc at security/security.c:879 (discriminator 19)
[ 225.610474] get_empty_filp+0x8d/0x1b0:
get_empty_filp at fs/file_table.c:129
[ 225.610694] path_openat+0x2d/0x1710:
path_openat at fs/namei.c:3478
[ 225.610907] ? page_add_file_rmap+0x13/0x200:
page_add_file_rmap at mm/rmap.c:1184
[ 225.611149] ? alloc_set_pte+0x25e/0x520:
set_pte_at at arch/x86/include/asm/paravirt.h:458
(inlined by) alloc_set_pte at mm/memory.c:3449
[ 225.611376] ? filemap_map_pages+0x30a/0x320:
filemap_map_pages at mm/filemap.c:2681
[ 225.611616] do_filp_open+0x8c/0xf0:
do_filp_open at fs/namei.c:3536
[ 225.611825] ? __handle_mm_fault+0xd69/0x10a0:
do_fault_around at mm/memory.c:3611
(inlined by) do_read_fault at mm/memory.c:3627
(inlined by) do_fault at mm/memory.c:3732
(inlined by) handle_pte_fault at mm/memory.c:3963
(inlined by) __handle_mm_fault at mm/memory.c:4087
[ 225.612112] ? _cond_resched+0x19/0x30:
_cond_resched at kernel/sched/core.c:4982
[ 225.612331] ? __alloc_fd+0x44/0x180:
__alloc_fd at fs/file.c:505
[ 225.612542] ? do_sys_open+0x1a6/0x230:
do_sys_open at fs/open.c:1094
[ 225.612758] do_sys_open+0x1a6/0x230:
do_sys_open at fs/open.c:1094
[ 225.612971] do_syscall_64+0x5b/0x180:
do_syscall_64 at arch/x86/entry/common.c:287
[ 225.613187] entry_SYSCALL_64_after_hwframe+0x44/0xa9:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:247
[ 225.613454] RIP: 0033:0x7fcfb041191c
[ 225.613662] RSP: 002b:00007fff8fb09f40 EFLAGS: 00000202 ORIG_RAX: 0000000000000002
[ 225.614060] RAX: ffffffffffffffda RBX: 0000000000aca200 RCX: 00007fcfb041191c
[ 225.614407] RDX: 0000000000000001 RSI: 0000000000080000 RDI: 0000000000aca1d0
[ 225.614756] RBP: 00007fff8fb0a020 R08: 0000000000aca160 R09: 0000000000000300
[ 225.615102] R10: 00007fcfb04645b0 R11: 0000000000000202 R12: 0000000000000000
[ 225.615449] R13: 0000000000000000 R14: 00007fcfb05512a0 R15: 0000000000000002
[ 225.615797] Code: 01 00 00 41 8b 46 20 49 8b 3e 48 8d 4a 01 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ba 48 85 db 74 0b 41 8b 46 20 <48> 8b 04 03 0f 18 08 41 f7 c4 00 80 00 00 0f 85 08 01 00 00 66
[ 225.616765] RIP: kmem_cache_alloc+0xa0/0x1e0:
prefetch_freepointer at mm/slub.c:275
(inlined by) slab_alloc_node at mm/slub.c:2734
(inlined by) slab_alloc at mm/slub.c:2749
(inlined by) kmem_cache_alloc at mm/slub.c:2754 RSP: ffffc90003f3bc68
[ 225.617102] ---[ end trace 8032327fef00e4ff ]---
[ 225.617353] Kernel panic - not syncing: Fatal exception
Attached the full dmesg and kconfig.
Thanks,
Fengguang
4 years, 2 months
[btrfs_destroy_workqueue] WARNING: CPU: 0 PID: 6954 at kernel/workqueue.c:4142 destroy_workqueue+0x64/0x1e0
by Fengguang Wu
Hello,
FYI this happens in mainline kernel 4.17.0-rc1.
It at least dates back to v4.14-rc2 .
It's triggered when running fio tests. It's really hard to reproduce
(only happened once in 4.17-rc1 and several times in v4.14-rc2) and
all bisects failed so far.
[ 133.751073] WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830:
perf_sw_event_sched at include/linux/perf_event.h:1062
(inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1100
(inlined by) prepare_task_switch at kernel/sched/core.c:2636
(inlined by) context_switch at kernel/sched/core.c:2813
(inlined by) __schedule at kernel/sched/core.c:3490
[ 134.048965] perf: interrupt took too long (9682 > 9626), lowering kernel.perf_event_max_sample_rate to 20000
[ 134.472390] perf: interrupt took too long (12178 > 12102), lowering kernel.perf_event_max_sample_rate to 16000
[ 234.324541] 2018-04-17 16:08:50 umount /fs/pmem0
[ 234.324546]
[ 240.185400] WARNING: CPU: 0 PID: 6954 at kernel/workqueue.c:4142 destroy_workqueue+0x64/0x1e0:
destroy_workqueue at kernel/workqueue.c:4142 (discriminator 1)
[ 240.197915] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod sr_mod cdrom intel_rapl sd_mod sg sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 kvm_intel ttm kvm irqbypass crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c_intel ghash_clmulni_intel syscopyarea nd_pmem(O) dax_pmem(O) snd_pcm pcbc sysfillrect device_dax(O) nd_btt(O) snd_timer sysimgblt aesni_intel fb_sys_fops nd_e820(O) crypto_simd ipmi_si libnvdimm(O) snd soundcore ahci mxm_wmi cryptd ipmi_devintf wdat_wdt dcdbas nfit_test_iomap(O) libahci pcspkr drm megaraid_sas glue_helper libata ipmi_msghandler wmi acpi_power_meter shpchp ip_tables
[ 240.267813] CPU: 0 PID: 6954 Comm: umount Tainted: G O 4.17.0-rc1 #1
[ 240.277473] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.1.7 06/16/2016
[ 240.286967] RIP: 0010:destroy_workqueue+0x64/0x1e0:
destroy_workqueue at kernel/workqueue.c:4142 (discriminator 1)
[ 240.293463] RSP: 0018:ffffc90021b0fde0 EFLAGS: 00010202
[ 240.300462] RAX: ffff884072f4c058 RBX: ffff88407a17dc00 RCX: ffff884072f4c000
[ 240.309628] RDX: ffff884072f4c058 RSI: 0000000000000000 RDI: ffffffff820cfa30
[ 240.318802] RBP: ffff88407a17dc20 R08: ffffc90021b0fd40 R09: 0000000000000000
[ 240.327990] R10: ffffc90021b0fdb8 R11: 0000000000000000 R12: ffff8820347f4fc0
[ 240.337306] R13: ffff884079c23138 R14: ffffffff82d47aa0 R15: 0000000000000000
[ 240.346546] FS: 00007f941fee2840(0000) GS:ffff882067600000(0000) knlGS:0000000000000000
[ 240.356876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 240.364602] CR2: 00007ff793645258 CR3: 0000004074f4a005 CR4: 00000000001606f0
[ 240.373917] Call Trace:
[ 240.378060] btrfs_destroy_workqueue+0x40/0x110 [btrfs]
[ 240.385322] btrfs_stop_all_workers+0x2d/0xf0 [btrfs]
[ 240.392397] close_ctree+0x133/0x2f0 [btrfs]
[ 240.398581] generic_shutdown_super+0x6c/0x120:
__read_once_size at include/linux/compiler.h:188
(inlined by) list_empty at include/linux/list.h:203
(inlined by) generic_shutdown_super at fs/super.c:442
[ 240.404956] kill_anon_super+0xe/0x20:
kill_anon_super at fs/super.c:1038
[ 240.410482] btrfs_kill_super+0x13/0x100 [btrfs]
[ 240.417076] deactivate_locked_super+0x3f/0x70:
deactivate_locked_super at fs/super.c:320
[ 240.423483] cleanup_mnt+0x3b/0x70:
cleanup_mnt at fs/namespace.c:1174
[ 240.428737] task_work_run+0xa3/0xe0:
task_work_run at kernel/task_work.c:115 (discriminator 1)
[ 240.434199] exit_to_usermode_loop+0x9e/0xa0:
tracehook_notify_resume at include/linux/tracehook.h:191
(inlined by) exit_to_usermode_loop at arch/x86/entry/common.c:166
[ 240.440456] do_syscall_64+0x16c/0x180:
prepare_exit_to_usermode at arch/x86/entry/common.c:196
(inlined by) syscall_return_slowpath at arch/x86/entry/common.c:265
(inlined by) do_syscall_64 at arch/x86/entry/common.c:290
[ 240.446133] entry_SYSCALL_64_after_hwframe+0x44/0xa9:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:247
[ 240.453285] RIP: 0033:0x7f941f7c7277
[ 240.458782] RSP: 002b:00007fffb9a17cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 240.468778] RAX: 0000000000000000 RBX: 000000000158b6e0 RCX: 00007f941f7c7277
[ 240.478299] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000158b8c0
[ 240.487824] RBP: 000000000158b8c0 R08: 0000000000000000 R09: 0000000000000015
[ 240.497317] R10: 00000000000006b0 R11: 0000000000000246 R12: 00007f941fcc9e44
[ 240.506810] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fffb9a17f80
[ 240.516315] Code: c2 74 19 8b 30 85 f6 74 f1 0f 0b 48 89 ef e8 84 9f 8c 00 5b 5d 41 5c e9 cb fa ff ff 48 39 8b a0 00 00 00 74 0a 83 79 18 01 7e 04 <0f> 0b eb dc 8b 41 58 85 c0 0f 85 3d 01 00 00 48 8b 41 60 48 8d
[ 240.540655] ---[ end trace faf649c5bf432714 ]---
[ 240.547594] Showing busy workqueues and worker pools:
Attached the full dmesg and kconfig.
Thanks,
Fengguang
4 years, 2 months
[test_parman_move] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]
by Fengguang Wu
Hello,
FYI this happens in mainline kernel 4.15.0-rc1.
It at least dates back to v4.12 .
It occurs in 2 out of 2 boots.
[ 95.347718] test passed
[ 95.359028] test_printf: all 260 tests passed
[ 96.656859] test_bitmap: test 13: input is '0-2047:128/256' OK, Time: 3332
[ 96.770961] test_bitmap: all 460506 tests passed
[ 96.783469] test_uuid: all 18 tests passed
[ 124.230111] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]
[ 124.230111] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc1 #360
[ 124.230111] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 124.230111] task: d3430000 task.stack: d342c000
[ 124.230111] EIP: test_parman_move+0xe/0xb0:
test_parman_move at lib/test_parman.c:119
[ 124.230111] EFLAGS: 00200212 CPU: 0
[ 124.230111] EAX: d0c00000 EBX: 00000004 ECX: 00000930 EDX: 0000090c
[ 124.230111] ESI: d0740324 EDI: d359f680 EBP: d342dec4 ESP: d342debc
[ 124.230111] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 124.230111] CR0: 80050033 CR2: 00000000 CR3: 02b2b000 CR4: 000006b0
[ 124.230111] Call Trace:
[ 124.230111] parman_lsort_item_add+0x9d/0x190:
__list_del_entry at include/linux/list.h:117
(inlined by) list_move_tail at include/linux/list.h:182
(inlined by) parman_prio_shift_down at lib/parman.c:144
(inlined by) parman_lsort_item_add at lib/parman.c:200
[ 124.230111] parman_item_add+0x9/0x10:
parman_item_add at lib/parman.c:356
[ 124.230111] test_parman_init+0x358/0x3b5:
test_parman_run at lib/test_parman.c:284
(inlined by) test_parman_lsort at lib/test_parman.c:369
(inlined by) test_parman_init at lib/test_parman.c:383
[ 124.230111] ? test_uuid_init+0x1a9/0x1a9:
test_parman_init at lib/test_parman.c:382
[ 124.230111] do_one_initcall+0x7c/0x10e:
do_one_initcall at init/main.c:826
[ 124.230111] ? kernel_init_freeable+0xc3/0x15b:
do_initcall_level at init/main.c:891
(inlined by) do_initcalls at init/main.c:900
(inlined by) do_basic_setup at init/main.c:918
(inlined by) kernel_init_freeable at init/main.c:1066
[ 124.230111] kernel_init_freeable+0xe3/0x15b:
do_initcall_level at init/main.c:892
(inlined by) do_initcalls at init/main.c:900
(inlined by) do_basic_setup at init/main.c:918
(inlined by) kernel_init_freeable at init/main.c:1066
[ 124.230111] ? rest_init+0xc0/0xc0:
kernel_init at init/main.c:990
[ 124.230111] kernel_init+0xb/0xf0:
kernel_init at init/main.c:995
[ 124.230111] ? schedule_tail_wrapper+0x9/0xc:
schedule_tail_wrapper at arch/x86/entry/entry_32.S:267
[ 124.230111] ret_from_fork+0x19/0x24:
ret_from_fork at arch/x86/entry/entry_32.S:299
[ 124.230111] Code: e8 48 5f c2 00 5d 85 c0 0f 94 c0 0f b6 c0 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 57 53 8b 5d 08 8b 40 04 c1 e3 02 <8d> 3c 90 89 fa 8d 04 88 89 d9 e8 a3 be c2 00 83 fb 08 89 da 73
[ 124.230111] Kernel panic - not syncing: softlockup: hung tasks
[ 124.230111] CPU: 0 PID: 1 Comm: swapper Tainted: G L 4.15.0-rc1 #360
[ 124.230111] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 124.230111] Call Trace:
[ 124.230111] dump_stack+0x16/0x19:
dump_stack at lib/dump_stack.c:64
[ 124.230111] panic+0x86/0x1bd:
panic at kernel/panic.c:194
[ 124.230111] watchdog_timer_fn+0x1e7/0x1f0:
watchdog_timer_fn at kernel/watchdog.c:443
[ 124.230111] ? watchdog+0x20/0x20:
watchdog_timer_fn at kernel/watchdog.c:340
[ 124.230111] hrtimer_run_queues+0xf3/0x1d0:
__run_hrtimer at kernel/time/hrtimer.c:1211
(inlined by) __hrtimer_run_queues at kernel/time/hrtimer.c:1275
(inlined by) hrtimer_run_queues at kernel/time/hrtimer.c:1414
[ 124.230111] run_local_timers+0x8/0x40:
run_local_timers at kernel/time/timer.c:1700
[ 124.230111] update_process_times+0x22/0x50:
update_process_times at kernel/time/timer.c:1630
[ 124.230111] tick_nohz_handler+0x66/0x100:
tick_nohz_handler at kernel/time/tick-sched.c:1068
[ 124.230111] smp_apic_timer_interrupt+0x40/0x80:
exiting_irq at arch/x86/include/asm/apic.h:540
(inlined by) smp_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1052
[ 124.230111] ? irq_exit+0x2f/0xb0:
tick_irq_exit at kernel/softirq.c:385
(inlined by) irq_exit at kernel/softirq.c:407
[ 124.230111] apic_timer_interrupt+0x35/0x3c:
apic_timer_interrupt at arch/x86/include/asm/entry_arch.h:36
[ 124.230111] EIP: test_parman_move+0xe/0xb0:
test_parman_move at lib/test_parman.c:119
[ 124.230111] EFLAGS: 00200212 CPU: 0
[ 124.230111] EAX: d0c00000 EBX: 00000004 ECX: 00000930 EDX: 0000090c
[ 124.230111] ESI: d0740324 EDI: d359f680 EBP: d342dec4 ESP: d342debc
[ 124.230111] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 124.230111] parman_lsort_item_add+0x9d/0x190:
__list_del_entry at include/linux/list.h:117
(inlined by) list_move_tail at include/linux/list.h:182
(inlined by) parman_prio_shift_down at lib/parman.c:144
(inlined by) parman_lsort_item_add at lib/parman.c:200
[ 124.230111] parman_item_add+0x9/0x10:
parman_item_add at lib/parman.c:356
[ 124.230111] test_parman_init+0x358/0x3b5:
test_parman_run at lib/test_parman.c:284
(inlined by) test_parman_lsort at lib/test_parman.c:369
(inlined by) test_parman_init at lib/test_parman.c:383
[ 124.230111] ? test_uuid_init+0x1a9/0x1a9:
test_parman_init at lib/test_parman.c:382
[ 124.230111] do_one_initcall+0x7c/0x10e:
do_one_initcall at init/main.c:826
[ 124.230111] ? kernel_init_freeable+0xc3/0x15b:
do_initcall_level at init/main.c:891
(inlined by) do_initcalls at init/main.c:900
(inlined by) do_basic_setup at init/main.c:918
(inlined by) kernel_init_freeable at init/main.c:1066
[ 124.230111] kernel_init_freeable+0xe3/0x15b:
do_initcall_level at init/main.c:892
(inlined by) do_initcalls at init/main.c:900
(inlined by) do_basic_setup at init/main.c:918
(inlined by) kernel_init_freeable at init/main.c:1066
[ 124.230111] ? rest_init+0xc0/0xc0:
kernel_init at init/main.c:990
[ 124.230111] kernel_init+0xb/0xf0:
kernel_init at init/main.c:995
[ 124.230111] ? schedule_tail_wrapper+0x9/0xc:
schedule_tail_wrapper at arch/x86/entry/entry_32.S:267
[ 124.230111] ret_from_fork+0x19/0x24:
ret_from_fork at arch/x86/entry/entry_32.S:299
[ 124.230111] Kernel Offset: disabled
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months
[drm_wait_one_vblank] WARNING: CPU: 0 PID: 269 at drivers/gpu/drm/drm_vblank.c:1084 drm_wait_one_vblank+0x160/0x170 [drm]
by Fengguang Wu
Hello,
FYI this happens in mainline kernel and at least dates back to v4.13 .
It has been reported before, well sorry, if cannot be fixed or
work-around, it'll keep popping up in our tests and consume our time.
The last resort would be to blacklist this error from future 0day
reports.
[ 13.867832] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 13.874320] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 14.037671] fbcon: inteldrmfb (fb0) is primary device
[ 14.176079] ------------[ cut here ]------------
[ 14.176081] vblank wait timed out on crtc 0
[ 14.176118] WARNING: CPU: 0 PID: 269 at drivers/gpu/drm/drm_vblank.c:1084 drm_wait_one_vblank+0x160/0x170 [drm]
[ 14.176119] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm_intel i915 kvm snd_hda_intel irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec crc32c_intel ghash_clmulni_intel drm_kms_helper pcbc ahci syscopyarea snd_hda_core aesni_intel libahci snd_hwdep crypto_simd sysfillrect sysimgblt fb_sys_fops dcdbas snd_pcm cryptd glue_helper libata pcspkr snd_timer shpchp drm snd soundcore video ip_tables
[ 14.176139] CPU: 0 PID: 269 Comm: kworker/u16:7 Not tainted 4.17.0-rc1 #1
[ 14.176140] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
[ 14.176144] Workqueue: events_unbound async_run_entry_fn
[ 14.176154] RIP: 0010:drm_wait_one_vblank+0x160/0x170 [drm]
[ 14.176155] RSP: 0018:ffffc900013cf918 EFLAGS: 00010296
[ 14.176157] RAX: 000000000000001f RBX: ffff88021bfe8000 RCX: 0000000000000000
[ 14.176157] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000202
[ 14.176158] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001f
[ 14.176159] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 14.176160] R13: 0000000000000002 R14: ffff88021b98a008 R15: ffff88021b64fc00
[ 14.176161] FS: 0000000000000000(0000) GS:ffff88021ea00000(0000) knlGS:0000000000000000
[ 14.176162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.176163] CR2: 0000557ac4b38e50 CR3: 000000021d40a002 CR4: 00000000001606f0
[ 14.176164] Call Trace:
[ 14.176171] ? finish_wait+0x80/0x80:
autoremove_wake_function at kernel/sched/wait.c:372
[ 14.176216] hsw_enable_crt+0x3d/0x90 [i915]
[ 14.176253] intel_encoders_enable+0x55/0x80 [i915]
[ 14.176285] haswell_crtc_enable+0x2a2/0x7f0 [i915]
[ 14.176315] intel_update_crtc+0x39/0x90 [i915]
[ 14.176343] intel_update_crtcs+0x47/0x60 [i915]
[ 14.176369] intel_atomic_commit_tail+0x1d2/0xc50 [i915]
[ 14.176394] intel_atomic_commit+0x215/0x2f0 [i915]
[ 14.176402] restore_fbdev_mode_atomic+0x19c/0x1f0 [drm_kms_helper]
[ 14.176409] drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
[ 14.176414] drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
[ 14.176446] intel_fbdev_set_par+0x16/0x60 [i915]
[ 14.176450] fbcon_init+0x4d7/0x680:
fbcon_init at drivers/video/fbdev/core/fbcon.c:1138
[ 14.176453] visual_init+0xd5/0x130:
visual_init at drivers/tty/vt/vt.c:748
[ 14.176456] do_bind_con_driver+0x1f4/0x400:
do_bind_con_driver at drivers/tty/vt/vt.c:3184
[ 14.176458] do_take_over_console+0x7b/0x190:
do_take_over_console at drivers/tty/vt/vt.c:3744
[ 14.176460] do_fbcon_takeover+0x58/0xb0:
do_fbcon_takeover at drivers/video/fbdev/core/fbcon.c:547
[ 14.176462] notifier_call_chain+0x47/0x70:
notifier_call_chain at kernel/notifier.c:95 (discriminator 1)
[ 14.176464] blocking_notifier_call_chain+0x3e/0x60:
__blocking_notifier_call_chain at kernel/notifier.c:319
(inlined by) blocking_notifier_call_chain at kernel/notifier.c:328
[ 14.176467] ? _cond_resched+0x19/0x30:
_cond_resched at kernel/sched/core.c:4982
[ 14.176469] register_framebuffer+0x224/0x330:
unlock_fb_info at include/linux/fb.h:657
(inlined by) do_register_framebuffer at drivers/video/fbdev/core/fbmem.c:1713
(inlined by) register_framebuffer at drivers/video/fbdev/core/fbmem.c:1806
[ 14.176475] __drm_fb_helper_initial_config_and_unlock+0x209/0x430 [drm_kms_helper]
[ 14.176505] intel_fbdev_initial_config+0x14/0x30 [i915]
[ 14.176508] async_run_entry_fn+0x39/0x160:
async_run_entry_fn at kernel/async.c:128
[ 14.176510] process_one_work+0x182/0x3a0:
arch_static_branch at kernel/workqueue.c:2145
(inlined by) static_key_false at include/linux/jump_label.h:142
(inlined by) trace_workqueue_execute_end at include/trace/events/workqueue.h:112
(inlined by) process_one_work at kernel/workqueue.c:2150
[ 14.176511] worker_thread+0x3a/0x3b0:
__read_once_size at include/linux/compiler.h:188
(inlined by) list_empty at include/linux/list.h:203
(inlined by) worker_thread at kernel/workqueue.c:2280
[ 14.176513] ? process_one_work+0x3a0/0x3a0:
worker_thread at kernel/workqueue.c:2222
[ 14.176515] kthread+0x11e/0x140:
kthread at kernel/kthread.c:238
[ 14.176517] ? kthread_associate_blkcg+0xb0/0xb0:
kthread at kernel/kthread.c:198
[ 14.176520] ret_from_fork+0x35/0x40:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 14.176522] Code: fc e0 0f 0b e9 39 ff ff ff 48 89 e6 4c 89 f7 e8 f7 79 01 e1 45 85 e4 0f 85 1b ff ff ff 89 ee 48 c7 c7 c8 23 0d a0 e8 40 ea fc e0 <0f> 0b e9 06 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 14.176545] ---[ end trace 25747e44b68a53bc ]---
[ 14.280077] ------------[ cut here ]------------
[ 14.280077] ------------[ cut here ]------------
[ 14.280079] vblank wait timed out on crtc 0
[ 14.280119] WARNING: CPU: 0 PID: 269 at drivers/gpu/drm/drm_vblank.c:1084 drm_wait_one_vblank+0x160/0x170 [drm]
[ 14.280119] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm_intel i915 kvm snd_hda_intel irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec crc32c_intel ghash_clmulni_intel drm_kms_helper pcbc ahci syscopyarea snd_hda_core aesni_intel libahci snd_hwdep crypto_simd sysfillrect sysimgblt fb_sys_fops dcdbas snd_pcm cryptd glue_helper libata pcspkr snd_timer shpchp drm snd soundcore video ip_tables
[ 14.280140] CPU: 0 PID: 269 Comm: kworker/u16:7 Tainted: G W 4.17.0-rc1 #1
[ 14.280141] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
[ 14.280145] Workqueue: events_unbound async_run_entry_fn
[ 14.280156] RIP: 0010:drm_wait_one_vblank+0x160/0x170 [drm]
[ 14.280157] RSP: 0018:ffffc900013cf918 EFLAGS: 00010296
[ 14.280159] RAX: 000000000000001f RBX: ffff88021bfe8000 RCX: 0000000000000000
[ 14.280159] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000202
[ 14.280160] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001f
[ 14.280161] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 14.280162] R13: 0000000000000002 R14: ffff88021b98a008 R15: ffff88021b64fc00
[ 14.280163] FS: 0000000000000000(0000) GS:ffff88021ea00000(0000) knlGS:0000000000000000
[ 14.280164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.280165] CR2: 0000557ac4b38e50 CR3: 000000021d40a002 CR4: 00000000001606f0
[ 14.280166] Call Trace:
[ 14.280174] ? finish_wait+0x80/0x80:
autoremove_wake_function at kernel/sched/wait.c:372
[ 14.280226] hsw_enable_crt+0x47/0x90 [i915]
[ 14.280265] intel_encoders_enable+0x55/0x80 [i915]
[ 14.280300] haswell_crtc_enable+0x2a2/0x7f0 [i915]
[ 14.280331] intel_update_crtc+0x39/0x90 [i915]
[ 14.280360] intel_update_crtcs+0x47/0x60 [i915]
[ 14.280386] intel_atomic_commit_tail+0x1d2/0xc50 [i915]
[ 14.280413] intel_atomic_commit+0x215/0x2f0 [i915]
[ 14.280421] restore_fbdev_mode_atomic+0x19c/0x1f0 [drm_kms_helper]
[ 14.280428] drm_fb_helper_restore_fbdev_mode_unlocked+0x45/0x90 [drm_kms_helper]
[ 14.280434] drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
[ 14.280468] intel_fbdev_set_par+0x16/0x60 [i915]
[ 14.280471] fbcon_init+0x4d7/0x680:
fbcon_init at drivers/video/fbdev/core/fbcon.c:1138
[ 14.280475] visual_init+0xd5/0x130:
visual_init at drivers/tty/vt/vt.c:748
[ 14.280478] do_bind_con_driver+0x1f4/0x400:
do_bind_con_driver at drivers/tty/vt/vt.c:3184
[ 14.280480] do_take_over_console+0x7b/0x190:
do_take_over_console at drivers/tty/vt/vt.c:3744
[ 14.280482] do_fbcon_takeover+0x58/0xb0:
do_fbcon_takeover at drivers/video/fbdev/core/fbcon.c:547
[ 14.280484] notifier_call_chain+0x47/0x70:
notifier_call_chain at kernel/notifier.c:95 (discriminator 1)
[ 14.280486] blocking_notifier_call_chain+0x3e/0x60:
__blocking_notifier_call_chain at kernel/notifier.c:319
(inlined by) blocking_notifier_call_chain at kernel/notifier.c:328
[ 14.280490] ? _cond_resched+0x19/0x30:
_cond_resched at kernel/sched/core.c:4982
[ 14.280492] register_framebuffer+0x224/0x330:
unlock_fb_info at include/linux/fb.h:657
(inlined by) do_register_framebuffer at drivers/video/fbdev/core/fbmem.c:1713
(inlined by) register_framebuffer at drivers/video/fbdev/core/fbmem.c:1806
[ 14.280499] __drm_fb_helper_initial_config_and_unlock+0x209/0x430 [drm_kms_helper]
[ 14.280530] intel_fbdev_initial_config+0x14/0x30 [i915]
[ 14.280533] async_run_entry_fn+0x39/0x160:
async_run_entry_fn at kernel/async.c:128
[ 14.280535] process_one_work+0x182/0x3a0:
arch_static_branch at kernel/workqueue.c:2145
(inlined by) static_key_false at include/linux/jump_label.h:142
(inlined by) trace_workqueue_execute_end at include/trace/events/workqueue.h:112
(inlined by) process_one_work at kernel/workqueue.c:2150
[ 14.280537] worker_thread+0x3a/0x3b0:
__read_once_size at include/linux/compiler.h:188
(inlined by) list_empty at include/linux/list.h:203
(inlined by) worker_thread at kernel/workqueue.c:2280
[ 14.280539] ? process_one_work+0x3a0/0x3a0:
worker_thread at kernel/workqueue.c:2222
[ 14.280541] kthread+0x11e/0x140:
kthread at kernel/kthread.c:238
[ 14.280543] ? kthread_associate_blkcg+0xb0/0xb0:
kthread at kernel/kthread.c:198
[ 14.280546] ret_from_fork+0x35/0x40:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 14.280548] Code: fc e0 0f 0b e9 39 ff ff ff 48 89 e6 4c 89 f7 e8 f7 79 01 e1 45 85 e4 0f 85 1b ff ff ff 89 ee 48 c7 c7 c8 23 0d a0 e8 40 ea fc e0 <0f> 0b e9 06 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 14.280571] ---[ end trace 25747e44b68a53bd ]---
[ 16.741085] ata1: link is slow to respond, please be patient (ready=0)
Attached the full dmesg, kconfig and reproduce scripts.
Thanks,
Fengguang
4 years, 2 months