Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: f2404ec06294324cf106a3aac9635756bc2be6ab ("locking/qspinlock: Introduce CNA
into the slow path of qspinlock")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git locking/wip-cna
in testcase: will-it-scale
with following parameters:
nr_task: 100%
mode: thread
test: pthread_mutex1
cpufreq_governor: performance
ucode: 0x2006906
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel
copies to see if the testcase will scale. It builds both a process and threads based test
in order to see any differences between the two.
test-url:
https://github.com/antonblanchard/will-it-scale
on test machine: 72 threads Intel(R) Xeon(R) Gold 6139 CPU @ 2.30GHz with 128G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
(please be noted this not always happens, 5 out 8 runs we found this issue. but in
same tests for parent commit 2eb32196ab, we didn't find same issue)
[ 119.597280] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [pthread_mutex1_:2229]
[ 119.597290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [pthread_mutex1_:2230]
[ 119.597300] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [pthread_mutex1_:2267]
[ 119.597301] Modules linked in: btrfs blake2b_generic xor zstd_decompress zstd_compress
raid6_pq libcrc32c sd_mod t10_pi sg
[ 119.597309] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [pthread_mutex1_:2268]
[ 119.597309] intel_rapl_msr
[ 119.597309] Modules linked in:
[ 119.597310] intel_rapl_common
[ 119.597311] btrfs
[ 119.597311] skx_edac
[ 119.597312] blake2b_generic
[ 119.597312] nfit
[ 119.597313] xor
[ 119.597313] libnvdimm
[ 119.597314] zstd_decompress
[ 119.597314] x86_pkg_temp_thermal
[ 119.597315] zstd_compress
[ 119.597317] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [pthread_mutex1_:2269]
[ 119.597317] intel_powerclamp
[ 119.597318] raid6_pq
[ 119.597318] Modules linked in:
[ 119.597319] coretemp
[ 119.597320] libcrc32c
[ 119.597320] btrfs
[ 119.597321] kvm_intel
[ 119.597322] sd_mod
[ 119.597322] blake2b_generic
[ 119.597323] kvm
[ 119.597323] t10_pi
[ 119.597324] xor
[ 119.597325] irqbypass
[ 119.597325] sg
[ 119.597326] zstd_decompress
[ 119.597326] crct10dif_pclmul
[ 119.597327] intel_rapl_msr
[ 119.597328] zstd_compress
[ 119.597328] crc32_pclmul
[ 119.597329] intel_rapl_common
[ 119.597330] raid6_pq
[ 119.597330] crc32c_intel
[ 119.597331] skx_edac
[ 119.597331] libcrc32c
[ 119.597332] ghash_clmulni_intel
[ 119.597332] nfit
[ 119.597333] sd_mod
[ 119.597334] ipmi_ssif
[ 119.597334] libnvdimm
[ 119.597335] t10_pi
[ 119.597335] intel_cstate
[ 119.597336] x86_pkg_temp_thermal
...
[ 119.605785] Modules linked in: btrfs blake2b_generic xor zstd_decompress zstd_compress
raid6_pq libcrc32c sd_mod t10_pi sg intel_rapl_msr intel_rapl_common skx_edac nfit
libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ipmi_ssif intel_cstate ast
drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt
fb_sys_fops mei_me libahci acpi_ipmi drm intel_uncore ioatdma ipmi_si libata mei joydev
wmi dca ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter ip_tables
[ 119.614252] Modules linked in: btrfs blake2b_generic xor zstd_decompress zstd_compress
raid6_pq libcrc32c sd_mod t10_pi sg intel_rapl_msr intel_rapl_common skx_edac nfit
libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ipmi_ssif intel_cstate ast
drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt
fb_sys_fops mei_me libahci acpi_ipmi drm intel_uncore ioatdma ipmi_si libata mei joydev
wmi dca ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter ip_tables
[ 119.622694] CPU: 0 PID: 2229 Comm: pthread_mutex1_ Tainted: G L
5.8.0-rc2-00016-gf2404ec062943 #2
[ 119.634437] CPU: 1 PID: 2230 Comm: pthread_mutex1_ Tainted: G L
5.8.0-rc2-00016-gf2404ec062943 #2
[ 119.642867] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS
SE5C620.86B.00.01.0015.110720180833 11/07/2018
[ 119.646343] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS
SE5C620.86B.00.01.0015.110720180833 11/07/2018
[ 119.650057] RIP: 0010:__cna_queued_spin_lock_slowpath+0x1d9/0x290
[ 119.653743] RIP: 0010:__cna_queued_spin_lock_slowpath+0x1d9/0x290
[ 119.656367] Code: c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 80 be 02 00 48
03 04 d5 a0 59 43 82 48 89 28 8b 45 08 85 c0 75 09 f3 90 <8b> 45 08 85 c0 74 f7 4c
8b 6d 00 4d 85 ed 0f 84 47 ff ff ff 41 0f
[ 119.659244] Code: c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 80 be 02 00 48
03 04 d5 a0 59 43 82 48 89 28 8b 45 08 85 c0 75 09 f3 90 <8b> 45 08 85 c0 74 f7 4c
8b 6d 00 4d 85 ed 0f 84 47 ff ff ff 41 0f
[ 119.662717] RSP: 0018:ffffc9000a033e20 EFLAGS: 00000246
[ 119.665223] RSP: 0018:ffffc9000a31fe20 EFLAGS: 00000246
[ 119.667626] RAX: 0000000000000000 RBX: ffffc90007533184 RCX: 0000000000000001
[ 119.667627] RDX: 000000000000003a RSI: 0000000000000000 RDI: ffffc90007533184
[ 119.670522] RAX: 0000000000000000 RBX: ffffc90007533184 RCX: 0000000000000001
[ 119.670523] RDX: 0000000000000040 RSI: 0000000000000001 RDI: ffffc90007533184
[ 119.673918] RBP: ffff88903f82be80 R08: 0000000000000000 R09: 0000000000000000
[ 119.673919] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000040000
[ 119.677727] RBP: ffff88903f86be80 R08: 0000000000000000 R09: 0000000000000000
[ 119.677728] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000080000
[ 119.680913] R13: 0000000000000001 R14: ffffc90007533188 R15: ffffc90007533184
[ 119.680915] FS: 00007ffff7bfd700(0000) GS:ffff88903f800000(0000)
knlGS:0000000000000000
[ 119.689118] R13: 0000000000000001 R14: ffffc90007533188 R15: ffffc90007533184
[ 119.689119] FS: 00007ffff73fc700(0000) GS:ffff88903f840000(0000)
knlGS:0000000000000000
[ 119.692537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 119.692539] CR2: 00007f5bdc8de4f4 CR3: 000000206fe2c002 CR4: 00000000007606f0
[ 119.695265] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 119.695266] CR2: 00007f9ba02b9a6c CR3: 000000206fe2c002 CR4: 00000000007606e0
[ 119.698764] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 119.701478] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 119.701479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 119.704269] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 119.704270] PKRU: 55555554
[ 119.706708] PKRU: 55555554
[ 119.706709] Call Trace:
[ 119.709496] Call Trace:
[ 119.709500] _raw_spin_lock+0x21/0x30
[ 119.712010] _raw_spin_lock+0x21/0x30
[ 119.715298] futex_wake+0xba/0x170
[ 119.717549] futex_wake+0xba/0x170
[ 119.720057] do_futex+0x157/0x1d0
[ 119.722300] do_futex+0x157/0x1d0
[ 119.725063] __x64_sys_futex+0x137/0x170
[ 119.727204] __x64_sys_futex+0x137/0x170
[ 119.730462] ? __prepare_exit_to_usermode+0xa4/0x180
[ 119.733811] ? __prepare_exit_to_usermode+0xa4/0x180
[ 119.736977] do_syscall_64+0x47/0x80
[ 119.740044] do_syscall_64+0x47/0x80
[ 119.743016] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 119.746417] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 119.749042] RIP: 0033:0x7ffff7f7633a
[ 119.752016] RIP: 0033:0x7ffff7f7633a
[ 119.754645] Code: Bad RIP value.
[ 119.757351] Code: Bad RIP value.
[ 119.757352] RSP: 002b:00007ffff73fbe20 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca
[ 119.760934] RSP: 002b:00007ffff7bfce20 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca
[ 119.763197] RAX: ffffffffffffffda RBX: 000055555555c1c0 RCX: 00007ffff7f7633a
[ 119.763198] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 000055555555c1c0
[ 119.765620] RAX: ffffffffffffffda RBX: 000055555555c1c0 RCX: 00007ffff7f7633a
[ 119.765621] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 000055555555c1c0
[ 119.768286] RBP: 00007ffff7fca080 R08: 0000000000000000 R09: 00007fffe8000b20
[ 119.768287] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fffffffb33e
[ 119.770928] RBP: 00007ffff7fca000 R08: 0000000000000000 R09: 00007ffff0000b20
[ 119.770929] R10: fffffffffffffb8e R11: 0000000000000206 R12: 00007fffffffb33e
[ 119.773293] R13: 00007fffffffb33f R14: 00007ffff73fc700 R15: 000055555556b330
[ 120.664329] Shutting down cpus with NMI
[ 120.665585] R13: 00007fffffffb33f R14: 00007ffff7bfd700 R15: 000055555556a9e0
[ 124.309818] Kernel Offset: disabled
ACPI MEMORY or I/O RESET_REG.
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang(a)intel.com>
To reproduce:
git clone
https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
Thanks,
Oliver Sang