> >> > You can remove powerclamp entry from thermal daemon
usage
> >> > meanwhile by removing
> >> > thermal-cpu-cdev-order.xml (Based on your build it will
> >> > be ../etc/thermald/..)
> >>
> >> Thanks for the workaround, I will try it and in case something else
> >> appears I will most surely ask for help.
> > Let me know if this works, then we can know if power clamp caused
> > this issue or not.
>
> So far no issues, but it is kinda sporadic I don't know exactly how to
> trigger this issue, but a common ground is that I was using my
> computer, just leaving it idling never made it freeze. But usually it
> never been more than a day to freeze, and so far it is working without
> problems.
>
Could you try explicitly setting some idle injection ratio to
powerclamp and see if it causes hang? you need some workload.
e.go.
echo 25 > /sys/class/thermal/cooling_deviceXXX (powerclamp type), or
use tmon tool to set it. Tmon is in kernel source, tools/thermal/tmon,
type TAB, it will show cooling devices to set.
This is my cooling device:
root@xen:~# cat /sys/class/thermal/cooling_device10/type
intel_powerclamp
Looking at this site:
https://www.kernel.org/doc/Documentation/thermal/intel_powerclamp.txt,
i read some bits of it, but I found the full command:
echo 25 > /sys/class/thermal/cooling_device10/cur_state
AND, well, that worked like a bomb, system freeze, the moment I hit
the enter key. Once again, SSH into the machine and here is a strip
from Syslog:
Apr 17 10:33:02 localhost kernel: [119601.040365] intel_powerclamp:
Start idle injection to reduce power
Apr 17 10:33:02 localhost kernel: [119601.117426] invalid opcode: 0000 [#1] SMP
Apr 17 10:33:02 localhost kernel: [119601.118350] Modules linked in:
cpuid uas usb_storage ebtable_filter ebtables ip6table_filter i$
Apr 17 10:33:02 localhost kernel: [119601.124954] CPU: 0 PID: 16813
Comm: kidle_inject/0 Not tainted 4.0.0-040000-generic #201504121$
Apr 17 10:33:02 localhost kernel: [119601.126157] Hardware name:
System manufacturer System Product Name/P8Z77-V DELUXE, BIOS 2104 0$
Apr 17 10:33:02 localhost kernel: [119601.127379] task:
ffff880059f8d000 ti: ffff88002bcac000 task.ti: ffff88002bcac000
Apr 17 10:33:02 localhost kernel: [119601.128612] RIP:
e030:[<ffffffffc0c9ab44>] [<ffffffffc0c9ab44>]
clamp_thread+0x244/0x360 [int$
Apr 17 10:33:02 localhost kernel: [119601.129869] RSP:
e02b:ffff88002bcafe08 EFLAGS: 00010246
Apr 17 10:33:02 localhost kernel: [119601.131131] RAX:
ffff88002bcac010 RBX: 0000000101c638b6 RCX: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.132397] RDX:
0000000000000000 RSI: ffff88002bcac000 RDI: 0000000000000200
Apr 17 10:33:02 localhost kernel: [119601.133655] RBP:
ffff88002bcafeb8 R08: 0000000000000001 R09: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.134914] R10:
0000000000007ff0 R11: 0000000000000000 R12: ffff88002bcac000
Apr 17 10:33:02 localhost kernel: [119601.136179] R13:
0000000000000001 R14: 0000000000000000 R15: ffff88002bcac010
Apr 17 10:33:02 localhost kernel: [119601.137444] FS:
0000000000000000(0000) GS:ffff880080000000(0000)
knlGS:0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.138713] CS: e033 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 17 10:33:02 localhost kernel: [119601.139988] CR2:
00007f04fa815000 CR3: 0000000059488000 CR4: 0000000000042660
Apr 17 10:33:02 localhost kernel: [119601.141182] Stack:
Apr 17 10:33:02 localhost kernel: [119601.142335] ffff88002bcac000
ffff880059f8d000 0000000000000000 0000002000000002
Apr 17 10:33:02 localhost kernel: [119601.143498] ffff880058ffda00
ffffffff81f12298 ffffffff81f12298 0000000101c638b6
Apr 17 10:33:02 localhost kernel: [119601.144654] ffffffff81f11700
ffffffffc0c9a190 0000000000000000 ffffffffffffffff
Apr 17 10:33:02 localhost kernel: [119601.145800] Call Trace:
Apr 17 10:33:02 localhost kernel: [119601.146932]
[<ffffffffc0c9a190>] ? pkg_state_counter+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.148079]
[<ffffffffc0c9a900>] ? powerclamp_adjust_controls+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.149229]
[<ffffffff81098869>] kthread+0xc9/0xe0
Apr 17 10:33:02 localhost kernel: [119601.150370]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.151512]
[<ffffffff817efc18>] ret_from_fork+0x58/0x90
Apr 17 10:33:02 localhost kernel: [119601.152654]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.153795] Code: 4e 12 20 48 8b
46 10 a8 08 75 37 eb 16 0f ae f0 65 48 8b 04 25 c8 b8 00 00 0$
Apr 17 10:33:02 localhost kernel: [119601.155041] RIP
[<ffffffffc0c9ab44>] clamp_thread+0x244/0x360 [intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.156283] RSP <ffff88002bcafe08>
Apr 17 10:33:02 localhost kernel: [119601.163592] invalid opcode: 0000 [#2] SMP
Apr 17 10:33:02 localhost kernel: [119601.163663] ---[ end trace
259f03a9be003c0a ]---
Apr 17 10:33:02 localhost kernel: [119601.166539] Modules linked in:
cpuid uas usb_storage ebtable_filter ebtables ip6table_filter i$
Apr 17 10:33:02 localhost kernel: [119601.176312] CPU: 1 PID: 16814
Comm: kidle_inject/1 Tainted: G D 4.0.0-040000-gene$
Apr 17 10:33:02 localhost kernel: [119601.178035] Hardware name:
System manufacturer System Product Name/P8Z77-V DELUXE, BIOS 2104 0$
Apr 17 10:33:02 localhost kernel: [119601.179767] task:
ffff880059f8b200 ti: ffff88000b528000 task.ti: ffff88000b528000
Apr 17 10:33:02 localhost kernel: [119601.181300] RIP:
e030:[<ffffffffc0c9ab44>] [<ffffffffc0c9ab44>]
clamp_thread+0x244/0x360 [int$
Apr 17 10:33:02 localhost kernel: [119601.182831] RSP:
e02b:ffff88000b52be08 EFLAGS: 00010246
Apr 17 10:33:02 localhost kernel: [119601.184362] RAX:
ffff88000b528010 RBX: 0000000101c638b6 RCX: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.185895] RDX:
0000000000000000 RSI: ffff88000b528000 RDI: 0000000000000200
Apr 17 10:33:02 localhost kernel: [119601.187449] RBP:
ffff88000b52beb8 R08: 0000000000000001 R09: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.188961] R10:
0000000000007ff0 R11: 0000000000000000 R12: ffff88000b528000
Apr 17 10:33:02 localhost kernel: [119601.190498] R13:
0000000000000001 R14: 0000000000000001 R15: ffff88000b528010
Apr 17 10:33:02 localhost kernel: [119601.192002] FS:
0000000000000000(0000) GS:ffff880080040000(0000)
knlGS:ffff880080040000
Apr 17 10:33:02 localhost kernel: [119601.193505] CS: e033 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 17 10:33:02 localhost kernel: [119601.195034] CR2:
00007f5dc0006bc0 CR3: 000000005a1be000 CR4: 0000000000042660
Apr 17 10:33:02 localhost kernel: [119601.196556] Stack:
Apr 17 10:33:02 localhost kernel: [119601.198072] ffff88000b528000
ffff880059f8b200 0000000000000001 0000002000000002
Apr 17 10:33:02 localhost kernel: [119601.199626] ffff88005b92aa00
ffff88005cf6cb98 ffff88005cf6cb98 0000000101c638b6
Apr 17 10:33:02 localhost kernel: [119601.201169] ffff88005cf6c000
ffffffffc0c9a190 0000000000000000 ffffffffffffffff
Apr 17 10:33:02 localhost kernel: [119601.202717] Call Trace:
Apr 17 10:33:02 localhost kernel: [119601.204308]
[<ffffffffc0c9a190>] ? pkg_state_counter+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.205871]
[<ffffffffc0c9a900>] ? powerclamp_adjust_controls+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.207444]
[<ffffffff81098869>] kthread+0xc9/0xe0
Apr 17 10:33:02 localhost kernel: [119601.209018]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.210599]
[<ffffffff817efc18>] ret_from_fork+0x58/0x90
Apr 17 10:33:02 localhost kernel: [119601.212183]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.213769] Code: 4e 12 20 48 8b
46 10 a8 08 75 37 eb 16 0f ae f0 65 48 8b 04 25 c8 b8 00 00 0$
Apr 17 10:33:02 localhost kernel: [119601.215481] RIP
[<ffffffffc0c9ab44>] clamp_thread+0x244/0x360 [intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.217126] RSP <ffff88000b52be08>
Apr 17 10:33:02 localhost kernel: [119601.218754] invalid opcode: 0000 [#3]
Apr 17 10:33:02 localhost kernel: [119601.218775] ---[ end trace
259f03a9be003c0b ]---
Apr 17 10:33:02 localhost kernel: [119601.220861] SMP
Apr 17 10:33:02 localhost kernel: [119601.224855] Modules linked in:
cpuid uas usb_storage ebtable_filter ebtables ip6table_filter i$
Apr 17 10:33:02 localhost kernel: [119601.237575] CPU: 3 PID: 16816
Comm: kidle_inject/3 Tainted: G D 4.0.0-040000-gene$
Apr 17 10:33:02 localhost kernel: [119601.239753] Hardware name:
System manufacturer System Product Name/P8Z77-V DELUXE, BIOS 2104 0$
Apr 17 10:33:02 localhost kernel: [119601.241493] task:
ffff880059956e00 ti: ffff88002dc60000 task.ti: ffff88002dc60000
Apr 17 10:33:02 localhost kernel: [119601.243143] RIP:
e030:[<ffffffffc0c9ab44>] [<ffffffffc0c9ab44>]
clamp_thread+0x244/0x360 [int$
Apr 17 10:33:02 localhost kernel: [119601.244809] RSP:
e02b:ffff88002dc63e08 EFLAGS: 00010246
Apr 17 10:33:02 localhost kernel: [119601.246472] RAX:
ffff88002dc60010 RBX: 0000000101c638b6 RCX: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.248118] RDX:
0000000000000000 RSI: ffff88002dc60000 RDI: 0000000000000200
Apr 17 10:33:02 localhost kernel: [119601.249790] RBP:
ffff88002dc63eb8 R08: 0000000000000001 R09: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.251485] R10:
0000000000007ff0 R11: 0000000000000000 R12: ffff88002dc60000
Apr 17 10:33:02 localhost kernel: [119601.253132] R13:
0000000000000001 R14: 0000000000000003 R15: ffff88002dc60010
Apr 17 10:33:02 localhost kernel: [119601.254831] FS:
0000000000000000(0000) GS:ffff8800800c0000(0000)
knlGS:ffff8800800c0000
Apr 17 10:33:02 localhost kernel: [119601.256477] CS: e033 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 17 10:33:02 localhost kernel: [119601.258169] CR2:
00007f9d5ae3e000 CR3: 000000000b8ff000 CR4: 0000000000042660
Apr 17 10:33:02 localhost kernel: [119601.259819] Stack:
Apr 17 10:33:02 localhost kernel: [119601.261483] ffff88002dc60000
ffff880059956e00 0000000000000003 0000002000000002
Apr 17 10:33:02 localhost kernel: [119601.263136] 0000000000000000
ffff88005c804b98 ffff88005c804b98 0000000101c638b6
Apr 17 10:33:02 localhost kernel: [119601.264832] ffff88005c804000
ffffffffc0c9a190 0000000000000000 ffffffffffffffff
Apr 17 10:33:02 localhost kernel: [119601.266511] Call Trace:
Apr 17 10:33:02 localhost kernel: [119601.268162]
[<ffffffffc0c9a190>] ? pkg_state_counter+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.269878]
[<ffffffffc0c9a900>] ? powerclamp_adjust_controls+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.271502]
[<ffffffff81098869>] kthread+0xc9/0xe0
Apr 17 10:33:02 localhost kernel: [119601.273082]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.274685]
[<ffffffff817efc18>] ret_from_fork+0x58/0x90
Apr 17 10:33:02 localhost kernel: [119601.276286]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.277895] Code: 4e 12 20 48 8b
46 10 a8 08 75 37 eb 16 0f ae f0 65 48 8b 04 25 c8 b8 00 00 0$
Apr 17 10:33:02 localhost kernel: [119601.279548] RIP
[<ffffffffc0c9ab44>] clamp_thread+0x244/0x360 [intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.281163] RSP <ffff88002dc63e08>
Apr 17 10:33:02 localhost kernel: [119601.282791] invalid opcode: 0000 [#4] SMP
Apr 17 10:33:02 localhost kernel: [119601.282814] ---[ end trace
259f03a9be003c0c ]---
Apr 17 10:33:02 localhost kernel: [119601.289149]
Apr 17 10:33:02 localhost kernel: [119601.292189] Modules linked in:
cpuid uas usb_storage ebtable_filter ebtables ip6table_filter i$
Apr 17 10:33:02 localhost kernel: [119601.305848] CPU: 2 PID: 16815
Comm: kidle_inject/2 Tainted: G D 4.0.0-040000-gene$
Apr 17 10:33:02 localhost kernel: [119601.307478] Hardware name:
System manufacturer System Product Name/P8Z77-V DELUXE, BIOS 2104 0$
Apr 17 10:33:02 localhost kernel: [119601.309115] task:
ffff880059f8c600 ti: ffff88000b5b8000 task.ti: ffff88000b5b8000
Apr 17 10:33:02 localhost kernel: [119601.310753] RIP:
e030:[<ffffffffc0c9ab44>] [<ffffffffc0c9ab44>]
clamp_thread+0x244/0x360 [int$
Apr 17 10:33:02 localhost kernel: [119601.312410] RSP:
e02b:ffff88000b5bbe08 EFLAGS: 00010246
Apr 17 10:33:02 localhost kernel: [119601.314063] RAX:
ffff88000b5b8010 RBX: 0000000101c638b6 RCX: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.315714] RDX:
0000000000000000 RSI: ffff88000b5b8000 RDI: 0000000000000200
Apr 17 10:33:02 localhost kernel: [119601.317373] RBP:
ffff88000b5bbeb8 R08: 0000000000000001 R09: 0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.319035] R10:
0000000000007ff0 R11: 0000000000000000 R12: ffff88000b5b8000
Apr 17 10:33:02 localhost kernel: [119601.320684] R13:
0000000000000001 R14: 0000000000000002 R15: ffff88000b5b8010
Apr 17 10:33:02 localhost kernel: [119601.322327] FS:
0000000000000000(0000) GS:ffff880080080000(0000)
knlGS:0000000000000000
Apr 17 10:33:02 localhost kernel: [119601.323978] CS: e033 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 17 10:33:02 localhost kernel: [119601.325643] CR2:
00007ff983039000 CR3: 000000005a1be000 CR4: 0000000000042660
Apr 17 10:33:02 localhost kernel: [119601.327295] Stack:
Apr 17 10:33:02 localhost kernel: [119601.328936] ffff88000b5b8000
ffff880059f8c600 0000000000000002 0000002000000002
Apr 17 10:33:02 localhost kernel: [119601.330644] 0000000000000000
ffff88005cfccb98 ffff88005cfccb98 0000000101c638b6
Apr 17 10:33:02 localhost kernel: [119601.332301] ffff88005cfcc000
ffffffffc0c9a190 0000000000000000 ffffffffffffffff
Apr 17 10:33:02 localhost kernel: [119601.335617]
[<ffffffffc0c9a190>] ? pkg_state_counter+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.337244]
[<ffffffffc0c9a900>] ? powerclamp_adjust_controls+0x100/0x100
[intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.338828]
[<ffffffff81098869>] kthread+0xc9/0xe0
Apr 17 10:33:02 localhost kernel: [119601.340408]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.341984]
[<ffffffff817efc18>] ret_from_fork+0x58/0x90
Apr 17 10:33:02 localhost kernel: [119601.343553]
[<ffffffff810987a0>] ? flush_kthread_worker+0x90/0x90
Apr 17 10:33:02 localhost kernel: [119601.345125] Code: 4e 12 20 48 8b
46 10 a8 08 75 37 eb 16 0f ae f0 65 48 8b 04 25 c8 b8 00 00 0$
Apr 17 10:33:02 localhost kernel: [119601.346770] RIP
[<ffffffffc0c9ab44>] clamp_thread+0x244/0x360 [intel_powerclamp]
Apr 17 10:33:02 localhost kernel: [119601.348370] RSP <ffff88000b5bbe08>
Apr 17 10:33:02 localhost kernel: [119601.349968] ---[ end trace
259f03a9be003c0d ]---