2015-11-02 23:07 GMT+03:00 Dave Hansen <dave.hansen(a)intel.com>:
On 11/02/2015 11:34 AM, Andrey Ryabinin wrote:
>>> >>
>>> >> [ 1.159450] augmented rbtree testing -> 23675 cycles
>>> >> [ 1.864996]
>>> >> It took less than a second, meanwhile in your case it didn't
finish in
>>> >> 22 seconds.
>>> >>
>>> >> This makes me think that your host is overloaded and the problem is
on
>>> >> your side.
>> >
>> > It's probably just a matter of putting some cond_resched()s in the test
>> > code.
> Yes, but is it worthwhile? It's very likely that lockup will just
> trigger in another place.
I'm guessing that the lockup here was because the tests were running for
too long. If we cond_resched() in there often enough, the kernel won't
detect a softlockup at all.
Sure, but why are these tests running so long?
In my setup it takes less than a second to finish these tests.
On the same kernel version and config of course. Although I might have more
powerful hardware it doesn't explain such huge difference.
So these tests are actually fast tests. I guess that the host is
overloaded and KVM guest runs
so slow that even these simple tests start triggering softlockup.
It won't shift somewhere else.
That's not what I mean. Sure, the cond_resched() in rbtree_test_init()
will fix this particular softlockup.
But if even such normally fast tests now are running too long, then a
lot of other kernel code, which normally
runs fast, likely becomes too slow on Ying's setup and will trigger
another softlockup.
rbtree_test_init() is just the first such place.
In that case, sticking cond_resched()s across the whole kernel is not
a solution.