Hi all,
what are the possible origins of messages of the type
Apr 2 07:28:42 kernel: Lustre: Service thread pid 10573 was inactive
for 0.00s. The thread might be
hung, or it might only be slow and will resume later.
Dumping the stack trace for debugging purposes:
Apr 2 07:28:42 kernel: Lustre: Service thread pid 10573 completed
after 0.00s. This indicates the
system was overloaded (too many service threads, or
there were not enough hardware resources).
Pid: 10573, comm: ll_ost_92
Apart from taking the enormous 0.00sec, these threads were running on an unused test
system, Lustre v
1.8.9_intel. And at the given time, it was isolated in its own IB-subnet, all other
traffic was cut
because of a general failure of the surrounding network ;-)
Each of these messages is accompanied by a trace.
I'm quite accustomed to seeing such messages, but then they appear on the heavily used
production
system, the times given are ~ 1000s, and the overload of the respective box is quite
visible, too.
But here?
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Peter Hassenbach
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt