Hi,
We sometimes see the following error message on OSSs. And the
May 5 20:47:16 lustre-oss03 kernel: Lustre: scratch-OST0006: Client
511ae429-07b7-f9ca-22b6-f0f8839b8029 (at 192.168.102.37@o2ib) refused
reconnection, still busy with 1 active RPCs
And on the client that it refused connection, the error is as below:
May 5 20:47:03 nodem37 kernel: Lustre:
2424:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for sent delay: [sent 1367804814/real 0] req@ffff881849d84800
x1433750448809719/t0(0)
o101->scratch-OST0008-osc-ffff880c3fe37400@192.168.100.3@o2ib:28/4 lens
296/352 e 0 to 1 dl 1367804823 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
May 5 20:47:03 nodem37 kernel: Lustre:
2424:0:(client.c:1780:ptlrpc_expire_one_request()) Skipped 4 previous
similar messages
May 5 20:47:03 nodem37 kernel: Lustre:
scratch-OST0008-osc-ffff880c3fe37400: Connection to scratch-OST0008 (at
192.168.100.3@o2ib) was lost; in progress operations using this service
will wait for recovery to complete
May 5 20:47:03 nodem37 kernel: Lustre: Skipped 1 previous similar message
May 5 20:47:04 nodem37 kernel: Lustre:
2424:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for sent delay: [sent 1367804815/real 0] req@ffff880a86515400
x1433750448809779/t0(0)
o101->scratch-OST0008-osc-ffff880c3fe37400@192.168.100.3@o2ib:28/4 lens
296/352 e 0 to 1 dl 1367804824 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
May 5 20:47:04 nodem37 kernel: Lustre:
2424:0:(client.c:1780:ptlrpc_expire_one_request()) Skipped 6 previous
similar messages
May 5 20:47:05 nodem37 kernel: LustreError: 11-0: an error occurred
while communicating with 192.168.100.3@o2ib. The ost_connect operation
failed with -16
May 5 20:47:05 nodem37 kernel: LustreError: Skipped 1 previous similar
message
May 5 20:47:05 nodem37 kernel: LustreError: 11-0: an error occurred
while communicating with 192.168.100.3@o2ib. The ost_connect operation
failed with -16
May 5 20:47:28 nodem37 kernel: Lustre:
scratch-OST0007-osc-ffff880c3fe37400: Connection restored to
scratch-OST0007 (at 192.168.100.3@o2ib)
May 5 20:47:28 nodem37 kernel: Lustre: Skipped 1 previous similar message
May 5 20:49:09 nodem37 kernel: LustreError: 11-0: an error occurred
while communicating with 192.168.100.3@o2ib. The ost_destroy operation
failed with -107
May 5 20:49:09 nodem37 kernel: LustreError: Skipped 1 previous similar
message
May 5 20:49:09 nodem37 kernel: Lustre:
scratch-OST0008-osc-ffff880c3fe37400: Connection to scratch-OST0008 (at
192.168.100.3@o2ib) was lost; in progress operations using this service
will wait for recovery to complete
May 5 20:49:09 nodem37 kernel: Lustre: Skipped 2 previous similar messages
May 5 20:49:09 nodem37 kernel: LustreError: 167-0: This client was
evicted by scratch-OST0008; in progress operations using this service
will fail.
May 5 20:49:09 nodem37 kernel: LustreError:
2422:0:(client.c:1060:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@ffff88184061d400 x1433750448823924/t0(0)
o4->scratch-OST0008-osc-ffff880c3fe37400@192.168.100.3@o2ib:6/4 lens
456/416 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
May 5 20:49:09 nodem37 kernel: LustreError:
2422:0:(client.c:1060:ptlrpc_import_delay_req()) Skipped 5687 previous
similar messages
May 5 20:49:09 nodem37 kernel: LustreError:
5585:0:(osc_lock.c:809:osc_ldlm_completion_ast())
lock@ffff88128c19b698[2 2 0 1 1 00000000] W(2):[0,
0]@[0x100080000:0xcdb5aed5:0x0] {
May 5 20:49:09 nodem37 kernel: LustreError:
5585:0:(osc_lock.c:809:osc_ldlm_completion_ast())
lovsub@ffff880db54ec860: [0 ffff8810e95d6e30 W(2):[0,
0]@[0x201c50c90:0x16927:0x0]]
May 5 20:49:09 nodem37 kernel: LustreError:
5585:0:(osc_lock.c:809:osc_ldlm_completion_ast()) osc@ffff88169bf71d78:
ffff881344ac6240 40120002 0x7293132dc153773c 2 (null) size: 0 mtime:
1367804804 atime: 1367804804 ctime: 1367804804 blocks: 0
May 5 20:49:09 nodem37 kernel: LustreError:
5585:0:(osc_lock.c:809:osc_ldlm_completion_ast()) } lock@ffff88128c19b698
May 5 20:49:09 nodem37 kernel: LustreError:
5585:0:(osc_lock.c:809:osc_ldlm_completion_ast()) dlmlock returned -5
May 5 20:49:09 nodem37 kernel: Lustre:
scratch-OST0008-osc-ffff880c3fe37400: Connection restored to
scratch-OST0008 (at 192.168.100.3@o2ib)
Has anybody seen this? Any advice is appreciated.
Thanks
Mi
Email Disclaimer:
www.stjude.org/emaildisclaimer
Consultation Disclaimer:
www.stjude.org/consultationdisclaimer