Hi, All.
I want you to discuss an issue about an error handle in ldlm_cli_cancel_req().
According to the code of Lustre-2.3.64, ldlm_cli_cancel_req() never re-send ldlm_cancel
req when it has got -EAGAIN from ptlrpc_queue_wait(), and ptlrpc_queue_wait() return
-EAGAIN when a state of import object which is target of the ldlm_cancel req is in
recovery states such as DISCON, CONNECTING etc. And, In my experience, it can happen so
often, especially in a large-scale system.
Which is why, I suggest that ldlm_cli_cancel_req() resend ldlm_cancel request when getting
-EAGAIN. Because, if not, the client which failed to send ldlm_cancel req will be evicted
from a server when the server sent blocking callback req to the client.
If someone agree my idea, I'd like to make a ticket for the issue on Jira.
Best regard.
-----------------------------------
Hiroya Nozaki nozaki.hiroya(a)jp.fujitsu.com
Next Generation Technical Computing Unit
Fujitsu, Ltd
Tel: 044-754-8769
Ext: 7103-8594
-----------------------------------
Hiroya Nozaki nozaki.hiroya(a)jp.fujitsu.com
Next Generation Technical Computing Unit
Fujitsu, Ltd
Tel: 044-754-8769
Ext: 7103-8594