Dear All,
Lustre v2.4.3, RHEL6.4, files system: ldsikfs
We have seen these errors[(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on
lock cancel] on all OST's and after a while MDS gets very busy and takes forever to
respond to RPC requests.
I read similar messages being a jira request reported here:
https://jira.hpdd.intel.com/browse/LU-3421
https://jira.hpdd.intel.com/browse/LU-6664
Although I did not see any ENOSPC errors on OST's as reported in one of the above
request that is solved. But as of this past week I had to bring down the entire file
system to resolve from long lockups. I use ltop utility to monitor IOPS and I noticed that
LOCKS held by each OST were in the range of 6,000-10,000.
I am a bit lost if this is solved as per LU-3421 in version 2.4.1, then why I am seeing
similar messages in v2.4.3. Could be something else? I can provide logs if required.
Please let me know if you think I need to tag on to the jira request...
Thank you,
Amit