I hope somebody can help me answer this. I see LOCKS in ltop in the numbers 25000 and system is really slowing down. Any help here is greatly appreciated.

 

Thank you,
Amit

 

From: HPDD-discuss [mailto:hpdd-discuss-bounces@lists.01.org] On Behalf Of Kumar, Amit
Sent: Thursday, August 13, 2015 12:26 PM
To: hpdd-discuss@lists.01.org
Subject: [HPDD-discuss] (ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel

 

Dear All,

 

Lustre v2.4.3, RHEL6.4, files system: ldsikfs

 

We have seen these errors[(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel] on all OST’s and after a while MDS gets very busy and takes forever to respond to RPC requests.

 

I read similar messages being a jira request reported here:

https://jira.hpdd.intel.com/browse/LU-3421

https://jira.hpdd.intel.com/browse/LU-6664

 

Although I did not see any ENOSPC errors on OST’s as reported in one of the above request that is solved. But as of this past week I had to bring down the entire file system to resolve from long lockups. I use ltop utility to monitor IOPS and I noticed that LOCKS held by each OST were in the range of 6,000-10,000.

 

I am a bit lost if this is solved as per LU-3421 in version 2.4.1, then why I am seeing similar messages in v2.4.3. Could be something else? I can provide logs if required.

Please let me know if you think I need to tag on to the jira request…

 

Thank you,
Amit