On Aug 13, 2015, at 1:26 PM, Kumar, Amit <ahkumar(a)mail.smu.edu>
wrote:
Although I did not see any ENOSPC errors on OST’s as reported in one of the above request
that is solved. But as of this past week I had to bring down the entire file system to
resolve from long lockups. I use ltop utility to monitor IOPS and I noticed that LOCKS
held by each OST were in the range of 6,000-10,000.
How many clients do you have using the file system? If there are quite a few, you might
need to look into limiting the number of locks each client will cache. By default, Lustre
comes up with a number based on the amount of memory on the system, but this can
potentially be a big number for each client. If there are lots of clients, then the locks
on the server side can start using up quite a bit of memory.
I don’t know if this is actually causing the issue you are seeing, but I have had a couple
of cases of crashing (or slow performing) MDS servers caused by large numbers of locks, so
this is usually one of the first things I check. If you want to get an idea of how much
memory is being used by locks, just look at /proc/slabinfo on the server and find any
lines with “ldlm” in them.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu