On Oct 7, 2014, at 8:00 AM, Michael Kluge <michael.kluge(a)tu-dresden.de>
wrote:
we recently reactivated an OST and now the 450 nodes send this OST
many more I/O requests that they send to the other osts. We have 4 servers and 48 osts.
The other oss have a load of about 100. The server with this ost has 1500 and many
"filter_commitrw_write()) scratch-OST002f: slow i_mutex 30s" messages.
Are all the "slow i_mutex" messages for the new ost? Also, have you increased
the number of oss threads that are started on the server? I thought the oss servers would
dynamically create more threads up to a limit of 512, so I would expect the load to max
out around that same point unless the number of threads has been increased on purpose.
Thus, I set qos_prio_free to 0 in
/proc/fs/lustre/lov/scratch-MDT0000-mdtlov
Does anyone have an expectation how long it will take until the load on this server will
go down? Hours? Days?
Here are a couple of things you could try to see if it helps your situation. I can't
tell whether or not these changes would alleviate the problem, but it shouldn't hurt
anything to try them.
1) Use "lctl" to diable the new ost on the mds server. This won't stop
clients from using the new ost if they are accessing a file that already resides on that
ost. But it should stop the mds from allocating that ost to any newly created files.
When things calm down, you can re-enable the ost.
2) Disable the QOS allocator by setting qos_threshold_rr=100. This should force the
round-robin allocator to be used all the time and spread out the requests. Then you can
gradually tune down the parameter to send more allocations to the new ost. (Note: You
might not want to try this if you have any osts that are very close to full capacity.)
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu