On Oct 8, 2014, at 1:38 AM, Michael Kluge <michael.kluge(a)tu-dresden.de>
wrote:
>> 2) Disable the QOS allocator by setting qos_threshold_rr=100.
This
>> should force the round-robin allocator to be used all the time and
>> spread out the requests. Then you can gradually tune down the
>> parameter to send more allocations to the new ost. (Note: You might
>> not want to try this if you have any osts that are very close to full
>> capacity.)
>
> Ahh. OK. I thought that qos_prio_free is responsible for this. The file
> system is at 60% of its capacity and otherwise very balanced. So I'll
> give this a try.
The qos_threshold_rr parameter is used to control when lustre uses the qos vs. rr ost
allocation method. Once lustre decides it needs to use qos, then it uses the
qos_prio_free parameter to control how much importance it places on free space when
picking the osts to allocate.
The default value for qos_threshold_rr seems to keep a pretty tight window on the min-max
ost usage which leads to a more uniform distribution of free space on osts. However, as
you have seen, this can lead to more contention on some osts. I have seen this affect the
overall aggregate I/O rate on one of our file systems, and I ended up increasing the value
a bit to improve throughput. The only side effect is that the spread of ost usages is
somewhat larger.
Thanks a lot, that was what I was looking for. Now things look much
more relaxed.
Glad to hear that helped.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu