Hello Andreas,
The work around (always use RR for file OST allocation):
# echo 100 > /proc/fs/lustre/lod/lundwork-MDT0000-mdtlov/qos_threshold_rr
seems to be working for us.
thanks!
-k
PS: Special thanks to Aurelien Degremont from CEA for the work around.
On Fri, 7 Nov 2014, Dilger, Andreas wrote:
See
https://jira.hpdd.intel.com/browse/LU-5778 for discussion and a
potential fix. It isn't tested yet, so feedback is welcome.
Cheers, Andreas
On Nov 7, 2014, at 11:47, "Kaizaad Bilimorya"
<kaizaad@sharcnet.ca<mailto:kaizaad@sharcnet.ca>> wrote:
OSSs & MDS
==========
Lustre 2.5.3
CentOS 6.5 kernel 2.6.32-431.23.3.el6_lustre.x86_64
Clients
=======
Lustre 1.8.x and 2.5.3
We had to disable a crashed OST, so on our combined MDS/MGS we did a
lctl conf_param lundwork-OST000e.osc.active=0
and that seems to have worked fine.
We now have an issue where the file OST allocation algorithm doesn't seem to be
working anymore. This was working fine before the OST crash (using default system striping
parameters).
Now we notice that when we are moving (rsync'ing) large amounts of data to this
lustre filesystem, it only uses all the OSTs before the failed one (listed in "lfs df
-h"). So, we inactivated those OSTs once they started to get full. Now it seems that
the next OST in the list (after the failed one) is the only one being hit, until we
deactivate that one.
thanks
-k
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org<mailto:HPDD-discuss@lists.01.org>
https://lists.01.org/mailman/listinfo/hpdd-discuss