On 2014/09/04, 10:59 AM, "Robin Humble" <rjh+lustre(a)cita.utoronto.ca>
wrote:
has anyone used 'lfs migrate [--block]' to live migrate lots
of data?
it worked ok?
any hints for best usage? (how many migrates running per OST etc.)
the context is that we've doubled our number of OSTs and now need to
rebalance our ~1 PB of data by moving roughly 1/2 of it onto the new
empty OSTs.
I have yet to chat to anyone who's used 'lfs migrate' (either directly
or via lfs_migrate) in production, so I'm being paranoid and looking
for comforting war stories where it's been used to shift around a lot
of data without problems...
documentation is a bit scarce. maybe just
https://jira.hpdd.intel.com/browse/LU-2445
Note that there were some bugs in "lfs migrate" that leaked inodes on
the MDS (see LU-3969) that I doubt is fixed in the version you have.
and 'lfs help migrate'. but with --block it sounds pretty
amazing.
it should be able to do the rebalance live and without a downtime
(with some delays to file access).
we're using the latest(?) Intel Enterprise Lustre version 2.0.1.1
(which appears to be 2.5.2 based). we've heard via Intel support that
'lfs migrate' runs a verify pass, which sounds nice.
Note that the "lfs_migrate" script does a before and after checksum pass,
but "lfs migrate" (which it calls internally on systems that support it)
does not.
I agree that the single-threaded copy in lfs_migrate may be sub-optimal
performance wise, so if people are interested to improve this it would
be great if they did it in a manner that could be included back into
lfs_migrate.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division