On 2014/01/08, 8:00 PM, "Bob Ball" <ball(a)umich.edu> wrote:
We are running lustre 2.1.6 on SL6.4 systems. Most OST date back to
lustre 1.8.4 under SL5.x.
I now find it necessary to drain and reformat the underlying RAID volume
of one of these OST. I have done this several times in the past, under
lustre 1.8.4, and was highly satisfied with the outcome. However, I
find this somewhat more problematic under 2.1.6 now. Basically, in the
two examples so far, corrupted files have resulted.
I have used lfs_migrate to first drain, then refill the OST after it is
reformatted. It is much faster now than under 1.8.4, which is nice. Do
I have to do this on an idle file system though to avoid the
corruption? The two previous examples were still live, so it was
possible that the corrupted files were being accessed at the time?
Could this have been the cause of the problems?
What am I missing in doing this now under 2.1.6?
The lfs_migrate man page and script for 2.1 (I thought) made it pretty
clear
that this tool is not safe for files that may be in use/modified:
# lfs_migrate /mnt/lustre/foo
lfs_migrate is currently NOT SAFE for moving in-use files.
Use it only when you are sure migrated files are unused.
If emptying OST(s) that are not disabled on the MDS, new
files may use them. To prevent MDS allocating any files on
OSTNNNN run 'lctl --device %{fsname}-OSTNNNN-osc deactivate'
on the MDS.
Continue? (y/n)
This situation is improved in Lustre 2.4 and 2.5 - open files are migrated
"in place" and transparently to applications, though it isn't yet able to
migrate files that are actively being modified (it should leave the file in
place if it detects the file is modified during migration).
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division