Thanks, Andreas. I gave everyone fair warning, and provided a list in
advance of files to be migrated. Given the state of the underlying RAID
though, it will be dicey one way or the other. :-(
bob
On 1/9/2014 8:08 PM, Dilger, Andreas wrote:
On 2014/01/08, 8:00 PM, "Bob Ball" <ball(a)umich.edu>
wrote:
> We are running lustre 2.1.6 on SL6.4 systems. Most OST date back to
> lustre 1.8.4 under SL5.x.
>
> I now find it necessary to drain and reformat the underlying RAID volume
> of one of these OST. I have done this several times in the past, under
> lustre 1.8.4, and was highly satisfied with the outcome. However, I
> find this somewhat more problematic under 2.1.6 now. Basically, in the
> two examples so far, corrupted files have resulted.
>
> I have used lfs_migrate to first drain, then refill the OST after it is
> reformatted. It is much faster now than under 1.8.4, which is nice. Do
> I have to do this on an idle file system though to avoid the
> corruption? The two previous examples were still live, so it was
> possible that the corrupted files were being accessed at the time?
> Could this have been the cause of the problems?
>
> What am I missing in doing this now under 2.1.6?
The lfs_migrate man page and script for 2.1 (I thought) made it pretty
clear
that this tool is not safe for files that may be in use/modified:
# lfs_migrate /mnt/lustre/foo
lfs_migrate is currently NOT SAFE for moving in-use files.
Use it only when you are sure migrated files are unused.
If emptying OST(s) that are not disabled on the MDS, new
files may use them. To prevent MDS allocating any files on
OSTNNNN run 'lctl --device %{fsname}-OSTNNNN-osc deactivate'
on the MDS.
Continue? (y/n)
This situation is improved in Lustre 2.4 and 2.5 - open files are migrated
"in place" and transparently to applications, though it isn't yet able to
migrate files that are actively being modified (it should leave the file in
place if it detects the file is modified during migration).
Cheers, Andreas