I ran into a checksum error on a directory and the move process exited. Any workarounds
for that?
Thank you,
Amit
-----Original Message-----
From: hpdd-discuss-bounces(a)lists.01.org [mailto:hpdd-discuss-bounces@lists.01.org] On
Behalf Of Kumar, Amit
Sent: Wednesday, October 16, 2013 1:21 AM
To: Dilger, Andreas
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] lfs find stuck
Andreas,
Thank you for your reply. I looked at the strace output, and I see that the lfs find
command tries to open a file and then does a bunch of things and moves onto the next file.
Although the script does not reach the "Done" section of the code for a single
file.
I just ran fsck on few ost's that I want to move data off of, and have it mounted
back, I will run the migrations scripts again and find out if I can get any further.
Regards,
Amit
-----Original Message-----
From: Dilger, Andreas [mailto:andreas.dilger@intel.com]
Sent: Tuesday, October 15, 2013 1:22 PM
To: Kumar, Amit
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] lfs find stuck
On 2013/10/11 3:50 PM, "Kumar, Amit" <ahkumar(a)mail.smu.edu> wrote:
Here is where the script is stuck: Script is the one in this
document:
http://wiki.lustre.org/manual/LustreManual18_HTML/LustreOperatingTips.h
tml # ./MOVE.sh -O hpc-OST0031_UUID /lustre
+ CKSUM=md5sum
+ getopts O: opt -O hpc-OST0031_UUID /lustre case $opt in OST_PARAM='
+ -O hpc-OST0031_UUID'
+ getopts O: opt -O hpc-OST0031_UUID /lustre shift 2 MVDIR=/lustre '['
+ 1 -ne 1 -o '!' -d /lustre ']'
+ read OLDNAME
+ lfs find -type f -O hpc-OST0031_UUID /lustre
The above Process does not proceed any further?
It is traversing the filesystem looking for files on this OST. If you stop and start the
script it has to re-start the scan of the filesystem, so it will not do find any files at
the beginning.
You could also verify it is actually doing something by stracing this
process:
strace -p $(pidof lfs)
and CTRL-C to stop the strace (that doesn't affect lfs).
I have multiple OST¹s being migrated at the same time, could this be
an
issue?
You can specify multiple source OSTs for migrating at one time, to avoid the need to scan
the filesystem multiple times. That reduces load on the MDS and speeds up the scanning.
Unfortunately, this would also require restarting your scan, so it is unlikely be faster
than leaving the current scans running.
Cheers, Andreas
From: hpdd-discuss-bounces(a)lists.01.org
[mailto:hpdd-discuss-bounces@lists.01.org]
On Behalf Of Kumar, Amit
Sent: Friday, October 11, 2013 4:25 PM
To: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] lfs find stuck
Another strange thing is when I am running the migrate script, the
usage on OST is increasing instead of decreasing, I have not seen this
behavior before, hope somebody can throw some light on this.
Regards,
Amit
From:hpdd-discuss-bounces@lists.01.org
[mailto:hpdd-discuss-bounces@lists.01.org]
On Behalf Of Kumar, Amit
Sent: Friday, October 11, 2013 3:44 PM
To: hpdd-discuss(a)lists.01.org
Subject: [HPDD-discuss] lfs find stuck
Dear All,
I have deactivated an OST, so that I can migrate data off of it.
For some strange reason when I try to run the migration script against
this one OST, I do not see it doing anything for almost eternity.
When I the same process with strace, I see it is trying to open files
and move to next file. But I do not see it moving any files at all,
this OST is 60% full, so I know for sure file exists.
Can anybody help me understand this.
Q) Do you think it can take hours together to find the first file to move?
Any tips to debug this issue?
Thank you,
Amit
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss