On Mon, May 06, 2013 at 02:51:16PM -0500, Mi Zhou wrote:
Hi,
We're using whamcloud lustre version 2.1.2 on CentOS 6.2.
The load on the MDS is ~150, and it is throwing this error:
May 6 10:15:31 lustre-mds01 kernel: LustreError:
6535:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile
0x200da5:0xc9cbdad6: rc -116
May 6 10:15:31 lustre-mds01 kernel: LustreError:
6535:0:(llog_cat.c:174:llog_cat_id2handle()) error opening log id
0x200da5:c9cbdad6: rc -116
May 6 10:15:31 lustre-mds01 kernel: LustreError:
6535:0:(llog_cat.c:335:llog_cat_cancel_records()) Cannot find log 0x200da5
May 6 10:15:31 lustre-mds01 kernel: LustreError:
6535:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile
0x200da5:0xc9cbdad6: rc -116
May 6 10:15:31 lustre-mds01 kernel: LustreError:
6535:0:(llog_cat.c:174:llog_cat_id2handle()) error opening log id
0x200da5:c9cbdad6: rc -116
May 6 10:15:31 lustre-mds01 kernel: LustreError:
6535:0:(llog_cat.c:335:llog_cat_cancel_records()) Cannot find log 0x200da5
May 6 10:15:37 lustre-mds01 kernel: LustreError:
6509:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile
0x200da5:0xc9cbdad6: rc -116
May 6 10:15:37 lustre-mds01 kernel: LustreError:
6509:0:(llog_lvfs.c:616:llog_lvfs_create()) Skipped 1 previous similar
message
May 6 10:15:37 lustre-mds01 kernel: LustreError:
6509:0:(llog_cat.c:174:llog_cat_id2handle()) error opening log id
0x200da5:c9cbdad6: rc -116
May 6 10:15:37 lustre-mds01 kernel: LustreError:
6509:0:(llog_cat.c:174:llog_cat_id2handle()) Skipped 1 previous similar
message
May 6 10:15:37 lustre-mds01 kernel: LustreError:
6509:0:(llog_cat.c:335:llog_cat_cancel_records()) Cannot find log 0x200da5
May 6 10:15:37 lustre-mds01 kernel: LustreError:
6509:0:(llog_cat.c:335:llog_cat_cancel_records()) Skipped 1 previous
similar message
Does this mean the log corrupted?
This means a catalog record being cancelled refers to a file that is
no longer on disk. We had a similar issue:
https://jira.hpdd.intel.com/browse/LU-1749
There are multiple possible causes for this to happen (i.e. disk
corruption + fsck, bugs, etc). In any case the MDT should continue to
function in the face of such missing files. It appears that
llog_cat_cancel_records() will not cancel the record when it gets back
ENOENT or ESTALE, so I'm guessing you will continue to see these
messages. We should open a new issue so a patch can be created to
handle the errors.
Ned
Thanks
Mi
Email Disclaimer:
www.stjude.org/emaildisclaimer
Consultation Disclaimer:
www.stjude.org/consultationdisclaimer
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss