Forgot the attachment: here is the messages file: look for line number 3442
http://tinyurl.com/kagz38t
Thank you,
Amit
-----Original Message-----
From: hpdd-discuss-bounces(a)lists.01.org [mailto:hpdd-discuss-bounces@lists.01.org] On
Behalf Of Kumar, Amit
Sent: Thursday, August 15, 2013 8:59 AM
To: Adrian Ulrich
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Redo for recoverable error ?
Hi Adrian,
Thank you for your quick response. NO I am not running snow-bird lustre with sw-raid6;
All of our arrays configured with hardware raid6.
I found the disk array in problem. Although I see when it remounted file system as
read-only, looking into the logs before remount does not reveal anything about what went
wrong. I see common processing errors but not more that I can understand to be critical
enough to result in read only mounts.
Remount happened a week back, attached is the copy of log file. Can you please look into
and see if you notice anything unusual.
***Another quick question.
My assumption is I can un-mount the OST in problem while the others are in production
and run the fsck on the ost volumes.
Any specific options I need to pass when I run fsck?
Version of e2fsprogs we have is: e2fsprogs-1.41.10.sun2-0redhat
Anything else do I need to watch out, before I bring this OST back online?
Please advise.
Amit
-----Original Message-----
From: Adrian Ulrich [mailto:adrian@blinkenlights.ch]
Sent: Thursday, August 15, 2013 1:02 AM
To: Kumar, Amit
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Redo for recoverable error ?
LustreError: 10144:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start())
error starting handle for op 8 (71 credits): rc -30
./asm-generic/errno-base.h:#define EROFS 30
mmap IO isn't your issue: It seems that (at least) one OST on your OSS got
automatically remounted read-only due to a filesystem error.
Older log entries on the OST should tell you what went wrong (grep remounting
/var/log/messages*).
The only way to recover from this is by:
- umounting the affected ost(s)
- run a full fsck on the volume(s) (using the latest e2fsprogs fsck)
- mount the volume again IF you are sure that your storage array didn't go bananas
and will not cause any new corruption (btw: are you running a 'snowbird-lustre'
with linux sw-raid6?)
Oh: and tell your users that they should check the return value of write() calls ;-)
Regards,
Adrian
--
RFC 1925:
(11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss