Dear Adrian,
Kudos for your insight. I will check if there are firmware updates to storage
arrays/controller.
Please let me know if you found anything else in the syslog that is worth paying attention
to.
Best Regards,
Amit
-----Original Message-----
From: Adrian Ulrich [mailto:adrian@blinkenlights.ch]
Sent: Friday, August 16, 2013 12:23 AM
To: Kumar, Amit
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Redo for recoverable error ?
Hello Amit,
Although I see when it remounted file system as read-only, looking
into the logs before remount does not reveal anything about what went
wrong.
No: The syslog clearly shows that ldiskfs found some filesystem corruption:
Aug 6 18:57:00 diskarray2 kernel: LDISKFS-fs error (device sdb):
ldiskfs_valid_block_bitmap: Invalid block bitmap - block_group = 30176, block = 988807170
Aug 6 18:57:00 diskarray2 kernel: LDISKFS-fs error (device sdb):
ldiskfs_valid_block_bitmap: Invalid block bitmap - block_group = 30176, block = 988807170
Aug 6 18:57:00 diskarray2 kernel: Aborting journal on device sdb-8.
Aug 6 18:57:00 diskarray2 kernel: LDISKFS-fs (sdb): Remounting filesystem read-only
This corruption might be old, but you should still investigate why it happened.
A normal kernel crash shouldn't cause such an error. This is most likely a bug in your
storage controller.
I got the OST back to healthy state now, by running e2fsck and fixing
3 minor errors.
Glad it worked for you, but i would still be very sceptical about the sanity of your
storage device :-)
Regards,
Adrian
--
RFC 1925:
(11) Every old idea will be proposed again with a different name and
a different presentation, regardless of whether it works.