Seems something is wrong with sdc2. What's smart tell you? any notices
about it in dmesg?
On Thu, May 7, 2015 at 8:53 AM, Kurt Strosahl <strosahl(a)jlab.org> wrote:
Good Morning,
We recently had an ost encounter an issue with what appears to be its
journal... The ost is sitting as a partition atop a raid6 array, which was
rebuilding due to a failed disk. The ost has a journal on an external
mirrored disk. We unmounted the ost, and ran the following: e2fsck -y -C
0 /dev/sdc2 -j /dev/sdd5
After that we remounted the ost, and as soon as the first client
tried to write to it after recover it went back to read-only. We unmounted
it again, ran e2fsck again, and again it flipped to read-only the second
writes tried to go to it (I had set it to read only in the mds, and let it
sit for a few minutes before setting it back to read/write to make sure
that it was only on a write that the problem happened).
May 7 10:28:48 kernel:
May 7 10:28:48 kernel: Aborting journal on device sdd5.
May 7 10:28:48 kernel: LDISKFS-fs (sdc2): Remounting filesystem read-only
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_mb_free_blocks: IO failure
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_reserve_inode_write: Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_reserve_inode_write: Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_ext_remove_space: Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_reserve_inode_write: Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_orphan_del: Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_reserve_inode_write: Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in
ldiskfs_ext_truncate: Journal has aborted
May 7 10:28:48 kernel: LustreError:
2436:0:(filter_log.c:174:filter_recov_log_unlink_cb()) error destroying
object 2760722: -30
May 7 10:28:48 kernel: LustreError:
2434:0:(llog_cat.c:441:llog_cat_process_thread()) llog_cat_process() failed
-30
May 7 10:28:58 kernel: LustreError:
8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) can't get handle
for 47 credits: rc = -30
May 7 10:28:58 kernel: LustreError:
8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) Skipped 54
previous similar messages
May 7 10:28:58 kernel: LustreError:
8791:0:(filter_io_26.c:705:filter_commitrw_write()) error starting
transaction: rc = -30
May 7 10:28:59 kernel: LustreError:
5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start()) error starting handle
for op 4 (108 credits): rc -30
May 7 10:28:59 kernel: LustreError:
5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start()) Skipped 18 previous
similar messages
May 7 10:29:03 kernel: LustreError:
8793:0:(filter_io_26.c:705:filter_commitrw_write()) error starting
transaction: rc = -30
May 7 10:29:07 kernel: LustreError:
8711:0:(filter_io_26.c:705:filter_commitrw_write()) error starting
transaction: rc = -30
w/r,
Kurt J. Strosahl
System Administrator
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss