Good Morning,
We recently had an ost encounter an issue with what appears to be its journal... The
ost is sitting as a partition atop a raid6 array, which was rebuilding due to a failed
disk. The ost has a journal on an external mirrored disk. We unmounted the ost, and ran
the following: e2fsck -y -C 0 /dev/sdc2 -j /dev/sdd5
After that we remounted the ost, and as soon as the first client tried to write to it
after recover it went back to read-only. We unmounted it again, ran e2fsck again, and
again it flipped to read-only the second writes tried to go to it (I had set it to read
only in the mds, and let it sit for a few minutes before setting it back to read/write to
make sure that it was only on a write that the problem happened).
May 7 10:28:48 kernel:
May 7 10:28:48 kernel: Aborting journal on device sdd5.
May 7 10:28:48 kernel: LDISKFS-fs (sdc2): Remounting filesystem read-only
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_mb_free_blocks: IO
failure
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write:
Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write:
Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_ext_remove_space:
Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write:
Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_orphan_del: Journal has
aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write:
Journal has aborted
May 7 10:28:48 kernel: LDISKFS-fs error (device sdc2) in ldiskfs_ext_truncate: Journal
has aborted
May 7 10:28:48 kernel: LustreError:
2436:0:(filter_log.c:174:filter_recov_log_unlink_cb()) error destroying object 2760722:
-30
May 7 10:28:48 kernel: LustreError: 2434:0:(llog_cat.c:441:llog_cat_process_thread())
llog_cat_process() failed -30
May 7 10:28:58 kernel: LustreError:
8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) can't get handle for 47
credits: rc = -30
May 7 10:28:58 kernel: LustreError:
8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) Skipped 54 previous similar
messages
May 7 10:28:58 kernel: LustreError: 8791:0:(filter_io_26.c:705:filter_commitrw_write())
error starting transaction: rc = -30
May 7 10:28:59 kernel: LustreError: 5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start())
error starting handle for op 4 (108 credits): rc -30
May 7 10:28:59 kernel: LustreError: 5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start())
Skipped 18 previous similar messages
May 7 10:29:03 kernel: LustreError: 8793:0:(filter_io_26.c:705:filter_commitrw_write())
error starting transaction: rc = -30
May 7 10:29:07 kernel: LustreError: 8711:0:(filter_io_26.c:705:filter_commitrw_write())
error starting transaction: rc = -30
w/r,
Kurt J. Strosahl
System Administrator
Scientific Computing Group, Thomas Jefferson National Accelerator Facility