Seems something is wrong with sdc2. What's smart tell you? any notices about it in dmesg?

On Thu, May 7, 2015 at 8:53 AM, Kurt Strosahl <strosahl@jlab.org> wrote:
Good Morning,

     We recently had an ost encounter an issue with what appears to be its journal...  The ost is sitting as a partition atop a raid6 array, which was rebuilding due to a failed disk.  The ost has a journal on an external mirrored disk.  We unmounted the ost, and ran  the following: e2fsck -y -C 0 /dev/sdc2 -j /dev/sdd5

     After that we remounted the ost, and as soon as the first client tried to write to it after recover it went back to read-only.  We unmounted it again, ran e2fsck again, and again it flipped to read-only the second writes tried to go to it (I had set it to read only in the mds, and let it sit for a few minutes before setting it back to read/write to make sure that it was only on a write that the problem happened).

May  7 10:28:48  kernel:
May  7 10:28:48  kernel: Aborting journal on device sdd5.
May  7 10:28:48  kernel: LDISKFS-fs (sdc2): Remounting filesystem read-only
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_mb_free_blocks: IO failure
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write: Journal has aborted
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write: Journal has aborted
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_ext_remove_space: Journal has aborted
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write: Journal has aborted
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_orphan_del: Journal has aborted
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_reserve_inode_write: Journal has aborted
May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in ldiskfs_ext_truncate: Journal has aborted
May  7 10:28:48  kernel: LustreError: 2436:0:(filter_log.c:174:filter_recov_log_unlink_cb()) error destroying object 2760722: -30
May  7 10:28:48  kernel: LustreError: 2434:0:(llog_cat.c:441:llog_cat_process_thread()) llog_cat_process() failed -30
May  7 10:28:58  kernel: LustreError: 8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) can't get handle for 47 credits: rc = -30
May  7 10:28:58  kernel: LustreError: 8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) Skipped 54 previous similar messages
May  7 10:28:58  kernel: LustreError: 8791:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30
May  7 10:28:59  kernel: LustreError: 5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start()) error starting handle for op 4 (108 credits): rc -30
May  7 10:28:59  kernel: LustreError: 5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start()) Skipped 18 previous similar messages
May  7 10:29:03  kernel: LustreError: 8793:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30
May  7 10:29:07  kernel: LustreError: 8711:0:(filter_io_26.c:705:filter_commitrw_write()) error starting transaction: rc = -30

w/r,
Kurt J. Strosahl
System Administrator
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss