All,
We had a power failure last evening and both our MDS (combined mdt /
mgt) and OSS servers went down.
Upon power-up, the MGT and all MDTs mounted correctly. Some of the OSTs
mounted but not all.
I umounted everything and then did an e2fsck on the OSTs that didn't
mount (just a basic "e2fsck <device>"). On one of those OSTs, there was
a corrected inode:
Pass 5: Checking group summary information
Inode bitmap differences: -76225610
Fix<y>? yes
However, the OSTs still wouldn't mount and I was seeing these messages
in the log:
May 21 11:28:06 oss2 lrmd: [3351]: info: RA output:
(lustre-ost5:start:stderr) mount.lustre: mount /dev/mapper/ost_home_5 at
/lustre/home/ost_home_5 failed: No such device or address The target
service failed to start (bad config log?) (/dev/mapper/ost_home_5). See
/var/log/messages.
So I then tried to umount everything and do a "tunefs.lustre
--writeconf" on each device.
Now on mount, I'm seeing the following:
May 21 14:00:19 oss1 kernel: LDISKFS-fs (dm-5): mounted filesystem with
ordered data mode
May 21 14:00:19 oss1 multipathd: dm-5: umount map (uevent)
May 21 14:00:19 oss1 kernel: JBD: barrier-based sync failed on dm-5-8 -
disabling barriers
May 21 14:00:19 oss1 kernel: LDISKFS-fs (dm-5): mounted filesystem with
ordered data mode
May 21 14:00:19 oss1 kernel: Lustre: MGC172.16.11.5@o2ib: Reactivating
import
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1156:server_start_targets()) no server named
home-OST0002 was started
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -6
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1453:server_put_super()) no obd home-OST0002
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(ldlm_request.c:1597:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
May 21 14:00:20 oss1 kernel: JBD: barrier-based sync failed on dm-5-8 -
disabling barriers
May 21 14:00:20 oss1 multipathd: dm-5: umount map (uevent)
May 21 14:00:20 oss1 kernel: Lustre: server umount home-OST0002 complete
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-6)
At this point, I'm not sure what to do next. Any suggestions?
Thanks,
Brian