I have a Lustre server running Lustre 1.6.3. Yesterday, on the OST (RAID5), I had to
replace and rebuild a failed disk.
Since, when I mount the OST, I get the following message in the system logfile and the
system crashes with a kernel panic message.
Apr 3 16:40:56 fn3 kernel: kjournald starting. Commit interval 5 seconds
Apr 3 16:40:56 fn3 kernel: LDISKFS FS on sdc1, internal journal
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: recovery complete.
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Apr 3 16:40:56 fn3 kernel: kjournald starting. Commit interval 5 seconds
Apr 3 16:40:56 fn3 kernel: LDISKFS FS on sdc1, internal journal
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: file extents enabled
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: mballoc enabled
Apr 3 16:40:56 fn3 kernel: Lustre: ost_num_threads module parameter is deprecated, use
oss_num_threads instead or unset both for dynamic thread startup
Apr 3 16:40:57 fn3 kernel: Lustre: Filtering OBD driver; info(a)clusterfs.com
Apr 3 16:40:57 fn3 kernel: LustreError: 134-6: Trying to start OBD home1fs-OST0000_UUID
using the wrong disk
xV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RmX4^RxV4^RmX4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^R@M| n.
Were the /dev/ assignments rearranged?
I tried to run fsck on the OST
fsck /dev/sdc1
and got the following message
fsck 1.40.11.sun1 (17-June-2008)
fsck: fsck.ext4: not found
fsck: Error 2 while executing fsck.ext4 for /dev/sdc1
Looks like fsck considers the filesystem as being an ext4 while it is an ext3. So I ran
e2fsck manually
e2fsck -y /dev/sdc1
which completed successfully except for this message: " Primary superblock features
different from backup".
I tried again to mount the OST with no success. The system still crashes with the kernel
panic message and the
same Lustre error message when trying to start OBD.
I get the following from tunefs.lustre:
tunefs.lustre /dev/sdc1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: home1fs-OST0000
Index: 0
Lustre FS: home1fs
Mount type: ldiskfs
Flags: 0x442
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.15.35@tcp
Permanent disk data:
Target: home1fs-OST0000
Index: 0
Lustre FS: home1fs
Mount type: ldiskfs
Flags: 0x442
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.15.35@tcp
Writing CONFIGS/mountdata
Can it be a superblock corruption?
Denis Charland, B. Ing.
Administrateur de Systèmes Linux / Linux Systems Administrator
Automobile / Automotive
Conseil national de recherches Canada / National Research Council Canada
75 de Mortagne, Boucherville, Québec, Canada, J4B 6Y4
Tél. / Phone : (450) 641-5078, Téléc. / Fax : (450) 641-5106
denis.charland@cnrc-nrc.gc.ca<mailto:denis.charland@cnrc-nrc.gc.ca>