I have a Lustre server running Lustre 1.6.3. Yesterday, on the OST (RAID5), I had to replace and rebuild a failed disk.

 

Since, when I mount the OST, I get the following message in the system logfile and the system crashes with a kernel panic message.

 

Apr  3 16:40:56 fn3 kernel: kjournald starting.  Commit interval 5 seconds

Apr  3 16:40:56 fn3 kernel: LDISKFS FS on sdc1, internal journal

Apr  3 16:40:56 fn3 kernel: LDISKFS-fs: recovery complete.

Apr  3 16:40:56 fn3 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.

Apr  3 16:40:56 fn3 kernel: kjournald starting.  Commit interval 5 seconds

Apr  3 16:40:56 fn3 kernel: LDISKFS FS on sdc1, internal journal

Apr  3 16:40:56 fn3 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.

Apr  3 16:40:56 fn3 kernel: LDISKFS-fs: file extents enabled

Apr  3 16:40:56 fn3 kernel: LDISKFS-fs: mballoc enabled

Apr  3 16:40:56 fn3 kernel: Lustre: ost_num_threads module parameter is deprecated, use oss_num_threads instead or unset both for dynamic thread startup

Apr  3 16:40:57 fn3 kernel: Lustre: Filtering OBD driver; info@clusterfs.com

Apr  3 16:40:57 fn3 kernel: LustreError: 134-6: Trying to start OBD home1fs-OST0000_UUID using the wrong disk xV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RmX4^RxV4^RmX4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4

^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4

^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4

^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4

^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^R@M| n. Were the /dev/ assignments rearranged?

 

I tried to run fsck on the OST

 

fsck /dev/sdc1

 

and got the following message

 

fsck 1.40.11.sun1 (17-June-2008)

fsck: fsck.ext4: not found

fsck: Error 2 while executing fsck.ext4 for /dev/sdc1

 

Looks like fsck considers the filesystem as being an ext4 while it is an ext3. So I ran e2fsck manually

 

e2fsck –y /dev/sdc1

 

which completed successfully except for this message:  “ Primary superblock features different from backup”.

 

I tried again to mount the OST with no success. The system still crashes with the kernel panic message and the

same Lustre error message when trying to start OBD.

 

I get the following from tunefs.lustre:

 

tunefs.lustre /dev/sdc1

 

checking for existing Lustre data: found CONFIGS/mountdata

Reading CONFIGS/mountdata

 

   Read previous values:

Target:     home1fs-OST0000

Index:      0

Lustre FS:  home1fs

Mount type: ldiskfs

Flags:      0x442

              (OST  )

Persistent mount opts: errors=remount-ro,extents,mballoc

Parameters: mgsnode=172.17.15.35@tcp

 

 

   Permanent disk data:

Target:     home1fs-OST0000

Index:      0

Lustre FS:  home1fs

Mount type: ldiskfs

Flags:      0x442

              (OST  )

Persistent mount opts: errors=remount-ro,extents,mballoc

Parameters: mgsnode=172.17.15.35@tcp

 

Writing CONFIGS/mountdata

 

Can it be a superblock corruption?

 

Denis Charland, B. Ing.

Administrateur de Systèmes Linux / Linux Systems Administrator

Automobile / Automotive

Conseil national de recherches Canada / National Research Council Canada

75 de Mortagne, Boucherville, Québec, Canada, J4B 6Y4

Tél. / Phone : (450) 641-5078, Téléc. / Fax : (450) 641-5106

denis.charland@cnrc-nrc.gc.ca