On Lustre 1.6.x kernel panics when mounting an OST can often be fixed by using the
"Dilger Procedure". As I recall it is merely mounting the OST as ldiskfs and
carefully truncating the last received file. A discussion is available on Lustre-discuss
lists. I'd copy and past a link but that crashes my phone...
Dan
"Charland, Denis" <Denis.Charland(a)imi.cnrc-nrc.gc.ca> wrote:
I have a Lustre server running Lustre 1.6.3. Yesterday, on the OST
(RAID5), I had to replace and rebuild a failed disk.
Since, when I mount the OST, I get the following message in the system
logfile and the system crashes with a kernel panic message.
Apr 3 16:40:56 fn3 kernel: kjournald starting. Commit interval 5
seconds
Apr 3 16:40:56 fn3 kernel: LDISKFS FS on sdc1, internal journal
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: recovery complete.
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Apr 3 16:40:56 fn3 kernel: kjournald starting. Commit interval 5
seconds
Apr 3 16:40:56 fn3 kernel: LDISKFS FS on sdc1, internal journal
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: file extents enabled
Apr 3 16:40:56 fn3 kernel: LDISKFS-fs: mballoc enabled
Apr 3 16:40:56 fn3 kernel: Lustre: ost_num_threads module parameter is
deprecated, use oss_num_threads instead or unset both for dynamic
thread startup
Apr 3 16:40:57 fn3 kernel: Lustre: Filtering OBD driver;
info(a)clusterfs.com
Apr 3 16:40:57 fn3 kernel: LustreError: 134-6: Trying to start OBD
home1fs-OST0000_UUID using the wrong disk
xV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RmX4^RxV4^RmX4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4
^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^RxV4^R@M|
n. Were the /dev/ assignments rearranged?
I tried to run fsck on the OST
fsck /dev/sdc1
and got the following message
fsck 1.40.11.sun1 (17-June-2008)
fsck: fsck.ext4: not found
fsck: Error 2 while executing fsck.ext4 for /dev/sdc1
Looks like fsck considers the filesystem as being an ext4 while it is
an ext3. So I ran e2fsck manually
e2fsck -y /dev/sdc1
which completed successfully except for this message: " Primary
superblock features different from backup".
I tried again to mount the OST with no success. The system still
crashes with the kernel panic message and the
same Lustre error message when trying to start OBD.
I get the following from tunefs.lustre:
tunefs.lustre /dev/sdc1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: home1fs-OST0000
Index: 0
Lustre FS: home1fs
Mount type: ldiskfs
Flags: 0x442
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.15.35@tcp
Permanent disk data:
Target: home1fs-OST0000
Index: 0
Lustre FS: home1fs
Mount type: ldiskfs
Flags: 0x442
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.15.35@tcp
Writing CONFIGS/mountdata
Can it be a superblock corruption?
Denis Charland, B. Ing.
Administrateur de Systèmes Linux / Linux Systems Administrator
Automobile / Automotive
Conseil national de recherches Canada / National Research Council
Canada
75 de Mortagne, Boucherville, Québec, Canada, J4B 6Y4
Tél. / Phone : (450) 641-5078, Téléc. / Fax : (450) 641-5106
denis.charland@cnrc-nrc.gc.ca<mailto:denis.charland@cnrc-nrc.gc.ca>
------------------------------------------------------------------------
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss