Hi,
I would like to report about a strange issue I have seen with HP's HPC
SFS G3.2-3 running an early version of Lustre 1.8, (I believe it was
1.8.3) on Centos 5.3.
lustre_config could create the ldiskfs on the ext2 level, but then
exited with the following message on all OSSes:
mkfs.lustre FATAL: failed to write local files
I could mount the OSTs as ldiskfs, but there was no CONFIGS directory.
When I tried to create one manually, the node crashed immediately, but I
have found the following entry in /var/log/messages:
Mar 18 10:51:15 sfs3 kernel: LDISKFS-fs error (device dm-1):
ldiskfs_ext_find_extent: :463: bad header in inode #229302273: invalid
magic - magic 0, entries 0, max 0(0), depth 0(0)
I upgraded e2fsprogs to the latest version, but again, lustre_conrfig
failed. I ran an e2fsck on the osts, which seemed to fix the problem
according to its output, and a second run of e2fsck confirmed that
everything was ok. However, when I mounted the OST as ldiskfs again and
tried to create the CONFIGS directory the node crashed again.
The solution to this problem was: Installing lustre 1.8.9 from
whamcloud. After that I hit another minor issue
(
https://jira.hpdd.intel.com/browse/LU-4789), but after fixing that one
lustre_config completed successfully. Either this was a bug which was
fixed between 1.8.3 and 1.8.9, or there was an issue with an expired
license which caused the lustre shipped with the SFS to behave
strangely. I believe the second explanation is the better one, because
the exact same thing has worked some years ago and has stopped working
after the support contract has ended (installed software has remained
unchanged since then).
best regards,
Martin