Hi all,
the specification of several '--mgsnode=' failover NIDS in OST formatting does not
work in my test
case. Using ':' to separate the NIDs works.
Seems for the first time I have a failover pair of MDSes with Lustre 2.x.
I formatted with:
MDS:~# mkfs.lustre --mgs --mdt --fsname=testfs --index=0 --servicenode=10.20.0.2@o2ib0
--servicenode=10.20.0.3@o2ib0 /dev/md0
No trouble mounting (the MGS/MDT is on 10.20.0.2).
My test-OSS is a single box, so I did
OSS:~# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=testfs --index=0 -param
ost.quota_type=ug --mgsnode=10.20.0.2@o2ib --mgsnode=10.20.0.3@o2ib lpool-oss/ost1 raidz2
dm-1 dm-2
dm-3 dm-4 ...
Permanent disk data:
Target: testfs:OST0000
Index: 0
Lustre FS: testfs
Mount type: zfs
Flags: 0x62
(OST first_time update )
Persistent mount opts:
Parameters: ost.quota_type=ug mgsnode=10.20.0.2@o2ib mgsnode=10.20.0.3@o2ib
mkfs_cmd = zfs create -o canmount=off -o xattr=sa lpool-oss/ost0
Writing lpool-oss/ost0 properties
lustre:version=1
lustre:flags=98
lustre:index=0
lustre:fsname=testfs
lustre:svname=testfs:OST0000
lustre:ost.quota_type=ug
lustre:mgsnode=10.20.0.2@o2ib
lustre:mgsnode=10.20.0.3@o2ib
This didn't mount, claiming the MGS on 10.20.0.3@o2ib wasn't up - which is
correct.
I checked with tunefs.lustre:
OSS:~# tunefs.lustre --dryrun lpool-oss/ost0
checking for existing Lustre data: found
Read previous values:
Target: testfs-OST0000
Index: 0
Lustre FS: testfs
Mount type: zfs
Flags: 0x62
(OST first_time update )
Persistent mount opts:
Parameters: mgsnode=10.20.0.3@o2ib ost.quota_type=ug
...
So indeed, the information about the first MGS NID (--mgsnode=10.20.0.2@o2ib) was lost
somehow.
Changing the failover NID specification to 'colon-syntax' does the trick:
OSS:~# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=testfs --index=0 --param
ost.quota_type=ug --mgsnode=10.20.0.2@o2ib:10.20.0.3@o2ib lpool-oss/ost0 raidz2 dm-1 dm-2
dm-3 ...
Mounts smoothly.
The example in the manual suggests that the doubling of '--mgsnode' would still
work:
https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifa...
A flaw in the manual, or did I miss something?
Regards,
Thomas