Hi all,
running Lustre 2.5.3, we have two MDSes at 10.20.0.25@o2ib1 and 10.20.0.20@o2ib1,
accessing some
shared storage.
On 10.20.0.25, I have formatted an MGS and MDT with
mkfs.lustre --mgs /dev/mapper/mpathb
and
mkfs.lustre --mdt --fsname=nyx1 --index=0 --mgsnode=10.20.0.25@o2ib1
--mgsnode=10.20.0.20@o2ib1
/dev/mapper/mpathd
Both mounted cleanly.
On an OSS, I formatted an OST
mkfs.lustre --reformat --ost --backfstype=zfs --fsname=nyx1
--mgsnode=10.20.0.25@o2ib1:10.20.0.20@o2ib1 --index=$IND osspool0/ost0 raidz2
/dev/mapper/...
Mounted cleanly, as did the fs on a client.
Then I umounted both MDT and MGS on 10.20.0.25 and mounted them on the failover
10.20.0.20.
This seems to have worked, although ptlrpc_expire_one_request() keeps complaining about
network
errors which only mention the 'dead' nid 10.20.0.25@o2ib1. But the log also
states
Lustre: nyx1-MDT0000: used disk, loading
and the listing
/proc/fs/lustre/devices is complete.
That is, also the OSS reconnected, its log says
Evicted from MGS (at MGC10.20.0.25@o2ib1_1) after server handle
changed
MGC10.20.0.25@o2ib1: Connection restored to MGS (at 10.20.0.20@o2ib1)
This seems to
be o.k.
However the client is stuck. Its log shows the same two messages,
Evicted from MGS (at MGC10.20.0.25@o2ib1_1) after server handle
changed
MGC10.20.0.25@o2ib1: Connection restored to MGS (at 10.20.0.20@o2ib1)
but followed by
mgc: cannot find uuid by nid 10.20.0.20@o2ib1
Process recover log nyx1-cliir error -2
Correspondingly, there is no Lustre access on this client,
nyx1-MDT0000-mdc-ffff880536afec00: check error: Resource temporarily
unavailable
I have of course played around with the specification of nids on the mkfs-commandline,
colon-separated
statement of mgsnodes, adding both nids as servicenodes to the MGS-format, - nada.
The only Jira I could find for this error message "mgc: cannot find uuid by nid"
is LU-5950.
This has "Fix Version/s: Lustre 2.7.0" - no failover prior to that? - hard to
believe ;-)
Any idea where I messed up?
Thanks,
Thomas
--
--------------------------------------------------------------------
Thomas Roth IT-HPC-Linux
Location: SB3 1.262 Phone: +49-6159-71 1453
http://twitter.com/gsi_it