Thanks, Andreas.
That did the trick.
My format command lines now read
> mkfs.lustre --mgs --servicenode=10.20.0.20@o2ib1
>--servicenode=10.20.0.25@o2ib1 /dev/mapper/mpathb
> mkfs.lustre --mdt --fsname=nyx1 --index=0 --servicenode=10.20.0.20@o2ib1
--servicenode=10.20.0.25@o2ib1 --mgsnode=10.20.0.25@o2ib1
--mgsnode=10.20.0.20@o2ib1 /dev/mapper/mpathd
Guess one should try to get such an extended example into the manual.
The Lustre User Manual is open for user contributions. Please see:
On 02/19/2015 07:29 AM, Dilger, Andreas wrote:
> While it looks like you specified multiple interface NIDs for the MGS,
>you didn't specify the backup NID for the MDT and OST. You need to use
>the --servicenode or --failnode option to register the backup server in
>advance, so clients will know to check there when the primary is gone.
>
> Cheers, Andreas
>
>> On Feb 18, 2015, at 12:18, Thomas Roth <t.roth(a)gsi.de> wrote:
>>
>> Hi all,
>>
>> running Lustre 2.5.3, we have two MDSes at 10.20.0.25@o2ib1 and
>>10.20.0.20@o2ib1, accessing some
>> shared storage.
>> On 10.20.0.25, I have formatted an MGS and MDT with
>>> mkfs.lustre --mgs /dev/mapper/mpathb
>> and
>>> mkfs.lustre --mdt --fsname=nyx1 --index=0 --mgsnode=10.20.0.25@o2ib1
>>>--mgsnode=10.20.0.20@o2ib1
>>> /dev/mapper/mpathd
>>
>> Both mounted cleanly.
>> On an OSS, I formatted an OST
>>> mkfs.lustre --reformat --ost --backfstype=zfs --fsname=nyx1
>>> --mgsnode=10.20.0.25@o2ib1:10.20.0.20@o2ib1 --index=$IND
>>>osspool0/ost0 raidz2 /dev/mapper/...
>> Mounted cleanly, as did the fs on a client.
>>
>>
>> Then I umounted both MDT and MGS on 10.20.0.25 and mounted them on the
>>failover 10.20.0.20.
>> This seems to have worked, although ptlrpc_expire_one_request() keeps
>>complaining about network
>> errors which only mention the 'dead' nid 10.20.0.25@o2ib1. But the log
>>also states
>>> Lustre: nyx1-MDT0000: used disk, loading
>> and the listing /proc/fs/lustre/devices is complete.
>> That is, also the OSS reconnected, its log says
>>> Evicted from MGS (at MGC10.20.0.25@o2ib1_1) after server handle
>>>changed
>>> MGC10.20.0.25@o2ib1: Connection restored to MGS (at 10.20.0.20@o2ib1)
>> This seems to be o.k.
>>
>> However the client is stuck. Its log shows the same two messages,
>>> Evicted from MGS (at MGC10.20.0.25@o2ib1_1) after server handle
>>>changed
>>> MGC10.20.0.25@o2ib1: Connection restored to MGS (at 10.20.0.20@o2ib1)
>>
>> but followed by
>>
>>> mgc: cannot find uuid by nid 10.20.0.20@o2ib1
>>> Process recover log nyx1-cliir error -2
>>
>> Correspondingly, there is no Lustre access on this client,
>>> nyx1-MDT0000-mdc-ffff880536afec00: check error: Resource temporarily
>>>unavailable
>>
>>
>>
>> I have of course played around with the specification of nids on the
>>mkfs-commandline, colon-separated
>> statement of mgsnodes, adding both nids as servicenodes to the
>>MGS-format, - nada.
>>
>> The only Jira I could find for this error message "mgc: cannot find
>>uuid by nid" is LU-5950.
>> This has "Fix Version/s: Lustre 2.7.0" - no failover prior to that? -
>>hard to believe ;-)
>>
>>
>> Any idea where I messed up?
>>
>>
>> Thanks,
>> Thomas
>>
>> --
>> --------------------------------------------------------------------
>> Thomas Roth IT-HPC-Linux
>> Location: SB3 1.262 Phone: +49-6159-71 1453
>>
>>
>>
http://twitter.com/gsi_it
>> _______________________________________________
>> HPDD-discuss mailing list
>> HPDD-discuss(a)lists.01.org
>>
https://lists.01.org/mailman/listinfo/hpdd-discuss
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Dr.-Ing. Jürgen Henschel
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division