Thanks, Andreas.
That did the trick.
My format command lines now read
mkfs.lustre --mgs --servicenode=10.20.0.20@o2ib1
--servicenode=10.20.0.25@o2ib1 /dev/mapper/mpathb
mkfs.lustre --mdt --fsname=nyx1 --index=0 --servicenode=10.20.0.20@o2ib1
--servicenode=10.20.0.25@o2ib1 --mgsnode=10.20.0.25@o2ib1 --mgsnode=10.20.0.20@o2ib1
/dev/mapper/mpathd
Guess one should try to get such an extended example into the manual.
Cheers,
Thomas
On 02/19/2015 07:29 AM, Dilger, Andreas wrote:
While it looks like you specified multiple interface NIDs for the
MGS, you didn't specify the backup NID for the MDT and OST. You need to use the
--servicenode or --failnode option to register the backup server in advance, so clients
will know to check there when the primary is gone.
Cheers, Andreas
> On Feb 18, 2015, at 12:18, Thomas Roth <t.roth(a)gsi.de> wrote:
>
> Hi all,
>
> running Lustre 2.5.3, we have two MDSes at 10.20.0.25@o2ib1 and 10.20.0.20@o2ib1,
accessing some
> shared storage.
> On 10.20.0.25, I have formatted an MGS and MDT with
>> mkfs.lustre --mgs /dev/mapper/mpathb
> and
>> mkfs.lustre --mdt --fsname=nyx1 --index=0 --mgsnode=10.20.0.25@o2ib1
--mgsnode=10.20.0.20@o2ib1
>> /dev/mapper/mpathd
>
> Both mounted cleanly.
> On an OSS, I formatted an OST
>> mkfs.lustre --reformat --ost --backfstype=zfs --fsname=nyx1
>> --mgsnode=10.20.0.25@o2ib1:10.20.0.20@o2ib1 --index=$IND osspool0/ost0 raidz2
/dev/mapper/...
> Mounted cleanly, as did the fs on a client.
>
>
> Then I umounted both MDT and MGS on 10.20.0.25 and mounted them on the failover
10.20.0.20.
> This seems to have worked, although ptlrpc_expire_one_request() keeps complaining
about network
> errors which only mention the 'dead' nid 10.20.0.25@o2ib1. But the log also
states
>> Lustre: nyx1-MDT0000: used disk, loading
> and the listing /proc/fs/lustre/devices is complete.
> That is, also the OSS reconnected, its log says
>> Evicted from MGS (at MGC10.20.0.25@o2ib1_1) after server handle changed
>> MGC10.20.0.25@o2ib1: Connection restored to MGS (at 10.20.0.20@o2ib1)
> This seems to be o.k.
>
> However the client is stuck. Its log shows the same two messages,
>> Evicted from MGS (at MGC10.20.0.25@o2ib1_1) after server handle changed
>> MGC10.20.0.25@o2ib1: Connection restored to MGS (at 10.20.0.20@o2ib1)
>
> but followed by
>
>> mgc: cannot find uuid by nid 10.20.0.20@o2ib1
>> Process recover log nyx1-cliir error -2
>
> Correspondingly, there is no Lustre access on this client,
>> nyx1-MDT0000-mdc-ffff880536afec00: check error: Resource temporarily unavailable
>
>
>
> I have of course played around with the specification of nids on the
mkfs-commandline, colon-separated
> statement of mgsnodes, adding both nids as servicenodes to the MGS-format, - nada.
>
> The only Jira I could find for this error message "mgc: cannot find uuid by
nid" is LU-5950.
> This has "Fix Version/s: Lustre 2.7.0" - no failover prior to that? - hard
to believe ;-)
>
>
> Any idea where I messed up?
>
>
> Thanks,
> Thomas
>
> --
> --------------------------------------------------------------------
> Thomas Roth IT-HPC-Linux
> Location: SB3 1.262 Phone: +49-6159-71 1453
>
>
>
http://twitter.com/gsi_it
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Dr.-Ing. Jürgen Henschel
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt