Am I correct in assuming that there were no network changes during the upgrade, and that
all the clients/servers were configured with the same NIDS before and after the upgrade?
--Rick
On Jan 22, 2015, at 12:41 PM, Kevin M. Hildebrand <kevin(a)umd.edu> wrote:
The only message I see occurs on the MDS at the time I'm mounting
the client:
Lustre: MGS: non-config logname received: params
Kevin
-----Original Message-----
From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr@utk.edu]
Sent: Thursday, January 22, 2015 12:37 PM
To: Kevin M. Hildebrand
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Lustre networking issues, multi-homed servers
Are there any Lustre error messages on the server side?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On Jan 22, 2015, at 11:03 AM, Kevin M. Hildebrand <kevin(a)umd.edu> wrote:
> Hello, I just upgraded a Lustre 1.8.7 installation to version 2.5.3.
>
> The Lustre servers are connected via IB and Ethernet, some of the clients have both
networks, and some of the clients have Ethernet only.
>
> I'm having a problem where the Ethernet-only clients appear to be attempting to
contact the servers via their IB addresses, and are failing to do so. As far as I can
tell the NIDS are correct on servers and clients, so I'm not sure where things are
going wrong. The Ethernet networks are 10.100.* and the IB network is 192.168.* below.
>
> MDS/MGS:
> # lctl list_nids
> 192.168.129.250@o2ib
> 10.100.129.250@tcp
>
> OSSes:
> # lctl list_nids
> 192.168.129.252@o2ib
> 10.100.129.252@tcp
> # lctl list_nids
> 192.168.129.249@o2ib
> 10.100.129.249@tcp
> # lctl list_nids
> 192.168.129.251@o2ib
> 10.100.129.251@tcp
>
> Client:
> # lctl list_nids
> 10.100.135.131@tcp
>
> On the client:
> # mount -t lustre 10.100.129.250@tcp:/lustre_1 /lustre
> # df
> <HANGS>
>
> Jan 22 10:49:01 compute-f09-1 kernel: Lustre: Lustre: Build Version:
2.5.3-RC1--PRISTINE-2.6.32-504.3.3.el6.x86_64
> Jan 22 10:49:01 compute-f09-1 kernel: Lustre: client wants to enable acl, but mdt
not!
> Jan 22 10:49:01 compute-f09-1 kernel: Lustre: Layout lock feature supported.
> Jan 22 10:49:01 compute-f09-1 kernel: Lustre: Mounted lustre_1-client
> Jan 22 10:49:06 compute-f09-1 kernel: Lustre:
5545:0:(client.c:1918:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent
delay: [sent 1421941741/real 0] req@ffff8808042bac00 x1491013983010920/t0(0)
o8->lustre_1-OST000a-osc-ffff880823397400@192.168.129.249@tcp:28/4 lens 400/544 e 0 to
1 dl 1421941746 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
> Jan 22 10:49:14 compute-f09-1 kernel: LustreError:
5630:0:(llite_lib.c:1624:ll_statfs_internal()) obd_statfs fails: rc = -5
>
> For some reason the client appears to be trying to connect to the 192.168 (IB)
address, even though it's not one of its networks.
>
> Can someone please shed some light as to what I'm missing?
>
> Thanks,
> Kevin
> ---
> Kevin Hildebrand
> University of Maryland Division of IT
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss