The only message I see occurs on the MDS at the time I'm mounting the client:
Lustre: MGS: non-config logname received: params
Kevin
-----Original Message-----
From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr@utk.edu]
Sent: Thursday, January 22, 2015 12:37 PM
To: Kevin M. Hildebrand
Cc: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Lustre networking issues, multi-homed servers
Are there any Lustre error messages on the server side?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On Jan 22, 2015, at 11:03 AM, Kevin M. Hildebrand <kevin(a)umd.edu> wrote:
Hello, I just upgraded a Lustre 1.8.7 installation to version 2.5.3.
The Lustre servers are connected via IB and Ethernet, some of the clients have both
networks, and some of the clients have Ethernet only.
I'm having a problem where the Ethernet-only clients appear to be attempting to
contact the servers via their IB addresses, and are failing to do so. As far as I can
tell the NIDS are correct on servers and clients, so I'm not sure where things are
going wrong. The Ethernet networks are 10.100.* and the IB network is 192.168.* below.
MDS/MGS:
# lctl list_nids
192.168.129.250@o2ib
10.100.129.250@tcp
OSSes:
# lctl list_nids
192.168.129.252@o2ib
10.100.129.252@tcp
# lctl list_nids
192.168.129.249@o2ib
10.100.129.249@tcp
# lctl list_nids
192.168.129.251@o2ib
10.100.129.251@tcp
Client:
# lctl list_nids
10.100.135.131@tcp
On the client:
# mount -t lustre 10.100.129.250@tcp:/lustre_1 /lustre
# df
<HANGS>
Jan 22 10:49:01 compute-f09-1 kernel: Lustre: Lustre: Build Version:
2.5.3-RC1--PRISTINE-2.6.32-504.3.3.el6.x86_64
Jan 22 10:49:01 compute-f09-1 kernel: Lustre: client wants to enable acl, but mdt not!
Jan 22 10:49:01 compute-f09-1 kernel: Lustre: Layout lock feature supported.
Jan 22 10:49:01 compute-f09-1 kernel: Lustre: Mounted lustre_1-client
Jan 22 10:49:06 compute-f09-1 kernel: Lustre:
5545:0:(client.c:1918:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent
delay: [sent 1421941741/real 0] req@ffff8808042bac00 x1491013983010920/t0(0)
o8->lustre_1-OST000a-osc-ffff880823397400@192.168.129.249@tcp:28/4 lens 400/544 e 0 to
1 dl 1421941746 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Jan 22 10:49:14 compute-f09-1 kernel: LustreError:
5630:0:(llite_lib.c:1624:ll_statfs_internal()) obd_statfs fails: rc = -5
For some reason the client appears to be trying to connect to the 192.168 (IB) address,
even though it's not one of its networks.
Can someone please shed some light as to what I'm missing?
Thanks,
Kevin
---
Kevin Hildebrand
University of Maryland Division of IT
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss