Hello list,
We're attempting to setup a multi-rail lustre configuration so we can
provide access to the same lustre filesystem from two different
clusters. Both clusters are IB connected and we're using o2ib as the
protocol. The lustre servers have multiple IB cards each and are
connected to a separate IB switch on each cluster.
On cluster 1: we've got the following (simplified) setup:
[root@mdt-3-40 ~]# lctl list_nids
172.16.1.113@o2ib
172.16.2.113@o2ib2
[root@oss-0-19 ~]# lctl list_nids
172.16.1.103@o2ib
172.16.2.103@o2ib1
Clients from cluster 1 can ping these servers:
[root@compute-1-5 data]# ping -c 1 172.16.1.113
PING 172.16.1.113 (172.16.1.113) 56(84) bytes of data.
64 bytes from 172.16.1.113: icmp_seq=1 ttl=64 time=2.04 ms
--- 172.16.1.113 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.049/2.049/2.049/0.000 ms
[root@compute-1-5 data]# ping -c 1 172.16.1.103
PING 172.16.1.103 (172.16.1.103) 56(84) bytes of data.
64 bytes from 172.16.1.103: icmp_seq=1 ttl=64 time=2.07 ms
--- 172.16.1.103 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.079/2.079/2.079/0.000 ms
Clients from cluster 1 can also ping them using lnet:
[root@compute-1-5 data]# lctl ping mdt-3-40.ib@o2ib0
12345-0@lo
12345-172.16.1.113@o2ib
12345-172.16.2.113@o2ib2
[root@compute-1-5 data]# lctl ping oss-0-19.ib@o2ib0
12345-0@lo
12345-172.16.1.103@o2ib
12345-172.16.2.103@o2ib1
Finally clients from cluster1 can mount the lustre filesystem:
[root@compute-1-5 data]# lfs check servers
data-MDT0000-mdc-ffff81021df32800: active
data-OST0000-osc-ffff81021df32800: active
On cluster 2: clients can ping the IPOIB addresses for these servers:
[root@compute-1-1 ~]# ping -c 1 172.16.2.113
PING 172.16.2.113 (172.16.2.113) 56(84) bytes of data.
64 bytes from 172.16.2.113: icmp_seq=1 ttl=64 time=0.096 ms
--- 172.16.2.113 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.096/0.096/0.096/0.000 ms
[root@compute-1-1 ~]# ping -c 1 172.16.2.103
PING 172.16.2.103 (172.16.2.103) 56(84) bytes of data.
64 bytes from 172.16.2.103: icmp_seq=1 ttl=64 time=0.083 ms
--- 172.16.2.103 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.083/0.083/0.083/0.000 ms
Doing an lctl ping fails though:
[root@compute-1-1 ~]# lctl ping 172.16.2.103@o2ib
failed to ping 172.16.2.103@o2ib: Input/output error
[root@compute-1-1 ~]# lctl ping 172.16.2.113@o2ib
failed to ping 172.16.2.113@o2ib: Input/output error
Pinging the client on cluster 2 (from itself) works though (so lnet is
up and working):
[root@compute-1-1 ~]# lctl ping compute-1-1.ib@o2ib
12345-0@lo
12345-172.16.2.247@o2ib
The net result of all of this is that I'm getting messages like these
when I try to mount the lustre filesystem from cluster 2:
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
Here are the configs for the MDS server:
echo 'options lnet ip2nets="o2ib0(ib0),o2ib2(ib2)
172.16.[1-2].*"' > /etc/modprobe.d/lustre.conf
modprobe lnet
mkfs.lustre --reformat --mgs /dev/MDT340/mgs
mount -t lustre /dev/MDT340/mgs /lustre/mgs
mkfs.lustre --reformat --mdt --fsname=data
--mgsnode=mdt-3-40.ib@o2ib,mdt-3-40.coma-ib@o2ib2 /dev/MDT340/data
mount -t lustre /dev/MDT340/data /lustre/data
These are the configs for the OSS server:
echo 'options lnet ip2nets="o2ib0(ib0),o2ib1(ib1)
172.16.[1-2].*"' > /etc/modprobe.d/lustre.conf
mkfs.lustre --reformat --fsname data --mgsnode=mdt-3-40.ib@o2ib,mdt-3-40.coma-ib@o2ib1
--ost /dev/sdb1
mount -t lustre /dev/sdb1 /lustre/data-sdb1
Here's the config for the non-working client:
[root@compute-1-1 ~]# cat /etc/modprobe.d/lustre.conf
options lnet networks=o2ib0(ib0)
[root@compute-1-1 ~]# mount -t lustre warp-mdt-3-40.ib@o2ib0:/data
/root/data
mount.lustre: mount warp-mdt-3-40.ib@o2ib0:/data at /root/data failed: Invalid argument
This may have multiple causes.
Is 'data' the correct filesystem name?
Are the mount options correct?
Check the syslog for more info.
Maybe I'm missing something obvious here. Any suggestions would be
appreciated.
Thanks much.
-Ed Walter
Carnegie Mellon University