Hi Ed,
The way you have your networks named is a little weird. You should think
as each o2ib network as a separate fabric. Since you have two fabrics,
you should have two o2ib networks, say o2ib for the 172.16.1.x network
and o2ib1 for the 172.16.2.x network.
On the MDS, you could just have:
options lnet networks="o2ib(ib0), o2ib1(ib2)"
OSS:
options lnet networks="o2ib(ib0), o2ib1(ib1)"
Clients connected to 172.16.1.x network:
options lnet networks="o2ib(ib0)"
Clients connected to 172.16.2.x network:
options lnet networks="o2ib1(ib0)"
Then you'd call the clients on the 172.16.2.x network compute-x-y.ib@o2ib1
HTH,
Kit
On 03/07/2013 02:02 PM, Edward Walter wrote:
Hello list,
We're attempting to setup a multi-rail lustre configuration so we can
provide access to the same lustre filesystem from two different
clusters. Both clusters are IB connected and we're using o2ib as the
protocol. The lustre servers have multiple IB cards each and are
connected to a separate IB switch on each cluster.
On cluster 1: we've got the following (simplified) setup:
> [root@mdt-3-40 ~]# lctl list_nids
> 172.16.1.113@o2ib
> 172.16.2.113@o2ib2
> [root@oss-0-19 ~]# lctl list_nids
> 172.16.1.103@o2ib
> 172.16.2.103@o2ib1
Clients from cluster 1 can ping these servers:
> [root@compute-1-5 data]# ping -c 1 172.16.1.113
> PING 172.16.1.113 (172.16.1.113) 56(84) bytes of data.
> 64 bytes from 172.16.1.113: icmp_seq=1 ttl=64 time=2.04 ms
>
> --- 172.16.1.113 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 2.049/2.049/2.049/0.000 ms
> [root@compute-1-5 data]# ping -c 1 172.16.1.103
> PING 172.16.1.103 (172.16.1.103) 56(84) bytes of data.
> 64 bytes from 172.16.1.103: icmp_seq=1 ttl=64 time=2.07 ms
>
> --- 172.16.1.103 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 2.079/2.079/2.079/0.000 ms
Clients from cluster 1 can also ping them using lnet:
> [root@compute-1-5 data]# lctl ping mdt-3-40.ib@o2ib0
> 12345-0@lo
> 12345-172.16.1.113@o2ib
> 12345-172.16.2.113@o2ib2
> [root@compute-1-5 data]# lctl ping oss-0-19.ib@o2ib0
> 12345-0@lo
> 12345-172.16.1.103@o2ib
> 12345-172.16.2.103@o2ib1
Finally clients from cluster1 can mount the lustre filesystem:
> [root@compute-1-5 data]# lfs check servers
> data-MDT0000-mdc-ffff81021df32800: active
> data-OST0000-osc-ffff81021df32800: active
On cluster 2: clients can ping the IPOIB addresses for these servers:
> [root@compute-1-1 ~]# ping -c 1 172.16.2.113
> PING 172.16.2.113 (172.16.2.113) 56(84) bytes of data.
> 64 bytes from 172.16.2.113: icmp_seq=1 ttl=64 time=0.096 ms
>
> --- 172.16.2.113 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.096/0.096/0.096/0.000 ms
> [root@compute-1-1 ~]# ping -c 1 172.16.2.103
> PING 172.16.2.103 (172.16.2.103) 56(84) bytes of data.
> 64 bytes from 172.16.2.103: icmp_seq=1 ttl=64 time=0.083 ms
>
> --- 172.16.2.103 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.083/0.083/0.083/0.000 ms
Doing an lctl ping fails though:
> [root@compute-1-1 ~]# lctl ping 172.16.2.103@o2ib
> failed to ping 172.16.2.103@o2ib: Input/output error
> [root@compute-1-1 ~]# lctl ping 172.16.2.113@o2ib
> failed to ping 172.16.2.113@o2ib: Input/output error
Pinging the client on cluster 2 (from itself) works though (so lnet is
up and working):
> [root@compute-1-1 ~]# lctl ping compute-1-1.ib@o2ib
> 12345-0@lo
> 12345-172.16.2.247@o2ib
The net result of all of this is that I'm getting messages like these
when I try to mount the lustre filesystem from cluster 2:
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Can't accept
172.16.2.247@o2ib on 172.16.1.113@o2ib (ib2:1:172.16.2.113): bad dst nid
172.16.2.113@o2ib
Here are the configs for the MDS server:
> echo 'options lnet ip2nets="o2ib0(ib0),o2ib2(ib2) 172.16.[1-2].*"'
> /etc/modprobe.d/lustre.conf
> modprobe lnet
> mkfs.lustre --reformat --mgs /dev/MDT340/mgs
> mount -t lustre /dev/MDT340/mgs /lustre/mgs
> mkfs.lustre --reformat --mdt --fsname=data
--mgsnode=mdt-3-40.ib@o2ib,mdt-3-40.coma-ib@o2ib2 /dev/MDT340/data
> mount -t lustre /dev/MDT340/data /lustre/data
These are the configs for the OSS server:
> echo 'options lnet ip2nets="o2ib0(ib0),o2ib1(ib1) 172.16.[1-2].*"'
> /etc/modprobe.d/lustre.conf
> mkfs.lustre --reformat --fsname data
--mgsnode=mdt-3-40.ib@o2ib,mdt-3-40.coma-ib@o2ib1 --ost /dev/sdb1
> mount -t lustre /dev/sdb1 /lustre/data-sdb1
Here's the config for the non-working client:
> [root@compute-1-1 ~]# cat /etc/modprobe.d/lustre.conf
> options lnet networks=o2ib0(ib0)
> [root@compute-1-1 ~]# mount -t lustre warp-mdt-3-40.ib@o2ib0:/data /root/data
> mount.lustre: mount warp-mdt-3-40.ib@o2ib0:/data at /root/data failed: Invalid
argument
> This may have multiple causes.
> Is 'data' the correct filesystem name?
> Are the mount options correct?
> Check the syslog for more info.
Maybe I'm missing something obvious here. Any suggestions would be
appreciated.
Thanks much.
-Ed Walter
Carnegie Mellon University
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss