That helps quite a bit actually. I had configured the o2ib network
numbers to match the physical interface numbers... i.e. o2ib0=ib0,
o2ib1=ib1 ... etc (at least one of these servers has 3 ib interfaces so
the network numbers didn't line up). It wasn't readily apparent that
the o2ib network numbers needed to match among peers for them to
communicate.
A quick lnet test with updated module settings (so that the o2ib network
numbers match properly) has lctl ping working as expected. I suspect
that all the other errors were related to the peers inability to
communicate.
Thanks much!
-Ed
On 03/07/2013 03:28 PM, Kit Westneat wrote:
Hi Ed,
The way you have your networks named is a little weird. You should think
as each o2ib network as a separate fabric. Since you have two fabrics,
you should have two o2ib networks, say o2ib for the 172.16.1.x network
and o2ib1 for the 172.16.2.x network.
On the MDS, you could just have:
options lnet networks="o2ib(ib0), o2ib1(ib2)"
OSS:
options lnet networks="o2ib(ib0), o2ib1(ib1)"
Clients connected to 172.16.1.x network:
options lnet networks="o2ib(ib0)"
Clients connected to 172.16.2.x network:
options lnet networks="o2ib1(ib0)"
Then you'd call the clients on the 172.16.2.x network compute-x-y.ib@o2ib1
HTH,
Kit
On 03/07/2013 02:02 PM, Edward Walter wrote:
> Hello list,
>
> We're attempting to setup a multi-rail lustre configuration so we can
> provide access to the same lustre filesystem from two different
> clusters. Both clusters are IB connected and we're using o2ib as the
> protocol. The lustre servers have multiple IB cards each and are
> connected to a separate IB switch on each cluster.
>
> On cluster 1: we've got the following (simplified) setup:
>
>> [root@mdt-3-40 ~]# lctl list_nids
>> 172.16.1.113@o2ib
>> 172.16.2.113@o2ib2
>> [root@oss-0-19 ~]# lctl list_nids
>> 172.16.1.103@o2ib
>> 172.16.2.103@o2ib1
> Clients from cluster 1 can ping these servers:
>
>> [root@compute-1-5 data]# ping -c 1 172.16.1.113
>> PING 172.16.1.113 (172.16.1.113) 56(84) bytes of data.
>> 64 bytes from 172.16.1.113: icmp_seq=1 ttl=64 time=2.04 ms
>>
>> --- 172.16.1.113 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 2.049/2.049/2.049/0.000 ms
>> [root@compute-1-5 data]# ping -c 1 172.16.1.103
>> PING 172.16.1.103 (172.16.1.103) 56(84) bytes of data.
>> 64 bytes from 172.16.1.103: icmp_seq=1 ttl=64 time=2.07 ms
>>
>> --- 172.16.1.103 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 2.079/2.079/2.079/0.000 ms
> Clients from cluster 1 can also ping them using lnet:
>
>> [root@compute-1-5 data]# lctl ping mdt-3-40.ib@o2ib0
>> 12345-0@lo
>> 12345-172.16.1.113@o2ib
>> 12345-172.16.2.113@o2ib2
>> [root@compute-1-5 data]# lctl ping oss-0-19.ib@o2ib0
>> 12345-0@lo
>> 12345-172.16.1.103@o2ib
>> 12345-172.16.2.103@o2ib1
> Finally clients from cluster1 can mount the lustre filesystem:
>
>> [root@compute-1-5 data]# lfs check servers
>> data-MDT0000-mdc-ffff81021df32800: active
>> data-OST0000-osc-ffff81021df32800: active
> On cluster 2: clients can ping the IPOIB addresses for these servers:
>
>> [root@compute-1-1 ~]# ping -c 1 172.16.2.113
>> PING 172.16.2.113 (172.16.2.113) 56(84) bytes of data.
>> 64 bytes from 172.16.2.113: icmp_seq=1 ttl=64 time=0.096 ms
>>
>> --- 172.16.2.113 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.096/0.096/0.096/0.000 ms
>> [root@compute-1-1 ~]# ping -c 1 172.16.2.103
>> PING 172.16.2.103 (172.16.2.103) 56(84) bytes of data.
>> 64 bytes from 172.16.2.103: icmp_seq=1 ttl=64 time=0.083 ms
>>
>> --- 172.16.2.103 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.083/0.083/0.083/0.000 ms
> Doing an lctl ping fails though:
>
>> [root@compute-1-1 ~]# lctl ping 172.16.2.103@o2ib
>> failed to ping 172.16.2.103@o2ib: Input/output error
>> [root@compute-1-1 ~]# lctl ping 172.16.2.113@o2ib
>> failed to ping 172.16.2.113@o2ib: Input/output error
> Pinging the client on cluster 2 (from itself) works though (so lnet is
> up and working):
>
>> [root@compute-1-1 ~]# lctl ping compute-1-1.ib@o2ib
>> 12345-0@lo
>> 12345-172.16.2.247@o2ib
> The net result of all of this is that I'm getting messages like these
> when I try to mount the lustre filesystem from cluster 2:
>
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
>> LustreError: 1998:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect())
>> Can't accept 172.16.2.247@o2ib on 172.16.1.113@o2ib
>> (ib2:1:172.16.2.113): bad dst nid 172.16.2.113@o2ib
> Here are the configs for the MDS server:
>
>> echo 'options lnet ip2nets="o2ib0(ib0),o2ib2(ib2)
172.16.[1-2].*"' >
>> /etc/modprobe.d/lustre.conf
>> modprobe lnet
>> mkfs.lustre --reformat --mgs /dev/MDT340/mgs
>> mount -t lustre /dev/MDT340/mgs /lustre/mgs
>> mkfs.lustre --reformat --mdt --fsname=data
>> --mgsnode=mdt-3-40.ib@o2ib,mdt-3-40.coma-ib@o2ib2 /dev/MDT340/data
>> mount -t lustre /dev/MDT340/data /lustre/data
> These are the configs for the OSS server:
>
>> echo 'options lnet ip2nets="o2ib0(ib0),o2ib1(ib1)
172.16.[1-2].*"' >
>> /etc/modprobe.d/lustre.conf
>> mkfs.lustre --reformat --fsname data
>> --mgsnode=mdt-3-40.ib@o2ib,mdt-3-40.coma-ib@o2ib1 --ost /dev/sdb1
>> mount -t lustre /dev/sdb1 /lustre/data-sdb1
> Here's the config for the non-working client:
>
>> [root@compute-1-1 ~]# cat /etc/modprobe.d/lustre.conf
>> options lnet networks=o2ib0(ib0)
>> [root@compute-1-1 ~]# mount -t lustre warp-mdt-3-40.ib@o2ib0:/data
>> /root/data
>> mount.lustre: mount warp-mdt-3-40.ib@o2ib0:/data at /root/data
>> failed: Invalid argument
>> This may have multiple causes.
>> Is 'data' the correct filesystem name?
>> Are the mount options correct?
>> Check the syslog for more info.
> Maybe I'm missing something obvious here. Any suggestions would be
> appreciated.
>
> Thanks much.
>
> -Ed Walter
> Carnegie Mellon University
>
>
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss