I see, this is an issue indeed. For me it worked at 4xQDR, so 40Gbps. I can
just suggest to check and try to fix it with the openib vendor stack if you
can't get the Mellanox one to compile with your kernel.
And be sure to check Changelogs on
https://wiki.hpdd.intel.com/display/PUB/HPDD+Wiki+Front+Page to know which
version of OFED should work with your lustre kernel. If it is a custom kernel
this will not help indeed. For clients kernel you have to check the Mellanox
OFED release notes.
Regards
Enrico
On Tuesday 07 May 2013 16:59:47 linux freaker wrote:
Just to inform you that while I use the default openib which comes with RHEL
6.3, it just showed me 10Gbps and not 40Gbps which QDR should show.
On Tue, May 7, 2013 at 4:57 PM, linux freaker <linuxfreaker(a)gmail.com> wrote:
Hi Enrico,
I did ran mlnx_add_kernel_support.sh first but still it dint work.
Then I tried recompiling lustre-modules with o2ib but no luck.
I can try it once more but really I feel its painful.
Cant I have 40Gbps with openib (which comes by default in RHEL 6.3)?
On Tue, May 7, 2013 at 4:46 PM, Enrico Tagliavini <enrico.tagliavini(a)ichec.ie>
wrote:
Hi,
your procedure is correct, but you first need to recompile Mellanox OFED
against your kernel using mlnx_add_kernel_support.sh from the .iso.
This will generate a new iso which will install correctly against the lustre
kernel. After that you have to recompile (at least) lustre-modules using --
with-o2ib=/usr/src/ofa_kernel configure switch as you did.
I don't want to question your choice, but since this procedure is a pain [you
have to redo everything again for security updates] you should consider again
if you can work with the stock IB stack. On my side I never saw a performance
difference between the two, but if someone did, feel free to share :).
Kind regards
Enrico Tagliavini
On Tuesday 07 May 2013 16:34:41 linux freaker wrote:
Hi,
I installed RHEL 6.3 which by default comes with openib (Infiniband support
software). I needed MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso to be
installed and so I extracted and ran ./mlnxofedinstall script. It removed the
old Mellanox installables and drivers for a while but couldnt install due to
lustre kernel. In curious, I installed it on default kernel. It went fine on
default RHEL kernel. and While I rebooted and tried to get lustre up through
lctl network up it threw error on MDS:
[root@oss2 ~]# modprobe lustre
WARNING: Error inserting fld
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko):
Input/output error
WARNING: Error inserting fid
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
Input/output error
WARNING: Error inserting mdc
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
Input/output error
WARNING: Error inserting osc
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
Input/output error
WARNING: Error inserting lov
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
Input/output error
FATAL: Error inserting lustre
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
Input/output error
[root@oss2 ~]# modprobe lnet
[root@oss2 ~]# lctl network up
LNET configure error 100: Network is down
Now it seemed I need to build OFED for lustre and I got this
thread:
http://thr3ads.net/lustre-discuss/2012/12/2164117-problem-with-installing...
It suggested to do the following steps and I followed it line by line:
2. boot into the lustre kernel
3. in our /usr/src/lustre-2.1.2 directory built lustre against the
Mellanox "Module.symvers" information (which is why you see the
"Input/Output" errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of the
aforementioned items, the lustre.ko. The MLNX version 1.8.5 that we needed was in the
/usr/src/ofa_kernel directory (with the
Module.symvers etc....) We used the defaults other than the o2ib so
our command in the /usr/src/lustre-2.1.2 directory looked like
"./configure --with-o2ib=/usr/src/ofa_kernel"
4. next we issued "make"
5. next we chose to run a "make rpms" command so that we could have rpms for
our system for cluster re-building
But even this failed to get my lustre up.modprobe lnet work but lctl network up doesnt.
lctl list_nids
IOC_LIBCFS_GET_NI error 100: Network is down
My ifconfig shows:
ib0 Link encap:InfiniBand HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::202:c903:b:8b85/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:2628 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:155819 (152.1 KiB) TX bytes:3503 (3.4 KiB)
uname -arn
Linux oss2 2.6.32-279.14.1.el6_lustre.x86_64 #1 SMP Fri Dec 14 23:22:17 PST
2012 x86_64 x86_64 x86_64 GNU/Linux
I did tried running kernel support script under MLNX directory and it did
installed RPM but no luck with lctl list_nids . Can anyone suggest how to fix
it?