Just to inform you that while I use the default openib which comes with
RHEL 6.3, it just showed me 10Gbps and not 40Gbps which QDR should show.
On Tue, May 7, 2013 at 4:57 PM, linux freaker <linuxfreaker(a)gmail.com>wrote:
Hi Enrico,
I did ran mlnx_add_kernel_support.sh first but still it dint work.
Then I tried recompiling lustre-modules with o2ib but no luck.
I can try it once more but really I feel its painful.
Cant I have 40Gbps with openib (which comes by default in RHEL 6.3)?
On Tue, May 7, 2013 at 4:46 PM, Enrico Tagliavini <
enrico.tagliavini(a)ichec.ie> wrote:
> **
>
> Hi,
>
> your procedure is correct, but you first need to recompile Mellanox OFED
> against your kernel using mlnx_add_kernel_support.sh from the .iso.
>
>
>
> This will generate a new iso which will install correctly against the
> lustre kernel. After that you have to recompile (at least) lustre-modules
> using --with-o2ib=/usr/src/ofa_kernel configure switch as you did.
>
>
>
> I don't want to question your choice, but since this procedure is a pain
> [you have to redo everything again for security updates] you should
> consider again if you can work with the stock IB stack. On my side I never
> saw a performance difference between the two, but if someone did, feel free
> to share :).
>
>
>
> Kind regards
>
> Enrico Tagliavini
>
>
>
> On Tuesday 07 May 2013 16:34:41 linux freaker wrote:
>
> Hi,
>
>
> I installed RHEL 6.3 which by default comes with openib (Infiniband
> support software). I needed MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso
> to be installed and so I extracted and ran ./mlnxofedinstall script. It
> removed the old Mellanox installables and drivers for a while but couldnt
> install due to lustre kernel. In curious, I installed it on default kernel.
> It went fine on default RHEL kernel. and While I rebooted and tried to get
> lustre up through lctl network up it threw error on MDS:
>
>
> [root@oss2 ~]# modprobe lustre
>
> WARNING: Error inserting fld
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko):
> Input/output error
>
> WARNING: Error inserting fid
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
> Input/output error
>
> WARNING: Error inserting mdc
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
> Input/output error
>
> WARNING: Error inserting osc
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
> Input/output error
>
> WARNING: Error inserting lov
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
> Input/output error
>
> FATAL: Error inserting lustre
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
> Input/output error
>
> [root@oss2 ~]# modprobe lnet
>
> [root@oss2 ~]# lctl network up
>
> LNET configure error 100: Network is down
>
>
> Now it seemed I need to build OFED for lustre and I got this thread:
>
http://thr3ads.net/lustre-discuss/2012/12/2164117-problem-with-installing...
>
>
> It suggested to do the following steps and I followed it line by line:
>
>
> 2. boot into the lustre kernel
>
> 3. in our /usr/src/lustre-2.1.2 directory built lustre against the
>
> Mellanox "Module.symvers" information (which is why you see the
>
> "Input/Output" errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of the
aforementioned items, the lustre.ko. The MLNX version 1.8.5 that we needed was in the
/usr/src/ofa_kernel directory (with the
>
> Module.symvers etc....) We used the defaults other than the o2ib so
>
> our command in the /usr/src/lustre-2.1.2 directory looked like
>
> "./configure --with-o2ib=/usr/src/ofa_kernel"
>
> 4. next we issued "make"
>
> 5. next we chose to run a "make rpms" command so that we could have rpms
for our system for cluster re-building
>
>
> But even this failed to get my lustre up.modprobe lnet work but lctl network up
doesnt.
>
>
> lctl list_nids
>
> IOC_LIBCFS_GET_NI error 100: Network is down
>
>
>
>
> My ifconfig shows:
>
>
> ib0 Link encap:InfiniBand HWaddr
> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>
> inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
>
> inet6 addr: fe80::202:c903:b:8b85/64 Scope:Link
>
> UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
>
> RX packets:2628 errors:0 dropped:0 overruns:0 frame:0
>
> TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
>
> collisions:0 txqueuelen:1024
>
> RX bytes:155819 (152.1 KiB) TX bytes:3503 (3.4 KiB)
>
>
> uname -arn
>
> Linux oss2 2.6.32-279.14.1.el6_lustre.x86_64 #1 SMP Fri Dec 14 23:22:17
> PST 2012 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
>
> I did tried running kernel support script under MLNX directory and it did
> installed RPM but no luck with lctl list_nids . Can anyone suggest how to
> fix it?
>
>
>
>
>
>
>
>
>
>
>
>