Hi,
I installed RHEL 6.3 which by default comes with openib (Infiniband support
software). I needed MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso to be
installed and so I extracted and ran ./mlnxofedinstall script. It removed
the old Mellanox installables and drivers for a while but couldnt install
due to lustre kernel. In curious, I installed it on default kernel. It went
fine on default RHEL kernel. and While I rebooted and tried to get lustre
up through lctl network up it threw error on MDS:
[root@oss2 ~]# modprobe lustre
WARNING: Error inserting fld
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko):
Input/output error
WARNING: Error inserting fid
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
Input/output error
WARNING: Error inserting mdc
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
Input/output error
WARNING: Error inserting osc
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
Input/output error
WARNING: Error inserting lov
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
Input/output error
FATAL: Error inserting lustre
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
Input/output error
[root@oss2 ~]# modprobe lnet
[root@oss2 ~]# lctl network up
LNET configure error 100: Network is down
Now it seemed I need to build OFED for lustre and I got this thread:
http://thr3ads.net/lustre-discuss/2012/12/2164117-problem-with-installing...
It suggested to do the following steps and I followed it line by line:
2. boot into the lustre kernel
3. in our /usr/src/lustre-2.1.2 directory built lustre against the
Mellanox "Module.symvers" information (which is why you see the
"Input/Output" errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of
the aforementioned items, the lustre.ko. The MLNX version 1.8.5 that
we needed was in the /usr/src/ofa_kernel directory (with the
Module.symvers etc....) We used the defaults other than the o2ib so
our command in the /usr/src/lustre-2.1.2 directory looked like
"./configure --with-o2ib=/usr/src/ofa_kernel"
4. next we issued "make"
5. next we chose to run a "make rpms" command so that we could have
rpms for our system for cluster re-building
But even this failed to get my lustre up.modprobe lnet work but lctl
network up doesnt.
lctl list_nids
IOC_LIBCFS_GET_NI error 100: Network is down
My ifconfig shows:
ib0 Link encap:InfiniBand HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::202:c903:b:8b85/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:2628 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:155819 (152.1 KiB) TX bytes:3503 (3.4 KiB)
uname -arn
Linux oss2 2.6.32-279.14.1.el6_lustre.x86_64 #1 SMP Fri Dec 14 23:22:17 PST
2012 x86_64 x86_64 x86_64 GNU/Linux
I did tried running kernel support script under MLNX directory and it did
installed RPM but no luck with lctl list_nids . Can anyone suggest how to
fix it?