I knew I had it somewherehttp://lists.lustre.org/pipermail/lustre-discuss/2012-November/016988.html
Mike
On Tue, Sep 30, 2014 at 10:32 AM, Mike Ware <charnobyl3000@gmail.com> wrote:
I had a similar issue using the Mellanox packages. If i remember correctly I had to recompile the drivers against the Lustre kernel for the install. I believe Mellanox had an article on this but I don't have the link.
Mike
On Tue, Sep 30, 2014 at 8:07 AM, Parinay Kondekar <parinay.kondekar@seagate.com> wrote:
_______________________________________________IMO you should try out strace to see if anything is noticed.
"Write failed: Broken pipe" is quite common message and difficult to conclude anything with.
Regards
parinay
On Tue, Sep 30, 2014 at 8:16 PM, aayush agrawal <aayush.agrawal@calsoftinc.com> wrote:
Hi Parinay,
Yes, I see ib0 in output of ifconfig -a.
I also tried with options lnet networks=o2ib0(ib0) but no luck.
While loading lnet I do see error in var/log/messages:
kernel: LNet: HW CPU cores: 32, npartitions: 4
alg: No test for crc32 (crc32-table)
kernel: alg: No test for adler32 (adler32-zlib)
kernel: alg: No test for crc32 (crc32-pclmul)
kernel: padlock: VIA PadLock Hash Engine not detected.
modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32_358/kernel/drivers/crypto/padlock-sha.ko): No such device
But as per below link this should not be a problem?
https://jira.hpdd.intel.com/browse/LU-1599
modprobe lnet completes successfully and I see "Write failed: Broken pipe" after running "lctl network up" and after this session gets logout from the server.
Thanks,
Aayush.
On 9/30/2014 7:21 PM, Parinay Kondekar wrote:
- what is the output of 'ifconfig -a' , do you see ib0 there ? mentioning 'options lnet networks=o2ib0(ib0)' should be enough.
- anything in syslog ?
HTH
On Tue, Sep 30, 2014 at 6:03 PM, aayush agrawal <aayush.agrawal@calsoftinc.com> wrote:
Hi,
I am trying to build lustre 2.5.0 against MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on CentOS6.4 with kernel version 2.6.32-358.
But I am not able to set lnet config settings properly. I used settings suggested in lustre 2.x manual. But then not able to get network up using lctl.
Details:
I have two server machines, one for mgs+mdt and second for oss and one client machine. I want to setup Infiniband on all these machines.
I could run below steps successfully for all the three machines:
1. Run script mlnxofedinstall
# ./mlnxofedinstall -vvv --add-kernel-support --without-32bit --without-fw-update --hpc
2. Restart openibd service
# /etc/init.d/openibd restart
3. configure ib0 interface.
4. configure lustre with o2ib
# ./configure --with-linux=Path_to_linux-2.6.32-358.18.1.el6 --with-o2ib=/usr/src/ofa_kernel/default/
5. make lustre rpms:
# make rpms
This gave me below compilation error
I looked online for this error and found bug registered on the same: https://jira.hpdd.intel.com/browse/LU-4266
Below patch from above link solved the problem and hence I could build lustre rpms:
http://review.whamcloud.com/#/c/8451/1
Now first I want to do the Infiniband setup for mgs and mdt on single machine which also has Ethernet IP. Then I want to format and mount mgs and mdt.
So I installed above created lustre rpms and then added below line in /etc/modprobe.d/lustre.conf
options lnet networks=o2ib(ib0)
Then I rebooted the machine to remove all lustre related modules including lnet and then ran modprobe lnet command to add above parameters and the ran lctl network up which is giving me below error:
LNET configure error 100: Network is down
I looked online and found below discussion on same error:
http://lists.lustre.org/pipermail/lustre-discuss/2010-June/013510.html
As per suggestion in above mail I tried with below line in /etc/modprobe.d/lustre.conf. In below command for IB_IP, I have given infiniband IP.
options lnet networks=o2ib(ib0) routes="tcp0 IB_IP@o2ib"
This command hangs for around 2 to 3 minutes and then gives error: Write failed: Broken pipe. Same is the case for "options lnet networks=o2ib(ib0)"
But if I set: options lnet networks=tcp0(eth0),o2ib(ib0) routes="tcp1 IB_IP@o2ib" then it gives LNET configure error 100: Network is down.
It seems that for network=o2ib(ibo) I am getting error Write failed: Broken pipe.
Am I missing anything while following above steps? Or how do I resolve above error?
Thanks,
Aayush.
<html>
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_mailman_listinfo_hpdd-2Ddiscuss&d=AAICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=0hW3r7x0NhgbZ7zgaZKr9K_fk7_E8bs0f-GAlH89rgM&e=
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss