I would start by checking out the network layer to make sure that everything is setup and
functioning as expected.
1) Make sure all nodes have the correct IP address configured
2) Run "lctl list_nids" on all nodes to make sure that the LNET layer has setup
all the interfaces
3) Run "ping $IPADDR" and "lctl ping $IPADDR@tcp0" tests between nodes
to check that communication is working
You may have already checked those things out, and it may seem like I am pointing out the
obvious. But I have seen several lustre problems that ended up being caused by simple
problems like an IP address being configured on the wrong interface or a typo in modprobe
config file. So the above checks are almost always my starting point whenever I encounter
a lustre problem.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On Mar 18, 2014, at 7:51 AM, Dennis Zheleznyak <dennis(a)eshkol.com.co>
wrote:
Hi all,
I mounted MGS/MDT and OST successfully using service lustre start.
When I'm trying to mount to the client I'm receiving an error:
hms-01 - MGS(/dev/sda5), MDT(/dev/sdb)
hms-03 - OST(/dev/sda5-14)
hms-05 - client
All have lnet module loaded.
All have /etc/modprobe/lustre.conf updated.
Command:
mkdir -p /mnt/lustre/fs01
mount.lustre hms-01@tcp0:/fs01 /mnt/lustre/fs01/
Error:
[root@hms-05 ~]# mount.lustre hms-01@tcp0:/fs01 /mnt/lustre/fs01/
mount.lustre: mount hms-01@tcp0:/fs01 at /mnt/lustre/fs01 failed: No such file or
directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
/var/log/messages:
Mar 18 13:19:41 hms-05 kernel: Lustre: Lustre: Build Version:
2.4.3-RC1--PRISTINE-2.6.32-358.23.2.el6.x86_64
Mar 18 13:19:41 hms-05 kernel: LustreError: 1211:0:(ldlm_lib.c:429:client_obd_setup())
can't add initial connection
Mar 18 13:19:41 hms-05 kernel: LustreError: 1211:0:(obd_config.c:572:class_setup()) setup
fs01-OST0005-osc-ffff88001cd24c00 failed (-2)
Mar 18 13:19:41 hms-05 kernel: LustreError:
1211:0:(obd_config.c:1553:class_config_llog_handler()) MGC132.65.56.201@tcp: cfg command
failed: rc = -2
Mar 18 13:19:41 hms-05 kernel: Lustre: cmd=cf003 0:fs01-OST0005-osc
1:fs01-OST0005_UUID 2:0@<0:0>
Mar 18 13:19:41 hms-05 kernel: LustreError: 15c-8: MGC132.65.56.201@tcp: The
configuration from log 'fs01-client' failed (-2). This may be the result of
communication errors between this node and the MGS, a bad configuration, or other errors.
See the syslog for more information.
Mar 18 13:19:41 hms-05 kernel: LustreError: 1191:0:(llite_lib.c:1042:ll_fill_super())
Unable to process log: -2
Mar 18 13:19:41 hms-05 kernel: LustreError: 1191:0:(obd_config.c:619:class_cleanup())
Device 4 not setup
Mar 18 13:19:41 hms-05 kernel: Lustre: Unmounted fs01-client
Mar 18 13:19:41 hms-05 kernel: LustreError: 1191:0:(obd_mount.c:1289:lustre_fill_super())
Unable to mount (-2)
Mar 18 13:21:22 hms-05 kernel: LustreError: 1242:0:(ldlm_lib.c:429:client_obd_setup())
can't add initial connection
Mar 18 13:21:22 hms-05 kernel: LustreError: 1242:0:(obd_config.c:572:class_setup()) setup
fs01-OST0005-osc-ffff88001f9b5000 failed (-2)
Mar 18 13:21:22 hms-05 kernel: LustreError:
1242:0:(obd_config.c:1553:class_config_llog_handler()) MGC132.65.56.201@tcp: cfg command
failed: rc = -2
Mar 18 13:21:22 hms-05 kernel: Lustre: cmd=cf003 0:fs01-OST0005-osc
1:fs01-OST0005_UUID 2:0@<0:0>
Mar 18 13:21:22 hms-05 kernel: LustreError: 15c-8: MGC132.65.56.201@tcp: The
configuration from log 'fs01-client' failed (-2). This may be the result of
communication errors between this node and the MGS, a bad configuration, or other errors.
See the syslog for more information.
Mar 18 13:21:22 hms-05 kernel: LustreError: 1234:0:(llite_lib.c:1042:ll_fill_super())
Unable to process log: -2
Mar 18 13:21:22 hms-05 kernel: LustreError: 1234:0:(obd_config.c:619:class_cleanup())
Device 4 not setup
Mar 18 13:21:22 hms-05 kernel: Lustre: Unmounted fs01-client
Mar 18 13:21:22 hms-05 kernel: LustreError: 1234:0:(obd_mount.c:1289:lustre_fill_super())
Unable to mount (-2)
I tried the following:
• tunefs.lustre /dev/sdb --writeconf # For the MDT.
• tunefs.lustre /dev/sda5 to 14 --writeconf # For the OST's.
• /etc/hosts /etc/modprobe.d/lustre.conf are identical through out all the machines.
• There's ping through out all the machines.
• All machines can ssh each other passwordless
Thank you,
Dennis.
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss