We're using this init script on our lustre clients:
#!/bin/bash
#
# Bring up/down the kernel lustre stack
#
# chkconfig: - 05 05
# description: Unloads lustre
# config: /etc/sysconfig/lustre.conf
#
### BEGIN INIT INFO
# Provides: lustre
# Default-Stop: 0 1 2 3 4 5 6
# Required-Start:
# Required-Stop:
# Short-Description: unloads the lustre kernel modules
# Description: unloads the lustre kernel modules
### END INIT INFO
. /etc/rc.d/init.d/functions
lockfile="/var/lock/subsys/lustre"
umount_lustre ()
{
for i in `mount |grep "type lustre" |awk {'print $3'}` ; do
umount $i
done
}
# See how we were called.
case "$1" in
start)
touch $lockfile
;;
stop)
echo "unmounting lustre filesystems"
umount_lustre
echo "running lustre_rmmod"
/usr/sbin/lustre_rmmod
rm -f $lockfile
;;
*)
echo $"Usage: lustre start|stop"
exit 1
esac
exit 0
I believe we derived this from the LLNL lustre scripts. It's pretty
sloppy (there should be some error handling to verify that the volumes
really unmounted). For the simple case where you're just rebooting;
this gets the job done on our lustre clients. Given that it umounts any
lustre filesystems that it detects; it should work on your OSS servers
as well.
Note: we mount lustre via fstab so our workflow diverges a bit from the
LLNL scripts.
Hope this helps.
-Ed Walter
Carnegie Mellon University
On 12/17/2013 04:29 PM, Andrew Wagner wrote:
In my case, I'm seeing this occur on OSS's not unloading IB
modules. The
errors we're seeing are the IB modules themselves failing to unload at
shutdown/rdma service stop due to remnant Lustre modules leaving them in
use. The OSTs seem to unmount without issue.
Andrew Wagner
Research System Administrator
Technical Computing
UW-Space Science and Engineering
AOSS Room 439
On 12/17/13 2:10 PM, Patrick Farrell wrote:
> Andrew, Rob,
>
> We may be seeing some related behavior here as well.
>
> Can you give any more details about the errors you're getting on
> shutdown/unmount?
>
> Our problem manifested itself as the MDS not being able to unmount,
> because it's waiting for communication from the clients, while
> unmounting/shutting down. (Eventually, messages about hung threads
> appear on the MDS.) It may not be the sane thing (we're seeing it
> with 2.5 and have only begun seeing it recently), but it is similar
> and happening on systems using IB.
>
> --
> Patrick Farrell
> Developer, IO File Systems
> Cray, Inc.
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss