Hello all,
I've recently started working with Lustre and setting up a couple of new
filesystems on RHEL 6.4 w/ Lustre 2.4 from the ZFS repository (we're
using Lustre on ZFS) with an Infiniband networking infrastructure using
OpenIB from the RedHat repositories.
I've encountered a problem that I'm curious if anyone else has
encountered. When shutting down machines with Lustre OSTs mounted on
them, the default shutdown scripts cause a hang when the OpenIB modules
begin to unload. This is due to the Lustre/LNET stop scripts not
completely unloading Lustre modules. While investigating, I discovered
that the following sequence would successfully unload the Lustre modules
such that IB modules could also unload:
1. Stop Lustre
2. Stop LNET (Outputs "ERROR: Module osc has non-zero reference count.")
3. Run lustre_rmmod (Outpus "Modules still loaded:
lnet/klnds/o2iblnd/ko2iblnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o
4. Stop LNET again to unload the three remaining modules.
I've written this into a shutdown script, which works as a solution, but
does not address the underlying problem.
Has anyone else seen this behavior?
--
Andrew Wagner
Research System Administrator
Technical Computing
UW-Space Science and Engineering
AOSS Room 439