Thanks, Andrew. Unfortunately, not the same thing we're seeing.
We have not seen the exact issue you described. However, Cray uses custom shutdown
scripts, which may be the difference there. Still, though, a look at the lustre-start.in
script suggests the shutdown process boils down to unmounting just unmounting the targets,
which isn't any different than what we do. It's possible the order we unmount the
various servers is different.
Also, Cray is not using ZFS, which could well be a factor.
It'd be interesting to know if Rob is using ZFS as well.
--
Patrick Farrell
Developer, IO File Systems
Cray, Inc.
________________________________________
From: Andrew Wagner [andrew.wagner(a)ssec.wisc.edu]
Sent: Tuesday, December 17, 2013 3:29 PM
To: Patrick Farrell; hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Infiniband & Lustre Module Unloading on RHEL 6.4
In my case, I'm seeing this occur on OSS's not unloading IB modules. The
errors we're seeing are the IB modules themselves failing to unload at
shutdown/rdma service stop due to remnant Lustre modules leaving them in
use. The OSTs seem to unmount without issue.
Andrew Wagner
Research System Administrator
Technical Computing
UW-Space Science and Engineering
AOSS Room 439
On 12/17/13 2:10 PM, Patrick Farrell wrote:
Andrew, Rob,
We may be seeing some related behavior here as well.
Can you give any more details about the errors you're getting on
shutdown/unmount?
Our problem manifested itself as the MDS not being able to unmount,
because it's waiting for communication from the clients, while
unmounting/shutting down. (Eventually, messages about hung threads
appear on the MDS.) It may not be the sane thing (we're seeing it
with 2.5 and have only begun seeing it recently), but it is similar
and happening on systems using IB.
--
Patrick Farrell
Developer, IO File Systems
Cray, Inc.
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss