I think we've gotten to the root cause for the issue we've been seeing
at LLNL, so before I waste your time with gathering debug data I'd like
to point you at the discussion on the JIRA ticket. There's a good
explanation of the issue in this comment:
https://jira.hpdd.intel.com/browse/LU-3029?focusedCommentId=56903&pag...
Are you running old 1.8 clients which do *not* support FMODE_32BITHASH?
My guess is you are running a mix of clients with FMODE_32BITHASH, and
FMODE_64BITHASH. If that is the case, you are likely seeing the same
issue as us. A fix will be proposed soon which would require an MDS
downtime.
--
Cheers, Prakash
On Wed, Apr 24, 2013 at 01:22:43PM +0200, Götz Waschk wrote:
On Tue, Apr 23, 2013 at 6:26 PM, Prakash Surya
<surya1(a)llnl.gov> wrote:
> Understood. Are you willing to run a systemtap script on the machine?
> Since I can't umount our production FS easily, I've been using systemtap
> to get additional information out of the system.
Dear Prakash,
yes, please send me the script.
Regards, Götz