On Tue, Sep 16, 2014 at 09:28:30AM -0400, Gary Molenkamp wrote:
Using lustre 2.5.3, 1 combined MDS/MDT, 44 OSTs. Currently
containing 120TB
data, over 35M files.
On the weekend, our MDS server crashed due to an IO hang. After restarting the
server, we starting hitting the LU-5040 bug during recovery:
kernel BUG at fs/jbd2/transaction.c:1033!
kernel: invalid opcode: 0000 [#1] SMP
I attempted a restart of all OST and MDT mounts with abort_recov, and the
filesystem was able to mount on a client and all OSTs connected on a client. The
first access to any files or metadata caused the MDS to panic and also show
indications of LU-5392.
Is this is indicating a corrupted quota subsystem? I was trying to find a means
of rebuilding the quota records. However, "lfs quotacheck" is no longer
supported as it states "since space accounting is always enabled".
If the quotas are corrupted, how can I recover them. Likewise, how can I
recover from the two bugs mentioned above? I have some time flexibility to
resolve it, if that would assist in getting the bugs addressed and my filesystem
back online.
if you just want to work around quota bugs for a while, then you can turn
off quotas with
tune2fs -Q ^usrquota /dev/mdt
tune2fs -Q ^grpquota /dev/mdt
and turn them on again later with
tunefs.lustre --quota /dev/mdt
we did this recently on our 2.5.ish filesystem
HTH
cheers,
robin
Any assistance would be appreciated.
Gary.
--
Gary Molenkamp SHARCNET
Systems Administrator University of Western Ontario
Compute/Calcul Canada
http://www.computecanada.org
gary(a)sharcnet.ca
http://www.sharcnet.ca
(519) 661-2111 x88429 (519) 661-4000
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss