We are running Lustre 2.7.0.
# uname -r
2.6.32-504.8.1.el6_lustre.x86_64
The combined mgsmdt load jumped up yesterday, and stayed high
since, with a couple of really outrageous peaks. Ended up power
cycling, as the mdt would not umount. It seems to be performing
fine now, but while watching logs, I am seeing a fair number of
these now in /var/log/messages
2015-06-10T20:25:51-04:00 mdtmgs.aglt2.org kernel: [ 936.535168]
LustreError: 3932:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk:
17180113246, new: 17180113245 replay: 0. see LU-617.
2015-06-10T20:27:24-04:00 mdtmgs.aglt2.org kernel: [ 1029.720722]
LustreError: 4038:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk:
17180141036, new: 17180141035 replay: 0. see LU-617.
2015-06-10T20:33:54-04:00 mdtmgs.aglt2.org kernel: [ 1419.740272]
LustreError: 3892:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk:
17180255177, new: 17180255176 replay: 0. see LU-617.
2015-06-10T20:35:38-04:00 mdtmgs.aglt2.org kernel: [ 1524.040242]
LustreError: 3926:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk:
17180285251, new: 17180285250 replay: 0. see LU-617.
So, I found a couple of LU that seem relevant, but this older one
best replays the same kind of errors.
https://jira.hpdd.intel.com/browse/LU-5283
This one also popped up in a search.
https://jira.hpdd.intel.com/browse/LU-5939
It bothers me in particular because it says Critical Bug in 2.7.0,
solved for 2.8.0
What, if anything, should I be doing about this? Should I worry
that I will lose my mdt? I might not ever be able to return to my
office if that happens.
Thanks,
bob