On Apr 3, 2017, at 10:43, Kumar, Amit <ahkumar(a)mail.smu.edu> wrote:
Dear Lustre,
Lustre v2.4.3
This past weekend we filled up our change logs as reported by: “changelog failed:
rc=-28”
Looking at the changelog_users showed the following:
# cat /proc/fs/lustre/mdd/scratch-MDT0000/changelog_users
current index: 8822156503
ID index
cl1 4676806604
It seems like we had about 4billion entries. We used this mainly for Robinhood and LMT.
Note that the Lustre ChangeLogs are not meant as a permanent record of all changes to the
filesystem. They are intended as short-term records of recent changes, that should be
processed by the ChangeLog reader as quickly as possible and then cancelled. If you have
4B records then it seems that your ChangeLog readers are not canceling the records
correctly, and the reader(s) should be removed from the system.
There have been a large number of bugs related to ChangeLogs that have been fixed since
2.4.3 was released. I'd recommend to at least upgrade your server to a newer
release.
Cheers, Andreas
I have some observations and questions as follows:
(a) I could not run the deregister command on both MDS and Robinhood server. Here is
what happens on Robinhood server, even though file system is mounted.
# lctl --device scratch-MDT0000 changelog_deregister cl1
No device found for name scratch-MDT0000: Invalid argument
(b) Although I was able to run “lfs changelog_clear scratch-MDT0000 cl1 0” on
Robinhood server.
Q1) Not sure why I was not able to run the “lctl --device scratch-MDT0000
changelog_deregister cl1” it complains with errors on MDS: “error: changelog_deregister:
No such file or directory”
Q2)*** When we hit this issue, files that were in transaction/or being edited shows up as
empty after I cleared changelog? Is this an expected behavior? Or something else is going
on here?
Q3)*** It seems like every file in edit mode during this issue is showing up empty and
complains about “Bad address” if we try to save it or remove it. I understood changelog is
mainly for stats and profiling, data should not be affected if we clear changelogs, does
it?.
Q4) We are setting up our new Lustre-ZFS solution in the next couple of weeks along with
Robinhood and LMT, assuming we may run into this issue in the future, what would you
recommend as far as setting up changelog? Is there a system variable that can be used to
increase or decrease the amount of changelogs that could be stored?
Any help/insight here is greatly appreciated.
Thank you,
Amit
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation