We saw similar behaviour after a MDT upgrade from 1.8.9 to 2.1.5, but
somewhat different error messages. We discovered we have to synchronize
user uids & group gids between our clients and MDT servers to honour
file ownership and access controls.
regards,
chris hunter
All,
we have got a test file system which had been created with Lustre 1.8
(or even 1.6), then briefly updated to 2.3, 2.4.1 and now to 2.4.2. On
this file system we now have a few directories that are inaccessible
after the latest upgrade. I believe they were accessible when we were
still running 2.4.1 but I'm not sure.
All clients are currently running 1.8.9.
Trying to ls one of the directories does generate an error on the
command line, but nothing in any of the system logs that I could find.
[bnh65367@p60-storage ~]$ ls -l /mnt/play01 |grep p60
ls: cannot access /mnt/play01/p45: No such file or directory
ls: cannot access /mnt/play01/p60: No such file or directory
d?????????? ? ? ? ? ? p60
[bnh65367@p60-storage ~]$ ls -l /mnt/play01/p60
ls: cannot access /mnt/play01/p60: No such file or directory
[bnh65367@p60-storage ~]$
Trying to touch one of the missing directories results in this on the
MDS and an input output error on the client command line.
Feb 11 19:13:23 cs04r-sc-mds02-03 kernel: LustreError:
14367:0:(mdt_open.c:1694:mdt_reint_open()) play01-MDT0000: name p60
present, but fid [0x45828f:0x7f3b41ef:0x0] invalid
I'm currently trying to understand if this is something that is
expected? Something we're likely to see if we upgrade directly from 1.8
to 2.4.2 on our production file systems? And of course we need to fix
it. To me it looks like LU-3934 could be related, though if I understand
that bug correctly, it should be fixed? Maybe it'll fix itself (by
automatically starting OI scrub?)?
Is this sufficiently different from LU-3934 and unexpected that I should
open a new ticket?
The file system has been upgrade a few hours ago, lctl get_param
'osd-ldiskfs.\*.oi_scrub on the MDS reports the status init for both MDT
and MGT (see below), does this mean it hasn't been started and I should
start it? How would I start it?