Hi Gary, Hi Malcolm,
On 02/05/14 20:04, Gary Hagensen wrote:
On 02/04/2014 07:26 PM, Cowe, Malcolm J wrote:
> Hi Gary,
>
> Robinhood should pick up the default archive ID, which can be found
> on the MDT:
>
> lctl get_param mdt.*.hsm.default_archive_id
>
> Otherwise, RH will use archive_id = 0. If it's not picking up the
> Lustre default, then there may be a bug in Robinhood.
>
> An HSM archive ID can be recorded in the Robinhood configuration in
> FileClass definitions, e.g.:
>
> FileClass mjc_lustre_files {
> definition {
> tree == "/lustre/fs01/demo"
> and
> owner == "mjc"
> }
> archive_num = 4;
> }
Note the parameter name has been changed to "archive_id" to make it
homogenous with the name in Lustre.
"archive_num" is still supported for backward compatibility, but I
suggest you use "archive_id".
>
> This is required when there is more than one archive available to
> Lustre and you want to define sets of files that are archived
> somewhere other than the default. Not sure if this would be enough in
> the case you presented.
>
> The archive ID is an attribute of the file, so if the file is
> deleted, so is the archive ID reference. I wouldn't expect the
> changelog to necessarily record the archive ID when a file is deleted
> but for a deferred removal policy to work, the archive ID would need
> to persist somewhere in Robinhood after the original file is deleted.
I was thinking the changelog could have the id when the file was
archived, but I don't see it in the HSM change record. Of coarse,
Robinhood, for requests it makes, could remember what archive it
requested. But there seems to be no path from the "lfs hsm_archive"
command to get the ID into robinhood unless robinhood looks. But
again, I didn't see the archive id in any database table in Robinhood
after doing a "lfs hsm_archive". My main concern is that I don't see a
place for the archive id in the table that is remembering the delayed
removes of deleted files. Like the filesystem, the database appears to
delete all info about the file when it is deleted and creates a record
in the SOFT_RM table in the mysql database in order to remember to do the
hsm_remove at a later time.
Indeed, RH performs HSM_REMOVE with archive_id=0, which
is not appropriate.
But the point is it may not be aware of the archive id of the deleted file
(for example if it has not yet processed the related HSM ARCHIVE changelog)
and it has no way to retrieve it from lustre as the entry no longer exists.
In this case, what about broadcasting the hsm_remove to all archives (in
coordinator),
using a specific archive_id in hsm_remove requests (0, or -1)?
Also, I think there are things to do in Lustre to properly handle file
removals:
- by default, deleting a file should trigger a hsm_remove request to the
copytool, so the entry is automatically cleaned from the archive.
- a tunable should allow disabling this, in order to delay the removal
using a policy engine.
>> Another thing to note is that if you have a
deferred_remove_delay of
>> say 1 day, then delete a file, then change the deferred_remove_delay
>> to 1 hour, only files deleted after the change (using rbh-lhsm -d)
>> will get the new delay. The previously deleted file will wait 1 day to
>> be removed. Not totally unexpected, but should be documented.
I agree.
soft_rm_time field is kind of useless as it can be computed
when applying the hsm_rm policy,
using the current parameter value.
Regard,
Thomas