To me the biggest gain with robinhood for scans is it can use the lustre
changelog.
So you pay the "scan price" once, then it's incremental updates as
changes happen. This give near-realtime data without much pain.
It doesn't update stuff like atime by default though - it only does that
when something like a purge may take place. So it sort of does it's own
'lazy' for some stuff. Not sure about file size.
The CEA robinhood folks can give better details probably.
Regards,
Scott
On 2/12/2015 1:39 PM, Meghan McClelland wrote:
Lazy size is interesting. Exposing it only to certain tools would
probably be good as we don't want people depending on it for
production work. That said, I'm curious how Robinhood gets these data?
Does it stat every file, or does it do something more scalable /
integrated? We wouldn't want to use lazy size for purging but I do
wonder if it would help with policy engine database at least some of
the time.
On Tue, Feb 10, 2015 at 1:47 PM, Dilger, Andreas
<andreas.dilger(a)intel.com> wrote:
> I think the "lazy" size is what would be stored directly in the MDS inodes
(possibly in struct som_attrs to avoid confusing i_size on the on-disk inode as happened
with 1.8).
>
> The lazy size would be available via tools like e2scan, lester
(
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ORNL-2DTe...
), and other specialized tools (possibly "lfs find"), but not exposed directly
as the file size to applications via stat, read, write-append.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Software Architect
> Intel High Performance Data Division
>
> On 2015/02/09, 3:16 PM, "Nathan Rutman"
<nathan.rutman@seagate.com<mailto:nathan.rutman@seagate.com>> wrote:
>
> I think "best effort" would be sufficient in 99% of cases, but when not how
would you ask for the "real" vs the "lazy" size?
>
>
> --
> Nathan Rutman · Principal Systems Architect
> Seagate Technology · +1 503 877-9507 · PST
>
> On Sat, Feb 7, 2015 at 5:56 PM, Dilger, Andreas
<andreas.dilger@intel.com<mailto:andreas.dilger@intel.com>> wrote:
> On 2015/02/06, 5:08 PM, "Meghan McClelland"
> <meghan.mcclelland@seagate.com<mailto:meghan.mcclelland@seagate.com>>
wrote:
>
>> Oh no! I had just been talking about using this feature :(
>>
>> Even in it's current form which I agree isn't ideal, I think it could
>> be helpful for a project like fssstats. Fsstats was an open effort to
>> gather and collect filesystems data (see
>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pdsi-2Dscidac.org...
). It has unfortunately somewhat
>> died off due to loss of funding, but I think was a good idea that
>> helped the community, especially researchers.
>
> Meghan, I am also a fan of fsstats, though I'm not sure how that is
> related to this change, unless you had some mechanism to fetch the stats
> directly from the MDT?
>
>> I'd really like to try and revive the open data initiative, and saw
>> som as a possible avenue to start collecting the data.
>
> Ah, so you would fetch the stats directly from the MDT?
>
>> I don't think statahead will provide the information needed but haven't
>> had a chance to look at it.
>
> It would be possible to store a "lazy size on the MDS", which is updated
> on a best-effort basis (and could be repaired in the background by LFSCK).
> This would be sufficient for most operations like filesystem stats, purge
> policies, etc. since it won't matter one way or the other in a histogram
> if the size of some files is off by a few KB, but that matters a great
> deal for reading the data.
>
> Implementing a "lazy size on MDS" would be pretty straight forward I think
> - the client just sends its current file size at close time, and the MDS
> picks the largest one (as it does with atimes). If there is a truncate
> the MDS will get an RPC for this also, so the only risk is during a crash
> and that can be fixed up by LFSCK.
>
> The main difference with the current SOM code is that it wouldn't have any
> complexity during recovery, and it doesn't need to care too much if the
> client crashed or was later evicted by an OST before it wrote all the data
> to disk.
>
> Cheers, Andreas
>
>
>> On Fri, Feb 6, 2015 at 3:33 PM, Dilger, Andreas
>> <andreas.dilger@intel.com<mailto:andreas.dilger@intel.com>> wrote:
>>> The Size on MDT (SOM) feature has been in a prototype state for several
>>> years, with no signs of moving beyond this prototype stage.
>>>
>>> Several problems exist in the code today, primarily that recovery is not
>>> really implemented, yet the existing code adds complexity on the clients
>>> and servers. Without proper recovery, the current code risks file data
>>> loss if the SOM data isn't updated on the MDS consistently with data
>>> writes to the OST.
>>>
>>>
>>> We're planning to remove the SOM code from the master branch as a
>>> result,
>>> tracked under
>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.hpdd.intel.com_
>>>
browse_LU-2D6047-3A&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=VKuhI1_CodqTbWyBg
>>>
Nk0Z5da-Cpzi6WFMl6RJ0M1EeM&m=J0jMwPGPIPnotoqE8FhyhDf07Rh6V4BAMet6Wfh-bqM&
>>> s=TzuY05kGc01qUMWZZluqhikn49_0zzLDVoT8e7igolQ&e=
>>> -
>>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_
>>>
13126&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=VKuhI1_CodqTbWyBgNk0Z5da-Cpzi6W
>>>
FMl6RJ0M1EeM&m=J0jMwPGPIPnotoqE8FhyhDf07Rh6V4BAMet6Wfh-bqM&s=sc9FrYH8KyW_
>>> 9Un3Z-E_HXPRv5DPicF-Mc92PbET6hc&e=
>>> -
>>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_
>>>
13169&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=VKuhI1_CodqTbWyBgNk0Z5da-Cpzi6W
>>>
FMl6RJ0M1EeM&m=J0jMwPGPIPnotoqE8FhyhDf07Rh6V4BAMet6Wfh-bqM&s=Qf7HJzAInFeG
>>> IR7l4VRWc7OEgUX77nhkrptHRFS8Q04&e=
>>> -
>>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_
>>>
13442&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=VKuhI1_CodqTbWyBgNk0Z5da-Cpzi6W
>>>
FMl6RJ0M1EeM&m=J0jMwPGPIPnotoqE8FhyhDf07Rh6V4BAMet6Wfh-bqM&s=v0_uCOoz-ipW
>>> 11UJPZlvLk-REnsF_T1P3KPvKMzQ_lE&e=
>>> -
>>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_
>>>
13443&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=VKuhI1_CodqTbWyBgNk0Z5da-Cpzi6W
>>>
FMl6RJ0M1EeM&m=J0jMwPGPIPnotoqE8FhyhDf07Rh6V4BAMet6Wfh-bqM&s=MhKEsBRfdjlw
>>> rxCQhqhW51tqRDhWMPk5OkfiKD8vOFM&e=
>>>
>>> Some of the performance improvements of SOM have been implemented by
>>> statahead.
>>>
>>> I think a case could be made for a very stripped down SOM to be
>>> implemented in the future, that only deals with single-client writers
>>> and
>>> synchronously invalidates the file size on open-for-write, which isn't
>>> so
>>> bad with flash storage for the MDT as is typical today. The size of
>>> files
>>> that do not get set at initial write or are invalidated by an open can
>>> be
>>> updated asynchronously by LFSCK doing a periodic scan in the background.
>>> Since this stripped-down implementation would have very little to do
>>> with
>>> the current implementation, there isn't much benefit to even trying to
>>> fix
>>> the current code in place.
>>>
>>> I definitely prefer presenting about new features going into Lustre,
>>> but I
>>> also think it is important that people are aware when a semi-feature
>>> like
>>> this is being removed. I don't believe that anyone is actually using
>>> this
>>> feature today, and the reduction in code maintenance and complexity will
>>> help both ongoing maintenance and bug fixing, as well as make it a that
>>> much easier for new developers to understand the code.
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>>
>>> Lustre Software Architect
>>> Intel High Performance Data Division
>>>
>>>
>>> _______________________________________________
>>> HPDD-discuss mailing list
>>> HPDD-discuss@lists.01.org<mailto:HPDD-discuss@lists.01.org>
>>>
>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_mailman
>>>
_listinfo_hpdd-2Ddiscuss&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=VKuhI1_CodqT
>>>
bWyBgNk0Z5da-Cpzi6WFMl6RJ0M1EeM&m=J0jMwPGPIPnotoqE8FhyhDf07Rh6V4BAMet6Wfh
>>> -bqM&s=Sr4-fH-6Rrr9PkCrgC4vhb7ZL-gRj1qm_uDFiT8AV3w&e=
>>
>>
>>
>> --
>> Meghan McClelland · Senior Product Manager
>> Seagate Technology, LLC
>> mobile: +1 (505) 695 0065
>>
www.seagate.com<http://www.seagate.com>
>>
>
>
> Cheers, Andreas
> --
> Andreas Dilger
>
> Lustre Software Architect
> Intel High Performance Data Division
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel@lists.lustre.org<mailto:Lustre-devel@lists.lustre.org>
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_mail...
>