Hello!
"namespace changes" means if the file is renamed or deleted and the like.
None of that depends on client read/writes in any way, so MDS just does not need to
know at all. It is not necessary to hold a metadata lock on a file during IO operations
either,
though it's necessary at the start of the IO to ensure the layout (EA holding
striping information) is valid.
But its not released by the client proactively, rather it's the metadata server
that would let the client know that if the lock is no longer valid and then the client
will contact
metadata server back to obtain the new layout info and the new lock next time
there's IO on that file. The cached data in this case would not change because it is
guarded by OST
locks at that point.
(client cannot monopolize a lock for a long time, once server asks for it, client has a
finite amount of time to release the lock).
There is a write data cache on the clients. It's guarded by ldlm locks issued by
OSTs. Should the OST decides it needs to ensure there's no more cached data remains on
the client, it revokes the locks
prompting the client to flush its cache (and an dirty data on the client would be
written to the OSTs at that time too).
So for a restriping operation it's a multi-step process then:
1. Revoke MDS EA lock and hold it -> this causes all clients to not being able to
start any new IO operations (they'd block trying to get this lock)
2. For all stripes that would need to change, revoke OST data locks -> this causes
all clients having caches for these objects to flush those caches.
3. now it's possible to do whatever with the striping as nobody has any in-progress
operations.
4. release the OST and MDS locks from above and the blocked clients will resume
activity and would refetch the new striping information and resume IO to new striping
layout.
Hopefully this helps.
Bye,
Oleg
On Apr 7, 2015, at 3:09 PM, Akhilesh Gadde wrote:
Hi Oleg,
Thanks for the response.
From your response, I understand that neither the client nor the OST informs the MDS/MST
about the write completion. Also, you mentioned that there is no meta data locking while
writing.
I seem to get a bit confused here. Sorry for that. :(
Say, there is a file already striped across multiple OSTs and now some client wants to
write to that file. Now it sends a request to the MDS to get the EA attributes for that
file and based on that,the client would directly write to the corresponding OSTs. So once
the client has completed writing to the file, how does the MDS/MST know that it has to
release the lock it has created on the metadata of that file.
The manual states that:
"→ In Lustre, creating a new file causes the client to contact a metadata server,
which creates an inode for the file and then contacts the OSTs to create objects that will
actually hold file data. Metadata for the objects is held in the inode as extended
attributes for the file.
→ Within the OST, data is actually read and written to underlying storage known as
Object-Based Disks (OBDs). Subsequent I/O to the newly created file is done directly
between the client and the OST, which interacts with the underlying OBDs to read and write
data. The metadata server is only updated when additional namespace changes associated
with the new file are required."
I am trying to understand how does the MDS know about the completion of clients
read/write operations on a new/exiting file. Also, the write cache you mentioned is part
of the client or OSS node??\
Can you please help me in understanding these questions. I am trying to understand the
Lustre File system replication design document that is being implemented by Intel. Some
confusion in the basic concepts is making it difficult for me to understand that
document.
Thanks,
Akhilesh Gadde.
On Sun, Apr 5, 2015 at 12:42 AM, Drokin, Oleg <oleg.drokin(a)intel.com> wrote:
Hello!
On Apr 4, 2015, at 5:14 PM, Akhilesh Gadde wrote:
> Hi,
>
> I am pretty new to Lustre and trying to understand a few things wrt to the File
Read/Write operations.
>
> 1. When the client wants to read a file, it obtains the EA layout information for
that file from the MDT and then accesses the file directly from OST(s).
>
> 2. When the client wants to write a file, it contacts the MDT and MDT would provide
the list of OSTs on which the file could be striped across. (MDT gives OSTs based on the
available free space in OSTs - round robin or weighted as given in manual).
>
> --> Once the client completes the write operation, would the client inform the
MDS about the completion and so release the locks on file metadata or the OSS/OST would
communicate this information to the MDS/MDT ??
Client does not inform MDS about write completion because MDS has no idea (and currently
does not care) about any such data activity.
Moreover, data and metadata locking are separate so there's no metadata locking while
writing.
The only bit of data MDS holds for a client that does IO is open file handle, but in fact
the client can close the file before the IO is actually finished (since there's write
caching,
and so the app might think it has done writing, but in reality the data is still flowing
from the cache to OSTs).
OSTs don't inform MDS about amy write completion either because MDS really would not
be able to do anything with this info anyway, and also OSTs don't really know
if the client genuinely stopped writing or if it is just pausing before a new burst of
data will come in.
Bye,
Oleg