Hi Oleg,
Thanks for the response.
From your response, I understand that neither the client nor the OST
informs the MDS/MST about the write completion. Also, you mentioned that
there is no meta data locking while writing.
I seem to get a bit confused here. Sorry for that. :(
Say, there is a file already striped across multiple OSTs and now some
client wants to write to that file. Now it sends a request to the MDS to
get the EA attributes for that file and based on that,the client would
directly write to the corresponding OSTs. So once the client has completed
writing to the file, how does the MDS/MST know that it has to release the
lock it has created on the metadata of that file.
The manual states that:
"→ In Lustre, creating a new file causes the client to contact a metadata
server, which creates an inode for the file and then contacts the OSTs to
create objects that will actually hold file data. Metadata for the objects
is held in the inode as extended attributes for the file.
→ Within the OST, data is actually read and written to underlying storage
known as Object-Based Disks (OBDs). Subsequent I/O to the newly created
file is done directly between the client and the OST, which interacts with
the underlying OBDs to read and write data. *The metadata server is only
updated when additional namespace changes associated with the new file are
required.*"
I am trying to understand how does the MDS know about the completion of
clients read/write operations on a new/exiting file. Also, the write cache
you mentioned is part of the client or OSS node??\
Can you please help me in understanding these questions. I am trying to
understand the Lustre File system replication design document that is being
implemented by Intel. Some confusion in the basic concepts is making it
difficult for me to understand that document.
Thanks,
Akhilesh Gadde.
On Sun, Apr 5, 2015 at 12:42 AM, Drokin, Oleg <oleg.drokin(a)intel.com> wrote:
Hello!
On Apr 4, 2015, at 5:14 PM, Akhilesh Gadde wrote:
> Hi,
>
> I am pretty new to Lustre and trying to understand a few things wrt to
the File Read/Write operations.
>
> 1. When the client wants to read a file, it obtains the EA layout
information for that file from the MDT and then accesses the file directly
from OST(s).
>
> 2. When the client wants to write a file, it contacts the MDT and MDT
would provide the list of OSTs on which the file could be striped across.
(MDT gives OSTs based on the available free space in OSTs - round robin or
weighted as given in manual).
>
> --> Once the client completes the write operation, would the client
inform the MDS about the completion and so release the locks on file
metadata or the OSS/OST would communicate this information to the MDS/MDT ??
Client does not inform MDS about write completion because MDS has no idea
(and currently does not care) about any such data activity.
Moreover, data and metadata locking are separate so there's no metadata
locking while writing.
The only bit of data MDS holds for a client that does IO is open file
handle, but in fact the client can close the file before the IO is actually
finished (since there's write caching,
and so the app might think it has done writing, but in reality the data is
still flowing from the cache to OSTs).
OSTs don't inform MDS about amy write completion either because MDS really
would not be able to do anything with this info anyway, and also OSTs don't
really know
if the client genuinely stopped writing or if it is just pausing before a
new burst of data will come in.
Bye,
Oleg