Hi Oleg,

Thanks for the response.
From your response, I understand that neither the client nor the OST informs the MDS/MST about the write completion. Also, you mentioned that there is no meta data locking while writing.
I seem to get a bit confused here. Sorry for that. :(
Say, there is a file already striped across multiple OSTs and now some client wants to write to that file. Now it sends a request to the MDS to get the EA attributes for that file and based on that,the client would directly write to the corresponding OSTs. So once the client has completed writing to the file, how does the MDS/MST know that it has to release the lock it has created on the metadata of that file. 
The manual states that:
"→ In Lustre, creating a new file causes the client to contact a metadata server, which creates an inode for the file and then contacts the OSTs to create objects that will actually hold file data. Metadata for the objects is held in the inode as extended attributes for the file.
→ Within the OST, data is actually read and written to underlying storage known as Object-Based Disks (OBDs). Subsequent I/O to the newly created file is done directly between the client and the OST, which interacts with the underlying OBDs to read and write data. The metadata server is only updated when additional namespace changes associated with the new file are required."

I am trying to understand how does the MDS know about the completion of clients read/write operations on a new/exiting file. Also, the write cache you mentioned is part of the client or OSS node??\

Can you please help me in understanding these questions. I am trying to understand the Lustre File system replication design document that is being implemented by Intel. Some confusion in the basic concepts is making it difficult for me to understand that document.

Thanks,
Akhilesh Gadde.





On Sun, Apr 5, 2015 at 12:42 AM, Drokin, Oleg <oleg.drokin@intel.com> wrote:
Hello!

On Apr 4, 2015, at 5:14 PM, Akhilesh Gadde wrote:

> Hi,
>
> I am pretty new to Lustre and trying to understand a few things wrt to the File Read/Write operations.
>
> 1. When the client wants to read a file, it obtains the EA layout information for that file from the MDT and then accesses the file directly from OST(s).
>
> 2. When the client wants to write a file, it contacts the MDT and MDT would provide the list of OSTs on which the file could be striped across. (MDT gives OSTs based on the available free space in OSTs - round robin or weighted as given in manual).
>
> --> Once the client completes the write operation, would the client inform the MDS about the completion and so release the locks on file metadata or the OSS/OST would communicate this information to the MDS/MDT ??

Client does not inform MDS about write completion because MDS has no idea (and currently does not care) about any such data activity.
Moreover, data and metadata locking are separate so there's no metadata locking while writing.
The only bit of data MDS holds for a client that does IO is open file handle, but in fact the client can close the file before the IO is actually finished (since there's write caching,
and so the app might think it has done writing, but in reality the data is still flowing from the cache to OSTs).
OSTs don't inform MDS about amy write completion either because MDS really would not be able to do anything with this info anyway, and also OSTs don't really know
if the client genuinely stopped writing or if it is just pausing before a new burst of data will come in.

Bye,
    Oleg