On Thu, Oct 12, 2017 at 11:57 PM, Christoph Hellwig <hch(a)lst.de> wrote:
On Thu, Oct 12, 2017 at 10:41:39AM -0700, Dan Williams wrote:
> So, you're jumping into this review at v9 where I've split the patches
> that take an initial MAP_DIRECT lease out from the patches that take
> FL_LAYOUT leases at memory registration time. You can see a previous
> attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace
> flush" which should be in your inbox.
The point is that your problem has absolutely nothing to do with mmap,
and all with get_user_pages.
get_user_pages on DAX doesn't give the same guarantees as on pagecache
or anonymous memory, and that is the prbolem we need to fix. In fact
I'm pretty sure if we try hard enough (and we might have to try
very hard) we can see the same problem with plain direct I/O and without
any RDMA involved, e.g. do a larger direct I/O write to memory that is
mmap()ed from a DAX file, then truncate the DAX file and reallocate
the blocks, and we might corrupt that new file. We'll probably need
a special setup where there is little other chance but to reallocate
those used blocks.
I'll take a harder look at this...
So what we need to do first is to fix get_user_pages vs unmapping
DAX mmap()ed blocks, be that from a hole punch, truncate, COW
operation, etc.
Then we need to look into the special case of a long-living non-transient
get_user_pages that RDMA does - we can't just reject any truncate or
other operation for that, so that's where something like me layout
lease suggestion comes into play - but the call that should get the
least is not the mmap - it's the memory registration call that does
the get_user_pages.
Yes, mmap is not the place to get the lease for a later
get_user_pages, and my patches do take an additional lease at
get_user_pages / MR init time. However, the mmap call has the
file-descriptor for SIGIO the MR-init call does not. If we delay all
of the setup it to MR time then we need to invent a notification
scheme specific to RDMA which seems like a waste to me when we can
generically signal an event on the fd for any event that effects any
of the vma's on the file. The FL_LAYOUT lease impacts the entire file,
so as far as I can see delaying the notification until MR-init is too
late, too granular, and too RDMA specific.