On Thu, Oct 12, 2017 at 7:23 AM, Christoph Hellwig <hch(a)lst.de> wrote:
Sorry for chiming in so late, been extremely busy lately.
From quickly glacing over what the now finally described use case is
(which contradicts the subject btw - it's not about flushing, it's
about not removing block mapping under a MR) and the previous comments
I think that mmap is simply the wrong kind of interface for this.
What we want is support for a new kinds of userspace memory registration in the
RDMA code that uses the pnfs export interface, both getting the block (or
rather byte in this case) mapping, and also gets the FL_LAYOUT lease for the
That btw is exactly what I do for the pNFS RDMA layout, just in-kernel.
...and this is exactly my plan.
So, you're jumping into this review at v9 where I've split the patches
that take an initial MAP_DIRECT lease out from the patches that take
FL_LAYOUT leases at memory registration time. You can see a previous
attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace
flush" which should be in your inbox.
I'm not proposing mmap as the memory registration interface, it's the
"register for notification of lease break" interface. Here's my
addr = mmap(..., MAP_DIRECT.., fd); <- register a vma for "direct"
memory registrations with an FL_LAYOUT lease that at a lease break
event sends SIGIO on the fd used for mmap.
ibv_reg_mr(..., addr, ...); <- check for a valid MAP_DIRECT vma, and
take out another FL_LAYOUT lease. This lease force revokes the RDMA
mapping when it expires, and it relies on the process receiving SIGIO
as the 'break' notification.
fallocate(fd, PUNCH_HOLE...) <- breaks all the FL_LAYOUT leases, the
vma owner gets notified by fd.
Al, rightly points out that the fd may be closed by the time the event
fires since the lease follows the vma lifetime. I see two ways to
solve this, document that the process may get notifications on a stale
fd if close() happens before munmap(), or, similar to how we call
locks_remove_posix() in filp_close(), add a routine to disable any
lease notifiers on close(). I'll investigate the second option because
this seems to be a general problem with leases.
For RDMA I am presently re-working the implementation . Inspired by
a discussion with Jason , I am going to add something like
ib_umem_ops to allow drivers to override the default policy of what
happens on a lease that expires. The default action is to invalidate
device access to the memory with iommu_unmap(), but I want to allow
for drivers to do something smarter or choose to not support DAX
mappings at all.