On Mon, Oct 16, 2017 at 12:26 AM, Christoph Hellwig <hch(a)lst.de> wrote:
On Fri, Oct 13, 2017 at 11:31:45AM -0600, Jason Gunthorpe wrote:
> I don't think that really represents how lots of apps actually use
> RDMA.
>
> RDMA is often buried down in the software stack (eg in a MPI), and by
> the time a mapping gets used for RDMA transfer the link between the
> FD, mmap and the MR is totally opaque.
>
> Having a MR specific notification means the low level RDMA libraries
> have a chance to deal with everything for the app.
>
> Eg consider a HPC app using MPI that uses some DAX aware library to
> get DAX backed mmap's. It then passes memory in those mmaps to the
> MPI library to do transfers. The MPI creates the MR on demand.
>
I suspect one of the more interesting use cases might be a file server,
for which that's not the case. But otherwise I agree with the above,
and also thing that notifying the MR handle is the only way to go for
another very important reason: fencing. What if the application/library
does not react on the notification? With a per-MR notification we
can unregister the MR in kernel space and have a rock solid fencing
mechanism. And that is the most important bit here.
While I agree with the need for a per-MR notification mechanism, one
thing we lose by walking away from MAP_DIRECT is a way for a
hypervisor to coordinate pass through of a DAX mapping to an RDMA
device in a guest. That will remain a case where we will still need to
use device-dax. I'm fine if that's the answer, but just want to be
clear about all the places we need to protect a DAX mapping against
RDMA from a non-ODP device.