On Fri, Oct 13, 2017 at 10:01:04AM -0700, Dan Williams wrote:
On Fri, Oct 13, 2017 at 9:38 AM, Jason Gunthorpe
> On Fri, Oct 13, 2017 at 08:14:55AM -0700, Dan Williams wrote:
>> scheme specific to RDMA which seems like a waste to me when we can
>> generically signal an event on the fd for any event that effects any
>> of the vma's on the file. The FL_LAYOUT lease impacts the entire file,
>> so as far as I can see delaying the notification until MR-init is too
>> late, too granular, and too RDMA specific.
> But for RDMA a FD is not what we care about - we want the MR handle so
> the app knows which MR needs fixing.
I'd rather put the onus on userspace to remember where it used a
MAP_DIRECT mapping and be aware that all the mappings of that file are
subject to a lease break. Sure, we could build up a pile of kernel
infrastructure to notify on a per-MR basis, but I think that would
only be worth it if leases were range based. As it is, the entire file
is covered by a lease instance and all MRs that might reference that
file get one notification. That said, we can always arrange for a
per-driver callback at lease-break time so that it can do something
above and beyond the default notification.
I don't think that really represents how lots of apps actually use
RDMA is often buried down in the software stack (eg in a MPI), and by
the time a mapping gets used for RDMA transfer the link between the
FD, mmap and the MR is totally opaque.
Having a MR specific notification means the low level RDMA libraries
have a chance to deal with everything for the app.
Eg consider a HPC app using MPI that uses some DAX aware library to
get DAX backed mmap's. It then passes memory in those mmaps to the
MPI library to do transfers. The MPI creates the MR on demand.
So, who should be responsible for MR coherency? Today we say the MPI
is responsible. But we can't really expect the MPI
to hook SIGIO and somehow try to reverse engineer what MRs are
impacted from a FD that may not even still be open.
I think, if you want to build a uAPI for notification of MR lease
break, then you need show how it fits into the above software model:
- How it can be hidden in a RDMA specific library
- How lease break can be done hitlessly, so the library user never
needs to know it is happening or see failed/missed transfers
- Whatever fast path checking is needed does not kill performance