On Fri, Oct 13, 2017 at 11:22:21AM -0700, Dan Williams wrote:
So, here's a strawman can ibv_poll_cq() start returning
== IBV_WC_LOC_PROT_ERR when file coherency is lost. This would make
the solution generic across DAX and non-DAX. What's you're feeling for
how well applications are prepared to deal with that status return?
The problem aren't local protection errors, but remote protection errors
when we modify a MR with an rkey that the remote side accesses.
> - How lease break can be done hitlessly, so the library user
> needs to know it is happening or see failed/missed transfers
iommu redirect should be hit less and behave like the page cache case
where RDMA targets pages that are no longer part of the file.
But systems that care about performance (e.g. the usual RDMA users) usually
don't use an IOMMU due to the performance impact. Especially as HCAs
already have their own built-in iommus (aka the MR mechanism).
Note that file systems already have a mechanism like you mention above
to keep extents that are busy from being reallocated. E.g. take a look at
fs/xfs/xfs_extent_busy.c. The downside is that this could lock down
a massive amount of space in the busy list if we for example have a MR
covering a huge file that is truncated down. So even if we'd want that
scheme we'd need some sort of ulmit for the amount of DAX pages locked
down in get_user_pages.