On Wed, May 6, 2015 at 5:19 PM, Linus Torvalds
On Wed, May 6, 2015 at 4:47 PM, Dan Williams
> Conceptually better, but certainly more difficult to audit if the fake
> struct page is initialized in a subtle way that breaks when/if it
> leaks to some unwitting context.
Maybe. It could go either way, though. In particular, with the
"dynamically allocated struct page" approach, if somebody uses it past
the supposed lifetime of the use, things like poisoning the temporary
"struct page" could be fairly effective. You can't really poison the
pfn - it's just a number, and if somebody uses it later than you think
(and you have re-used that physical memory for something else), you'll
never ever know.
True, but there's little need to poison a _pfn_t because it's
permanent once discovered via ->direct_access() on the hosting struct
block_device. Sure, kmap_atomic_pfn_t() may fail when the pmem driver
unbinds from a device, but the __pfn_t is still valid. Obviously, we
can only support atomic kmap(s) with this property, and it would be
nice to fault if someone continued to use the __pfn_t after the
hosting device was disabled. To be clear, DAX has this same problem
today. Nothing stops whomever called ->direct_access() to continue
using the pfn after the backing device has been disabled.
I'd *assume* that most users of the dynamic "struct
have very clear lifetime rules. Those things would presumably normally
get looked-up by some extended version of "get_user_pages()", and
there's a clear use of the result, with no longer lifetime. Also, you
do need to have some higher-level locking when you do this, to make
sure that the persistent pages don't magically get re-assigned. We're
presumably talking about having a filesystem in that persistent
memory, so we cannot be doing IO to the pages (from some other source
- whether RDMA or some special zero-copy model) while the underlying
filesystem is reassigning the storage because somebody deleted the
IOW, there had better be other external rules about when - and how
long - you can use a particular persistent page. No? So the whole
"when/how to allocate the temporary 'struct page'" is just another
detail in that whole thing.
And yes, some uses may not ever actually see that. If the whole of
persistent memory is just assigned to a database or something, and the
DB just wants to do a "flush this range of persistent memory to
long-term disk storage", then there may not be much of a "lifetime"
issue for the persistent memory. But even then you're going to have IO
completion callbacks etc to let the DB know that it has hit the disk,
What is the primary thing that is driving this need? Do we have a very
My pet concrete example is covered by __pfn_t. Referencing persistent
memory in an md/dm hierarchical storage configuration. Setting aside
the thrash to get existing block users to do "bvec_set_page(page)"
instead of "bvec->page = page" the onus is on that md/dm
implementation and backing storage device driver to operate on
__pfn_t. That use case is simple because there is no use of page
locking or refcounting in that path, just dma_map_page() and
kmap_atomic(). The more difficult use case is precisely what Al
picked up on, O_DIRECT and RDMA. This patchset does nothing to
address those use cases outside of not needing a struct page when they
eventually craft a bio.
I know Matthew Wilcox has explored the idea of "get_user_sg()" and let
the scatterlist hold the reference count and locks, but I'll let him
speak to that.
I still see __pfn_t as generally useful for the simple in-kernel
stacked-block-i/o use case.