On Thu, May 7, 2015 at 8:00 AM, Linus Torvalds
On Wed, May 6, 2015 at 7:36 PM, Dan Williams
> My pet concrete example is covered by __pfn_t. Referencing persistent
> memory in an md/dm hierarchical storage configuration. Setting aside
> the thrash to get existing block users to do "bvec_set_page(page)"
> instead of "bvec->page = page" the onus is on that md/dm
> implementation and backing storage device driver to operate on
> __pfn_t. That use case is simple because there is no use of page
> locking or refcounting in that path, just dma_map_page() and
So clarify for me: are you trying to make the IO stack in general be
able to use the persistent memory as a source (or destination) for IO
to _other_ devices, or are you talking about just internally shuffling
things around for something like RAID on top of persistent memory?
Because I think those are two very different things.
Yes, they are, and I am referring to the former, persistent memory as
a source/destination to other devices.
For example, one of the things I worry about is for people doing IO
from persistent memory directly to some "slow stable storage" (aka
disk). That was what I thought you were aiming for: infrastructure so
that you can make a bio for a *disk* device contain a page list that
is the persistent memory.
And I think that is a very dangerous operation to do, because the
persistent memory itself is going to have some filesystem on it, so
anything that looks up the persistent memory pages is *not* going to
have a stable pfn: the pfn will point to a fixed part of the
persistent memory, but the file that was there may be deleted and the
memory reassigned to something else.
Indeed, truncate() in the absence of struct page has been a major
hurdle for persistent memory enabling. But it does not impact this
specific md/dm use case. md/dm will have taken an exclusive claim on
an entire pmem block device (or partition), so there will be no
competing with a filesystem.
That's the kind of thing that "struct page" helps with
for normal IO
devices. It's both a source of serialization and indirection, so that
when somebody does a "truncate()" on a file, we don't end up doing IO
to random stale locations on the disk that got reassigned to another
So "struct page" is very fundamental. It's *not* just a "this is the
physical source/drain of the data you are doing IO on".
So if you are looking at some kind of "zero-copy IO", where you can do
IO from a filesystem on persistent storage to *another* filesystem on
(say, a big rotational disk used for long-term storage) by just doing
a bo that targets the disk, but has the persistent memory as the
source memory, I really want to understand how you are going to
So *that* is what I meant by "What is the primary thing that is
driving this need? Do we have a very concrete example?"
I abvsolutely do *not* want to teach the bio subsystem to just
randomly be able to take the source/destination of the IO as being
some random pfn without knowing what the actual uses are and how these
IO's are generated in the first place.
blkdev_get(FMODE_EXCL) is the protection in this case.
I was assuming that you wanted to do something where you mmap() the
persistent memory, and then write it out to another device (possibly
using aio_write()). But that really does require some kind of
serialization at a higher level, because you can't just look up the
pfn's in the page table and assume they are stable: they are *not*
We want to get there eventually, but this patchset does not address that case.