On Wed, Jul 26, 2017 at 4:46 PM, Rik van Riel <riel(a)redhat.com> wrote:
On Wed, 2017-07-26 at 14:40 -0700, Dan Williams wrote:
> On Wed, Jul 26, 2017 at 2:27 PM, Rik van Riel <riel(a)redhat.com>
> wrote:
> > On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote:
> > > >
> > >
> > > Just want to summarize here(high level):
> > >
> > > This will require implementing new 'virtio-pmem' device which
> > > presents
> > > a DAX address range(like pmem) to guest with read/write(direct
> > > access)
> > > & device flush functionality. Also, qemu should implement
> > > corresponding
> > > support for flush using virtio.
> > >
> >
> > Alternatively, the existing pmem code, with
> > a flush-only block device on the side, which
> > is somehow associated with the pmem device.
> >
> > I wonder which alternative leads to the least
> > code duplication, and the least maintenance
> > hassle going forward.
>
> I'd much prefer to have another driver. I.e. a driver that refactors
> out some common pmem details into a shared object and can attach to
> ND_DEVICE_NAMESPACE_{IO,PMEM}. A control device on the side seems
> like
> a recipe for confusion.
At that point, would it make sense to expose these special
virtio-pmem areas to the guest in a slightly different way,
so the regions that need virtio flushing are not bound by
the regular driver, and the regular driver can continue to
work for memory regions that are backed by actual pmem in
the host?
Hmm, yes that could be feasible especially if it uses the ACPI NFIT
mechanism. It would basically involve defining a new SPA (System
Phyiscal Address) range GUID type, and then teaching libnvdimm to
treat that as a new pmem device type.
See usage of UUID_PERSISTENT_MEMORY in drivers/acpi/nfit/ and the
eventual region description sent to nvdimm_pmem_region_create(). We
would then need to plumb a new flag so that nd_region_to_nstype() in
libnvdimm returns a different namespace type number for this virtio
use case, but otherwise the rest of libnvdimm should treat the region
as pmem.