On Thu, Jan 18, 2018 at 8:53 AM, David Hildenbrand <david(a)redhat.com> wrote:
On 24.11.2017 13:40, Pankaj Gupta wrote:
>
> Hello,
>
> Thank you all for all the useful suggestions.
> I want to summarize the discussions so far in the
> thread. Please see below:
>
>>>>
>>>>> We can go with the "best" interface for what
>>>>> could be a relatively slow flush (fsync on a
>>>>> file on ssd/disk on the host), which requires
>>>>> that the flushing task wait on completion
>>>>> asynchronously.
>>>>
>>>>
>>>> I'd like to clarify the interface of "wait on completion
>>>> asynchronously" and KVM async page fault a bit more.
>>>>
>>>> Current design of async-page-fault only works on RAM rather
>>>> than MMIO, i.e, if the page fault caused by accessing the
>>>> device memory of a emulated device, it needs to go to
>>>> userspace (QEMU) which emulates the operation in vCPU's
>>>> thread.
>>>>
>>>> As i mentioned before the memory region used for vNVDIMM
>>>> flush interface should be MMIO and consider its support
>>>> on other hypervisors, so we do better push this async
>>>> mechanism into the flush interface design itself rather
>>>> than depends on kvm async-page-fault.
>>>
>>> I would expect this interface to be virtio-ring based to queue flush
>>> requests asynchronously to the host.
>>
>> Could we reuse the virtio-blk device, only with a different device id?
>
> As per previous discussions, there were suggestions on main two parts of the
project:
>
> 1] Expose vNVDIMM memory range to KVM guest.
>
> - Add flag in ACPI NFIT table for this new memory type. Do we need NVDIMM spec
> changes for this?
>
> - Guest should be able to add this memory in system memory map. Name of the added
memory in
> '/proc/iomem' should be different(shared memory?) than persistent memory
as it
> does not satisfy exact definition of persistent memory (requires an explicit
flush).
>
> - Guest should not allow 'device-dax' and other fancy features which are
not
> virtualization friendly.
>
> 2] Flushing interface to persist guest changes.
>
> - As per suggestion by ChristophH (CCed), we explored options other then virtio
like MMIO etc.
> Looks like most of these options are not use-case friendly. As we want to do
fsync on a
> file on ssd/disk on the host and we cannot make guest vCPU's wait for that
time.
>
> - Though adding new driver(virtio-pmem) looks like repeated work and not needed so
we can
> go with the existing pmem driver and add flush specific to this new memory
type.
I'd like to emphasize again, that I would prefer a virtio-pmem only
solution.
There are architectures out there (e.g. s390x) that don't support
NVDIMMs - there is no HW interface to expose any such stuff.
However, with virtio-pmem, we could make it work also on architectures
not having ACPI and friends.
ACPI and virtio-only can share the same pmem driver. There are two
parts to this, region discovery and setting up the pmem driver. For
discovery you can either have an NFIT-bus defined range, or a new
virtio-pmem-bus define it. As far as the pmem driver itself it's
agnostic to how the range is discovered.
In other words, pmem consumes 'regions' from libnvdimm and the a bus
provider like nfit, e820, or a new virtio-mechansim produce 'regions'.