On 3/24/21 8:37 AM, David Gibson wrote:
On Tue, Mar 23, 2021 at 09:47:38AM -0400, Shivaprasad G Bhat wrote:
> The patch adds support for the SCM flush hcall for the nvdimm devices.
> To be available for exploitation by guest through the next patch.
> The hcall expects the semantics such that the flush to return
> with H_BUSY when the operation is expected to take longer time along
> with a continue_token. The hcall to be called again providing the
> continue_token to get the status. So, all fresh requsts are put into
> a 'pending' list and flush worker is submitted to the thread pool.
> The thread pool completion callbacks move the requests to 'completed'
> list, which are cleaned up after reporting to guest in subsequent
> hcalls to get the status.
> The semantics makes it necessary to preserve the continue_tokens
> and their return status even across migrations. So, the pre_save
> handler for the device waits for the flush worker to complete and
> collects all the hcall states from 'completed' list. The necessary
> nvdimm flush specific vmstate structures are added to the spapr
> machine vmstate.
> Signed-off-by: Shivaprasad G Bhat <sbhat(a)linux.ibm.com>
An overal question: surely the same issue must arise on x86 with
file-backed NVDIMMs. How do they handle this case?
On x86 we have different ways nvdimm can be discovered. ACPI NFIT, e820
map and virtio_pmem. Among these virio_pmem always operated with
synchronous dax disabled and both ACPI and e820 doesn't have the ability
to differentiate support for synchronous dax.
With that I would expect users to use virtio_pmem when using using file