Dave Chinner <david(a)fromorbit.com> writes:
> Another potential issue is that MAP_PMEM_AWARE is not enough on
> own. If the filesystem or inode does not support DAX the application
> needs to assume page cache semantics. At a minimum MAP_PMEM_AWARE
> requests would need to fail if DAX is not available.
They will always still need to call msync()/fsync() to guarantee
data integrity, because the filesystem metadata that indexes the
data still needs to be committed before data integrity can be
guaranteed. i.e. MAP_PMEM_AWARE by itself it not sufficient for data
integrity, and so the app will have to be written like any other app
that uses page cache based mmap().
Indeed, the application cannot even assume that a fully allocated
file does not require msync/fsync because the filesystem may be
doing things like dedupe, defrag, copy on write, etc behind the back
of the application and so file metadata changes may still be in
volatile RAM even though the application has flushed it's data.
Once you hand out a persistent memory mapping, you sure as heck can't
switch blocks around behind the back of the application.
But even if we're not dealing with persistent memory, you seem to imply
that applications needs to fsync just in case the file system did
something behind its back. In other words, an application opening a
fully allocated file and using fdatasync will also need to call fsync,
just in case. Is that really what you're suggesting?
Applications have no idea what the underlying filesystem and storage
is doing and so they cannot assume that complete data integrity is
provided by userspace driven CPU cache flush instructions on their
This is surprising to me, and goes completely against the proposed
programming model. In fact, this is a very basic tenet of the operation
of the nvml libraries on pmem.io.
That aside, let me see if I understand you correctly.
An application creates a file and writes to every single block in the
thing, sync's it, closes it. It then opens it back up, calls mmap with
this new MAP_DAX flag or on a file system mounted with -o dax, and
proceeds to access the file using loads and stores. It persists its
data by using non-temporal stores, flushing and fencing cpu
If I understand you correctly, you're saying that that application is
not written correctly, because it needs to call fsync to persist
metadata (that it presumably did not modify). Is that right?