On Thu, Apr 5, 2018 at 12:23 AM, Christoph Hellwig <hch(a)infradead.org> wrote:
On Wed, Apr 04, 2018 at 05:03:07PM -0700, Dan Williams wrote:
> "Currently, fsdax applications can assume that if they call fsync or
> msync on a dax mapped file that any pending writes that have been
> flushed out of the cpu cache will be also be flushed to the lowest
> possible persistence / failure domain available on the platform. In
> typical scenarios the platform ADR capability handles marshaling
> writes that have reached global visibility to persistence. In
> exceptional cases where ADR fails to complete its operation software
> can detect that scenario the the "last shutdown" health status check
> and otherwise mitigate the effects of an ADR failure by protecting
> metadata with the WPQ flush. In other words, enabling device-dax to
> optionally trigger WPQ Flush on msync() allows applications to have
> common implementation for persistence domain handling across fs-dax
> and device-dax."
This sounds totally bogus. Either ADR is reliable and we can rely on
it all the time (like we assume for say capacitors on ssds with non-
volatile write caches), or we can't rely on it and the write through
store model is a blatant lie. In other words - msync/fsync is what
we use for normal persistence, not for working around broken hardware.
Yes, I think it is unfortunate that the failure mode is exposed to
software at all. The problem is that ADR is a platform feature that
depends on power supply requirements external to the NVDIMM device. An
SSD is different. It is a self contained system that can arrange for
the whole device to fail if the internal energy source fails and
otherwise hide this detail from software. My personal take, a system
designer that can specify and qualify an entire stack of components
can certainly opt-out of advertising the flush capability to the OS
because, like the SSD vendor, they control the integrated solution. A
platform vendor that allows off the shelf power supplies would in my
opinion be remiss not to give the OS the option to mitigate the
quality of some random power supply. It then follow that if the OS has
the ability to mitigate ADR failure it should be through a common
interface between fsdax and devdax.
In many ways this sounds like a plot to make normal programming
not listening to the pmem.io hype look bad in benchmarks..
No, just no.