On Sun, Feb 21, 2016 at 02:03:43PM -0800, Dan Williams wrote:
On Sun, Feb 21, 2016 at 1:23 PM, Boaz Harrosh
<boaz(a)plexistor.com> wrote:
> On 02/21/2016 10:57 PM, Dan Williams wrote:
>> On Sun, Feb 21, 2016 at 12:24 PM, Boaz Harrosh <boaz(a)plexistor.com>
wrote:
>>> On 02/21/2016 09:51 PM, Dan Williams wrote:
> Sure. please have a look. What happens is that the legacy app
> will add the page to the radix tree, come the fsync it will be
> flushed. Even though a "new-type" app might fault on the same page
> before or after, which did not add it to the radix tree.
> So yes, all pages faulted by legacy apps will be flushed.
>
> I have manually tested all this and it seems to work. Can you see
> a theoretical scenario where it would not?
I'm worried about the scenario where the pmem aware app assumes that
none of the cachelines in its mapping are dirty when it goes to issue
pcommit. We'll have two applications with different perceptions of
when writes are durable. Maybe it's not a problem in practice, at
least current generation x86 cpus flush existing dirty cachelines when
performing non-temporal stores. However, it bothers me that there are
cpus where a pmem-unaware app could prevent a pmem-aware app from
making writes durable. It seems if one app has established a
MAP_PMEM_AWARE mapping it needs guarantees that all apps participating
in that shared mapping have the same awareness.
Which, in practice, cannot work. Think cp, rsync, or any other
program a user can run that can read the file the MAP_PMEM_AWARE
application is using.
Another potential issue is that MAP_PMEM_AWARE is not enough on its
own. If the filesystem or inode does not support DAX the application
needs to assume page cache semantics. At a minimum MAP_PMEM_AWARE
requests would need to fail if DAX is not available.
They will always still need to call msync()/fsync() to guarantee
data integrity, because the filesystem metadata that indexes the
data still needs to be committed before data integrity can be
guaranteed. i.e. MAP_PMEM_AWARE by itself it not sufficient for data
integrity, and so the app will have to be written like any other app
that uses page cache based mmap().
Indeed, the application cannot even assume that a fully allocated
file does not require msync/fsync because the filesystem may be
doing things like dedupe, defrag, copy on write, etc behind the back
of the application and so file metadata changes may still be in
volatile RAM even though the application has flushed it's data.
Applications have no idea what the underlying filesystem and storage
is doing and so they cannot assume that complete data integrity is
provided by userspace driven CPU cache flush instructions on their
file data.
This "pmem aware applications only need to commit their data"
thinking is what got us into this mess in the first place. It's
wrong, and we need to stop trying to make pmem work this way because
it's a fundamentally broken concept.
Cheers,
Dave.
--
Dave Chinner
david(a)fromorbit.com