On Wed, Sep 02, 2015 at 01:04:01PM -0600, Ross Zwisler wrote:
On Tue, Sep 01, 2015 at 03:18:41PM +0300, Boaz Harrosh wrote:
> So the approach we took was a bit different to exactly solve these
> problem, and to also not over flush too much. here is what we did.
>
> * At vm_operations_struct we also override the .close vector (say call it
dax_vm_close)
>
> * At dax_vm_close() on writable files call ->fsync(,vma->vm_start,
vma->vm_end,)
> (We have an inode flag if the file was actually dirtied, but even if not, that
will
> not be that bad, so a file was opened for write, mmapped, but actually never
> modified. Not a lot of these, and the do nothing cl_flushing is very fast)
>
> * At ->fsync() do the actual cl_flush for all cases but only iff
> if (mapping_mapped(inode->i_mapping) == 0)
> return 0;
>
> This is because data written not through mmap is already persistent and we
> do not need the cl_flushing
>
> Apps expect all these to work:
> 1. open mmap m-write msync ... close
> 2. open mmap m-write fsync ... close
> 3. open mmap m-write unmap ... fsync close
>
> 4. open mmap m-write sync ...
So basically you made close have an implicit fsync? What about the flow that
looks like this:
5. open mmap close m-write
This guy definitely needs an msync/fsync at the end to make sure that the
m-write becomes durable.
We can sync on pte_dirty() during zap_page_range(): it's practically free,
since we page walk anyway.
With this approach it probably makes sense to come back to page walk on
msync() side too to be consistent wrt pte_dirty() meaning.
Also, the CLOSE(2) man page specifically says that a flush does not
occur at
close:
A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It
is not common for a filesystem to flush the buffers when the stream is
closed. If you need to be sure that the data is physically stored,
use fsync(2). (It will depend on the disk hardware at this point.)
I don't think that adding an implicit fsync to close is the right solution -
we just need to get msync and fsync correctly working.
I doesn't mean we can't sync if we can do without noticible performance
degradation.
--
Kirill A. Shutemov