On Fri, Aug 26, 2016 at 03:29:34PM -0600, Ross Zwisler wrote:
These changes don't remove the things in XFS needed by the old
I/O and fault
paths (e.g. xfs_get_blocks_direct() is still there an unchanged). Is the
correct way forward to get buy-in from ext2/ext4 so that they also move to
supporting an iomap based I/O path (xfs_file_iomap_begin(),
xfs_iomap_write_direct(), etc?). That would allow us to have parallel I/O and
fault paths for a while, then remove the old buffer_head based versions when
the three supported filesystems have moved to iomap.
If ext2 and ext4 don't choose to move to iomap, though, I don't think we want
to have a separate I/O & fault path for iomap/XFS. That seems too painful,
and the old buffer_head version should continue to work, ugly as it may be.
We're going to move forward killing buffer_heads in XFS. I think ext4
would dramatically benefit from this a well, as would ext2 (although I
think all that DAX work in ext2 is a horrible idea to start with).
If I don't get buy-in for the iomap DAX work in the dax code we'll just
have to keep it separate. That buffer_head mess just isn't maintainable
the long run.
1) In your mail above you say "It also gets rid of the other
warts of the DAX
path due to pretending to be like direct I/O". I assume by this you mean
the code in dax_do_io() around DIO_LOCKING, inode_dio_begin(), etc?
Perhaps there are other things as well in XFS, but this is what I
the DAX code. If so, yep, this seems like a win. I don't understand how
DIO_LOCKING is relevant to the DAX I/O path, as we never mix buffered and
It's related to doing stupid copy and paste from direct I/O in the DAX
The comment in dax_do_io() for the inode_dio_begin() call says
prevents the I/O from races with truncate. Am I correct that we now get
this protection via the xfs_rw_ilock()/xfs_rw_iunlock() calls in
Yes, XFS always has a lock over reads that serializes with truncate.
Currenrly it's the XFS i_iolock, but I'll remove that soon and use the
VFS i_rwsem instead. For ext2/4 we could go straight to i_rwsem in
2) Just a nit, I noticed that you used "~(PAGE_SIZE - 1)"
in several places in
iomap_dax_actor() and iomap_dax_fault() instead of PAGE_MASK. Was this
Mostly because that's how I think. I'm fine using PAGE_MASK, though.
3) It's kind of weird having iomap_dax_fault() in fs/dax.c but
iomap_dax_actor() and iomap_dax_rw() in fs/iomap.c? I'm guessing the
latter is placed where it is because it uses iomap_apply(), which is local
to fs/iomap.c? Anyway, it would be nice if we could keep them together, if
It's still work in progress and could use a few cleanups.
4) In iomap_dax_actor() you do this check:
WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
If we hit this we should bail with -EIO, yea? Otherwise we could write to
unmapped space or something horrible.
Fine with me.
5) In iomap_dax_fault, I think the "I/O beyond the end of the
might have been broken. Take for example an I/O to the second page of a
file, where the file has size one page. So:
sure, I can fix this up.
6) Regarding the "we don't even have the size hole
problem" comment in your
mail, the current PMD logic requires us to know the size of the hole.
And a big part of the iomap interface is proper reporting of holes.