DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
locking. This series allows DAX PMDs to participate in the DAX radix tree
based locking scheme so that they can be re-enabled.
Jan and Christoph, can you please help review these changes?
Andrew, when the time is right can you please push these changes to Linus via
the -mm tree?
In some simple mmap I/O testing with FIO the use of PMD faults more than
doubles I/O performance as compared with PTE faults. Here is the FIO script I
used for my testing:
Here are the performance results with XFS using only pte faults:
READ: io=1022.7MB, aggrb=557610KB/s, minb=557610KB/s, maxb=557610KB/s, mint=1878msec, maxt=1878msec
WRITE: io=1025.4MB, aggrb=559084KB/s, minb=559084KB/s, maxb=559084KB/s, mint=1878msec, maxt=1878msec
Here are performance numbers for that same test using PMD faults:
READ: io=1022.7MB, aggrb=1406.7MB/s, minb=1406.7MB/s, maxb=1406.7MB/s, mint=727msec, maxt=727msec
WRITE: io=1025.4MB, aggrb=1410.4MB/s, minb=1410.4MB/s, maxb=1410.4MB/s, mint=727msec, maxt=727msec
This was done on a random lab machine with a PMEM device made from memmap'd
RAM. To get XFS to use PMD faults, I did the following:
mkfs.xfs -f -d su=2m,sw=1 /dev/pmem0
mount -o dax /dev/pmem0 /mnt/pmem0
xfs_io -c "extsize 2m" /mnt/pmem0
Changes since v2:
- Removed the struct buffer_head + get_block_t based dax_pmd_fault() handler.
All DAX PMD faults will now happen via the new struct iomap based
- Added a new struct iomap based PMD path which is now used by XFS.
- Now that it is using struct iomap, ext2 no longer needs to modified so that
ext2_get_block() will give us the size of a hole.
- Remove support for DAX PMD faults for ext2. I can't get them to reliably
happen in my testing.
- Removed unused xfs_get_blocks_dax_fault() wrapper
- Added a bunch of comments around my changes in dax.c.
This was built upon xfs/for-next with PMD performance fixes from Toshi Kani and
Dan Williams. Dan's patch has already been merged for v4.8, and Toshi's
patches are currently queued in Andrew Morton's mm tree for v4.9 inclusion.
Here is a tree containing my changes and all the fixes that I've been testing:
Ross Zwisler (11):
ext4: allow DAX writeback for hole punch
ext4: tell DAX the size of allocation holes
dax: remove buffer_size_valid()
ext2: remove support for DAX PMD faults
dax: make 'wait_table' global variable static
dax: consistent variable naming for DAX entries
dax: coordinate locking for offsets in PMD range
dax: remove dax_pmd_fault()
dax: add struct iomap based DAX PMD support
xfs: use struct iomap based DAX PMD fault path
dax: remove "depends on BROKEN" from FS_DAX_PMD
fs/Kconfig | 1 -
fs/dax.c | 696 +++++++++++++++++++++++++++++-----------------------
fs/ext2/file.c | 24 +-
fs/ext4/inode.c | 7 +-
fs/xfs/xfs_aops.c | 25 +-
fs/xfs/xfs_aops.h | 3 -
fs/xfs/xfs_file.c | 2 +-
include/linux/dax.h | 37 ++-
mm/filemap.c | 6 +-
9 files changed, 434 insertions(+), 367 deletions(-)
The last patch is what started the series: XFS currently uses the
direct I/O locking strategy for DAX because DAX was overloaded onto
the direct I/O path. For XFS this means that we only take a shared
inode lock instead of the normal exclusive one for writes IFF they
are properly aligned. While this is fine for O_DIRECT which requires
explicit opt-in from the application it's not fine for DAX where we'll
suddenly lose expected and required synchronization of the file system
happens to use DAX undeneath.
Patches 1-7 just untangle the code so that we can deal with DAX on
it's own easily.
Patch 1 changes the default behaviour on machine check exceptions to
just adding the error address to badblocks accounting instead of starting
a full ARS. The old behaviour can be enabled via sysfs.
Patch 2 and 3 fix a problem where stale badblocks could show up after an
on-demand ARS or an MCE triggered scrub because when clearing poison, we
didn't clear the internal nvdimm_bus->poison_list.
Vishal Verma (3):
nfit: don't start a full scrub by default for an MCE
pmem: reduce kmap_atomic sections to the memcpys only
libnvdimm: clear the internal poison_list when clearing badblocks
drivers/acpi/nfit/core.c | 23 +++++++++++++--
drivers/acpi/nfit/mce.c | 24 ++++++++++++----
drivers/acpi/nfit/nfit.h | 6 ++++
drivers/nvdimm/bus.c | 2 ++
drivers/nvdimm/core.c | 73 ++++++++++++++++++++++++++++++++++++++++++++---
drivers/nvdimm/pmem.c | 28 ++++++++++++++----
include/linux/libnvdimm.h | 2 ++
7 files changed, 141 insertions(+), 17 deletions(-)