[PATCH 0/7] Fix DM DAX handling
by Ross Zwisler
This series fixes a few issues that I found with DM's handling of DAX
devices. Here are some of the issues I found:
* We can create a dm-stripe or dm-linear device which is made up of an
fsdax PMEM namespace and a raw PMEM namespace but which can hold a
filesystem mounted with the -o dax mount option. DAX operations to
the raw PMEM namespace part lack struct page and can fail in
interesting/unexpected ways when doing things like fork(), examining
memory with gdb, etc.
* We can create a dm-stripe or dm-linear device which is made up of an
fsdax PMEM namespace and a BRD ramdisk which can hold a filesystem
mounted with the -o dax mount option. All I/O to this filesystem
will fail.
* In DM you can't transition a dm target which could possibly support
DAX (mode DM_TYPE_DAX_BIO_BASED) to one which can't support DAX
(mode DM_TYPE_BIO_BASED), even if you never use DAX.
The first 2 patches in this series are prep work from Darrick and Dave
which improve bdev_dax_supported(). The last 5 problems fix the above
mentioned problems in DM. I feel that this series simplifies the
handling of DAX devices in DM, and the last 5 DM-related patches have a
net code reduction of 50 lines.
Darrick J. Wong (1):
fs: allow per-device dax status checking for filesystems
Dave Jiang (1):
dax: change bdev_dax_supported() to support boolean returns
Ross Zwisler (5):
dm: fix test for DAX device support
dm: prevent DAX mounts if not supported
dm: remove DM_TYPE_DAX_BIO_BASED dm_queue_mode
dm-snap: remove unnecessary direct_access() stub
dm-error: remove unnecessary direct_access() stub
drivers/dax/super.c | 44 +++++++++++++++++++++----------------------
drivers/md/dm-ioctl.c | 16 ++++++----------
drivers/md/dm-snap.c | 8 --------
drivers/md/dm-table.c | 29 +++++++++++-----------------
drivers/md/dm-target.c | 7 -------
drivers/md/dm.c | 7 ++-----
fs/ext2/super.c | 3 +--
fs/ext4/super.c | 3 +--
fs/xfs/xfs_ioctl.c | 3 ++-
fs/xfs/xfs_iops.c | 30 ++++++++++++++++++++++++-----
fs/xfs/xfs_super.c | 10 ++++++++--
include/linux/dax.h | 12 ++++--------
include/linux/device-mapper.h | 8 ++++++--
13 files changed, 88 insertions(+), 92 deletions(-)
--
2.14.3
3 years, 12 months
[PATCH resend 0/7] Fix DM DAX handling
by Ross Zwisler
No code changes from v1. Just CCing the xfs mailing list & adding one
Reviewed-by from Darrick.
---
This series fixes a few issues that I found with DM's handling of DAX
devices. Here are some of the issues I found:
* We can create a dm-stripe or dm-linear device which is made up of an
fsdax PMEM namespace and a raw PMEM namespace but which can hold a
filesystem mounted with the -o dax mount option. DAX operations to
the raw PMEM namespace part lack struct page and can fail in
interesting/unexpected ways when doing things like fork(), examining
memory with gdb, etc.
* We can create a dm-stripe or dm-linear device which is made up of an
fsdax PMEM namespace and a BRD ramdisk which can hold a filesystem
mounted with the -o dax mount option. All I/O to this filesystem
will fail.
* In DM you can't transition a dm target which could possibly support
DAX (mode DM_TYPE_DAX_BIO_BASED) to one which can't support DAX
(mode DM_TYPE_BIO_BASED), even if you never use DAX.
The first 2 patches in this series are prep work from Darrick and Dave
which improve bdev_dax_supported(). The last 5 problems fix the above
mentioned problems in DM. I feel that this series simplifies the
handling of DAX devices in DM, and the last 5 DM-related patches have a
net code reduction of 50 lines.
Darrick J. Wong (1):
fs: allow per-device dax status checking for filesystems
Dave Jiang (1):
dax: change bdev_dax_supported() to support boolean returns
Ross Zwisler (5):
dm: fix test for DAX device support
dm: prevent DAX mounts if not supported
dm: remove DM_TYPE_DAX_BIO_BASED dm_queue_mode
dm-snap: remove unnecessary direct_access() stub
dm-error: remove unnecessary direct_access() stub
drivers/dax/super.c | 44 +++++++++++++++++++++----------------------
drivers/md/dm-ioctl.c | 16 ++++++----------
drivers/md/dm-snap.c | 8 --------
drivers/md/dm-table.c | 29 +++++++++++-----------------
drivers/md/dm-target.c | 7 -------
drivers/md/dm.c | 7 ++-----
fs/ext2/super.c | 3 +--
fs/ext4/super.c | 3 +--
fs/xfs/xfs_ioctl.c | 3 ++-
fs/xfs/xfs_iops.c | 30 ++++++++++++++++++++++++-----
fs/xfs/xfs_super.c | 10 ++++++++--
include/linux/dax.h | 12 ++++--------
include/linux/device-mapper.h | 8 ++++++--
13 files changed, 88 insertions(+), 92 deletions(-)
--
2.14.3
3 years, 12 months
[PATCH] uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
by Dan Williams
Add a common Kconfig CONFIG_ARCH_HAS_UACCESS_MCSAFE that archs can
optionally select, and fixup the declaration of _copy_to_iter_mcsafe().
Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Ingo, Thomas, fyi, here is a trivial compilation fixup I will carry in
my tree based on tip/x86/dax. I missed defining the Kconfig symbol, and
my unit test was only dependent on the low level implementation.
include/linux/uio.h | 2 +-
lib/Kconfig | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index f5766e853a77..409c845d4cd3 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -155,7 +155,7 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
#endif
#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-size_t _copy_to_iter_mcsafe(void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i);
#else
#define _copy_to_iter_mcsafe _copy_to_iter
#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 5fe577673b98..907f6e4f1cf2 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -586,6 +586,9 @@ config ARCH_HAS_PMEM_API
config ARCH_HAS_UACCESS_FLUSHCACHE
bool
+config ARCH_HAS_UACCESS_MCSAFE
+ bool
+
config STACKDEPOT
bool
select STACKTRACE
3 years, 12 months
Questions about vNVDIMM on qemu/KVM
by Yasunori Goto
Hello,
I'm investigating status of vNVDIMM on qemu/KVM,
and I have some questions about it. I'm glad if anyone answer them.
In my understanding, qemu/KVM has a feature to show NFIT for guest,
and it will be still updated about platform capability with this patch set.
https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04756.html
And libvirt also supports this feature with <memory model='nvdimm'>
https://libvirt.org/formatdomain.html#elementsMemory
However, virtio-pmem is developing now, and it is better
for archtectures to detect regions of NVDIMM without ACPI (like s390x)
In addition, It is also necessary to flush guest contents on vNVDIMM
who has a backend-file.
Q1) Does ACPI.NFIT bus of qemu/kvm remain with virtio-pmem?
How do each roles become it if both NFIT and virtio-pmem will be available?
If my understanding is correct, both NFIT and virtio-pmem is used to
detect vNVDIMM regions, but only one seems to be necessary....
Otherwize, is the NFIT bus just for keeping compatibility,
and virtio-pmem is promising way?
Q2) What bus is(will be?) created for virtio-pmem?
I could confirm the bus of NFIT is created with <memory model='nvdimm'>,
and I heard other bus will be created for virtio-pmem, but I could not
find what bus is created concretely.
---
# ndctl list -B
{
"provider":"ACPI.NFIT",
"dev":"ndbus0"
}
---
I think it affects what operations user will be able to, and what
notification is necessary for vNVDIMM.
ACPI defines some operations like namespace controll, and notification
for NVDIMM health status or others.
(I suppose that other status notification might be necessary for vNVDIMM,
but I'm not sure yet...)
If my understanding is wrong, please correct me.
Thanks,
---
Yasunori Goto
4 years
[ndctl PATCH] test: add a MADV_HWPOISON test
by Dan Williams
Check that injecting soft-poison to a dax mapping results in SIGBUS with
the expected BUS_MCEERR_AR siginfo data.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
test/dax-pmd.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 126 insertions(+), 4 deletions(-)
diff --git a/test/dax-pmd.c b/test/dax-pmd.c
index 65bee6ffe907..abff4f9fd199 100644
--- a/test/dax-pmd.c
+++ b/test/dax-pmd.c
@@ -12,6 +12,7 @@
*/
#include <stdio.h>
#include <unistd.h>
+#include <setjmp.h>
#include <sys/mman.h>
#include <linux/mman.h>
#include <sys/types.h>
@@ -192,15 +193,130 @@ int test_dax_directio(int dax_fd, unsigned long align, void *dax_addr, off_t off
return rc;
}
+static sigjmp_buf sj_env;
+static int sig_mcerr_ao, sig_mcerr_ar, sig_count;
+
+static void sigbus_hdl(int sig, siginfo_t *si, void *ptr)
+{
+ switch (si->si_code) {
+ case BUS_MCEERR_AO:
+ fprintf(stderr, "%s: BUS_MCEERR_AO addr: %p len: %d\n",
+ __func__, si->si_addr, 1 << si->si_addr_lsb);
+ sig_mcerr_ao++;
+ break;
+ case BUS_MCEERR_AR:
+ fprintf(stderr, "%s: BUS_MCEERR_AR addr: %p len: %d\n",
+ __func__, si->si_addr, 1 << si->si_addr_lsb);
+ sig_mcerr_ar++;
+ break;
+ default:
+ sig_count++;
+ break;
+ }
+
+ siglongjmp(sj_env, 1);
+}
+
+static int test_dax_poison(int dax_fd, unsigned long align, void *dax_addr,
+ off_t offset)
+{
+ unsigned char *addr = MAP_FAILED;
+ struct sigaction act;
+ unsigned x = x;
+ void *buf;
+ int rc;
+
+ if (posix_memalign(&buf, 4096, 4096) != 0)
+ return -ENOMEM;
+
+ memset(&act, 0, sizeof(act));
+ act.sa_sigaction = sigbus_hdl;
+ act.sa_flags = SA_SIGINFO;
+
+ if (sigaction(SIGBUS, &act, 0)) {
+ fail();
+ rc = -errno;
+ goto out;
+ }
+
+ /* dirty the block on disk to bypass the default zero page */
+ rc = pwrite(dax_fd, buf, 4096, offset + align / 2);
+ if (rc < 4096) {
+ fail();
+ rc = -ENXIO;
+ goto out;
+ }
+ fsync(dax_fd);
+
+ addr = mmap(dax_addr, 2*align, PROT_READ|PROT_WRITE,
+ MAP_SHARED_VALIDATE|MAP_POPULATE|MAP_SYNC, dax_fd, offset);
+ if (addr == MAP_FAILED) {
+ fail();
+ rc = -errno;
+ goto out;
+ }
+
+ if (sigsetjmp(sj_env, 1)) {
+ if (sig_mcerr_ar) {
+ fprintf(stderr, "madvise triggered 'action required' sigbus\n");
+ goto clear_error;
+ } else if (sig_count) {
+ fail();
+ return -ENXIO;
+ }
+ }
+
+ rc = madvise(addr + align / 2, 4096, MADV_HWPOISON);
+ if (rc) {
+ fail();
+ rc = -errno;
+ goto out;
+ }
+
+ /* clear the error */
+clear_error:
+ if (!sig_mcerr_ar) {
+ fail();
+ rc = -ENXIO;
+ goto out;
+ }
+
+ rc = fallocate(dax_fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
+ offset + align / 2, 4096);
+ if (rc) {
+ fail();
+ rc = -errno;
+ goto out;
+ }
+
+ rc = pwrite(dax_fd, buf, 4096, offset + align / 2);
+ if (rc < 4096) {
+ fail();
+ rc = -ENXIO;
+ goto out;
+ }
+ fsync(dax_fd);
+
+ /* check that we can fault in the poison page */
+ x = *(volatile unsigned *) addr + align / 2;
+ rc = 0;
+
+out:
+ if (addr != MAP_FAILED)
+ munmap(addr, 2 * align);
+ free(buf);
+ return rc;
+}
+
/* test_pmd assumes that fd references a pre-allocated + dax-capable file */
static int test_pmd(int fd)
{
- unsigned long long m_align, p_align;
+ unsigned long long m_align, p_align, pmd_off;
struct fiemap_extent *ext;
+ void *base, *pmd_addr;
struct fiemap *map;
int rc = -ENXIO;
unsigned long i;
- void *base;
if (fd < 0) {
fail();
@@ -249,9 +365,15 @@ static int test_pmd(int fd)
m_align = ALIGN(base, HPAGE_SIZE) - ((unsigned long) base);
p_align = ALIGN(ext->fe_physical, HPAGE_SIZE) - ext->fe_physical;
- rc = test_dax_directio(fd, HPAGE_SIZE, (char *) base + m_align,
- ext->fe_logical + p_align);
+ pmd_addr = (char *) base + m_align;
+ pmd_off = ext->fe_logical + p_align;
+ rc = test_dax_directio(fd, HPAGE_SIZE, pmd_addr, pmd_off);
+ if (rc)
+ goto err_directio;
+
+ rc = test_dax_poison(fd, HPAGE_SIZE, pmd_addr, pmd_off);
+ err_directio:
err_extent:
err_mmap:
free(map);
4 years