[ndctl PATCH 0/2] ndctl: dax device support
by Dan Williams
Enable the "dax" mode of a pfn device to allow persistent memory to be
accessed through /dev/daxX.Y rather than /dev/pmemX.
---
Dan Williams (2):
ndctl: add library support for 'dax' devices
ndctl: utility support for dax devices
Documentation/ndctl-create-namespace.txt | 12 +
builtin-xaction-namespace.c | 33 ++-
configure.ac | 26 ++
lib/libndctl.c | 367 ++++++++++++++++++++++++++++--
lib/libndctl.sym | 25 ++
lib/ndctl/libndctl.h.in | 38 +++
util/json.c | 17 +
7 files changed, 494 insertions(+), 24 deletions(-)
4 years, 11 months
[PATCH v4 0/7] dax: handling media errors
by Vishal Verma
Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.
The first three patches from Dan re-enable dax even when media
errors are present.
The fourth patch from Matthew removes the zeroout path from dax
entirely, making zeroout operations always go through the driver
(The motivation is that if a backing device has media errors,
and we create a sparse file on it, we don't want the initial
zeroing to happen via dax, we want to give the block driver a
chance to clear the errors).
The fifth patch changes how DAX IO is re-routed as direct IO.
We add a new iocb flag for DAX to distinguish it from actual
direct IO, and if we're in O_DIRECT, use the regular direct_IO
path instead of DAX. This gives us an opportunity to do recovery
by doing O_DIRECT writes that will go through the driver to clear
errors from bad sectors.
Patch 6 reduces our calls to clear_pmem from dax in the
truncate/hole-punch cases. We check if the range being truncated
is sector aligned/sized, and if so, send blkdev_issue_zeroout
instead of clear_pmem so that errors can be handled better by
the driver.
Patch 7 fixes a redundant comment in DAX and is mostly unrelated
to the rest of this series.
This series also depends on/is based on Jan Kara's DAX Locking
fixes series [1].
[1]: http://www.spinics.net/lists/linux-mm/msg105819.html
v4:
- Remove the dax->direct_IO fallbacks entirely. Instead, go through
the usual direct_IO path when we're in O_DIRECT, and use dax_IO
for other, non O_DIRECT IO. (Dan, Christoph)
v3:
- Wrapper-ize the direct_IO fallback again and make an exception
for -EIOCBQUEUED (Jeff, Dan)
- Reduce clear_pmem usage in DAX to the minimum
Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)
Matthew Wilcox (1):
dax: use sb_issue_zerout instead of calling dax_clear_sectors
Vishal Verma (3):
fs: prioritize and separate direct_io from dax_io
dax: for truncate/hole-punch, do zeroing through the driver if
possible
dax: fix a comment in dax_zero_page_range and dax_truncate_page
arch/powerpc/sysdev/axonram.c | 10 +++---
block/ioctl.c | 9 -----
drivers/block/brd.c | 9 ++---
drivers/block/loop.c | 2 +-
drivers/nvdimm/pmem.c | 17 +++++++---
drivers/s390/block/dcssblk.c | 12 +++----
fs/block_dev.c | 19 ++++++++---
fs/dax.c | 78 +++++++++++++++----------------------------
fs/ext2/inode.c | 23 ++++++++-----
fs/ext4/file.c | 2 +-
fs/ext4/inode.c | 19 +++++++----
fs/xfs/xfs_aops.c | 20 +++++++----
fs/xfs/xfs_bmap_util.c | 15 +++------
fs/xfs/xfs_file.c | 4 +--
include/linux/blkdev.h | 3 +-
include/linux/dax.h | 1 -
include/linux/fs.h | 15 +++++++--
mm/filemap.c | 4 +--
18 files changed, 134 insertions(+), 128 deletions(-)
--
2.5.5
4 years, 11 months
[PATCH] libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure
by Dan Williams
I had relied on the kbuild robot for cross build coverage, however it
only builds alpha_defconfig. Switch from HPAGE_SIZE to PMD_SIZE, which
is more widely defined.
Fixes: 658922e57b84 ("libnvdimm, pfn: fix memmap reservation sizing")
Cc: <stable(a)vger.kernel.org>
Reported-by: Guenter Roeck <guenter(a)roeck-us.net>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/nvdimm/pmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5101f3ab4f29..e5a8bf032ec9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -404,7 +404,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
* vmemmap_populate_hugepages() allocates the memmap array in
* HPAGE_SIZE chunks.
*/
- memmap_size = ALIGN(64 * npfns, HPAGE_SIZE);
+ memmap_size = ALIGN(64 * npfns, PMD_SIZE);
offset = ALIGN(start + SZ_8K + memmap_size, nd_pfn->align)
- start;
} else if (nd_pfn->mode == PFN_MODE_RAM)
4 years, 11 months
[PATCH v3] test: Add a unit test for dax error handling
by Vishal Verma
When we have a namespace with media errors, DAX should fail when trying
to map the bad blocks for direct access, but a regular write() to the
same sector should go through the driver and clear the error.
This test checks for all of the above happening - failure for a read()
on a file with a bad block, failure on an mmap-read for the same, and
finally a successful write that clears the bad block.
It also tests that a hole punch to a badblock (if the hole-punch is
sector aligned and sized) clears the error.
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
v3: Disable the parts that test error clearing with writes through DAX
v2: Also test that punching a hole clears poison.
Makefile.am | 5 +-
test/dax-errors.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++++
test/dax-errors.sh | 134 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 283 insertions(+), 2 deletions(-)
create mode 100644 test/dax-errors.c
create mode 100755 test/dax-errors.sh
diff --git a/Makefile.am b/Makefile.am
index 3f7dca3..27b06a6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -145,8 +145,8 @@ EXTRA_DIST += lib/libndctl.pc.in
CLEANFILES += lib/libndctl.pc
TESTS = test/libndctl test/dpa-alloc test/parent-uuid test/create.sh \
- test/clear.sh
-check_PROGRAMS = test/libndctl test/dpa-alloc test/parent-uuid
+ test/clear.sh test/dax-errors.sh
+check_PROGRAMS = test/libndctl test/dpa-alloc test/parent-uuid test/dax-errors
if ENABLE_DESTRUCTIVE
TESTS += test/blk-ns test/pmem-ns test/pcommit
@@ -179,3 +179,4 @@ test_dax_dev_LDADD = lib/libndctl.la
test_dax_pmd_SOURCES = test/dax-pmd.c
test_mmap_SOURCES = test/mmap.c
+test_dax_err_SOURCES = test/dax-errors.c
diff --git a/test/dax-errors.c b/test/dax-errors.c
new file mode 100644
index 0000000..11d0031
--- /dev/null
+++ b/test/dax-errors.c
@@ -0,0 +1,146 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <stdlib.h>
+#include <linux/fs.h>
+#include <linux/fiemap.h>
+#include <setjmp.h>
+
+#define fail() fprintf(stderr, "%s: failed at: %d\n", __func__, __LINE__)
+
+static sigjmp_buf sj_env;
+static int sig_count;
+
+static void sigbus_hdl(int sig, siginfo_t *siginfo, void *ptr)
+{
+ fprintf(stderr, "** Received a SIGBUS **\n");
+ sig_count++;
+ siglongjmp(sj_env, 1);
+}
+
+static int test_dax_read_err(int fd)
+{
+ void *base, *buf;
+ int rc = 0;
+
+ if (fd < 0) {
+ fail();
+ return -ENXIO;
+ }
+
+ if (posix_memalign(&buf, 4096, 4096) != 0)
+ return -ENOMEM;
+
+ base = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ if (base == MAP_FAILED) {
+ perror("mmap");
+ rc = -ENXIO;
+ goto err_mmap;
+ }
+
+ if (sigsetjmp(sj_env, 1)) {
+ if (sig_count == 1) {
+ fprintf(stderr, "Failed to read from mapped file\n");
+ free(buf);
+ if (base) {
+ if (munmap(base, 4096) < 0) {
+ fail();
+ return 1;
+ }
+ }
+ return 1;
+ }
+ return sig_count;
+ }
+
+ /* read a page through DAX (should fail due to a bad block) */
+ memcpy(buf, base, 4096);
+
+ err_mmap:
+ free(buf);
+ return rc;
+}
+
+/* TODO: disabled till we get clear-on-write in the kernel */
+#if 0
+static int test_dax_write_clear(int fd)
+{
+ void *buf;
+ int rc = 0;
+
+ if (fd < 0) {
+ fail();
+ return -ENXIO;
+ }
+
+ if (posix_memalign(&buf, 4096, 4096) != 0)
+ return -ENOMEM;
+ memset(buf, 0, 4096);
+
+ /*
+ * Attempt to write zeroes to the first page of the file using write()
+ * This should clear the pmem errors/bad blocks
+ */
+ printf("Attempting to write\n");
+ if (write(fd, buf, 4096) < 0)
+ rc = errno;
+
+ free(buf);
+ return rc;
+}
+#endif
+
+int main(int argc, char *argv[])
+{
+ int fd, rc;
+ struct sigaction act;
+
+ if (argc < 1)
+ return -EINVAL;
+
+ memset(&act, 0, sizeof(act));
+ act.sa_sigaction = sigbus_hdl;
+ act.sa_flags = SA_SIGINFO;
+
+ if (sigaction(SIGBUS, &act, 0)) {
+ fail();
+ return 1;
+ }
+
+ fd = open(argv[1], O_RDWR | O_DIRECT);
+
+ /* Start the test. First, we do an mmap-read, and expect it to fail */
+ rc = test_dax_read_err(fd);
+ if (rc == 0) {
+ fprintf(stderr, "Expected read to fail, but it succeeded\n");
+ rc = -ENXIO;
+ goto out;
+ }
+ if (rc > 1) {
+ fprintf(stderr, "Received a second SIGBUS, exiting.\n");
+ rc = -ENXIO;
+ goto out;
+ }
+ printf(" mmap-read failed as expected\n");
+ rc = 0;
+
+ /* Next, do a regular (O_DIRECT) write() */
+ /* TODO: Disable this till we have clear-on-write in the kernel
+ * rc = test_dax_write_clear(fd);
+ *
+ * if (rc)
+ * perror("write");
+ */
+
+ out:
+ if (fd >= 0)
+ close(fd);
+ return rc;
+}
diff --git a/test/dax-errors.sh b/test/dax-errors.sh
new file mode 100755
index 0000000..cf9dd3a
--- /dev/null
+++ b/test/dax-errors.sh
@@ -0,0 +1,134 @@
+#!/bin/bash -x
+
+DEV=""
+NDCTL="./ndctl"
+BUS="-b nfit_test.0"
+BUS1="-b nfit_test.1"
+MNT=test_dax_mnt
+FILE=image
+json2var="s/[{}\",]//g; s/:/=/g"
+rc=77
+
+err() {
+ rc=1
+ echo "test/dax-errors: failed at line $1"
+ rm -f $FILE
+ rm -f $MNT/$FILE
+ if [ -n "$blockdev" ]; then
+ umount /dev/$blockdev
+ else
+ rc=77
+ fi
+ rmdir $MNT
+ exit $rc
+}
+
+set -e
+mkdir -p $MNT
+trap 'err $LINENO' ERR
+
+# setup (reset nfit_test dimms)
+modprobe nfit_test
+$NDCTL disable-region $BUS all
+$NDCTL zero-labels $BUS all
+$NDCTL enable-region $BUS all
+
+rc=1
+
+# create pmem
+dev="x"
+json=$($NDCTL create-namespace $BUS -t pmem -m raw)
+eval $(echo $json | sed -e "$json2var")
+[ $dev = "x" ] && echo "fail: $LINENO" && exit 1
+[ $mode != "raw" ] && echo "fail: $LINENO" && exit 1
+
+# check for expected errors in the middle of the namespace
+read sector len < /sys/block/$blockdev/badblocks
+[ $((sector * 2)) -ne $((size /512)) ] && echo "fail: $LINENO" && exit 1
+if dd if=/dev/$blockdev of=/dev/null iflag=direct bs=512 skip=$sector count=$len; then
+ echo "fail: $LINENO" && exit 1
+fi
+
+# check that writing clears the errors
+if ! dd of=/dev/$blockdev if=/dev/zero oflag=direct bs=512 seek=$sector count=$len; then
+ echo "fail: $LINENO" && exit 1
+fi
+
+if read sector len < /sys/block/$blockdev/badblocks; then
+ # fail if reading badblocks returns data
+ echo "fail: $LINENO" && exit 1
+fi
+
+#mkfs.xfs /dev/$blockdev -b size=4096 -f
+mkfs.ext4 /dev/$blockdev -b 4096
+mount /dev/$blockdev $MNT -o dax
+
+# prepare an image file with random data
+dd if=/dev/urandom of=$FILE bs=4096 count=4
+test -s $FILE
+
+# copy it to the dax file system
+cp $FILE $MNT/$FILE
+
+# Get the start sector for the file
+start_sect=$(filefrag -v -b512 $MNT/$FILE | grep -E "^[ ]+[0-9]+.*" | head -1 | awk '{ print $4 }' | cut -d. -f1)
+test -n "$start_sect"
+echo "start sector of the file is $start_sect"
+
+# inject badblocks for one page at the start of the file
+echo $start_sect 8 > /sys/block/$blockdev/badblocks
+
+# make sure reading the first block of the file fails as expected
+: The following 'dd' is expected to hit an I/O Error
+dd if=$MNT/$FILE of=/dev/null iflag=direct bs=4096 count=1 && err $LINENO || true
+
+# run the dax-errors test
+test -x test/dax-errors
+test/dax-errors $MNT/$FILE
+
+# TODO: disable this check till we have clear-on-write in the kernel
+#if read sector len < /sys/block/$blockdev/badblocks; then
+# # fail if reading badblocks returns data
+# echo "fail: $LINENO" && exit 1
+#fi
+
+# TODO Due to the above, we have to clear the existing badblock manually
+read sector len < /sys/block/$blockdev/badblocks
+if ! dd of=/dev/$blockdev if=/dev/zero oflag=direct bs=512 seek=$sector count=$len; then
+ echo "fail: $LINENO" && exit 1
+fi
+
+
+# test that a hole punch to a dax file also clears errors
+dd if=/dev/urandom of=$MNT/$FILE oflag=direct bs=4096 count=4
+start_sect=$(filefrag -v -b512 $MNT/$FILE | grep -E "^[ ]+[0-9]+.*" | head -1 | awk '{ print $4 }' | cut -d. -f1)
+test -n "$start_sect"
+echo "holepunch test: start sector: $start_sect"
+
+# inject a badblock at the second sector of the first page
+echo $((start_sect + 1)) 1 > /sys/block/$blockdev/badblocks
+
+# verify badblock by reading
+: The following 'dd' is expected to hit an I/O Error
+dd if=$MNT/$FILE of=/dev/null iflag=direct bs=4096 count=1 && err $LINENO || true
+
+# hole punch the second sector, and verify it clears the
+# badblock (and doesn't fail)
+if ! fallocate -p -o 0 -l 1024 $MNT/$FILE; then
+ echo "fail: $LINENO" && exit 1
+fi
+[ -n "$(cat /sys/block/$blockdev/badblocks)" ] && echo "error: $LINENO" && exit 1
+
+# cleanup
+rm -f $FILE
+rm -f $MNT/$FILE
+if [ -n "$blockdev" ]; then
+ umount /dev/$blockdev
+fi
+rmdir $MNT
+
+$NDCTL disable-region $BUS all
+$NDCTL disable-region $BUS1 all
+modprobe -r nfit_test
+
+exit 0
--
2.5.5
4 years, 11 months
[PATCH v11 0/5] libnvidmm, nfit: dimm command marshaling
by Dan Williams
Jerry and I have been working towards a way to support the ACPI DSM
command set needed by HPE DIMMs. The HPE command sets differ
from the original Intel-defined command set already upstream.
Ideally the kernel would only implement a single standard command
format, however the standard is not yet available and devices
implementing an alternate command set are already shipping.
This rework of Jerry's initial patches [1] aims to support shipping
devices while encouraging future / follow-on command definitions to wait
for the standardization process to complete by:
1/ Requiring public documentation of commands
2/ Providing a mechanism to disable vendor-specific functionality
See patch 2 for more details. This patch passes the existing nvdimm
unit tests, but I have yet to extend the tests to target this new
mechanism.
Changes since v10: [1]
1/ Rewrote the commit message for the patch that introduces ND_CMD_CALL
2/ Replace 'nfit_cmd_family_tbl' with nfit_mem->family to clean up some
lookup code.
3/ Squash and reorganize the 7 patches into a smaller set. Commit
8467ba4fc94a from my for-4.7/dsm branch [2] was also squashed.
4/ Add sysfs attributes for the dimm family and DSM function-supported
mask.
5/ Add a module parameter to disable vendor specific commands
[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-April/005484.html
[2]: https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=fo...
---
Dan Williams (5):
nfit, libnvdimm: clarify "commands" vs "_DSMs"
nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
nfit: disable vendor specific commands
tools/testing/nvdimm: ND_CMD_CALL support
nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
drivers/acpi/nfit.c | 145 +++++++++++++++++++++++++++++++++-----
drivers/acpi/nfit.h | 18 ++++-
drivers/nvdimm/bus.c | 47 +++++++++++-
drivers/nvdimm/core.c | 2 -
drivers/nvdimm/dimm_devs.c | 18 +++--
drivers/nvdimm/nd-core.h | 2 -
include/linux/libnvdimm.h | 5 +
include/uapi/linux/ndctl.h | 42 +++++++++++
tools/testing/nvdimm/test/nfit.c | 46 ++++++++----
9 files changed, 275 insertions(+), 50 deletions(-)
4 years, 11 months
[GIT PULL] libnvdimm fixes for 4.6-rc7
by Williams, Dan J
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes
...to receive:
1/ A fix for the persistent memory 'struct page' driver. The
implementation overlooked the fact that pages are allocated in 2MB
units leading to -ENOMEM when establishing some configurations. It's
tagged for -stable as the problem was introduced with the initial
implementation in 4.5.
2/ The new "error status translation" routine, introduced with the 4.6
updates to the nfit driver, missed a necessary path in acpi_nfit_ctl().
End result is that we are falsely assuming commands complete
successfully when the embedded status says otherwise.
Full changelog and diff below, these have received a positive build
notification from the kbuild robot over 107 configs.
---
The following changes since commit 02da2d72174c61988eb4456b53f405e3ebdebce4:
Linux 4.6-rc5 (2016-04-24 16:17:05 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes
for you to fetch changes up to 2eea65829dc6c20dccbe79726fd0f3fe7f8aa43b:
nfit: fix translation of command status results (2016-05-02 09:11:53 -0700)
----------------------------------------------------------------
Dan Williams (2):
libnvdimm, pfn: fix memmap reservation sizing
nfit: fix translation of command status results
drivers/acpi/nfit.c | 5 ++++-
drivers/nvdimm/pmem.c | 13 ++++++++++---
2 files changed, 14 insertions(+), 4 deletions(-)
commit 658922e57b847bb7112aa67f6441b6bbc6554412
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Sat Apr 30 13:07:06 2016 -0700
libnvdimm, pfn: fix memmap reservation sizing
When configuring a pfn-device instance to allocate the memmap array it
needs to account for the fact that vmemmap_populate_hugepages()
allocates struct page blocks in HPAGE_SIZE chunks. We need to align the
reserved area size to 2MB otherwise arch_add_memory() runs out of memory
while establishing the memmap:
WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
[..]
Call Trace:
[<ffffffff8148bdb3>] dump_stack+0x85/0xc2
[<ffffffff810a749b>] __warn+0xcb/0xf0
[<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
[<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
[<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
[<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
[<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
[<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
[<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
[<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
[..]
ndbus0: nd_pmem.probe(pfn3.0) = -12
nd_pmem: probe of pfn3.0 failed with error -12
libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
Reported-by: Namratha Kothapalli <namratha.n.kothapalli(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f798899338ed..5101f3ab4f29 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -397,10 +397,17 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
*/
start += start_pad;
npfns = (pmem->size - start_pad - end_trunc - SZ_8K) / SZ_4K;
- if (nd_pfn->mode == PFN_MODE_PMEM)
- offset = ALIGN(start + SZ_8K + 64 * npfns, nd_pfn->align)
+ if (nd_pfn->mode == PFN_MODE_PMEM) {
+ unsigned long memmap_size;
+
+ /*
+ * vmemmap_populate_hugepages() allocates the memmap array in
+ * HPAGE_SIZE chunks.
+ */
+ memmap_size = ALIGN(64 * npfns, HPAGE_SIZE);
+ offset = ALIGN(start + SZ_8K + memmap_size, nd_pfn->align)
- start;
- else if (nd_pfn->mode == PFN_MODE_RAM)
+ } else if (nd_pfn->mode == PFN_MODE_RAM)
offset = ALIGN(start + SZ_8K, nd_pfn->align) - start;
else
goto err;
commit 2eea65829dc6c20dccbe79726fd0f3fe7f8aa43b
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon May 2 09:11:53 2016 -0700
nfit: fix translation of command status results
When transportation of the command completes successfully, it indicates
that the 'status' result is valid. Fix the missed checking and
translation of the status field at the end of acpi_nfit_ctl().
Otherwise, we fail to handle reported errors and assume commands
complete successfully.
Reported-by: Linda Knippers <linda.knippers(a)hpe.com>
Reviewed-by: Johannes Thumshirn <jthumshirn(a)suse.de>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index d0f35e63640b..63cc9dbe4f3b 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -287,8 +287,11 @@ static int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc,
offset);
rc = -ENXIO;
}
- } else
+ } else {
rc = 0;
+ if (cmd_rc)
+ *cmd_rc = xlat_status(buf, cmd);
+ }
out:
ACPI_FREE(out_obj);
4 years, 11 months
[PATCH v2-UPDATE 3/3] xfs: Add alignment check for DAX mount
by Toshi Kani
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.
Call bdev_direct_access to check the alignment when -o dax is
specified.
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Reviewed-by: Boaz Harrosh <boaz(a)plexistor.com>
Reviewed-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Boaz Harrosh <boaz(a)plexistor.com>
---
v2-UPDATE
- Add a period and fix a typo in error messages (Ross Zwisler)
---
fs/xfs/xfs_super.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 187e14b..ac18fae 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1557,15 +1557,30 @@ xfs_fs_fill_super(
sb->s_flags |= MS_I_VERSION;
if (mp->m_flags & XFS_MOUNT_DAX) {
+ struct blk_dax_ctl dax = {
+ .sector = 0,
+ .size = PAGE_SIZE,
+ };
xfs_warn(mp,
- "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
+ "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
if (sb->s_blocksize != PAGE_SIZE) {
xfs_alert(mp,
- "Filesystem block size invalid for DAX Turning DAX off.");
+ "Filesystem block size invalid for DAX. Turning DAX off.");
mp->m_flags &= ~XFS_MOUNT_DAX;
- } else if (!sb->s_bdev->bd_disk->fops->direct_access) {
- xfs_alert(mp,
- "Block device does not support DAX Turning DAX off.");
+ } else if ((error = bdev_direct_access(sb->s_bdev, &dax)) < 0) {
+ switch (error) {
+ case -EOPNOTSUPP:
+ xfs_alert(mp,
+ "Block device does not support DAX. Turning DAX off.");
+ break;
+ case -EINVAL:
+ xfs_alert(mp,
+ "Partition alignment invalid for DAX. Turning DAX off.");
+ break;
+ default:
+ xfs_alert(mp,
+ "DAX access failed (%d). Turning DAX off.", error);
+ }
mp->m_flags &= ~XFS_MOUNT_DAX;
}
}
4 years, 11 months
[PATCH v2 0/3] Add alignment check for DAX mount
by Toshi Kani
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.
Add alignment check to ext4, ext2, and xfs.
v2:
- Use a helper function via ->direct_access for the check.
(Christoph Hellwig)
- Call bdev_direct_access() with sector 0 for the check.
(Boaz Harrosh)
---
Toshi Kani (3):
1/3 ext4: Add alignment check for DAX mount
2/3 ext2: Add alignment check for DAX mount
3/3 xfs: Add alignment check for DAX mount
---
fs/ext2/super.c | 21 +++++++++++++++++++--
fs/ext4/super.c | 20 ++++++++++++++++++--
fs/xfs/xfs_super.c | 23 +++++++++++++++++++----
3 files changed, 56 insertions(+), 8 deletions(-)
4 years, 11 months
[PATCH v2 0/5] dax: handling of media errors
by Vishal Verma
Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.
The first three patches from Dan re-enable dax even when media
errors are present.
The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).
One pending item is addressing clear_pmem usages in dax.c. clear_pmem is
'unsafe' as it attempts to simply memcpy, and does not go through the driver.
We have a few options of solving this:
1. Remove all usages of clear_pmem that are not sector-aligned. For the
ones that are aligned, replace them with a bio submission that goes
through the driver to clear errors.
2. Export from the block layer, either an API to zero sub-sector ranges,
or in general, clear errors in a range. The dax attempts to clear_pmem
can then use either of these and not be hit be media errors.
I'll send out a v3 with a crack at option 1, but I wanted to get these
changes (especially the ones in xfs) out for review.
The fifth patch changes all the callers of dax_do_io to check for
EIO, and fallback to direct_IO as needed. This forces the IO to
go through the block driver, and can attempt to clear the error.
v2:
- Use blockdev_issue_zeroout in xfs instead of sb_issue_zeroout (Christoph)
- Un-wrapper-ize dax_do_io and leave the fallback to direct_IO to callers
(Christoph)
- Rebase to v4.6-rc1 (fixup a couple of conflicts in ext4 and xfs)
Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)
Vishal Verma (2):
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: handle media errors in dax_do_io
arch/powerpc/sysdev/axonram.c | 10 +++++-----
block/ioctl.c | 9 ---------
drivers/block/brd.c | 9 +++++----
drivers/nvdimm/pmem.c | 17 +++++++++++++----
drivers/s390/block/dcssblk.c | 12 ++++++------
fs/block_dev.c | 19 +++++++++++++++----
fs/dax.c | 36 ++----------------------------------
fs/ext2/inode.c | 29 ++++++++++++++++++-----------
fs/ext4/indirect.c | 18 +++++++++++++-----
fs/ext4/inode.c | 21 ++++++++++++++-------
fs/xfs/xfs_aops.c | 14 ++++++++++++--
fs/xfs/xfs_bmap_util.c | 15 ++++-----------
include/linux/blkdev.h | 3 +--
include/linux/dax.h | 1 -
14 files changed, 108 insertions(+), 105 deletions(-)
--
2.5.5
4 years, 11 months
[PATCH] nfit: fix translation of command status results
by Dan Williams
When transportation of the command completes successfully, it indicates
that the 'status' result is valid. Fix the missed checking and
translation of the status field at the end of acpi_nfit_ctl().
Otherwise, we fail to handle reported errors and assume commands
complete successfully.
Reported-by: Linda Knippers <linda.knippers(a)hpe.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/acpi/nfit.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index d0f35e63640b..63cc9dbe4f3b 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -287,8 +287,11 @@ static int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc,
offset);
rc = -ENXIO;
}
- } else
+ } else {
rc = 0;
+ if (cmd_rc)
+ *cmd_rc = xlat_status(buf, cmd);
+ }
out:
ACPI_FREE(out_obj);
4 years, 11 months