[PATCH 00/11] mm: sub-section memory hotplug support
by Dan Williams
Quoting "[PATCH 09/11] mm: support section-unaligned ZONE_DEVICE memory
ranges":
---
The initial motivation for this change is persistent memory platforms
that, unfortunately, align the pmem range on a boundary less than a full
section (64M vs 128M), and may change the alignment from one boot to the
next. A secondary motivation is the arrival of prospective ZONE_DEVICE
users that want devm_memremap_pages() to map PCI-E device memory ranges
to enable peer-to-peer DMA.
Currently the nvdimm core injects padding when 'pfn' (struct page
mapping configuration) instances are created. However, not all users of
devm_memremap_pages() have the opportunity to inject such padding. Users
of the memmap=ss!nn kernel command line option can trigger the following
failure with unaligned parameters like "memmap=0xfc000000!8G":
WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
[..]
Call Trace:
[<ffffffff814c0393>] dump_stack+0x86/0xc3
[<ffffffff810b173b>] __warn+0xcb/0xf0
[<ffffffff810b17bf>] warn_slowpath_fmt+0x5f/0x80
[<ffffffff811eb105>] devm_memremap_pages+0x3b5/0x4c0
[<ffffffffa006f308>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
[<ffffffffa00e231a>] pmem_attach_disk+0x19a/0x440 [nd_pmem]
Without this change a user could inadvertently lose access to nvdimm
namespaces by adding/removing other DIMMs in the platform leading to the
BIOS changing the base alignment of the namespace in an incompatible
fashion. With this support we can accommodate a BIOS changing the
namespace to any alignment provided it is >= SECTION_ACTIVE_SIZE.
---
Andrew, yes, this is rather late for 4.10, but it is ostensibly a fix
for devm_memremap_pages(). Both the memmap=ss!nn and qemu-kvm methods of
defining persistent memory can generate the misaligned configuration.
However, in those cases the existing devm_memremap_pages() would have
failed so no one could be relying on that.
The greater concern is new misalignment injected by the BIOS after the
libnvdimm sub-system already recorded that the namespace does not need
alignment padding. In that case the user would need to figure out how to
undo the BIOS change to regain access to their nvdimm device.
The patches have received a build success notification from the
0day-kbuild robot across 177 configs and pass the ndctl unit test suite.
They merge cleanly on top of current -next (test merge with
next-20161201).
---
Dan Williams (11):
mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
mm: introduce struct mem_section_usage to track partial population of a section
mm: introduce common definitions for the size and mask of a section
mm: cleanup sparse_init_one_section() return value
mm: track active portions of a section at boot
mm: fix register_new_memory() zone type detection
mm: convert kmalloc_section_memmap() to populate_section_memmap()
mm: prepare for hot-{add,remove} of sub-section ranges
mm: support section-unaligned ZONE_DEVICE memory ranges
mm: enable section-unaligned devm_memremap_pages()
libnvdimm, pfn, dax: stop padding pmem namespaces to section alignment
arch/x86/mm/init_64.c | 15 +
drivers/base/memory.c | 26 +-
drivers/nvdimm/pfn_devs.c | 40 +---
include/linux/memory.h | 4
include/linux/memory_hotplug.h | 6 -
include/linux/mm.h | 3
include/linux/mmzone.h | 26 ++
kernel/memremap.c | 75 ++++---
mm/Kconfig | 1
mm/memory_hotplug.c | 95 ++++----
mm/page_alloc.c | 6 -
mm/sparse-vmemmap.c | 24 +-
mm/sparse.c | 454 +++++++++++++++++++++++++++++-----------
13 files changed, 509 insertions(+), 266 deletions(-)
4 years, 1 month
[PATCH 0/5] acpi, nfit: acpi_nfit_ctl() corner case fixes + tests
by Dan Williams
>From [PATCH 5/5] tools/testing/nvdimm: unit test acpi_nfit_ctl():
---
A recent flurry of bug discoveries in the nfit driver's DSM marshalling
routine has highlighted the fact that we do not have unit test coverage
for this routine. Add a self-test of acpi_nfit_ctl() routine before
probing the "nfit_test.0" device. This mocks stimulus to acpi_nfit_ctl()
and if any of the tests fail "nfit_test.0" will be unavailable causing
the rest of the tests to not run / fail.
This unit test will also be a place to land reproductions of quirky
BIOS behavior discovered in the field and ensure the kernel does not
regress against implementations it has seen in practice.
---
The problems addressed in this round of fixes date are concentrated
around the variable-length output from the ARS (Address Range Scrub)
Status command, and the proper handling of per-command extended status
values.
They are urgent as improper handling of ARS commands can lead to a
platform that fails to boot. If there are media errors in the boot path
on a platform that does not support machine-check recovery we rely on
the ARS to inform the pmem driver which address ranges may trigger a
machine-check when read.
---
Dan Williams (4):
acpi, nfit, libnvdimm: fix / harden ars_status output length handling
acpi, nfit: validate ars_status output buffer size
acpi, nfit: fix bus vs dimm confusion in xlat_status
tools/testing/nvdimm: unit test acpi_nfit_ctl()
Vishal Verma (1):
acpi, nfit: fix extended status translations for ACPI DSMs
drivers/acpi/nfit/core.c | 54 +++++---
drivers/acpi/nfit/nfit.h | 2
drivers/nvdimm/bus.c | 25 +++
include/linux/libnvdimm.h | 2
tools/testing/nvdimm/Kbuild | 1
tools/testing/nvdimm/test/iomap.c | 23 +++
tools/testing/nvdimm/test/nfit.c | 236 ++++++++++++++++++++++++++++++++-
tools/testing/nvdimm/test/nfit_test.h | 8 +
8 files changed, 324 insertions(+), 27 deletions(-)
4 years, 1 month
[ndctl PATCH] test, device-dax: test read-only mappings
by Dan Williams
Hugh notes:
"I think that is more restrictive than you intended: haven't tried, but
I believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving
no way to mmap /dev/dax without write permission to it."
Reported-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
test/device-dax.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/test/device-dax.c b/test/device-dax.c
index 82154d5c1fff..75b17ed63088 100644
--- a/test/device-dax.c
+++ b/test/device-dax.c
@@ -201,22 +201,22 @@ static int test_device_dax(int loglevel, struct ndctl_test *test,
}
sprintf(path, "/dev/%s", daxctl_dev_get_devname(dev));
- fd = open(path, O_RDWR);
+ fd = open(path, O_RDONLY);
if (fd < 0) {
- fprintf(stderr, "%s: failed to open device-dax instance\n",
+ fprintf(stderr, "%s: failed to open(O_RDONLY) device-dax instance\n",
daxctl_dev_get_devname(dev));
rc = -ENXIO;
goto out;
}
- buf = mmap(NULL, VERIFY_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+ buf = mmap(NULL, VERIFY_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
if (buf != MAP_FAILED) {
fprintf(stderr, "%s: expected MAP_PRIVATE failure\n", path);
rc = -ENXIO;
goto out;
}
- buf = mmap(NULL, VERIFY_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ buf = mmap(NULL, VERIFY_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (buf == MAP_FAILED) {
fprintf(stderr, "%s: expected MAP_SHARED success\n", path);
return -ENXIO;
@@ -226,6 +226,24 @@ static int test_device_dax(int loglevel, struct ndctl_test *test,
if (rc)
goto out;
+ /* upgrade to a writable mapping */
+ close(fd);
+ munmap(buf, VERIFY_SIZE);
+ fd = open(path, O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "%s: failed to open(O_RDWR) device-dax instance\n",
+ daxctl_dev_get_devname(dev));
+ rc = -ENXIO;
+ goto out;
+ }
+
+ buf = mmap(NULL, VERIFY_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ if (buf == MAP_FAILED) {
+ fprintf(stderr, "%s: expected PROT_WRITE + MAP_SHARED success\n",
+ path);
+ return -ENXIO;
+ }
+
/*
* Prior to 4.8-final these tests cause crashes, or are
* otherwise not supported.
4 years, 1 month
[PATCH] device-dax: fix private mapping restriction, permit read-only
by Dan Williams
Hugh notes in response to commit 4cb19355ea19 "device-dax: fail all
private mapping attempts":
"I think that is more restrictive than you intended: haven't tried, but I
believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no
way to mmap /dev/dax without write permission to it."
Indeed it does restrict read-only mappings, switch to checking
VM_MAYSHARE, not VM_SHARED.
Cc: <stable(a)vger.kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda(a)intel.com>
Fixes: 4cb19355ea19 ("device-dax: fail all private mapping attempts")
Reported-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/dax/dax.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c
index 3d94ff20fdca..286447a83dab 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/dax.c
@@ -271,7 +271,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma,
return -ENXIO;
/* prevent private mappings from being established */
- if ((vma->vm_flags & VM_SHARED) != VM_SHARED) {
+ if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
dev_info(dev, "%s: %s: fail, attempted private mapping\n",
current->comm, func);
return -EINVAL;
4 years, 1 month
[PATCH] device-dax: fail all private mapping attempts
by Dan Williams
The device-dax implementation originally tried to be tricky and allow
private read-only mappings, but in the process allowed writable
MAP_PRIVATE + MAP_NORESERVE mappings. For simplicity and predictability
just fail all private mapping attempts since device-dax memory is
statically allocated and will never support overcommit.
Cc: <stable(a)vger.kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Reported-by: Pawel Lebioda <pawel.lebioda(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/dax/dax.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c
index 0e499bfca41c..3d94ff20fdca 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/dax.c
@@ -270,8 +270,8 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma,
if (!dax_dev->alive)
return -ENXIO;
- /* prevent private / writable mappings from being established */
- if ((vma->vm_flags & (VM_NORESERVE|VM_SHARED|VM_WRITE)) == VM_WRITE) {
+ /* prevent private mappings from being established */
+ if ((vma->vm_flags & VM_SHARED) != VM_SHARED) {
dev_info(dev, "%s: %s: fail, attempted private mapping\n",
current->comm, func);
return -EINVAL;
4 years, 1 month
[PATCH v2] nfit: Fix extended status translations for ACPI DSMs
by Vishal Verma
ACPI DSMs can have an 'extended' status which can be non-zero to convey
additional information about the command. In the xlat_status routine,
where we translate the command statuses, we were returning an error for
a non-zero extended status, even if the primary status indicated success.
Return from each command's 'case' once we have verified both its status
and extend status are good.
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
drivers/acpi/nfit/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 71a7d07..60acbb1 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -113,7 +113,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status)
flags = ND_ARS_PERSISTENT | ND_ARS_VOLATILE;
if ((status >> 16 & flags) == 0)
return -ENOTTY;
- break;
+ return 0;
case ND_CMD_ARS_START:
/* ARS is in progress */
if ((status & 0xffff) == NFIT_ARS_START_BUSY)
@@ -122,7 +122,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status)
/* Command failed */
if (status & 0xffff)
return -EIO;
- break;
+ return 0;
case ND_CMD_ARS_STATUS:
ars_status = buf;
/* Command failed */
@@ -154,7 +154,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status)
/* Unknown status */
if (status >> 16)
return -EIO;
- break;
+ return 0;
case ND_CMD_CLEAR_ERROR:
clear_err = buf;
if (status & 0xffff)
@@ -163,7 +163,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status)
return -EIO;
if (clear_err->length > clear_err->cleared)
return clear_err->cleared;
- break;
+ return 0;
default:
break;
}
--
2.7.4
4 years, 1 month
[PATCH] nfit: Fix extended status translations for ACPI DSMs
by Vishal Verma
ACPI DSMs can have an 'extended' status which can be non-zero to convey
additional information about the command. In the xlat_status routine,
where we translate the command statuses, we were returning an error for
a non-zero extended status, even if the primary status indicated success.
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
drivers/acpi/nfit/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 71a7d07..d14f09b 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status)
}
/* all other non-zero status results in an error */
- if (status)
+ if (status & 0xffff)
return -EIO;
return 0;
}
--
2.7.4
4 years, 1 month
[PATCH] e820: use module_platform_driver
by Johannes Thumshirn
Use module_platform_driver for the e820 driver instead of open-coding it.
Signed-off-by: Johannes Thumshirn <jthumshirn(a)suse.de>
---
drivers/nvdimm/e820.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
index 11ea901..6f9a6ff 100644
--- a/drivers/nvdimm/e820.c
+++ b/drivers/nvdimm/e820.c
@@ -84,18 +84,8 @@ static struct platform_driver e820_pmem_driver = {
},
};
-static __init int e820_pmem_init(void)
-{
- return platform_driver_register(&e820_pmem_driver);
-}
-
-static __exit void e820_pmem_exit(void)
-{
- platform_driver_unregister(&e820_pmem_driver);
-}
+module_platform_driver(e820_pmem_driver);
MODULE_ALIAS("platform:e820_pmem*");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
-module_init(e820_pmem_init);
-module_exit(e820_pmem_exit);
--
2.10.2
4 years, 1 month
[PATCH V2 1/3 libnvdimm-pending] libnvdimm: remove else after return in nsio_rw_bytes()
by Fabian Frederick
else after return is not needed.
Signed-off-by: Fabian Frederick <fabf(a)skynet.be>
---
V2: -applied on top of libnvdimm-pending
drivers/nvdimm/claim.c | 37 ++++++++++++++++++-------------------
1 file changed, 18 insertions(+), 19 deletions(-)
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 4638b9e..75c36c3 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -242,29 +242,28 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
return -EIO;
return memcpy_from_pmem(buf, nsio->addr + offset, size);
- } else {
-
- if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
- if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512)) {
- long cleared;
-
- cleared = nvdimm_clear_poison(&ndns->dev,
- offset, size);
- if (cleared != size) {
- size = cleared;
- rc = -EIO;
- }
-
- badblocks_clear(&nsio->bb, sector,
- cleared >> 9);
- } else
+ }
+
+ if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
+ if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512)) {
+ long cleared;
+
+ cleared = nvdimm_clear_poison(&ndns->dev,
+ offset, size);
+ if (cleared != size) {
+ size = cleared;
rc = -EIO;
- }
+ }
- memcpy_to_pmem(nsio->addr + offset, buf, size);
- nvdimm_flush(to_nd_region(ndns->dev.parent));
+ badblocks_clear(&nsio->bb, sector,
+ cleared >> 9);
+ } else
+ rc = -EIO;
}
+ memcpy_to_pmem(nsio->addr + offset, buf, size);
+ nvdimm_flush(to_nd_region(ndns->dev.parent));
+
return rc;
}
--
2.7.4
4 years, 1 month
[PATCH V2 3/3 libnvdimm-pending] libnvdimm, namespace: use octal for permissions
by Fabian Frederick
According to commit f90774e1fd27
("checkpatch: look for symbolic permissions and suggest octal instead")
Signed-off-by: Fabian Frederick <fabf(a)skynet.be>
---
V2: applied on top of libnvdimm-pending
Use calculated value instead of a | b (Suggested by Dan Williams)
drivers/nvdimm/namespace_devs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 7569ba7..6bf60f3 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1132,7 +1132,7 @@ static ssize_t size_show(struct device *dev,
return sprintf(buf, "%llu\n", (unsigned long long)
nvdimm_namespace_capacity(to_ndns(dev)));
}
-static DEVICE_ATTR(size, S_IRUGO, size_show, size_store);
+static DEVICE_ATTR(size, 0444, size_show, size_store);
static u8 *namespace_to_uuid(struct device *dev)
{
@@ -1456,7 +1456,7 @@ static umode_t namespace_visible(struct kobject *kobj,
if (is_namespace_pmem(dev) || is_namespace_blk(dev)) {
if (a == &dev_attr_size.attr)
- return S_IWUSR | S_IRUGO;
+ return 0644;
if (is_namespace_pmem(dev) && a == &dev_attr_sector_size.attr)
return 0;
--
2.7.4
4 years, 1 month