[PATCH 1/3] ndctl, create-namespace: Allow 64K and 16M alignments
by Oliver O'Halloran
These are needed on powerpc since 64K is the default page size and 16MB
is the PMD size when using the hash MMU.
Signed-off-by: Oliver O'Halloran <oohall(a)gmail.com>
---
ndctl/builtin-xaction-namespace.c | 2 ++
util/size.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/ndctl/builtin-xaction-namespace.c b/ndctl/builtin-xaction-namespace.c
index 46d651e86153..d6c38dc15984 100644
--- a/ndctl/builtin-xaction-namespace.c
+++ b/ndctl/builtin-xaction-namespace.c
@@ -494,7 +494,9 @@ static int validate_namespace_options(struct ndctl_region *region,
switch (p->align) {
case SZ_4K:
+ case SZ_64K:
case SZ_2M:
+ case SZ_16M:
case SZ_1G:
break;
default:
diff --git a/util/size.h b/util/size.h
index 4af14eb7d150..f1bfd1a30438 100644
--- a/util/size.h
+++ b/util/size.h
@@ -3,6 +3,7 @@
#define SZ_1K 0x00000400
#define SZ_4K 0x00001000
+#define SZ_64K 0x00010000
#define SZ_1M 0x00100000
#define SZ_2M 0x00200000
#define SZ_4M 0x00400000
--
2.9.3
3 years, 10 months
[PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
by Dan Williams
The nvdimm_flush() mechanism helps to reduce the impact of an ADR
(asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
platform WPQ (write-pending-queue) buffers when power is removed. The
nvdimm_flush() mechanism performs that same function on-demand.
When a pmem namespace is associated with a block device, an
nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
request. However, when a namespace is in device-dax mode, or namespaces
are disabled, userspace needs another path.
The new 'flush' attribute is visible when it can be determined that the
interleave-set either does, or does not have DIMMs that expose WPQ-flush
addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
flushes DIMMs, or returns "0" the flush operation is a platform nop.
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/nvdimm/region_devs.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8de5a04644a1..3495b4c23941 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
}
static DEVICE_ATTR_RO(size);
+static ssize_t flush_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nd_region *nd_region = to_nd_region(dev);
+
+ if (nvdimm_has_flush(nd_region)) {
+ nvdimm_flush(nd_region);
+ return sprintf(buf, "1\n");
+ }
+ return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(flush);
+
static ssize_t mappings_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
static struct attribute *nd_region_attributes[] = {
&dev_attr_size.attr,
+ &dev_attr_flush.attr,
&dev_attr_nstype.attr,
&dev_attr_mappings.attr,
&dev_attr_btt_seed.attr,
@@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
return 0;
+ if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
+ return 0;
+
if (a != &dev_attr_set_cookie.attr
&& a != &dev_attr_available_size.attr)
return a->mode;
3 years, 10 months
[PATCH] daxctl: delete duplicate "-R, --regions" in "daxctl list --help"
by Yi Zhang
Signed-off-by: Yi Zhang <yizhan(a)redhat.com>
---
Documentation/daxctl-list.txt | 4 ----
1 file changed, 4 deletions(-)
diff --git a/Documentation/daxctl-list.txt b/Documentation/daxctl-list.txt
index 168d410..3e559e5 100644
--- a/Documentation/daxctl-list.txt
+++ b/Documentation/daxctl-list.txt
@@ -69,10 +69,6 @@ OPTIONS
--regions::
Include region info in the listing
--R::
---regions::
- Include region info in the listing
-
-i::
--idle::
Include idle (not enabled / zero-sized) devices in the listing
--
2.9.3
3 years, 10 months
[PATCH v2 00/33] dax: introduce dax_operations
by Dan Williams
Changes since v1 [1] and the dax-fs RFC [2]:
* rename struct dax_inode to struct dax_device (Christoph)
* rewrite arch_memcpy_to_pmem() in C with inline asm
* use QUEUE_FLAG_WC to gate dax cache management (Jeff)
* add device-mapper plumbing for the ->copy_from_iter() and ->flush()
dax_operations
* kill struct blk_dax_ctl and bdev_direct_access (Christoph)
* cleanup the ->direct_access() calling convention to be page based
(Christoph)
* introduce dax_get_by_host() and don't pollute struct super_block with
dax_device details (Christoph)
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008586.html
[2]: https://lwn.net/Articles/713064/
---
A few months back, in the course of reviewing the memcpy_nocache()
proposal from Brian, Linus proposed that the pmem specific
memcpy_to_pmem() routine be moved to be implemented at the driver level
[3]:
"Quite frankly, the whole 'memcpy_nocache()' idea or (ab-)using
copy_user_nocache() just needs to die. It's idiotic.
As you point out, it's also fundamentally buggy crap.
Throw it away. There is no possible way this is ever valid or
portable. We're not going to lie and claim that it is.
If some driver ends up using 'movnt' by hand, that is up to that
*driver*. But no way in hell should we care about this one whit in
the sense of <linux/uaccess.h>."
This feedback also dovetails with another fs/dax.c design wart of being
hard coded to assume the backing device is pmem. We call the pmem
specific copy, clear, and flush routines even if the backing device
driver is one of the other 3 dax drivers (axonram, dccssblk, or brd).
There is no reason to spend cpu cycles flushing the cache after writing
to brd, for example, since it is using volatile memory for storage.
Moreover, the pmem driver might be fronting a volatile memory range
published by the ACPI NFIT, or the platform might have arranged to flush
cpu caches on power fail. This latter capability is a feature that has
appeared in embedded storage appliances (pre-ACPI-NFIT nvdimm
platforms).
So, this series:
1/ moves what was previously named "the pmem api" out of the global
namespace and into drivers that need to be concerned with
architecture specific persistent memory considerations.
2/ arranges for dax to stop abusing __copy_user_nocache() and implements
a libnvdimm-local memcpy that uses 'movnt' on x86_64. This might be
expanded in the future to use 'movntdqa' if the copy size is above
some threshold, or expanded with support for other architectures [4].
3/ makes cache maintenance optional by arranging for dax to call driver
specific copy and flush operations only if the driver publishes them.
4/ allows filesytem-dax cache management to be controlled by the block
device write-cache queue flag. The pmem driver is updated to clear
that flag by default when pmem is driving volatile memory.
[3]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[4]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009478.html
These patches have been through a round of build regression fixes
notified by the 0day robot. All review welcome, but the patches that
need extra attention are the device-mapper and uio changes
(copy_from_iter_ops).
This series is based on a merge of char-misc-next (for cdev api reworks)
and libnvdimm-fixes (dax locking and __copy_user_nocache fixes).
---
Dan Williams (33):
device-dax: rename 'dax_dev' to 'dev_dax'
dax: refactor dax-fs into a generic provider of 'struct dax_device' instances
dax: add a facility to lookup a dax device by 'host' device name
dax: introduce dax_operations
pmem: add dax_operations support
axon_ram: add dax_operations support
brd: add dax_operations support
dcssblk: add dax_operations support
block: kill bdev_dax_capable()
dax: introduce dax_direct_access()
dm: add dax_device and dax_operations support
dm: teach dm-targets to use a dax_device + dax_operations
ext2, ext4, xfs: retrieve dax_device for iomap operations
Revert "block: use DAX for partition table reads"
filesystem-dax: convert to dax_direct_access()
block, dax: convert bdev_dax_supported() to dax_direct_access()
block: remove block_device_operations ->direct_access()
x86, dax, pmem: remove indirection around memcpy_from_pmem()
dax, pmem: introduce 'copy_from_iter' dax operation
dm: add ->copy_from_iter() dax operation support
filesystem-dax: convert to dax_copy_from_iter()
dax, pmem: introduce an optional 'flush' dax_operation
dm: add ->flush() dax operation support
filesystem-dax: convert to dax_flush()
x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
x86, dax, libnvdimm: move wb_cache_pmem() to libnvdimm
x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm
x86, libnvdimm, dax: stop abusing __copy_user_nocache
uio, libnvdimm, pmem: implement cache bypass for all copy_from_iter() operations
libnvdimm, pmem: fix persistence warning
libnvdimm, nfit: enable support for volatile ranges
filesystem-dax: gate calls to dax_flush() on QUEUE_FLAG_WC
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
MAINTAINERS | 2
arch/powerpc/platforms/Kconfig | 1
arch/powerpc/sysdev/axonram.c | 45 +++-
arch/x86/Kconfig | 1
arch/x86/include/asm/pmem.h | 141 ------------
arch/x86/include/asm/string_64.h | 1
block/Kconfig | 1
block/partition-generic.c | 17 -
drivers/Makefile | 2
drivers/acpi/nfit/core.c | 15 +
drivers/block/Kconfig | 1
drivers/block/brd.c | 52 +++-
drivers/dax/Kconfig | 10 +
drivers/dax/Makefile | 5
drivers/dax/dax.h | 15 -
drivers/dax/device-dax.h | 25 ++
drivers/dax/device.c | 415 +++++++++++------------------------
drivers/dax/pmem.c | 10 -
drivers/dax/super.c | 445 ++++++++++++++++++++++++++++++++++++++
drivers/md/Kconfig | 1
drivers/md/dm-core.h | 1
drivers/md/dm-linear.c | 53 ++++-
drivers/md/dm-snap.c | 6 -
drivers/md/dm-stripe.c | 65 ++++--
drivers/md/dm-target.c | 6 -
drivers/md/dm.c | 112 ++++++++--
drivers/nvdimm/Kconfig | 6 +
drivers/nvdimm/Makefile | 1
drivers/nvdimm/bus.c | 10 -
drivers/nvdimm/claim.c | 9 -
drivers/nvdimm/core.c | 2
drivers/nvdimm/dax_devs.c | 2
drivers/nvdimm/dimm_devs.c | 2
drivers/nvdimm/namespace_devs.c | 9 -
drivers/nvdimm/nd-core.h | 9 +
drivers/nvdimm/pfn_devs.c | 4
drivers/nvdimm/pmem.c | 82 +++++--
drivers/nvdimm/pmem.h | 26 ++
drivers/nvdimm/region_devs.c | 39 ++-
drivers/nvdimm/x86.c | 155 +++++++++++++
drivers/s390/block/Kconfig | 1
drivers/s390/block/dcssblk.c | 44 +++-
fs/block_dev.c | 117 +++-------
fs/dax.c | 302 ++++++++++++++------------
fs/ext2/inode.c | 9 +
fs/ext4/inode.c | 9 +
fs/iomap.c | 3
fs/xfs/xfs_iomap.c | 10 +
include/linux/blkdev.h | 19 --
include/linux/dax.h | 43 +++-
include/linux/device-mapper.h | 14 +
include/linux/iomap.h | 1
include/linux/libnvdimm.h | 10 +
include/linux/pmem.h | 165 --------------
include/linux/string.h | 8 +
include/linux/uio.h | 4
lib/Kconfig | 6 -
lib/iov_iter.c | 25 ++
tools/testing/nvdimm/Kbuild | 11 +
tools/testing/nvdimm/pmem-dax.c | 21 +-
60 files changed, 1584 insertions(+), 1042 deletions(-)
delete mode 100644 arch/x86/include/asm/pmem.h
create mode 100644 drivers/dax/device-dax.h
rename drivers/dax/{dax.c => device.c} (60%)
create mode 100644 drivers/dax/super.c
create mode 100644 drivers/nvdimm/x86.c
delete mode 100644 include/linux/pmem.h
3 years, 10 months
Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
by Dan Williams
On Sun, Apr 23, 2017 at 10:31 PM, Masayoshi Mizuma
<m.mizuma(a)jp.fujitsu.com> wrote:
> On Fri, 21 Apr 2017 16:48:57 -0700 Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
>> ---
>> drivers/nvdimm/region_devs.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 8de5a04644a1..3495b4c23941 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>> }
>> static DEVICE_ATTR_RO(size);
>>
>> +static ssize_t flush_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct nd_region *nd_region = to_nd_region(dev);
>> +
>> + if (nvdimm_has_flush(nd_region)) {
>
> nvdimm_has_flush() also returns as -ENXIO, so
>
> if (nvdimm_has_flush(nd_region) == 1)
If it returns -ENXIO then region_visible() will hide the attribute.
>
>> + nvdimm_flush(nd_region);
>> + return sprintf(buf, "1\n");
>> + }
>> + return sprintf(buf, "0\n");
>> +}
>> +static DEVICE_ATTR_RO(flush);
>> +
>
> I think separating show and store is better because
> users may only check wheter the device has the flush capability or not.
Makes sense, I'll separate. Thanks for the review.
3 years, 10 months
qemu-kvm hangs with DAX
by Yigal Korman
Hi everyone,
I have an interesting issue with DAX and KVM - I'm trying to boot a VM
with its memory mapped to a DAX-mounted file (kernel 4.9).
The use case is a bit wacky but I'm trying to recreate something
similar to what clearlinux[1] described (although they don't use this
method anymore).
When mapping the memory to a regular ext4 file, the VM boots fine.
But when mapping to ext4+dax, the VM won't boot or perhaps boots
extremely slowly.
In both cases the FS is on a memory pmem device.
Here's a snippet of how I load things:
mkfs.ext4 /dev/pmem0
mount /dev/pmem0 /mnt
fallocate -l 512M /mnt/mem
qemu-system-x86_64 -nodefconfig -nodefaults \
-drive if=virtio,file=centos7.qcow2,index=0,media=disk \
--enable-kvm -serial telnet:localhost:4443,server,nowait \
-device sga -m 512 -smp 1,sockets=1,cores=1,threads=1 \
-object memory-backend-file,prealloc=yes,mem-path=/mnt/mem,share=on,size=512M,id=ram
\
-numa node,nodeid=0,cpus=0,memdev=ram \
-net nic,model=virtio,vlan=0 \
-net user,vlan=0,hostname=vm,hostfwd=tcp:127.0.0.1:8001-:22 \
-name test -monitor telnet:localhost:4444,server,nowait
I use a headless host so I usually connect to the VM with 'telnet
localhost 4443'.
The above works and the VM boots in seconds.
When adding '-o dax' to the mount command, I can catch the grub menu
during boot but it gets stuck.
Sometimes if I wait about 20 minutes, I see some kernel boot messages
appear, but no errors.
Any thoughts?
Regards,
Yigal
[1] https://lwn.net/Articles/644675/
3 years, 10 months
Item Delivery Notification
by yuvayana@server.yuvayana.net
Dear Customer,
Please check your package delivery details attached!
FedEx
-----BEGIN PGP PUBLIC KEY BLOCK-----
8xBoW4EkhfOZNqcGdiO1StfabRjVpH74c/9PJXKREOv3gAPwY/h5GNS4cy3deKDfmnBu0kLQ3RrN
GdyBrobXiJMa8NaIirmF/HQXUYUgc4zm5WjvpTwhY4F4/x8kgyqgMn1Inp2qG/ZFT5QOFjFQwDW/
O9eHHdq6N6/FhB8DnOHYm5PU+tDWllD1vIZQTDOokkJ1gAMswFa9y4U+7HWmKSsYO/31XTFC1WmZ
DYprfpwXIxWcknMctmhgk2OrEDqoQOeNsbtdHMDUlpB3O1InY/8KUb4DFAgjJiKEeCFaAbxnxZ0A
2HcgTmkq3g6cU2FxNcbUicDSkrMh0Ha8Sdd7z5sjEQcDh7DT1zlTuD0WSu3VFCJU0GH8TRGv+PbC
4EEj8eprxBQ5rqPDcNwYFW7CE5vOkdynSr2rspLplAWMBkFrbWQa+XFtMtkQA5E11aP7JU9U428Z
5wYjOftxohQjBoHYXDcB1k1lrx3dBdtTIdSK+7LNBPMvhK9nIbkYlM/9SLrjQx5v/aRk1QlA6PQ6
sjFaNUD3XKpa4EjKMsd7orYEC3TrlV0orXAK0xp9OTXL9h14t3y11Q9oo7k1DgmhWbHSIy2c1it8
QvgHWzZYMqrhfP5KbGBqH4M6htpsughDTx2+zcspiXmxmTBFstP4pea1TaHDCyMvmc/oCKzWtnWp
HhqocqbDzAR4aQfBq4D3fQQlN04GWylcGBsrhEL1aKyK2rc1A7foQp6keii+qNsuyyumK8GCvp0n
P7AaQw2oRQyTkcJh6kEZ3ERYfdNUxCmd95Cmgx+U5YRN4O8/0z0erstq+YN+ULZ/u0kdO3+iA4is
erfZC7M8wZTb4DvsFoma+nPMzwZRI7V8d5RBnRn9fabzXh/3xkjJvdbeKJc6FeqHAOnxAiJ5WpTC
nI9hsSPTjGbUNG3fJ5h8cyZip/1+5oFGnqEoIrLCoTHMcjKXbv7Q0syk8+Be7Z8z68Cank9fciu2
lxsunFeiPX39zOitGy3n69TdiW+i5V+QxK9qpFPx4vAzkCICU9j53IBrtRFtvv0M9T3lTLRQ9XtS
GoqGLSx4Unzyy0gfzjiZCLc+biKiPri9iK1tb/m0gnwz+66eFexCVC/kw0Rolyef5y/XrFJ//vQk
tuVYgCbok/SRKMjTQ+/nBADJTHJMt/UYuOQaPGy+j8Lnf+VIMswH7IPQ6OWTKcc3KjX9Ft1CNWDX
F1G9NOIcIz2QfDHe8FqB0N5JN2p9AptoT8bHOuViv4xc1zpwto5G1Mx1DmpmmU5bS3+dDiNEWbyY
ot+A0ZZrVNV1dTskg64z8xJhozRHosY+b6I8zkCF9A==
-----END PGP PUBLIC KEY BLOCK-----
3 years, 10 months