[GIT PULL] libnvdimm for v5.8
by Dan Williams
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-for-5.8
...to receive a smattering of cleanups for v5.8. I was considering at
least one more late breaking topic for -rc1 (papr_scm device health
reporting), but a last minute kbuild-robot report kicked it out. I
might bring that back for -rc2 since it was nearly ready save for that
late breaking test report.
Otherwise, this has appeared in -next with no reported issues.
---
The following changes since commit ae83d0b416db002fe95601e7f97f64b59514d936:
Linux 5.7-rc2 (2020-04-19 14:35:30 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-for-5.8
for you to fetch changes up to 6ec26b8b2d70b41d7c2affd8660d94ce78b3823c:
nvdimm/pmem: stop using ->queuedata (2020-05-13 15:15:37 -0700)
----------------------------------------------------------------
libnvdimm for 5.8
- Small collection of cleanups to rework usage of ->queuedata and the
GUID api.
----------------------------------------------------------------
Andy Shevchenko (1):
libnvdimm: Replace guid_copy() with import_guid() where it makes sense
Christoph Hellwig (3):
nvdimm/blk: stop using ->queuedata
nvdimm/btt: stop using ->queuedata
nvdimm/pmem: stop using ->queuedata
drivers/acpi/nfit/core.c | 2 +-
drivers/nvdimm/blk.c | 5 ++---
drivers/nvdimm/btt.c | 3 +--
drivers/nvdimm/pmem.c | 6 +++---
4 files changed, 7 insertions(+), 9 deletions(-)
7 months, 1 week
[PATCH v11 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
Changes since v10 [1]:
* Changed the definition of 'struct nd_papr_pdsm_health' to a maximal
struct 184 bytes which can be extended in future with newly
introduced 'extension_flags'
* Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health
that was preventing correct initialization of 'struct
nd_papr_pdsm_health'.
* Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from
'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license
incompatibility issue with non-GPL-2.0 user-space code trying to
include the header in its code.
* Fixed the control flow in papr_scm_ndctl() to return '0' in case
nd_cmd is handled. In case of unknown nd-cmd return -EINVAL.
[1] https://lore.kernel.org/linux-nvdimm/20200604234136.253703-1-vaibhav@linu...
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v11
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 127 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 373 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 564 insertions(+), 11 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
7 months, 1 week
[PATCH v12 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
Changes since v11 [1]:
* Minor update to 'papr_pdsm.h' fixing a misleading comment about
'possible' padding being added by GCC which doesn't apply in case
structs are marked as __packed.
* Fix the order of initialization of 'struct nd_papr_pdsm_health' in
papr_pdsm_health().
* Added acks from Ira for various patches.
[1] https://lore.kernel.org/linux-nvdimm/20200607131339.476036-1-vaibhav@linu...
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v12
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 125 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 373 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 562 insertions(+), 11 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
7 months, 1 week
[PATCH AUTOSEL 5.6 186/606] device-dax: don't leak kernel memory to user space after unloading kmem
by Sasha Levin
From: David Hildenbrand <david(a)redhat.com>
commit 60858c00e5f018eda711a3aa84cf62214ef62d61 upstream.
Assume we have kmem configured and loaded:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory$
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
Assume we try to unload kmem. This force-unloading will work, even if
memory cannot get removed from the system.
[root@localhost ~]# rmmod kmem
[ 86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
...
[ 86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot
Now, we can reconfigure the namespace:
[root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
[ 131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
[ 131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
...
This fails as expected due to the busy memory resource, and the memory
cannot be used. However, the dax0.0 device is removed, and along its
name.
The name of the memory resource now points at freed memory (name of the
device):
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
150000000-33fffffff : System RAM
We have to make sure to duplicate the string. While at it, remove the
superfluous setting of the name and fixup a stale comment.
Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: <stable(a)vger.kernel.org> [5.3]
Link: http://lkml.kernel.org/r/20200508084217.9160-2-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dax/kmem.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 3d0a7e702c94..1e678bdf5aed 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -22,6 +22,7 @@ int dev_dax_kmem_probe(struct device *dev)
resource_size_t kmem_size;
resource_size_t kmem_end;
struct resource *new_res;
+ const char *new_res_name;
int numa_node;
int rc;
@@ -48,11 +49,16 @@ int dev_dax_kmem_probe(struct device *dev)
kmem_size &= ~(memory_block_size_bytes() - 1);
kmem_end = kmem_start + kmem_size;
- /* Region is permanently reserved. Hot-remove not yet implemented. */
- new_res = request_mem_region(kmem_start, kmem_size, dev_name(dev));
+ new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
+ if (!new_res_name)
+ return -ENOMEM;
+
+ /* Region is permanently reserved if hotremove fails. */
+ new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
if (!new_res) {
dev_warn(dev, "could not reserve region [%pa-%pa]\n",
&kmem_start, &kmem_end);
+ kfree(new_res_name);
return -EBUSY;
}
@@ -63,12 +69,12 @@ int dev_dax_kmem_probe(struct device *dev)
* unknown to us that will break add_memory() below.
*/
new_res->flags = IORESOURCE_SYSTEM_RAM;
- new_res->name = dev_name(dev);
rc = add_memory(numa_node, new_res->start, resource_size(new_res));
if (rc) {
release_resource(new_res);
kfree(new_res);
+ kfree(new_res_name);
return rc;
}
dev_dax->dax_kmem_res = new_res;
@@ -83,6 +89,7 @@ static int dev_dax_kmem_remove(struct device *dev)
struct resource *res = dev_dax->dax_kmem_res;
resource_size_t kmem_start = res->start;
resource_size_t kmem_size = resource_size(res);
+ const char *res_name = res->name;
int rc;
/*
@@ -102,6 +109,7 @@ static int dev_dax_kmem_remove(struct device *dev)
/* Release and free dax resources */
release_resource(res);
kfree(res);
+ kfree(res_name);
dev_dax->dax_kmem_res = NULL;
return 0;
--
2.25.1
7 months, 2 weeks
[RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
by Aneesh Kumar K.V
With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.
This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. The kernel config option is added to allow the user
to control whether MAP_SYNC should be enabled by default or not.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
---
include/linux/sched/coredump.h | 13 ++++++++++---
include/uapi/linux/prctl.h | 3 +++
kernel/fork.c | 8 +++++++-
kernel/sys.c | 18 ++++++++++++++++++
mm/Kconfig | 3 +++
mm/mmap.c | 4 ++++
6 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..9ba6b3d5f991 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */
-#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC 27 /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_DISABLE_MAP_SYNC_MASK (1 << MMF_DISABLE_MAP_SYNC)
-#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
- MMF_DISABLE_THP_MASK)
+#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \
+ MMF_DISABLE_THP_MASK | MMF_DISABLE_MAP_SYNC_MASK)
+
+static inline bool map_sync_enabled(struct mm_struct *mm)
+{
+ return !(mm->flags & MMF_DISABLE_MAP_SYNC_MASK);
+}
#endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..ee4cde32d5cf 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
#define PR_SET_IO_FLUSHER 57
#define PR_GET_IO_FLUSHER 58
+#define PR_SET_MAP_SYNC_ENABLE 59
+#define PR_GET_MAP_SYNC_ENABLE 60
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d5a9a363e81e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
static int __init coredump_filter_setup(char *s)
{
default_dump_filter =
@@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->flags = current->mm->flags & MMF_INIT_MASK;
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
- mm->flags = default_dump_filter;
+ mm->flags = default_dump_filter | default_map_sync_mask;
mm->def_flags = 0;
}
diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..f6127cf4128b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
clear_bit(MMF_DISABLE_THP, &me->mm->flags);
up_write(&me->mm->mmap_sem);
break;
+
+ case PR_GET_MAP_SYNC_ENABLE:
+ if (arg2 || arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = !test_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+ break;
+ case PR_SET_MAP_SYNC_ENABLE:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ if (down_write_killable(&me->mm->mmap_sem))
+ return -EINTR;
+ if (arg2)
+ clear_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+ else
+ set_bit(MMF_DISABLE_MAP_SYNC, &me->mm->flags);
+ up_write(&me->mm->mmap_sem);
+ break;
+
case PR_MPX_ENABLE_MANAGEMENT:
case PR_MPX_DISABLE_MANAGEMENT:
/* No longer implemented: */
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..38fd7cfbfca8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
config MAPPING_DIRTY_HELPERS
bool
+config ARCH_MAP_SYNC_DISABLE
+ bool
+
endmenu
diff --git a/mm/mmap.c b/mm/mmap.c
index f609e9ec4a25..613e5894f178 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1464,6 +1464,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
case MAP_SHARED_VALIDATE:
if (flags & ~flags_mask)
return -EOPNOTSUPP;
+
+ if ((flags & MAP_SYNC) && !map_sync_enabled(mm))
+ return -EOPNOTSUPP;
+
if (prot & PROT_WRITE) {
if (!(file->f_mode & FMODE_WRITE))
return -EACCES;
--
2.26.2
7 months, 2 weeks
[PATCH v10 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
Changes since v9 [1]:
* Addressed review comments from Ira and Dan Williams.
* Removed the contentious 'payload_version' field from struct
nd_pdsm_cmd_pkg.
* Also removed code/defines related to handling of different version
of a pdsm payload struct.
* Consolidated validation checks for nd_pdsm_cmd_pkg in
is_cmd_valid().
* Added a check in is_cmd_valid() to ensure reserved fields in struct
nd_pdsm_cmd_pkg are set to '0'.
* Reworked papr_pdsm_health() to avoid removing code that was added in
initial part of this patch-series.
* Added a new patch to the series to move out some proposed changes to
papr_scm_ndctl() in an independent patch.
* Reworked papr_pdsm_health() to ensure correct payload_size in the
pdsm command package.
[1] https://lore.kernel.org/linux-nvdimm/20200602101438.73929-1-vaibhav@linux...
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v10
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 131 +++++++
arch/powerpc/platforms/pseries/papr_scm.c | 361 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 554 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
7 months, 2 weeks
[PATCH v5 0/2] Renovate memcpy_mcsafe with copy_mc_to_{user, kernel}
by Dan Williams
Changes since v4 [1]:
- Fix up .gitignore for PowerPC test artifacts (Michael)
- Collect Michael's Ack.
[1]: http://lore.kernel.org/r/159010126119.975921.6614194205409771984.stgit@dw...
---
The primary motivation to go touch memcpy_mcsafe() is that the existing
benefit of doing slow "handle with care" copies is obviated on newer
CPUs. With that concern lifted it also obviates the need to continue to
update the MCA-recovery capability detection code currently gated by
"mcsafe_key". Now the old "mcsafe_key" opt-in to perform the copy with
concerns for recovery fragility can instead be made an opt-out from the
default fast copy implementation (enable_copy_mc_fragile()).
The discussion with Linus on the first iteration of this patch
identified that memcpy_mcsafe() was misnamed relative to its usage. The
new names copy_mc_to_user() and copy_mc_to_kernel() clearly indicate the
intended use case and lets the architecture organize the implementation
accordingly.
For both powerpc and x86 a copy_mc_generic() implementation is added as
the backend for these interfaces.
Patches are relative to tip/master.
---
Dan Williams (2):
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user,kernel}()
x86/copy_mc: Introduce copy_mc_generic()
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 40 +++--
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_mc_64.S | 4
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_mc_test.h | 75 +++++++++
arch/x86/include/asm/mcsafe_test.h | 75 ---------
arch/x86/include/asm/string_64.h | 32 ----
arch/x86/include/asm/uaccess.h | 21 +++
arch/x86/include/asm/uaccess_64.h | 20 --
arch/x86/kernel/cpu/mce/core.c | 8 -
arch/x86/kernel/quirks.c | 9 -
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_mc.c | 64 ++++++++
arch/x86/lib/copy_mc_64.S | 165 ++++++++++++++++++++
arch/x86/lib/memcpy_64.S | 115 --------------
arch/x86/lib/usercopy_64.c | 21 ---
drivers/md/dm-writecache.c | 15 +-
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6 -
include/linux/string.h | 9 -
include/linux/uaccess.h | 9 +
include/linux/uio.h | 10 +
lib/Kconfig | 7 +
lib/iov_iter.c | 43 +++--
tools/arch/x86/include/asm/mcsafe_test.h | 13 --
tools/arch/x86/lib/memcpy_64.S | 115 --------------
tools/objtool/check.c | 5 -
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ---
tools/testing/nvdimm/test/nfit.c | 48 +++---
.../testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6 -
.../selftests/powerpc/copyloops/copy_mc_64.S | 1
.../selftests/powerpc/copyloops/memcpy_mcsafe_64.S | 1
37 files changed, 451 insertions(+), 526 deletions(-)
rename arch/powerpc/lib/{memcpy_mcsafe_64.S => copy_mc_64.S} (98%)
create mode 100644 arch/x86/include/asm/copy_mc_test.h
delete mode 100644 arch/x86/include/asm/mcsafe_test.h
create mode 100644 arch/x86/lib/copy_mc.c
create mode 100644 arch/x86/lib/copy_mc_64.S
delete mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
delete mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c
create mode 120000 tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
delete mode 120000 tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S
base-commit: 229aaa8c059f2c908e0561453509f996f2b2d5c4
7 months, 2 weeks
[RESEND PATCH v9 0/5] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
Changes since v9 [1]:
* Added acks from Aneesh and Steven Steven Rostedt.
Changes since v8 [2]:
* Updated proposed changes to remove usage of term 'SCM' due to
ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
[ Aneesh ]
* Renamed NVDIMM health defines from PAPR_SCM_DIMM_* to PAPR_PMEM_* .
* Updates to various newly introduced identifiers in 'papr_scm.c'
removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed uapi header 'papr_scm_pdsm.h' to 'papr_pdsm.h'.
* Renamed sysfs ABI document from sysfs-bus-papr-scm to
sysfs-bus-papr-pmem.
* No behavioural changes from v8 [1].
[1] https://lore.kernel.org/linux-nvdimm/20200529214719.223344-1-vaibhav@linu...
[2] https://lore.kernel.org/linux-nvdimm/20200527041244.37821-1-vaibhav@linux...
---
The PAPR standard[3][5] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[4]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[4]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[6]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[3] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[4] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[5] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[6] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v9
---
Vaibhav Jain (5):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 175 +++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 363 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 600 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
7 months, 2 weeks
[RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink
by Shiyang Ruan
This patchset is a try to resolve the shared 'page cache' problem for
fsdax.
In order to track multiple mappings and indexes on one page, I
introduced a dax-rmap rb-tree to manage the relationship. A dax entry
will be associated more than once if is shared. At the second time we
associate this entry, we create this rb-tree and store its root in
page->private(not used in fsdax). Insert (->mapping, ->index) when
dax_associate_entry() and delete it when dax_disassociate_entry().
We can iterate the dax-rmap rb-tree before any other operations on
mappings of files. Such as memory-failure and rmap.
Same as before, I borrowed and made some changes on Goldwyn's patchsets.
These patches makes up for the lack of CoW mechanism in fsdax.
The rests are dax & reflink support for xfs.
(Rebased to 5.7-rc2)
Shiyang Ruan (8):
fs/dax: Introduce dax-rmap btree for reflink
mm: add dax-rmap for memory-failure and rmap
fs/dax: Introduce dax_copy_edges() for COW
fs/dax: copy data before write
fs/dax: replace mmap entry in case of CoW
fs/dax: dedup file range to use a compare function
fs/xfs: handle CoW for fsdax write() path
fs/xfs: support dedupe for fsdax
fs/dax.c | 343 +++++++++++++++++++++++++++++++++++++----
fs/ocfs2/file.c | 2 +-
fs/read_write.c | 11 +-
fs/xfs/xfs_bmap_util.c | 6 +-
fs/xfs/xfs_file.c | 10 +-
fs/xfs/xfs_iomap.c | 3 +-
fs/xfs/xfs_iops.c | 11 +-
fs/xfs/xfs_reflink.c | 79 ++++++----
include/linux/dax.h | 11 ++
include/linux/fs.h | 9 +-
mm/memory-failure.c | 63 ++++++--
mm/rmap.c | 54 +++++--
12 files changed, 498 insertions(+), 104 deletions(-)
--
2.26.2
7 months, 2 weeks
[ndctl PATCH v5 0/6] Add support for reporting papr nvdimm health
by Vaibhav Jain
Changes since v4 [1]:
* Updated proposed changes to remove usage of term 'SCM' due to
ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
[ Aneesh ]
* Updates to various newly introduced identifiers in 'papr.c'
removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed 'papr_scm.c' to 'papr.c'
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
[1] https://lore.kernel.org/linux-nvdimm/20200527044737.40615-1-vaibhav@linux...
---
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[2]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[3]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.
The patch-set introduces support for the new PAPR PDSM family
defined at [4] & [5] via a new dimm-ops named
'papr_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.
These changes coupled with proposed kernel changes located at Ref[1] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
pseries guests. Below is a sample output using proposed kernel + ndctl
changes:
# ndctl list -DH
[
{
"dev":"nmem0",
"flag_smart_event":true,
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Structure of the patchset
=========================
We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.
Patch-2 introduces probe function of papr nvdimms and assigning
'papr_dimm_ops' defined in 'papr.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr nvdimms
Patch-3 introduces new dimm ops 'dimm_init()' & 'dimm_uninit()' to handle
DSM family specific initialization of 'struct ndctl_dimm'.
Patches-4,5 implements scaffolding to add support for PAPR PDSM
requests and pull in their definitions from the kernel.
Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.
References
==========
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[4] "powerpc/papr_scm: Add support for reporting nvdimm health"
https://lore.kernel.org/linux-nvdimm/20200529214719.223344-1-vaibhav@linu...
[5] "ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods"
https://lore.kernel.org/linux-nvdimm/20200529214719.223344-5-vaibhav@linu...
Vaibhav Jain (6):
libndctl: Refactor out add_dimm() to handle NFIT specific init
libncdtl: Add initial support for NVDIMM_FAMILY_PAPR nvdimm family
libndctl: Introduce new dimm-ops dimm_init() & dimm_uninit()
libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
papr: Add scaffolding to issue and handle PDSM requests
libndctl,papr_scm: Implement support for PAPR_PDSM_HEALTH
ndctl/lib/Makefile.am | 1 +
ndctl/lib/libndctl.c | 260 ++++++++++++++++++++++++++----------
ndctl/lib/papr.c | 301 ++++++++++++++++++++++++++++++++++++++++++
ndctl/lib/papr_pdsm.h | 175 ++++++++++++++++++++++++
ndctl/lib/private.h | 9 ++
ndctl/libndctl.h | 1 +
ndctl/ndctl.h | 1 +
7 files changed, 678 insertions(+), 70 deletions(-)
create mode 100644 ndctl/lib/papr.c
create mode 100644 ndctl/lib/papr_pdsm.h
--
2.26.2
7 months, 2 weeks