[PATCH] tools/testing/nvdimm: Replace zero-length array with
flexible-array
by Gustavo A. R. Silva
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
---
tools/testing/nvdimm/test/nfit_test.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/nvdimm/test/nfit_test.h b/tools/testing/nvdimm/test/nfit_test.h
index db3c07beb9d1..b5f7a996c4d0 100644
--- a/tools/testing/nvdimm/test/nfit_test.h
+++ b/tools/testing/nvdimm/test/nfit_test.h
@@ -51,7 +51,7 @@ struct nd_cmd_translate_spa {
__u32 nfit_device_handle;
__u32 _reserved;
__u64 dpa;
- } __packed devices[0];
+ } __packed devices[];
} __packed;
@@ -74,7 +74,7 @@ struct nd_cmd_ars_err_inj_stat {
struct nd_error_stat_query_record {
__u64 err_inj_stat_spa_range_base;
__u64 err_inj_stat_spa_range_length;
- } __packed record[0];
+ } __packed record[];
} __packed;
#define ND_INTEL_SMART 1
@@ -180,7 +180,7 @@ struct nd_intel_fw_send_data {
__u32 context;
__u32 offset;
__u32 length;
- __u8 data[0];
+ __u8 data[];
/* this field is not declared due ot variable data from input */
/* __u32 status; */
} __packed;
2 years
[PATCH v3 0/3] mm/memory_hotplug: Interface to add driver-managed system ram
by David Hildenbrand
Third time is the charm? Let's see ... :)
This is the follow up of [1]:
[PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with
kexec-tools
and [2]:
[PATCH v2 0/3] mm/memory_hotplug: Allow to not create firmware memmap
entries
kexec (via kexec_load()) can currently not properly handle memory added via
dax/kmem, and will have similar issues with virtio-mem. kexec-tools will
currently add all memory to the fixed-up initial firmware memmap. In case
of dax/kmem, this means that - in contrast to a proper reboot - how that
persistent memory will be used can no longer be configured by the kexec'd
kernel. In case of virtio-mem it will be harmful, because that memory
might contain inaccessible pieces that require coordination with hypervisor
first.
In both cases, we want to let the driver in the kexec'd kernel handle
detecting and adding the memory, like during an ordinary reboot.
Introduce add_memory_driver_managed(). More on the samentics are in patch
#1.
In the future, we might want to make this behavior configurable for
dax/kmem- either by configuring it in the kernel (which would then also
allow to configure kexec_file_load()) or in kexec-tools by also adding
"System RAM (kmem)" memory from /proc/iomem to the fixed-up initial
firmware memmap.
More on the motivation can be found in [1] and [2].
v2 -> v3:
- Don't use flags for add_memory() and friends, provide
add_memory_driver_managed() instead.
- Flag memory resources via IORESOURCE_MEM_DRIVER_MANAGED and handle them
in kexec.
- Name memory resources "System RAM ($DRIVER)", visible via /proc/iomem
- Added more details to the patch descriptions, especially regarding the
history of /sys/firmware/memmap
- Add a comment to the device-dax change. Dropped Dave's Ack as the
v1 -> v2:
- Don't change the resource name
- Rename the flag to MHP_NO_FIRMWARE_MEMMAP to reflect what it is doing
- Rephrase subjects/descriptions
- Use the flag for dax/kmem
[1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com
[2] https://lkml.kernel.org/r/20200430102908.10107-1-david@redhat.com
David Hildenbrand (3):
mm/memory_hotplug: Introduce add_memory_device_managed()
kexec_file: Don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED
device-dax: Add memory via add_memory_driver_managed()
drivers/dax/kmem.c | 8 ++++-
include/linux/ioport.h | 1 +
include/linux/memory_hotplug.h | 2 ++
kernel/kexec_file.c | 5 +++
mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++---
5 files changed, 73 insertions(+), 5 deletions(-)
--
2.25.3
2 years
[PATCH] nvdimm: Avoid race between probe and reading device attributes
by Richard Palethorpe
It is possible to cause a division error and use-after-free by querying the
nmem device before the driver data is fully initialised in nvdimm_probe. E.g
by doing
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
On 5.7-rc3 this causes:
[ 12.711578] divide error: 0000 [#1] SMP KASAN PTI
[ 12.712321] CPU: 0 PID: 231 Comm: cat Not tainted 5.7.0-rc3 #48
[ 12.713188] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 12.714857] RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[ 12.715772] Code: ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 05 80 fa 03 7e 52 8b 73 08 31 d2 89 c1 48 83 c4 08 5b 5d <f7> f6 31 d2 41 5c 83 c0 07 c1 e8 03 48 8d 84 00 8e 02 00 00 25 00
[ 12.718311] RSP: 0018:ffffc9000046fd08 EFLAGS: 00010282
[ 12.719030] RAX: 0000000000000000 RBX: ffffffffc0073aa0 RCX: 0000000000000000
[ 12.720005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888060931808
[ 12.720970] RBP: ffff88806609d018 R08: 0000000000000001 R09: ffffed100cc0a2b1
[ 12.721889] R10: ffff888066051587 R11: ffffed100cc0a2b0 R12: ffff888060931800
[ 12.722744] R13: ffff888064362000 R14: ffff88806609d018 R15: ffffffff8b1a2520
[ 12.723602] FS: 00007fd16f3d5580(0000) GS:ffff88806b400000(0000) knlGS:0000000000000000
[ 12.724600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.725308] CR2: 00007fd16f1ec000 CR3: 0000000064322006 CR4: 0000000000160ef0
[ 12.726268] Call Trace:
[ 12.726633] available_slots_show+0x4e/0x120 [libnvdimm]
[ 12.727380] dev_attr_show+0x42/0x80
[ 12.727891] ? memset+0x20/0x40
[ 12.728341] sysfs_kf_seq_show+0x218/0x410
[ 12.728923] seq_read+0x389/0xe10
[ 12.729415] vfs_read+0x101/0x2d0
[ 12.729891] ksys_read+0xf9/0x1d0
[ 12.730361] ? kernel_write+0x120/0x120
[ 12.730915] do_syscall_64+0x95/0x4a0
[ 12.731435] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 12.732163] RIP: 0033:0x7fd16f2fe4be
[ 12.732685] Code: c0 e9 c6 fe ff ff 50 48 8d 3d 2e 12 0a 00 e8 69 e9 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[ 12.735207] RSP: 002b:00007ffd3177b838 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 12.736261] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fd16f2fe4be
[ 12.737233] RDX: 0000000000020000 RSI: 00007fd16f1ed000 RDI: 0000000000000003
[ 12.738203] RBP: 00007fd16f1ed000 R08: 00007fd16f1ec010 R09: 0000000000000000
[ 12.739172] R10: 00007fd16f3f4f70 R11: 0000000000000246 R12: 00007ffd3177ce23
[ 12.740144] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 12.741139] Modules linked in: nfit libnvdimm
[ 12.741783] ---[ end trace 99532e4b82410044 ]---
[ 12.742452] RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[ 12.743167] Code: ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 05 80 fa 03 7e 52 8b 73 08 31 d2 89 c1 48 83 c4 08 5b 5d <f7> f6 31 d2 41 5c 83 c0 07 c1 e8 03 48 8d 84 00 8e 02 00 00 25 00
[ 12.745709] RSP: 0018:ffffc9000046fd08 EFLAGS: 00010282
[ 12.746340] RAX: 0000000000000000 RBX: ffffffffc0073aa0 RCX: 0000000000000000
[ 12.747209] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888060931808
[ 12.748081] RBP: ffff88806609d018 R08: 0000000000000001 R09: ffffed100cc0a2b1
[ 12.748977] R10: ffff888066051587 R11: ffffed100cc0a2b0 R12: ffff888060931800
[ 12.749849] R13: ffff888064362000 R14: ffff88806609d018 R15: ffffffff8b1a2520
[ 12.750729] FS: 00007fd16f3d5580(0000) GS:ffff88806b400000(0000) knlGS:0000000000000000
[ 12.751708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.752441] CR2: 00007fd16f1ec000 CR3: 0000000064322006 CR4: 0000000000160ef0
[ 12.821357] ==================================================================
[ 12.822284] BUG: KASAN: use-after-free in __mutex_lock+0x111c/0x11a0
[ 12.823084] Read of size 4 at addr ffff888065c26238 by task reproducer/218
[ 12.823968]
[ 12.824183] CPU: 2 PID: 218 Comm: reproducer Tainted: G D 5.7.0-rc3 #48
[ 12.825167] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 12.826595] Call Trace:
[ 12.826926] dump_stack+0x97/0xe0
[ 12.827362] print_address_description.constprop.0+0x1b/0x210
[ 12.828111] ? __mutex_lock+0x111c/0x11a0
[ 12.828645] __kasan_report.cold+0x37/0x92
[ 12.829179] ? __mutex_lock+0x111c/0x11a0
[ 12.829706] kasan_report+0x38/0x50
[ 12.830158] __mutex_lock+0x111c/0x11a0
[ 12.830666] ? ftrace_graph_stop+0x10/0x10
[ 12.831193] ? is_nvdimm_bus+0x40/0x40 [libnvdimm]
[ 12.831820] ? mutex_trylock+0x2b0/0x2b0
[ 12.832333] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.832975] ? mutex_trylock+0x2b0/0x2b0
[ 12.833500] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.834122] ? prepare_ftrace_return+0xa1/0xf0
[ 12.834724] ? ftrace_graph_caller+0x6b/0xa0
[ 12.835269] ? acpi_label_write+0x390/0x390 [nfit]
[ 12.835909] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.836558] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.837179] nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.837802] nvdimm_bus_probe+0x110/0x6b0 [libnvdimm]
[ 12.838470] really_probe+0x212/0x9a0
[ 12.838954] driver_probe_device+0x1cd/0x300
[ 12.839511] ? driver_probe_device+0x5/0x300
[ 12.840063] device_driver_attach+0xe7/0x120
[ 12.840623] bind_store+0x18d/0x230
[ 12.841075] kernfs_fop_write+0x200/0x420
[ 12.841606] vfs_write+0x154/0x450
[ 12.842047] ksys_write+0xf9/0x1d0
[ 12.842497] ? __ia32_sys_read+0xb0/0xb0
[ 12.843010] do_syscall_64+0x95/0x4a0
[ 12.843495] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 12.844140] RIP: 0033:0x7f5b235d3563
[ 12.844607] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[ 12.846877] RSP: 002b:00007fff1c3bc578 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 12.847822] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f5b235d3563
[ 12.848717] RDX: 0000000000000006 RSI: 000055f9576710d0 RDI: 0000000000000001
[ 12.849594] RBP: 000055f9576710d0 R08: 000000000000000a R09: 0000000000000000
[ 12.850470] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[ 12.851333] R13: 00007f5b236a3500 R14: 0000000000000006 R15: 00007f5b236a3700
[ 12.852247]
[ 12.852466] Allocated by task 225:
[ 12.852893] save_stack+0x1b/0x40
[ 12.853310] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 12.853918] kmem_cache_alloc_node+0xef/0x270
[ 12.854475] copy_process+0x485/0x6130
[ 12.854945] _do_fork+0xf1/0xb40
[ 12.855353] __do_sys_clone+0xc3/0x100
[ 12.855843] do_syscall_64+0x95/0x4a0
[ 12.856302] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 12.856939]
[ 12.857140] Freed by task 0:
[ 12.857522] save_stack+0x1b/0x40
[ 12.857940] __kasan_slab_free+0x12c/0x170
[ 12.858464] kmem_cache_free+0xb0/0x330
[ 12.858945] rcu_core+0x55f/0x19f0
[ 12.859385] __do_softirq+0x228/0x944
[ 12.859869]
[ 12.860075] The buggy address belongs to the object at ffff888065c26200
[ 12.860075] which belongs to the cache task_struct of size 6016
[ 12.861638] The buggy address is located 56 bytes inside of
[ 12.861638] 6016-byte region [ffff888065c26200, ffff888065c27980)
[ 12.863084] The buggy address belongs to the page:
[ 12.863702] page:ffffea0001970800 refcount:1 mapcount:0 mapping:0000000021ee3712 index:0x0 head:ffffea0001970800 order:3 compound_mapcount:0 compound_pincount:0
[ 12.865478] flags: 0x80000000010200(slab|head)
[ 12.866039] raw: 0080000000010200 0000000000000000 0000000100000001 ffff888066c0f980
[ 12.867010] raw: 0000000000000000 0000000080050005 00000001ffffffff 0000000000000000
[ 12.867986] page dumped because: kasan: bad access detected
[ 12.868696]
[ 12.868900] Memory state around the buggy address:
[ 12.869514] ffff888065c26100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.870414] ffff888065c26180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.871318] >ffff888065c26200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 12.872238] ^
[ 12.872870] ffff888065c26280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 12.873754] ffff888065c26300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 12.874640]
==================================================================
This can be prevented by setting the driver data after initialisation is
complete. As a precaution the bus lock is also taken when setting the driver
data.
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Richard Palethorpe <rpalethorpe(a)suse.com>
---
drivers/nvdimm/dimm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 7d4ddc4d9322..c757e8a5094d 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -43,7 +43,6 @@ static int nvdimm_probe(struct device *dev)
if (!ndd)
return -ENOMEM;
- dev_set_drvdata(dev, ndd);
ndd->dpa.name = dev_name(dev);
ndd->ns_current = -1;
ndd->ns_next = -1;
@@ -106,6 +105,10 @@ static int nvdimm_probe(struct device *dev)
if (rc)
goto err;
+ nvdimm_bus_lock(dev);
+ dev_set_drvdata(dev, ndd);
+ nvdimm_bus_unlock(dev);
+
return 0;
err:
--
2.26.1
2 years
[PATCH v6 0/4] powerpc/papr_scm: Add support for reporting nvdimm health
by Vaibhav Jain
The PAPR standard[1][3] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[2]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch nvdimm health stats using hcalls described
in Ref[2]. This health and performance stats are then exposed to
userspace via syfs and PAPR-nvDimm-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[4]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based nvdimms that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR nvDimm-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_scm_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patchset starts with implementing support for fetching nvdimm health
information from PHYP and partially exposing it to user-space via a nvdimm
sysfs flag.
Second & Third patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the Fourth patch implements support for servicing PDSM
'PAPR_SCM_PDSM_HEALTH' that returns the nvdimm health information to
libndctl.
Changelog:
==========
v5..v6:
* Incorporate review comments from Mpe and Dan Williams.
* Changed the usage of term DSM to PDSM as former conflicted with
usage in ACPI context.
* UAPI updates to remove usage of bool and marking the structs
defined as 'packed'.
* Simplified the health-bitmap handling in papr_scm to use u64
instead of __be64 integers.
* Caching of the health information so reading the dimm-flag file
doesn't result in costly hcalls everytime.
* Changed dimm-flag 'save_fail' to 'flush_fail'
* Moved the dimm flag file from 'papr_flags' to 'papr/flags'.
* Added a patch to document H_SCM_HEALTH hcall return values.
* Added sysfs ABI documentation for newly introduce dimm-flag
sysfs file 'papr/flags'
v4..v5:
* Fixed a bug in new implementation of papr_scm_ndctl() that was triggering
a false error condition.
v3..v4:
* Restructured papr_scm_ndctl() to dispatch ND_CMD_CALL commands to a new
function named papr_scm_service_dsm() to serivice DSM requests. [Aneesh]
v2..v3:
* Updated the papr_scm_dsm.h header to be more confimant general kernel
guidelines for UAPI headers. [Aneesh]
* Changed the definition of macro PAPR_SCM_DIMM_UNARMED_MASK to not
include case when the nvdimm is unarmed because its a vPMEM
nvdimm. [Aneesh]
v1..v2:
* Restructured the patch-set based on review comments on V1 patch-set to
simplify the patch review. Multiple small patches have been combined into
single patches to reduce cross referencing that was needed in earlier
patch-set. Hence most of the patches in this patch-set as now new. [Aneesh]
* Removed the initial work done for fetch nvdimm performance statistics.
These changes will be re-proposed in a separate patch-set. [Aneesh]
* Simplified handling of versioning of 'struct
nd_papr_scm_dimm_health_stat_v1' as only one version of the structure is
currently in existence.
References:
[1] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[3] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[4] https://patchwork.kernel.org/project/linux-nvdimm/list/?series=244625
Vaibhav Jain (4):
powerpc: Document details on H_SCM_HEALTH hcall
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-scm | 27 ++
Documentation/powerpc/papr_hcalls.rst | 43 ++-
arch/powerpc/include/asm/papr_scm.h | 49 +++
arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 192 +++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 317 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
6 files changed, 617 insertions(+), 12 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
create mode 100644 arch/powerpc/include/asm/papr_scm.h
create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
--
2.25.3
2 years
[ndctl PATCH v2 0/6] Add support for reporting papr-scm nvdimm health
by Vaibhav Jain
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[1]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[2]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.
The patch-set introduces support for the new PAPR-SCM PDSM family
defined at [3] via a new dimm-ops named
'papr_scm_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.
These changes coupled with proposed kernel changes located at Ref[4] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
papr_scm guests. Below is a sample output using proposed kernel + ndctl
changes:
# ndctl list -DH
[
{
"dev":"nmem0",
"flag_smart_event":true,
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Structure of the patchset
=========================
We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.
Patch-2 introduces probe function of papr_scm nvdimms and assigning
'papr_scm_dimm_ops' defined in 'papr_scm.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr-scm nvdimms
Patch-3 introduces new dimm ops 'dimm_init()' & 'dimm_uninit()' to handle
DSM family specific initialization of 'struct ndctl_dimm'.
Patches-4,5 implements scaffolding to add support for PAPR_SCM PDSM
requests and pull in their definitions from the kernel.
Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.
Changelog
=========
v1..v2:
* Reordered and squashed couple of patches to reduce the patchset size.
* Addressed few offline review comments from Santosh Sivaraj.
* Updated the uapi header file for papr_scm_pdsm.h.
* Updated the struct names based on the new uapi header.
References
==========
[1] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[3] "[PATCH v6 3/4] ndctl/papr_scm,uapi: Add support for PAPR nvdimm
specific methods"
https://lore.kernel.org/linux-nvdimm/20200420070711.223545-4-vaibhav@linu...
[4] "powerpc/papr_scm: Add support for reporting nvdimm health"
https://lore.kernel.org/linux-nvdimm/20200420070711.223545-1-vaibhav@linu...
Vaibhav Jain (6):
libndctl: Refactor out add_dimm() to handle NFIT specific init
libncdtl: Add initial support for NVDIMM_FAMILY_PAPR_SCM dimm family
libndctl: Introduce new dimm-ops dimm_init() & dimm_uninit()
libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
libndctl,papr_scm: Add scaffolding to issue and handle PDSM requests
libndctl,papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
ndctl/lib/Makefile.am | 1 +
ndctl/lib/libndctl.c | 260 ++++++++++++++++++++++++---------
ndctl/lib/papr_scm.c | 293 ++++++++++++++++++++++++++++++++++++++
ndctl/lib/papr_scm_pdsm.h | 192 +++++++++++++++++++++++++
ndctl/lib/private.h | 7 +
ndctl/libndctl.h | 1 +
ndctl/ndctl.h | 1 +
7 files changed, 685 insertions(+), 70 deletions(-)
create mode 100644 ndctl/lib/papr_scm.c
create mode 100644 ndctl/lib/papr_scm_pdsm.h
--
2.25.3
2 years
[PATCH v2 0/3] mm/memory_hotplug: Allow to not create firmware memmap entries
by David Hildenbrand
This is the follow up of [1]:
[PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with
kexec-tools
I realized that this is not only helpful for virtio-mem, but also for
dax/kmem - it's a fix for that use case (see patch #3) of persistent
memory.
Also, while testing, I discovered that kexec-tools will *not* add dax/kmem
memory (anything not directly under the root when parsing /proc/iomem) to
the elfcorehdr, so this memory will never get included in a dump. This
probably has to be fixed in kexec-tools - virtio-mem will require this as
well.
v1 -> v2:
- Don't change the resource name
- Rename the flag to MHP_NO_FIRMWARE_MEMMAP to reflect what it is doing
- Rephrase subjects/descriptions
- Use the flag for dax/kmem
I'll have to rebase virtio-mem on these changes, there will be a resend.
[1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com
David Hildenbrand (3):
mm/memory_hotplug: Prepare passing flags to add_memory() and friends
mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
device-dax: Add system ram (add_memory()) with MHP_NO_FIRMWARE_MEMMAP
arch/powerpc/platforms/powernv/memtrace.c | 2 +-
arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +-
drivers/acpi/acpi_memhotplug.c | 2 +-
drivers/base/memory.c | 2 +-
drivers/dax/kmem.c | 3 ++-
drivers/hv/hv_balloon.c | 2 +-
drivers/s390/char/sclp_cmd.c | 2 +-
drivers/xen/balloon.c | 2 +-
include/linux/memory_hotplug.h | 15 ++++++++++++---
mm/memory_hotplug.c | 14 ++++++++------
10 files changed, 29 insertions(+), 17 deletions(-)
--
2.25.3
2 years