[ndctl PATCH 0/2] misc fixes and cleanup
by Vishal Verma
Patch 1 is a README update in the Unit tests section to include
CONFIG_ENCRYPTED_KEYS in the list of config items required for unit tests.
Patch 2 fixes a potential resource leak found during static analysis
Vishal Verma (2):
ndctl/README: Add CONFIG_ENCRYPTED_KEYS to the config items list
ndctl/namespace: fix a resource leak in file_write_infoblock()
README.md | 1 +
ndctl/namespace.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
--
2.26.2
2 years
[ndctl PATCH] papr: Check for command type in papr_xlat_firmware_status()
by Vaibhav Jain
We recently discovered intermittent failures while reading label-area
of PAPR-NVDimms and the command 'read-labels' would in such a case
generated empty output like below:
$ sudo ndctl read-labels -j nmem0
[
]
read 0 nmem
Upon investigation we found that this was caused by a spurious error
code returned from ndctl_cmd_submit_xlat() when its called from
ndctl_dimm_read_label_extent() while trying to read the label-area
contents of the NVDIMM.
Digging further it was relieved that ndctl_cmd_submit_xlat() would
always call papr_xlat_firmware_status() via pointer
'papr_dimm_ops->xlat_firmware_status' to retrieve translated firmware
status for all ndctl_cmds even though they arent really PAPR PDSM
commands.
In this case ndctl_cmd->type == ND_CMD_GET_CONFIG_DATA and was
represented by type 'struct nd_cmd_get_config_data_hdr' and
papr_xlat_firmware_status() incorrectly assumed it to be of type
'struct nd_pkg_pdsm' and wrongly dereferenced it returning an invalid
value.
A proper fix for this would probably need introducing a new ndctl_cmd
callback like 'ndctl_cmd.get_xlat_firmware_status' similar to one
introduced in [1]. However such a change could be disruptive, hence
the patch introduces a small workaround in papr_xlat_firmware_status()
that checks if the 'struct ndctl_cmd *' provided if of correct type
CMD_CALL and if not then it ignores it and return '0'
References:
[1]: commit fa754dd8acdb ("ndctl/dimm: Rework dimm command status
reporting")
Fixes: 151d2876c49e ("papr: Add scaffolding to issue and handle PDSM requests")
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
---
ndctl/lib/papr.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/ndctl/lib/papr.c b/ndctl/lib/papr.c
index d9ce253369b3..63561f8f9797 100644
--- a/ndctl/lib/papr.c
+++ b/ndctl/lib/papr.c
@@ -56,9 +56,7 @@ static u32 papr_get_firmware_status(struct ndctl_cmd *cmd)
static int papr_xlat_firmware_status(struct ndctl_cmd *cmd)
{
- const struct nd_pkg_pdsm *pcmd = to_pdsm(cmd);
-
- return pcmd->cmd_status;
+ return (cmd->type == ND_CMD_CALL) ? to_pdsm(cmd)->cmd_status : 0;
}
/* Verify if the given command is supported and valid */
--
2.26.2
2 years
[ndctl PATCH v2 0/4] Firmware Activation and Test Updates
by Dan Williams
Changes since v1 [1]:
- Update the firmware-activation patches per v3 of the kernel support
series. Specifically handle the rename of
{ndbusX,nmemX}/firmware_activate to {ndbusX,nmemX}/firmware/activate,
add support for ndbusX/firmware/capability, and account for ability to
specify "quiesce" or "live" to ndbusX/firmware/activate to select the
activation method.
- This series replaces patch 8, 9, 10, and 12 from the v1 posting.
[1]: http://lore.kernel.org/r/159408961822.2386154.888266173771881937.stgit@dw...
---
Some persistent memory devices run a firmware locally on the device /
"DIMM" to perform tasks like media management, capacity provisioning,
and health monitoring. The process of updating that firmware typically
involves a reboot because it has implications for in-flight memory
transactions. However, reboots are disruptive and at least the Intel
persistent memory platform implementation, described by the Intel ACPI
DSM specification [2], has added support for activating firmware at
runtime.
As mentioned in the kernel patches adding support for firmware-activate
[3], ndctl is extended with the following functionality:
1/ The existing update-firmware command will 'arm' devices where the
firmware image is staged by default.
ndctl update-firmware all -f firmware_image.bin
2/ The existing ability to enumerate firmware-update capabilities now
includes firmware activate capabilities at the 'bus' and 'dimm/device'
level:
ndctl list -BDF -b nfit_test.0
[
{
"provider":"nfit_test.0",
"dev":"ndbus2",
"scrub_state":"idle",
"firmware":{
"activate_method":"suspend",
"activate_state":"idle"
},
"dimms":[
{
"dev":"nmem1",
"id":"cdab-0a-07e0-ffffffff",
"handle":0,
"phys_id":0,
"security":"disabled",
"firmware":{
"current_version":0,
"can_update":true
}
},
...
3/ The new activate-firmware command triggers firmware activation per
the platform enumerated context, "suspend" vs "live", or can be forced
to "live" if there is explicit knowledge that allowing applications
and devices to race the quiesce timeout will have no adverse effects.
ndctl activate-firmware nfit_test.0 [--force]
[2]: https://pmem.io/documents/IntelOptanePMem_DSM_Interface-V2.0.pdf
[3]: http://lore.kernel.org/r/159528284411.993790.11733759435137949717.stgit@d...
---
Dan Williams (4):
ndctl/list: Add firmware activation enumeration
ndctl/dimm: Auto-arm firmware activation
ndctl/bus: Add 'activate-firmware' command
ndctl/test: Test firmware-activation interface
Documentation/ndctl/Makefile.am | 3
Documentation/ndctl/ndctl-activate-firmware.txt | 146 +++++++++++++
Documentation/ndctl/ndctl-list.txt | 39 +++
Documentation/ndctl/ndctl-update-firmware.txt | 16 +
ndctl/action.h | 1
ndctl/builtin.h | 1
ndctl/bus.c | 158 +++++++++++++-
ndctl/dimm.c | 125 ++++++++++-
ndctl/lib/libndctl.c | 257 +++++++++++++++++++++++
ndctl/lib/libndctl.sym | 14 +
ndctl/lib/private.h | 4
ndctl/libndctl.h | 35 +++
ndctl/list.c | 3
ndctl/ndctl.c | 1
test/firmware-update.sh | 47 ++++
util/json.c | 117 +++++++++-
util/json.h | 3
17 files changed, 920 insertions(+), 50 deletions(-)
create mode 100644 Documentation/ndctl/ndctl-activate-firmware.txt
2 years
[PATCH v2] ACPI: Drop rcu usage for MMIO mappings
by Dan Williams
Recently a performance problem was reported for a process invoking a
non-trival ASL program. The method call in this case ends up
repetitively triggering a call path like:
acpi_ex_store
acpi_ex_store_object_to_node
acpi_ex_write_data_to_field
acpi_ex_insert_into_field
acpi_ex_write_with_update_rule
acpi_ex_field_datum_io
acpi_ex_access_region
acpi_ev_address_space_dispatch
acpi_ex_system_memory_space_handler
acpi_os_map_cleanup.part.14
_synchronize_rcu_expedited.constprop.89
schedule
The end result of frequent synchronize_rcu_expedited() invocation is
tiny sub-millisecond spurts of execution where the scheduler freely
migrates this apparently sleepy task. The overhead of frequent scheduler
invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a
firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping
lock in the acpi_os_{read,write}_memory() path which ostensibly could be
called from an APEI NMI error interrupt? Neither rcu_read_lock() nor
ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu
was not serving as a mechanism to avoid direct calls to ioremap(). Even
the original implementation had a spin_lock_irqsave(), but that is not
NMI safe.
APEI itself already has some concept of avoiding ioremap() from
interrupt context (see erst_exec_move_data()), if the new warning
triggers it means that APEI either needs more instrumentation like that
to pre-emptively fail, or more infrastructure to arrange for pre-mapping
the resources it needs in NMI context.
Cc: <stable(a)vger.kernel.org>
Fixes: 620242ae8c3d ("ACPI: Maintain a list of ACPI memory mapped I/O remappings")
Cc: Len Brown <lenb(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Erik Kaneda <erik.kaneda(a)intel.com>
Cc: Myron Stowe <myron.stowe(a)redhat.com>
Cc: "Rafael J. Wysocki" <rjw(a)rjwysocki.net>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Changes since v1 [1]:
- Actually cc: the most important list for ACPI changes (Rafael)
- Cleanup unnecessary variable initialization (Andy)
Link: https://lore.kernel.org/linux-nvdimm/158880834905.2183490.156163294694202...
drivers/acpi/osl.c | 117 +++++++++++++++++++++++++---------------------------
1 file changed, 57 insertions(+), 60 deletions(-)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 762c5d50b8fe..a44b75aac5d0 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -214,13 +214,13 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
return pa;
}
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
static struct acpi_ioremap *
acpi_map_lookup(acpi_physical_address phys, acpi_size size)
{
struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
+ lockdep_assert_held(&acpi_ioremap_lock);
+ list_for_each_entry(map, &acpi_ioremaps, list)
if (map->phys <= phys &&
phys + size <= map->phys + map->size)
return map;
@@ -228,7 +228,6 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
return NULL;
}
-/* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */
static void __iomem *
acpi_map_vaddr_lookup(acpi_physical_address phys, unsigned int size)
{
@@ -263,7 +262,8 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
{
struct acpi_ioremap *map;
- list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
+ lockdep_assert_held(&acpi_ioremap_lock);
+ list_for_each_entry(map, &acpi_ioremaps, list)
if (map->virt <= virt &&
virt + size <= map->virt + map->size)
return map;
@@ -360,7 +360,7 @@ void __iomem __ref
map->size = pg_sz;
map->refcount = 1;
- list_add_tail_rcu(&map->list, &acpi_ioremaps);
+ list_add_tail(&map->list, &acpi_ioremaps);
out:
mutex_unlock(&acpi_ioremap_lock);
@@ -374,20 +374,13 @@ void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
}
EXPORT_SYMBOL_GPL(acpi_os_map_memory);
-/* Must be called with mutex_lock(&acpi_ioremap_lock) */
-static unsigned long acpi_os_drop_map_ref(struct acpi_ioremap *map)
-{
- unsigned long refcount = --map->refcount;
-
- if (!refcount)
- list_del_rcu(&map->list);
- return refcount;
-}
-
-static void acpi_os_map_cleanup(struct acpi_ioremap *map)
+static void acpi_os_drop_map_ref(struct acpi_ioremap *map)
{
- synchronize_rcu_expedited();
+ lockdep_assert_held(&acpi_ioremap_lock);
+ if (--map->refcount > 0)
+ return;
acpi_unmap(map->phys, map->virt);
+ list_del(&map->list);
kfree(map);
}
@@ -408,7 +401,6 @@ static void acpi_os_map_cleanup(struct acpi_ioremap *map)
void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size)
{
struct acpi_ioremap *map;
- unsigned long refcount;
if (!acpi_permanent_mmap) {
__acpi_unmap_table(virt, size);
@@ -422,11 +414,8 @@ void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size)
WARN(true, PREFIX "%s: bad address %p\n", __func__, virt);
return;
}
- refcount = acpi_os_drop_map_ref(map);
+ acpi_os_drop_map_ref(map);
mutex_unlock(&acpi_ioremap_lock);
-
- if (!refcount)
- acpi_os_map_cleanup(map);
}
EXPORT_SYMBOL_GPL(acpi_os_unmap_iomem);
@@ -461,7 +450,6 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas)
{
u64 addr;
struct acpi_ioremap *map;
- unsigned long refcount;
if (gas->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY)
return;
@@ -477,11 +465,8 @@ void acpi_os_unmap_generic_address(struct acpi_generic_address *gas)
mutex_unlock(&acpi_ioremap_lock);
return;
}
- refcount = acpi_os_drop_map_ref(map);
+ acpi_os_drop_map_ref(map);
mutex_unlock(&acpi_ioremap_lock);
-
- if (!refcount)
- acpi_os_map_cleanup(map);
}
EXPORT_SYMBOL(acpi_os_unmap_generic_address);
@@ -700,55 +685,71 @@ int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width)
return 0;
}
+static void __iomem *acpi_os_rw_map(acpi_physical_address phys_addr,
+ unsigned int size, bool *did_fallback)
+{
+ void __iomem *virt_addr;
+
+ if (WARN_ONCE(in_interrupt(), "ioremap in interrupt context\n"))
+ return NULL;
+
+ /* Try to use a cached mapping and fallback otherwise */
+ *did_fallback = false;
+ mutex_lock(&acpi_ioremap_lock);
+ virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
+ if (virt_addr)
+ return virt_addr;
+ mutex_unlock(&acpi_ioremap_lock);
+
+ virt_addr = acpi_os_ioremap(phys_addr, size);
+ *did_fallback = true;
+
+ return virt_addr;
+}
+
+static void acpi_os_rw_unmap(void __iomem *virt_addr, bool did_fallback)
+{
+ if (did_fallback) {
+ /* in the fallback case no lock is held */
+ iounmap(virt_addr);
+ return;
+ }
+
+ mutex_unlock(&acpi_ioremap_lock);
+}
+
acpi_status
acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width)
{
- void __iomem *virt_addr;
unsigned int size = width / 8;
- bool unmap = false;
+ bool did_fallback = false;
+ void __iomem *virt_addr;
u64 dummy;
int error;
- rcu_read_lock();
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (!virt_addr) {
- rcu_read_unlock();
- virt_addr = acpi_os_ioremap(phys_addr, size);
- if (!virt_addr)
- return AE_BAD_ADDRESS;
- unmap = true;
- }
-
+ virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback);
+ if (!virt_addr)
+ return AE_BAD_ADDRESS;
if (!value)
value = &dummy;
error = acpi_os_read_iomem(virt_addr, value, width);
BUG_ON(error);
- if (unmap)
- iounmap(virt_addr);
- else
- rcu_read_unlock();
-
+ acpi_os_rw_unmap(virt_addr, did_fallback);
return AE_OK;
}
acpi_status
acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width)
{
- void __iomem *virt_addr;
unsigned int size = width / 8;
- bool unmap = false;
+ bool did_fallback = false;
+ void __iomem *virt_addr;
- rcu_read_lock();
- virt_addr = acpi_map_vaddr_lookup(phys_addr, size);
- if (!virt_addr) {
- rcu_read_unlock();
- virt_addr = acpi_os_ioremap(phys_addr, size);
- if (!virt_addr)
- return AE_BAD_ADDRESS;
- unmap = true;
- }
+ virt_addr = acpi_os_rw_map(phys_addr, size, &did_fallback);
+ if (!virt_addr)
+ return AE_BAD_ADDRESS;
switch (width) {
case 8:
@@ -767,11 +768,7 @@ acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width)
BUG();
}
- if (unmap)
- iounmap(virt_addr);
- else
- rcu_read_unlock();
-
+ acpi_os_rw_unmap(virt_addr, did_fallback);
return AE_OK;
}
2 years, 1 month
[RESEND] [PATCH v2] dax: print error message by pr_info() in
__generic_fsdax_supported()
by Coly Li
In struct dax_operations, the callback routine dax_supported() returns
a bool type result. For false return value, the caller has no idea
whether the device does not support dax at all, or it is just some mis-
configuration issue.
An example is formatting an Ext4 file system on pmem device on top of
a NVDIMM namespace by,
# mkfs.ext4 /dev/pmem0
If the fs block size does not match kernel space memory page size (which
is possible on non-x86 platform), mount this Ext4 file system will fail,
# mount -o dax /dev/pmem0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0,
missing codepage or helper program, or other error.
And from the dmesg output there is only the following information,
[ 307.853148] EXT4-fs (pmem0): DAX unsupported by block device.
The above information is quite confusing. Because definiately the pmem0
device supports dax operation, and the super block is consistent as how
it was created by mkfs.ext4.
Indeed the failure is from __generic_fsdax_supported() by the following
code piece,
if (blocksize != PAGE_SIZE) {
pr_debug("%s: error: unsupported blocksize for dax\n",
bdevname(bdev, buf));
return false;
}
It is because the Ext4 block size is 4KB and kernel page size is 8KB or
16KB.
It is not simple to make dax_supported() from struct dax_operations
or __generic_fsdax_supported() to return exact failure type right now.
So the simplest fix is to use pr_info() to print all the error messages
inside __generic_fsdax_supported(). Then users may find informative clue
from the kernel message at least.
Message printed by pr_debug() is very easy to be ignored by users. This
patch prints error message by pr_info() in __generic_fsdax_supported(),
when then mount fails, following lines can be found from dmesg output,
[ 2705.500885] pmem0: error: unsupported blocksize for dax
[ 2705.500888] EXT4-fs (pmem0): DAX unsupported by block device.
Now the users may have idea the mount failure is from pmem driver for
unsupported block size.
Reported-by: Michal Suchanek <msuchanek(a)suse.com>
Suggested-by: Jan Kara <jack(a)suse.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Reviewed-by: Jan Kara <jack(a)suse.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Anthony Iliopoulos <ailiopoulos(a)suse.com>
---
Changelog:
v2: Add reviewed-by from Jan Kara
v1: initial version.
drivers/dax/super.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 8e32345be0f7..de0d02ec0347 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -80,14 +80,14 @@ bool __generic_fsdax_supported(struct dax_device
*dax_dev,
int err, id;
if (blocksize != PAGE_SIZE) {
- pr_debug("%s: error: unsupported blocksize for dax\n",
+ pr_info("%s: error: unsupported blocksize for dax\n",
bdevname(bdev, buf));
return false;
}
err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff);
if (err) {
- pr_debug("%s: error: unaligned partition for dax\n",
+ pr_info("%s: error: unaligned partition for dax\n",
bdevname(bdev, buf));
return false;
}
@@ -95,7 +95,7 @@ bool __generic_fsdax_supported(struct dax_device *dax_dev,
last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
if (err) {
- pr_debug("%s: error: unaligned partition for dax\n",
+ pr_info("%s: error: unaligned partition for dax\n",
bdevname(bdev, buf));
return false;
}
@@ -106,7 +106,7 @@ bool __generic_fsdax_supported(struct dax_device
*dax_dev,
dax_read_unlock(id);
if (len < 1 || len2 < 1) {
- pr_debug("%s: error: dax access failed (%ld)\n",
+ pr_info("%s: error: dax access failed (%ld)\n",
bdevname(bdev, buf), len < 1 ? len : len2);
return false;
}
@@ -139,7 +139,7 @@ bool __generic_fsdax_supported(struct dax_device
*dax_dev,
}
if (!dax_enabled) {
- pr_debug("%s: error: dax support not enabled\n",
+ pr_info("%s: error: dax support not enabled\n",
bdevname(bdev, buf));
return false;
}
--
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm(a)lists.01.org
To unsubscribe send an email to linux-nvdimm-leave(a)lists.01.org
2 years, 1 month
[PATCH v7 0/7] Support new pmem flush and sync instructions for POWER
by Aneesh Kumar K.V
This patch series enables the usage os new pmem flush and sync instructions on POWER
architecture. POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage. Additionally,
POWER10 also introduce phwsync and plwsync which can be used to establish order of these
writes to persistent storage.
This series exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new instructions
are added as a variant of the old ones that old hardware won't differentiate.
On POWER10, pmem devices will be represented by a different device tree compat
strings. This ensures that older kernels won't initialize pmem devices on POWER10.
With this:
1) vPMEM continues to work since it is a volatile region. That
doesn't need any flush instructions.
2) pmdk and other user applications get updated to use new instructions
and updated packages are made available to all distributions
3) On newer hardware, the device will appear with a new compat string.
Hence older distributions won't initialize pmem on newer hardware.
Changes from v6:
* rename flush barrier to pmem_wmb(). Update documentation.
* Drop the WARN_ON in flush routines.
* Drop pap_scm ndr_region flush callback.
Changes from v5:
* Drop CONFIG_ARCH_MAP_SYNC_DISABLE and related changes
Changes from V4:
* Add namespace specific sychronous fault control.
Changes from V3:
* Add new compat string to be used for the device.
* Use arch_pmem_flush_barrier() in dm-writecache.
Aneesh Kumar K.V (7):
powerpc/pmem: Restrict papr_scm to P8 and above.
powerpc/pmem: Add new instructions for persistent storage and sync
powerpc/pmem: Add flush routines using new pmem store and sync
instruction
libnvdimm/nvdimm/flush: Allow architecture to override the flush
barrier
powerpc/pmem: Update ppc64 to use the new barrier instruction.
powerpc/pmem: Avoid the barrier in flush routines
powerpc/pmem: Initialize pmem device on newer hardware
Documentation/memory-barriers.txt | 14 ++++++++
arch/powerpc/include/asm/barrier.h | 13 +++++++
arch/powerpc/include/asm/cacheflush.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 12 +++++++
arch/powerpc/lib/pmem.c | 44 ++++++++++++++++++++---
arch/powerpc/platforms/pseries/papr_scm.c | 1 +
arch/powerpc/platforms/pseries/pmem.c | 6 ++++
drivers/md/dm-writecache.c | 2 +-
drivers/nvdimm/of_pmem.c | 1 +
drivers/nvdimm/region_devs.c | 8 ++---
include/asm-generic/barrier.h | 10 ++++++
11 files changed, 103 insertions(+), 9 deletions(-)
--
2.26.2
2 years, 1 month
[RFC PATCH 00/15] PKS: Add Protection Keys Supervisor (PKS) support
by ira.weiny@intel.com
From: Ira Weiny <ira.weiny(a)intel.com>
This RFC series has been reviewed by Dave Hansen.
This patch set introduces a new page protection mechanism for supervisor pages,
Protection Key Supervisor (PKS) and an initial user of them, persistent memory,
PMEM.
PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to those pages beyond the normal paging protections. They work in
a similar fashion to user space pkeys. Like User page pkeys (PKU), supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
A page mapping is assigned to a domain by setting a pkey in the page table
entry.
Unlike User pkeys no new instructions are added; rather WRMSR/RDMSR are used to
update the PKRS register.
XSAVE is not supported for the PKRS MSR. To reduce software complexity the
implementation saves/restores the MSR across context switches but not during
irqs. This is a compromise which results is a hardening of unwanted access
without absolute restriction.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
arch and/or CPU instance.
Protecting against stray writes is particularly important for PMEM because,
unlike writes to anonymous memory, writes to PMEM persists across a reboot.
Thus data corruption could result in permanent loss of data.
The following attributes of PKS makes it perfect as a mechanism to protect PMEM
from stray access within the kernel:
1) Fast switching of permissions
2) Prevents access without page table manipulations
3) Works on a per thread basis
4) No TLB flushes required
The second half of this series thus uses the PKS mechanism to protect PMEM from
stray access.
Implementation details
----------------------
Modifications of task struct in patches:
(x86/pks: Preserve the PKRS MSR on context switch)
(memremap: Add zone device access protection)
Because pkey access is per-thread 2 modifications are made to the task struct.
The first is a saved copy of the MSR during context switches. The second
reference counts access to the device domain to correctly handle kmap nesting
properly.
Maintain PKS setting in a re-entrant manner in patch:
(memremap: Add zone device access protection)
Using local_irq_save() seems to be the safest and fastest way to maintain kmap
as re-entrant. But there may be a better way. spin_lock_irq() and atomic
counters were considered. But atomic counters do not properly protect the pkey
update and spin_lock_irq() is unnecessary as the pkey protections are thread
local. Suggestions are welcome.
The use of kmap in patch:
(kmap: Add stray write protection for device pages)
To keep general access to PMEM pages general, we piggy back on the kmap()
interface as there are many places in the kernel who do not have, nor should be
required to have, a priori knowledge that a page is PMEM. The modifications to
the kmap code is careful to quickly determine which pages don't require special
handling to reduce overhead for non PMEM pages.
Breakdown of patches
--------------------
Implement PKS within x86 arch:
x86/pkeys: Create pkeys_internal.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Preserve the PKRS MSR on context switch
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
pre-req bug fixes for dax:
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
Add stray write protection to PMEM:
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection
Fenghua Yu (4):
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Ira Weiny (11):
x86/pkeys: Create pkeys_internal.h
x86/pks: Preserve the PKRS MSR on context switch
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection
Documentation/core-api/protection-keys.rst | 81 +++-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 13 +-
arch/x86/include/asm/pgtable_types.h | 4 +
arch/x86/include/asm/pkeys.h | 43 ++
arch/x86/include/asm/pkeys_internal.h | 35 ++
arch/x86/include/asm/processor.h | 13 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 17 +
arch/x86/kernel/fpu/xstate.c | 17 +-
arch/x86/kernel/process.c | 35 ++
arch/x86/mm/fault.c | 16 +-
arch/x86/mm/pkeys.c | 174 +++++++-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 5 +-
drivers/nvdimm/pmem.c | 6 +
fs/dax.c | 13 +-
include/linux/highmem.h | 32 +-
include/linux/memremap.h | 1 +
include/linux/mm.h | 33 ++
include/linux/pkeys.h | 18 +
include/linux/sched.h | 3 +
init/init_task.c | 3 +
kernel/fork.c | 3 +
lib/Kconfig.debug | 12 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 452 ++++++++++++++++++++
mm/Kconfig | 15 +
mm/memremap.c | 111 +++++
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 65 +++
34 files changed, 1175 insertions(+), 61 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_internal.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.25.1
2 years, 1 month
[PATCH -next] libnvdimm/security: Make __nvdimm_security_overwrite_query() static
by Wei Yongjun
The sparse tool complains as follows:
drivers/nvdimm/security.c:416:6: warning:
symbol '__nvdimm_security_overwrite_query' was not declared. Should it be static?
__nvdimm_security_overwrite_query() is not used outside of this
file, so marks it static.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1(a)huawei.com>
---
drivers/nvdimm/security.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c
index 89b85970912d..11fb5ada70ad 100644
--- a/drivers/nvdimm/security.c
+++ b/drivers/nvdimm/security.c
@@ -413,7 +413,7 @@ static int security_overwrite(struct nvdimm *nvdimm, unsigned int keyid)
return rc;
}
-void __nvdimm_security_overwrite_query(struct nvdimm *nvdimm)
+static void __nvdimm_security_overwrite_query(struct nvdimm *nvdimm)
{
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(&nvdimm->dev);
int rc;
2 years, 1 month
[PATCH] Documentation: use includes in more ndctl command pages.
by Michal Suchanek
While backporting commit 498ee3d100c3 ("Documentation: clarify bus/dimm/region filtering")
I noticed not all instances of --bus, --dimm, and --region use the
include and hence do not get the clarification.
Fixes: 498ee3d100c3 ("Documentation: clarify bus/dimm/region filtering")
Signed-off-by: Michal Suchanek <msuchanek(a)suse.de>
---
Documentation/ndctl/labels-options.txt | 7 ++-----
Documentation/ndctl/ndctl-inject-smart.txt | 4 +---
Documentation/ndctl/ndctl-monitor.txt | 11 +++--------
3 files changed, 6 insertions(+), 16 deletions(-)
diff --git a/Documentation/ndctl/labels-options.txt b/Documentation/ndctl/labels-options.txt
index 4aee37969fd5..c7649cfd2aab 100644
--- a/Documentation/ndctl/labels-options.txt
+++ b/Documentation/ndctl/labels-options.txt
@@ -1,9 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
<memory device(s)>::
- One or more 'nmemX' device names. The keyword 'all' can be specified to
- operate on every dimm in the system, optionally filtered by bus id (see
- --bus= option).
+include::xable-dimm-options.txt[]
-s::
--size=::
@@ -16,8 +14,7 @@
-b::
--bus=::
- Limit operation to memory devices (dimms) that are on the given bus.
- Where 'bus' can be a provider name or a bus id number.
+include::xable-bus-options.txt[]
-v::
Turn on verbose debug messages in the library (if ndctl was built with
diff --git a/Documentation/ndctl/ndctl-inject-smart.txt b/Documentation/ndctl/ndctl-inject-smart.txt
index d28be46cae1c..9fd63bae2729 100644
--- a/Documentation/ndctl/ndctl-inject-smart.txt
+++ b/Documentation/ndctl/ndctl-inject-smart.txt
@@ -38,9 +38,7 @@ OPTIONS
-------
-b::
--bus=::
- Enforce that the operation only be carried on devices that are
- attached to the given bus. Where 'bus' can be a provider name or a bus
- id number.
+include::xable-bus-options.txt[]
-m::
--media-temperature=::
diff --git a/Documentation/ndctl/ndctl-monitor.txt b/Documentation/ndctl/ndctl-monitor.txt
index 2239f047266d..c0273d378b59 100644
--- a/Documentation/ndctl/ndctl-monitor.txt
+++ b/Documentation/ndctl/ndctl-monitor.txt
@@ -49,20 +49,15 @@ OPTIONS
-------
-b::
--bus=::
- Enforce that the operation only be carried on devices that are
- attached to the given bus. Where 'bus' can be a provider name
- or a bus id number.
+include::xable-bus-options.txt[]
-d::
--dimm=::
- A 'nmemX' device name, or dimm id number. Select the devices to
- monitor reference the given dimm.
+include::xable-dimm-options.txt[]
-r::
--region=::
- A 'regionX' device name, or a region id number. The keyword 'all'
- can be specified to carry out the operation on every region in
- the system, optionally filtered by bus id (see --bus= option).
+include::xable-region-options.txt[]
-n::
--namespace=::
--
2.26.2
2 years, 1 month
[PATCH 00/17] Documentation/driver-api: eliminate duplicated words
by Randy Dunlap
Remove occurrences of duplicated words in Documentation/driver-api/.
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: linux-doc(a)vger.kernel.org
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: William Breathitt Gray <vilhelm.gray(a)gmail.com>
Cc: linux-iio(a)vger.kernel.org
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: linux-media(a)vger.kernel.org
Cc: Jon Mason <jdmason(a)kudzu.us>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Allen Hubbe <allenbh(a)gmail.com>
Cc: linux-ntb(a)googlegroups.com
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: linux-usb(a)vger.kernel.org
Cc: Eli Billauer <eli.billauer(a)gmail.com>
Documentation/driver-api/dmaengine/provider.rst | 2 +-
Documentation/driver-api/driver-model/platform.rst | 2 +-
Documentation/driver-api/firmware/built-in-fw.rst | 2 +-
Documentation/driver-api/firmware/direct-fs-lookup.rst | 2 +-
Documentation/driver-api/firmware/firmware_cache.rst | 2 +-
Documentation/driver-api/firmware/request_firmware.rst | 2 +-
Documentation/driver-api/generic-counter.rst | 2 +-
Documentation/driver-api/iio/buffers.rst | 2 +-
Documentation/driver-api/media/cec-core.rst | 2 +-
Documentation/driver-api/media/dtv-frontend.rst | 6 +++---
Documentation/driver-api/media/v4l2-controls.rst | 4 ++--
Documentation/driver-api/media/v4l2-dev.rst | 2 +-
Documentation/driver-api/ntb.rst | 2 +-
Documentation/driver-api/nvdimm/nvdimm.rst | 2 +-
Documentation/driver-api/uio-howto.rst | 2 +-
Documentation/driver-api/usb/URB.rst | 2 +-
Documentation/driver-api/xillybus.rst | 2 +-
17 files changed, 20 insertions(+), 20 deletions(-)
2 years, 1 month