[PATCH 0/2] fix sync to flush processor cache for ext4 DAX files
by Toshi Kani
This patchset fixes an issue that sync syscall to an existing DAX file
does not flush processor cache.
Patch 1/2 adds a check to skip the journal inode. It's a bit awkward,
but I could not find a beter way to get the journal inode.
Patch 2/2 fixes the issue by moving up ext4_set_inode_flags() before
ext4_set_aops() in ext4_iget(). This assumes updated i_flags is harmless
in the error cases after the moved-up ext4_set_inode_flags(). Please
review.
---
Toshi Kani (2):
1/2 ext4, dax: update dax check to skip journal inode
2/2 ext4, dax: set ext4_dax_aops for dax files
---
fs/ext4/ext4_jbd2.h | 8 ++++++++
fs/ext4/inode.c | 5 ++++-
2 files changed, 12 insertions(+), 1 deletion(-)
2 years, 4 months
回复:如何规避合 伙人风险
by 宋总
linux-nvdimm
》》》》见 》》》》 附》》》》 件
【老板困惑】
1、如何引进投资人而不丧失控股权,公司估值如何测算?
2、如何让员工当老板,解决员工自驱力问题?
3、如何解决员工分红越多,造反越快的难题?
3、如何解决一线部门“吃肉”,职能部门“喝汤”的难题?
4、如何学会杯酒释“兵权”,规避控股权旁落之风险?
5、如何活用“三湾改编”,打造合伙精神?
6、如何让客户、经销商变成合伙人,做大公司业绩而无利益输送的担忧?
7、企业合伙人制度制案例剖析以及案例详解!……
针对以上问题我们将在当天课程当中为您深度解密合伙人制度
从基础原理到模式设计到实操案例到注意事项到风险防范,既是思维的提升,又是实务的落地。
2018-9-12 15:56:29
2 years, 4 months
[PATCH v4 0/2] ext4: fix DAX dma vs truncate/hole-punch
by Ross Zwisler
Changes since v3:
* Added an ext4_break_layouts() call to ext4_insert_range() to ensure
that the {ext4,xfs}_break_layouts() calls have the same meaning.
(Dave, Darrick and Jan)
---
This series from Dan:
https://lists.01.org/pipermail/linux-nvdimm/2018-March/014913.html
added synchronization between DAX dma and truncate/hole-punch in XFS.
This short series adds analogous support to ext4.
I've added calls to ext4_break_layouts() everywhere that ext4 removes
blocks from an inode's map.
The timings in XFS are such that it's difficult to hit this race. Dan
was able to show the race by manually introducing delays in the direct
I/O path.
For ext4, though, its trivial to hit this race, and a hit will result in
a trigger of this WARN_ON_ONCE() in dax_disassociate_entry():
WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
I've made an xfstest which tests all the paths where we now call
ext4_break_layouts(). Each of the four paths easily hits this race many
times in my test setup with the xfstest. You can find that test here:
https://lists.01.org/pipermail/linux-nvdimm/2018-June/016435.html
With these patches applied, I've still seen occasional hits of the above
WARN_ON_ONCE(), which tells me that we still have some work to do. I'll
continue looking at these more rare hits.
Ross Zwisler (2):
dax: dax_layout_busy_page() warn on !exceptional
ext4: handle layout changes to pinned DAX mappings
fs/dax.c | 10 +++++++++-
fs/ext4/ext4.h | 1 +
fs/ext4/extents.c | 17 +++++++++++++++++
fs/ext4/inode.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/truncate.h | 4 ++++
5 files changed, 77 insertions(+), 1 deletion(-)
--
2.14.4
2 years, 4 months
open sets ext4_da_aops for DAX existing files
by Kani, Toshi
I noticed that both ext4_da_aops and ext4_dax_aops are used on DAX
mounted ext4 files. Looking at open() path:
New file
--------
lookup_open
ext4_create
__ext4_new_inode
ext4_set_inode_flags // Set S_DAX flag
ext4_set_aops // Set aops to ext4_dax_aops
Existing file
-------------
lookup_open
ext4_lookup
ext4_iget
ext4_set_aops // Set aops to ext4_da_aops
ext4_set_inode_flags // Set S_DAX flag
So, we set ext4_da_aops for existing files since S_DAX flag is set after
ext4_set_aops().
Thanks,
-Toshi
2 years, 4 months
[PATCH v2 1/2] ext4: Close race between direct IO and ext4_break_layouts()
by Dave Jiang
From: Ross Zwisler <zwisler(a)kernel.org>
If the refcount of a page is lowered between the time that it is returned
by dax_busy_page() and when the refcount is again checked in
ext4_break_layouts() => ___wait_var_event(), the waiting function
ext4_wait_dax_page() will never be called. This means that
ext4_break_layouts() will still have 'retry' set to false, so we'll stop
looping and never check the refcount of other pages in this inode.
Instead, always continue looping as long as dax_layout_busy_page() gives us
a page which it found with an elevated refcount.
Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
---
v2:
- remove verbiage in comment header (Jan)
fs/ext4/inode.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8f6ad7667974..d2663a1e3ec2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4191,9 +4191,8 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
return 0;
}
-static void ext4_wait_dax_page(struct ext4_inode_info *ei, bool *did_unlock)
+static void ext4_wait_dax_page(struct ext4_inode_info *ei)
{
- *did_unlock = true;
up_write(&ei->i_mmap_sem);
schedule();
down_write(&ei->i_mmap_sem);
@@ -4203,14 +4202,12 @@ int ext4_break_layouts(struct inode *inode)
{
struct ext4_inode_info *ei = EXT4_I(inode);
struct page *page;
- bool retry;
int error;
if (WARN_ON_ONCE(!rwsem_is_locked(&ei->i_mmap_sem)))
return -EINVAL;
do {
- retry = false;
page = dax_layout_busy_page(inode->i_mapping);
if (!page)
return 0;
@@ -4218,8 +4215,8 @@ int ext4_break_layouts(struct inode *inode)
error = ___wait_var_event(&page->_refcount,
atomic_read(&page->_refcount) == 1,
TASK_INTERRUPTIBLE, 0, 0,
- ext4_wait_dax_page(ei, &retry));
- } while (error == 0 && retry);
+ ext4_wait_dax_page(ei));
+ } while (error == 0);
return error;
}
2 years, 4 months
[PATCH 0/3] libnvdimm: reset seeds for next namespace creation
by Ocean He
From: Ocean He <hehy1(a)lenovo.com>
When pmem namespaces created are smaller than section size twice, the
second creation would fail and meanwhile there is a kernel call trace
which comes from commit 15d36fecd0bdc7510b70 ("mm: disallow mappings that
conflict for devm_memremap_pages()").
------------[ cut here ]------------
nd_pmem pfn1.1: Conflicting mapping in same section
WARNING: CPU: 84 PID: 51974 at kernel/memremap.c:194 devm_memremap_pages+0x4a0/0x4e0
CPU: 84 PID: 51974 Comm: ndctl Kdump: loaded Tainted: G W E 4.19.0-rc2-23-default+ #27
RIP: 0010:devm_memremap_pages+0x4a0/0x4e0
Call Trace:
pmem_attach_disk+0x3ab/0x581 [nd_pmem]
nvdimm_bus_probe+0x69/0x150 [libnvdimm]
really_probe+0x262/0x3d0
driver_probe_device+0x60/0x120
bind_store+0x102/0x190
kernfs_fop_write+0x105/0x180
__vfs_write+0x36/0x1a0
? common_file_perm+0x47/0x130
? security_file_permission+0x2c/0xb0
vfs_write+0xad/0x1a0
ksys_write+0x52/0xc0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Here is an example (section size is 128MB) based on kernel 4.19-rc2.
# ndctl create-namespace -r region1 -s 100m -t pmem -m fsdax
{
"dev":"namespace1.0",
"mode":"fsdax",
"map":"dev",
"size":"96.00 MiB (100.66 MB)",
"uuid":"ef9a0556-a610-40b5-8c71-43991765a2cc",
"raw_uuid":"177b22e2-b7e8-482f-a063-2b8de876d979",
"sector_size":512,
"blockdev":"pmem1",
"numa_node":1
}
# ndctl create-namespace -r region1 -s 100m -t pmem -m fsdax
libndctl: ndctl_pfn_enable: pfn1.1: failed to enable
Error: namespace1.1: failed to enable
failed to create namespace: No such device or address
When above second creation failure occurs, the expectation is to destroy
namespace1.0 to create a new namespace which size is aligned with section
size. However, both namespace seed and pfn seed have been consumed, the
new namespace creation still fails.
# ndctl destroy-namespace namespace1.0 -f
destroyed 1 namespace
# ndctl create-namespace -r region1 -s 128m -t pmem -m fsdax
failed to create namespace: Device or resource busy
To ensure pfn_seed/dax_seed and namespace_seed are always ready for next
namespace creation, this patch set enables seed detach and reset. Back to
the example, the new namespace creation never fails if this patch set
applied.
# ndctl destroy-namespace namespace1.0 -f
destroyed 1 namespace
# ndctl create-namespace -r region1 -s 128m -t pmem -m fsdax
{
"dev":"namespace1.0",
"mode":"fsdax",
"map":"dev",
"size":"124.00 MiB (130.02 MB)",
"uuid":"0d0e7506-d108-4a88-824a-edef26fd0399",
"raw_uuid":"efeb9647-12f5-44cd-8a52-2f3a0d14589a",
"sector_size":512,
"blockdev":"pmem1",
"numa_node":1
}
# ndctl create-namespace -r region1 -s 128m -t pmem -m fsdax
{
"dev":"namespace1.1",
"mode":"fsdax",
"map":"dev",
"size":130023424,
"uuid":"689828dc-8779-434d-8e93-0406d4e1e536",
"raw_uuid":"d86e1025-c224-48b6-b2a7-6ccef152d5fd",
"sector_size":512,
"blockdev":"pmem1.1",
"numa_node":1
}
The mode devdax (-m devdax) has the same issue, this patch set could
cover it.
Ocean He (3):
libnvdimm, claim: remove static attribute of nd_detach_and_reset
libnvdimm, namespace_devs: add function nd_region_reset_ns_seed for
namespace seed reset
libnvdimm, region_devs: reset related seeds when fail to create
namespace
drivers/nvdimm/claim.c | 2 +-
drivers/nvdimm/namespace_devs.c | 32 ++++++++++++++++++++++++++++++++
drivers/nvdimm/nd-core.h | 2 ++
drivers/nvdimm/region_devs.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 69 insertions(+), 1 deletion(-)
--
1.8.3.1
2 years, 4 months
[PATCH v5 00/13] Copy Offload in NVMe Fabrics with P2P PCI Memory
by Logan Gunthorpe
Hi Everyone,
Now that the patchset which creates a command line option to disable
ACS redirection has landed it's time to revisit the P2P patchset for
copy offoad in NVMe fabrics.
I present version 5 wihch no longer does any magic with the ACS bits and
instead will reject P2P transactions between devices that would be affected
by them. A few other cleanups were done which are described in the
changelog below.
This version is based on v4.19-rc1 and a git repo is here:
https://github.com/sbates130272/linux-p2pmem pci-p2p-v5
Thanks,
Logan
--
Changes in v5:
* Rebased on v4.19-rc1
* Drop changing ACS settings in this patchset. Now, the code
will only allow P2P transactions between devices whos
downstream ports do not restrict P2P TLPs.
* Drop the REQ_PCI_P2PDMA block flag and instead use
is_pci_p2pdma_page() to tell if a request is P2P or not. In that
case we check for queue support and enforce using REQ_NOMERGE.
Per feedback from Christoph.
* Drop the pci_p2pdma_unmap_sg() function as it was empty and only
there for symmetry and compatibility with dma_unmap_sg. Per feedback
from Christoph.
* Split off the logic to handle enabling P2P in NVMe fabrics' configfs
into specific helpers in the p2pdma code. Per feedback from Christoph.
* A number of other minor cleanups and fixes as pointed out by
Christoph and others.
Changes in v4:
* Change the original upstream_bridges_match() function to
upstream_bridge_distance() which calculates the distance between two
devices as long as they are behind the same root port. This should
address Bjorn's concerns that the code was to focused on
being behind a single switch.
* The disable ACS function now disables ACS for all bridge ports instead
of switch ports (ie. those that had two upstream_bridge ports).
* Change the pci_p2pmem_alloc_sgl() and pci_p2pmem_free_sgl()
API to be more like sgl_alloc() in that the alloc function returns
the allocated scatterlist and nents is not required bythe free
function.
* Moved the new documentation into the driver-api tree as requested
by Jonathan
* Add SGL alloc and free helpers in the nvmet code so that the
individual drivers can share the code that allocates P2P memory.
As requested by Christoph.
* Cleanup the nvmet_p2pmem_store() function as Christoph
thought my first attempt was ugly.
* Numerous commit message and comment fix-ups
Changes in v3:
* Many more fixes and minor cleanups that were spotted by Bjorn
* Additional explanation of the ACS change in both the commit message
and Kconfig doc. Also, the code that disables the ACS bits is surrounded
explicitly by an #ifdef
* Removed the flag we added to rdma_rw_ctx() in favour of using
is_pci_p2pdma_page(), as suggested by Sagi.
* Adjust pci_p2pmem_find() so that it prefers P2P providers that
are closest to (or the same as) the clients using them. In cases
of ties, the provider is randomly chosen.
* Modify the NVMe Target code so that the PCI device name of the provider
may be explicitly specified, bypassing the logic in pci_p2pmem_find().
(Note: it's still enforced that the provider must be behind the
same switch as the clients).
* As requested by Bjorn, added documentation for driver writers.
Changes in v2:
* Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
as a bunch of cleanup and spelling fixes he pointed out in the last
series.
* To address Alex's ACS concerns, we change to a simpler method of
just disabling ACS behind switches for any kernel that has
CONFIG_PCI_P2PDMA.
* We also reject using devices that employ 'dma_virt_ops' which should
fairly simply handle Jason's concerns that this work might break with
the HFI, QIB and rxe drivers that use the virtual ops to implement
their own special DMA operations.
--
This is a continuation of our work to enable using Peer-to-Peer PCI
memory in the kernel with initial support for the NVMe fabrics target
subsystem. Many thanks go to Christoph Hellwig who provided valuable
feedback to get these patches to where they are today.
The concept here is to use memory that's exposed on a PCI BAR as
data buffers in the NVMe target code such that data can be transferred
from an RDMA NIC to the special memory and then directly to an NVMe
device avoiding system memory entirely. The upside of this is better
QoS for applications running on the CPU utilizing memory and lower
PCI bandwidth required to the CPU (such that systems could be designed
with fewer lanes connected to the CPU).
Due to these trade-offs we've designed the system to only enable using
the PCI memory in cases where the NIC, NVMe devices and memory are all
behind the same PCI switch hierarchy. This will mean many setups that
could likely work well will not be supported so that we can be more
confident it will work and not place any responsibility on the user to
understand their topology. (We chose to go this route based on feedback
we received at the last LSF). Future work may enable these transfers
using a white list of known good root complexes. However, at this time,
there is no reliable way to ensure that Peer-to-Peer transactions are
permitted between PCI Root Ports.
In order to enable this functionality, we introduce a few new PCI
functions such that a driver can register P2P memory with the system.
Struct pages are created for this memory using devm_memremap_pages()
and the PCI bus offset is stored in the corresponding pagemap structure.
When the PCI P2PDMA config option is selected the ACS bits in every
bridge port in the system are turned off to allow traffic to
pass freely behind the root port. At this time, the bit must be disabled
at boot so the IOMMU subsystem can correctly create the groups, though
this could be addressed in the future. There is no way to dynamically
disable the bit and alter the groups.
Another set of functions allow a client driver to create a list of
client devices that will be used in a given P2P transactions and then
use that list to find any P2P memory that is supported by all the
client devices.
In the block layer, we also introduce a P2P request flag to indicate a
given request targets P2P memory as well as a flag for a request queue
to indicate a given queue supports targeting P2P memory. P2P requests
will only be accepted by queues that support it. Also, P2P requests
are marked to not be merged seeing a non-homogenous request would
complicate the DMA mapping requirements.
In the PCI NVMe driver, we modify the existing CMB support to utilize
the new PCI P2P memory infrastructure and also add support for P2P
memory in its request queue. When a P2P request is received it uses the
pci_p2pmem_map_sg() function which applies the necessary transformation
to get the corrent pci_bus_addr_t for the DMA transactions.
In the RDMA core, we also adjust rdma_rw_ctx_init() and
rdma_rw_ctx_destroy() to take a flags argument which indicates whether
to use the PCI P2P mapping functions or not. To avoid odd RDMA devices
that don't use the proper DMA infrastructure this code rejects using
any device that employs the virt_dma_ops implementation.
Finally, in the NVMe fabrics target port we introduce a new
configuration boolean: 'allow_p2pmem'. When set, the port will attempt
to find P2P memory supported by the RDMA NIC and all namespaces. If
supported memory is found, it will be used in all IO transfers. And if
a port is using P2P memory, adding new namespaces that are not supported
by that memory will fail.
These patches have been tested on a number of Intel based systems and
for a variety of RDMA NICs (Mellanox, Broadcomm, Chelsio) and NVMe
SSDs (Intel, Seagate, Samsung) and p2pdma devices (Eideticom,
Microsemi, Chelsio and Everspin) using switches from both Microsemi
and Broadcomm.
Logan Gunthorpe (13):
PCI/P2PDMA: Support peer-to-peer memory
PCI/P2PDMA: Add sysfs group to display p2pmem stats
PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
docs-rst: Add a new directory for PCI documentation
PCI/P2PDMA: Add P2P DMA driver writer's documentation
block: Add PCI P2P flag for request queue and check support for
requests
IB/core: Ensure we map P2P memory correctly in
rdma_rw_ctx_[init|destroy]()
nvme-pci: Use PCI p2pmem subsystem to manage the CMB
nvme-pci: Add support for P2P memory in requests
nvme-pci: Add a quirk for a pseudo CMB
nvmet: Introduce helper functions to allocate and free request SGLs
nvmet: Optionally use PCI P2P memory
Documentation/ABI/testing/sysfs-bus-pci | 25 +
Documentation/driver-api/index.rst | 2 +-
Documentation/driver-api/pci/index.rst | 21 +
Documentation/driver-api/pci/p2pdma.rst | 170 ++++++
Documentation/driver-api/{ => pci}/pci.rst | 0
block/blk-core.c | 14 +
drivers/infiniband/core/rw.c | 11 +-
drivers/nvme/host/core.c | 4 +
drivers/nvme/host/nvme.h | 8 +
drivers/nvme/host/pci.c | 121 ++--
drivers/nvme/target/configfs.c | 36 ++
drivers/nvme/target/core.c | 149 +++++
drivers/nvme/target/nvmet.h | 15 +
drivers/nvme/target/rdma.c | 22 +-
drivers/pci/Kconfig | 17 +
drivers/pci/Makefile | 1 +
drivers/pci/p2pdma.c | 941 +++++++++++++++++++++++++++++
include/linux/blkdev.h | 3 +
include/linux/memremap.h | 6 +
include/linux/mm.h | 18 +
include/linux/pci-p2pdma.h | 124 ++++
include/linux/pci.h | 4 +
22 files changed, 1658 insertions(+), 54 deletions(-)
create mode 100644 Documentation/driver-api/pci/index.rst
create mode 100644 Documentation/driver-api/pci/p2pdma.rst
rename Documentation/driver-api/{ => pci}/pci.rst (100%)
create mode 100644 drivers/pci/p2pdma.c
create mode 100644 include/linux/pci-p2pdma.h
--
2.11.0
2 years, 4 months
[PATCH] device-dax: avoid hang on error before devm_memremap_pages()
by Stefan Hajnoczi
dax_pmem_percpu_exit() waits for dax_pmem_percpu_release() to invoke the
dax_pmem->cmp completion. Unfortunately this approach to cleaning up
the percpu_ref only works after devm_memremap_pages() was successful.
If devm_add_action_or_reset() or devm_memremap_pages() fails,
dax_pmem_percpu_release() is not invoked. Therefore
dax_pmem_percpu_exit() hangs waiting for the completion:
rc = devm_add_action_or_reset(dev, dax_pmem_percpu_exit,
&dax_pmem->ref);
if (rc)
return rc;
dax_pmem->pgmap.ref = &dax_pmem->ref;
addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
Avoid the hang by calling percpu_ref_exit() in the error paths instead
of going through dax_pmem_percpu_exit().
Signed-off-by: Stefan Hajnoczi <stefanha(a)redhat.com>
---
Found by code inspection. Compile-tested only.
---
drivers/dax/pmem.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index fd49b24fd6af..99e2aace8078 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -105,15 +105,19 @@ static int dax_pmem_probe(struct device *dev)
if (rc)
return rc;
- rc = devm_add_action_or_reset(dev, dax_pmem_percpu_exit,
- &dax_pmem->ref);
- if (rc)
+ rc = devm_add_action(dev, dax_pmem_percpu_exit, &dax_pmem->ref);
+ if (rc) {
+ percpu_ref_exit(&dax_pmem->ref);
return rc;
+ }
dax_pmem->pgmap.ref = &dax_pmem->ref;
addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
- if (IS_ERR(addr))
+ if (IS_ERR(addr)) {
+ devm_remove_action(dev, dax_pmem_percpu_exit, &dax_pmem->ref);
+ percpu_ref_exit(&dax_pmem->ref);
return PTR_ERR(addr);
+ }
rc = devm_add_action_or_reset(dev, dax_pmem_percpu_kill,
&dax_pmem->ref);
--
2.17.1
2 years, 4 months
[PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE
by Dan Williams
Changes since v1 [1]:
* Teach memmap_sync() to take over a sub-set of memmap initialization in
the foreground. This foreground work still needs to await the
completion of vmemmap_populate_hugepages(), but it will otherwise
steal 1/1024th of the 'struct page' init work for the given range.
(Jan)
* Add kernel-doc for all the new 'async' structures.
* Split foreach_order_pgoff() to its own patch.
* Add Pavel and Daniel to the cc as they have been active in the memory
hotplug code.
* Fix a typo that prevented CONFIG_DAX_DRIVER_DEBUG=y from performing
early pfn retrieval at dax-filesystem mount time.
* Improve some of the changelogs
[1]: https://lwn.net/Articles/759117/
---
In order to keep pfn_to_page() a simple offset calculation the 'struct
page' memmap needs to be mapped and initialized in advance of any usage
of a page. This poses a problem for large memory systems as it delays
full availability of memory resources for 10s to 100s of seconds.
For typical 'System RAM' the problem is mitigated by the fact that large
memory allocations tend to happen after the kernel has fully initialized
and userspace services / applications are launched. A small amount, 2GB
of memory, is initialized up front. The remainder is initialized in the
background and freed to the page allocator over time.
Unfortunately, that scheme is not directly reusable for persistent
memory and dax because userspace has visibility to the entire resource
pool and can choose to access any offset directly at its choosing. In
other words there is no allocator indirection where the kernel can
satisfy requests with arbitrary pages as they become initialized.
That said, we can approximate the optimization by performing the
initialization in the background, allow the kernel to fully boot the
platform, start up pmem block devices, mount filesystems in dax mode,
and only incur delay at the first userspace dax fault. When that initial
fault occurs that process is delegated a portion of the memmap to
initialize in the foreground so that it need not wait for initialization
of resources that it does not immediately need.
With this change an 8 socket system was observed to initialize pmem
namespaces in ~4 seconds whereas it was previously taking ~4 minutes.
These patches apply on top of the HMM + devm_memremap_pages() reworks:
https://marc.info/?l=linux-mm&m=153128668008585&w=2
---
Dan Williams (10):
mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone()
mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages()
mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages
mm: Multithread ZONE_DEVICE initialization
mm, memremap: Up-level foreach_order_pgoff()
mm: Allow an external agent to coordinate memmap initialization
filesystem-dax: Make mount time pfn validation a debug check
libnvdimm, pmem: Initialize the memmap in the background
device-dax: Initialize the memmap in the background
libnvdimm, namespace: Publish page structure init state / control
Huaisheng Ye (4):
libnvdimm, pmem: Allow a NULL-pfn to ->direct_access()
tools/testing/nvdimm: Allow a NULL-pfn to ->direct_access()
s390, dcssblk: Allow a NULL-pfn to ->direct_access()
filesystem-dax: Do not request a pfn when not required
arch/ia64/mm/init.c | 5 +
arch/powerpc/mm/mem.c | 5 +
arch/s390/mm/init.c | 8 +
arch/sh/mm/init.c | 5 +
arch/x86/mm/init_32.c | 8 +
arch/x86/mm/init_64.c | 27 ++--
drivers/dax/Kconfig | 10 +
drivers/dax/dax-private.h | 2
drivers/dax/device-dax.h | 2
drivers/dax/device.c | 16 ++
drivers/dax/pmem.c | 5 +
drivers/dax/super.c | 64 ++++++---
drivers/nvdimm/nd.h | 2
drivers/nvdimm/pfn_devs.c | 50 +++++--
drivers/nvdimm/pmem.c | 17 ++
drivers/nvdimm/pmem.h | 1
drivers/s390/block/dcssblk.c | 5 -
fs/dax.c | 10 -
include/linux/memmap_async.h | 110 ++++++++++++++++
include/linux/memory_hotplug.h | 18 ++-
include/linux/memremap.h | 31 ++++
include/linux/mm.h | 8 +
kernel/memremap.c | 85 ++++++------
mm/memory_hotplug.c | 73 ++++++++---
mm/page_alloc.c | 271 +++++++++++++++++++++++++++++++++++----
mm/sparse-vmemmap.c | 56 ++++++--
tools/testing/nvdimm/pmem-dax.c | 11 +-
27 files changed, 717 insertions(+), 188 deletions(-)
create mode 100644 include/linux/memmap_async.h
2 years, 4 months