[PATCH v1] libnvdimm, dax: Fix a missing check in nd_dax_probe()
by wangyingjie55@126.com
From: Yingjie Wang <wangyingjie55(a)126.com>
In nd_dax_probe(), 'nd_dax' is allocated by nd_dax_alloc().
nd_dax_alloc() may fail and return NULL, so we should better check
it's return value to avoid a NULL pointer dereference
a bit later in the code.
Fixes: c5ed9268643c ("libnvdimm, dax: autodetect support")
Signed-off-by: Yingjie Wang <wangyingjie55(a)126.com>
---
drivers/nvdimm/dax_devs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 99965077bac4..b1426ac03f01 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -106,6 +106,8 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns)
nvdimm_bus_lock(&ndns->dev);
nd_dax = nd_dax_alloc(nd_region);
+ if (!nd_dax)
+ return -ENOMEM;
nd_pfn = &nd_dax->nd_pfn;
dax_dev = nd_pfn_devinit(nd_pfn, ndns);
nvdimm_bus_unlock(&ndns->dev);
--
2.7.4
1 year, 3 months
[PATCH v1] libnvdimm, dax: Fix a missing check in nd_dax_probe()
by wangyingjie55@126.com
From: Yingjie Wang <wangyingjie55(a)126.com>
In nd_dax_probe()��� 'nd_dax' is allocated by nd_dax_alloc().
nd_dax_alloc() may fail and return NULL, so we should better check
it's return value to avoid a NULL pointer dereference
a bit later in the code.
Fixes: c5ed9268643c ("libnvdimm, dax: autodetect support")
Signed-off-by: Yingjie Wang <wangyingjie55(a)126.com>
---
drivers/nvdimm/dax_devs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 99965077bac4..b1426ac03f01 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -106,6 +106,8 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns)
nvdimm_bus_lock(&ndns->dev);
nd_dax = nd_dax_alloc(nd_region);
+ if (!nd_dax)
+ return -ENOMEM;
nd_pfn = &nd_dax->nd_pfn;
dax_dev = nd_pfn_devinit(nd_pfn, ndns);
nvdimm_bus_unlock(&ndns->dev);
--
2.7.4
1 year, 3 months
[PATCH v3 00/11] fsdax: introduce fs query to support reflink
by Shiyang Ruan
This patchset is aimed to support shared pages tracking for fsdax.
Change from V2:
- Split 8th patch into other related to make it easy to review
- Other small fixes
Change from V1:
- Add the old memory-failure handler back for rolling back
- Add callback in MD's ->rmap() to support multiple mapping of dm device
- Add judgement for CONFIG_SYSFS
- Add pfn_valid() judgement in hwpoison_filter()
- Rebased to v5.11-rc5
This patchset moves owner tracking from dax_assocaite_entry() to pmem
device driver, by introducing an interface ->memory_failure() of struct
pagemap. This interface is called by memory_failure() in mm, and
implemented by pmem device. Then pmem device calls its ->corrupted_range()
to find the filesystem which the corrupted data located in, and call
filesystem handler to track files or metadata assocaited with this page.
Finally we are able to try to fix the corrupted data in filesystem and do
other necessary processing, such as killing processes who are using the
files affected.
The call trace is like this:
memory_failure()
pgmap->ops->memory_failure() => pmem_pgmap_memory_failure()
gendisk->fops->corrupted_range() => - pmem_corrupted_range()
- md_blk_corrupted_range()
sb->s_ops->currupted_range() => xfs_fs_corrupted_range()
xfs_rmap_query_range()
xfs_currupt_helper()
* corrupted on metadata
try to recover data, call xfs_force_shutdown()
* corrupted on file data
try to recover data, call mf_dax_mapping_kill_procs()
The fsdax & reflink support for XFS is not contained in this patchset.
(Rebased on v5.11-rc5)
==
Shiyang Ruan (11):
pagemap: Introduce ->memory_failure()
blk: Introduce ->corrupted_range() for block device
fs: Introduce ->corrupted_range() for superblock
block_dev: Introduce bd_corrupted_range()
mm, fsdax: Refactor memory-failure handler for dax mapping
mm, pmem: Implement ->memory_failure() in pmem driver
pmem: Implement ->corrupted_range() for pmem driver
dm: Introduce ->rmap() to find bdev offset
md: Implement ->corrupted_range()
xfs: Implement ->corrupted_range() for XFS
fs/dax: Remove useless functions
block/genhd.c | 6 ++
drivers/md/dm-linear.c | 20 ++++
drivers/md/dm.c | 61 +++++++++++
drivers/nvdimm/pmem.c | 45 ++++++++
fs/block_dev.c | 47 ++++++++-
fs/dax.c | 63 ++++-------
fs/xfs/xfs_fsops.c | 5 +
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 112 ++++++++++++++++++++
include/linux/blkdev.h | 2 +
include/linux/dax.h | 1 +
include/linux/device-mapper.h | 5 +
include/linux/fs.h | 2 +
include/linux/genhd.h | 3 +
include/linux/memremap.h | 8 ++
include/linux/mm.h | 9 ++
mm/memory-failure.c | 190 +++++++++++++++++++++++-----------
17 files changed, 475 insertions(+), 105 deletions(-)
--
2.30.0
1 year, 3 months
[PATCH v2 00/10] fsdax,xfs: Add reflink&dedupe support for fsdax
by Shiyang Ruan
This patchset is attempt to add CoW support for fsdax, and take XFS,
which has both reflink and fsdax feature, as an example.
Changes from V1:
- Factor some helper functions to simplify dax fault code
- Introduce iomap_apply2() for dax_dedupe_file_range_compare()
- Fix mistakes and other problems
- Rebased on v5.11
One of the key mechanism need to be implemented in fsdax is CoW. Copy
the data from srcmap before we actually write data to the destance
iomap. And we just copy range in which data won't be changed.
Another mechanism is range comparison. In page cache case, readpage()
is used to load data on disk to page cache in order to be able to
compare data. In fsdax case, readpage() does not work. So, we need
another compare data with direct access support.
With the two mechanism implemented in fsdax, we are able to make reflink
and fsdax work together in XFS.
Some of the patches are picked up from Goldwyn's patchset. I made some
changes to adapt to this patchset.
(Rebased on v5.11)
==
Shiyang Ruan (10):
fsdax: Factor helpers to simplify dax fault code
fsdax: Factor helper: dax_fault_actor()
fsdax: Output address in dax_iomap_pfn() and rename it
fsdax: Introduce dax_iomap_cow_copy()
fsdax: Replace mmap entry in case of CoW
fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero
iomap: Introduce iomap_apply2() for operations on two files
fsdax: Dedup file range to use a compare function
fs/xfs: Handle CoW for fsdax write() path
fs/xfs: Add dedupe support for fsdax
fs/dax.c | 532 +++++++++++++++++++++++++++--------------
fs/iomap/apply.c | 51 ++++
fs/iomap/buffered-io.c | 2 +-
fs/remap_range.c | 45 +++-
fs/xfs/xfs_bmap_util.c | 3 +-
fs/xfs/xfs_file.c | 29 ++-
fs/xfs/xfs_inode.c | 8 +-
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_iomap.c | 30 ++-
fs/xfs/xfs_iomap.h | 1 +
fs/xfs/xfs_iops.c | 11 +-
fs/xfs/xfs_reflink.c | 16 +-
include/linux/dax.h | 7 +-
include/linux/fs.h | 15 +-
include/linux/iomap.h | 7 +-
15 files changed, 550 insertions(+), 208 deletions(-)
--
2.30.1
1 year, 3 months
[PATCH v2 0/4] Remove nrexceptional tracking
by Matthew Wilcox (Oracle)
We actually use nrexceptional for very little these days. It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger
with the THP patches because we don't know how many indices a shadow
entry occupies. It's easier to just remove it than keep it accurate.
Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects. Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.
Matthew Wilcox (Oracle) (4):
mm: Introduce and use mapping_empty
mm: Stop accounting shadow entries
dax: Account DAX entries as nrpages
mm: Remove nrexceptional from inode
fs/block_dev.c | 2 +-
fs/dax.c | 8 ++++----
fs/gfs2/glock.c | 3 +--
fs/inode.c | 2 +-
include/linux/fs.h | 2 --
include/linux/pagemap.h | 5 +++++
mm/filemap.c | 16 ----------------
mm/swap_state.c | 4 ----
mm/truncate.c | 19 +++----------------
mm/workingset.c | 1 -
10 files changed, 15 insertions(+), 47 deletions(-)
--
2.28.0
1 year, 3 months
Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure()
by Dan Williams
On Mon, Mar 8, 2021 at 3:34 AM ruansy.fnst(a)fujitsu.com
<ruansy.fnst(a)fujitsu.com> wrote:
> > > > > 1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > > > > index 79c49e7f5c30..0bcf2b1e20bd 100644
> > > > > --- a/include/linux/memremap.h
> > > > > +++ b/include/linux/memremap.h
> > > > > @@ -87,6 +87,14 @@ struct dev_pagemap_ops {
> > > > > * the page back to a CPU accessible page.
> > > > > */
> > > > > vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
> > > > > +
> > > > > + /*
> > > > > + * Handle the memory failure happens on one page. Notify the processes
> > > > > + * who are using this page, and try to recover the data on this page
> > > > > + * if necessary.
> > > > > + */
> > > > > + int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn,
> > > > > + int flags);
> > > > > };
> > > >
> > > > After the conversation with Dave I don't see the point of this. If
> > > > there is a memory_failure() on a page, why not just call
> > > > memory_failure()? That already knows how to find the inode and the
> > > > filesystem can be notified from there.
> > >
> > > We want memory_failure() supports reflinked files. In this case, we are not
> > > able to track multiple files from a page(this broken page) because
> > > page->mapping,page->index can only track one file. Thus, I introduce this
> > > ->memory_failure() implemented in pmem driver, to call ->corrupted_range()
> > > upper level to upper level, and finally find out files who are
> > > using(mmapping) this page.
> > >
> >
> > I know the motivation, but this implementation seems backwards. It's
> > already the case that memory_failure() looks up the address_space
> > associated with a mapping. From there I would expect a new 'struct
> > address_space_operations' op to let the fs handle the case when there
> > are multiple address_spaces associated with a given file.
> >
>
> Let me think about it. In this way, we
> 1. associate file mapping with dax page in dax page fault;
I think this needs to be a new type of association that proxies the
representation of the reflink across all involved address_spaces.
> 2. iterate files reflinked to notify `kill processes signal` by the
> new address_space_operation;
> 3. re-associate to another reflinked file mapping when unmmaping
> (rmap qeury in filesystem to get the another file).
Perhaps the proxy object is reference counted per-ref-link. It seems
error prone to keep changing the association of the pfn while the
reflink is in-tact.
> It did not handle those dax pages are not in use, because their ->mapping are
> not associated to any file. I didn't think it through until reading your
> conversation. Here is my understanding: this case should be handled by
> badblock mechanism in pmem driver. This badblock mechanism will call
> ->corrupted_range() to tell filesystem to repaire the data if possible.
There are 2 types of notifications. There are badblocks discovered by
the driver (see notify_pmem()) and there are memory_failures()
signalled by the CPU machine-check handler, or the platform BIOS. In
the case of badblocks that needs to be information considered by the
fs block allocator to avoid / try-to-repair badblocks on allocate, and
to allow listing damaged files that need repair. The memory_failure()
notification needs immediate handling to tear down mappings to that
pfn and signal processes that have consumed it with
SIGBUS-action-required. Processes that have the poison mapped, but
have not consumed it receive SIGBUS-action-optional.
> So, we split it into two parts. And dax device and block device won't be mixed
> up again. Is my understanding right?
Right, it's only the filesystem that knows that the block_device and
the dax_device alias data at the same logical offset. The requirements
for sector error handling and page error handling are separate like
block_device_operations and dax_operations.
> But the solution above is to solve the hwpoison on one or couple pages, which
> happens rarely(I think). Do the 'pmem remove' operation cause hwpoison too?
> Call memory_failure() so many times? I havn't understood this yet.
I'm working on a patch here to call memory_failure() on a wide range
for the surprise remove of a dax_device while a filesystem might be
mounted. It won't be efficient, but there is no other way to notify
the kernel that it needs to immediately stop referencing a page.
1 year, 3 months
[RFC 0/2] virtio-pmem: Asynchronous flush
by Pankaj Gupta
Jeff reported preflush order issue with the existing implementation
of virtio pmem preflush. Dan suggested[1] to implement asynchronous flush
for virtio pmem using work queue as done in md/RAID. This patch series
intends to solve the preflush ordering issue and also makes the flush
asynchronous from the submitting thread POV.
Submitting this patch series for feeback and is in WIP. I have
done basic testing and currently doing more testing.
Pankaj Gupta (2):
pmem: make nvdimm_flush asynchronous
virtio_pmem: Async virtio-pmem flush
drivers/nvdimm/nd_virtio.c | 66 ++++++++++++++++++++++++++----------
drivers/nvdimm/pmem.c | 15 ++++----
drivers/nvdimm/region_devs.c | 3 +-
drivers/nvdimm/virtio_pmem.c | 9 +++++
drivers/nvdimm/virtio_pmem.h | 12 +++++++
5 files changed, 78 insertions(+), 27 deletions(-)
[1] https://marc.info/?l=linux-kernel&m=157446316409937&w=2
--
2.20.1
1 year, 3 months
[PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps
by Joao Martins
Hey,
This small series, attempts at minimizing 'struct page' overhead by
pursuing a similar approach as Muchun Song series "Free some vmemmap
pages of hugetlb page"[0] but applied to devmap/ZONE_DEVICE.
[0] https://lore.kernel.org/linux-mm/20201130151838.11208-1-songmuchun@byteda...
The link above describes it quite nicely, but the idea is to reuse tail
page vmemmap areas, particular the area which only describes tail pages.
So a vmemmap page describes 64 struct pages, and the first page for a given
ZONE_DEVICE vmemmap would contain the head page and 63 tail pages. The second
vmemmap page would contain only tail pages, and that's what gets reused across
the rest of the subsection/section. The bigger the page size, the bigger the
savings (2M hpage -> save 6 vmemmap pages; 1G hpage -> save 4094 vmemmap pages).
In terms of savings, per 1Tb of memory, the struct page cost would go down
with compound pagemap:
* with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory)
* with 1G pages we lose 8MB instead of 16G (0.0007% instead of 1.5% of total memory)
Along the way I've extended it past 'struct page' overhead *trying* to address a
few performance issues we knew about for pmem, specifically on the
{pin,get}_user_pages* function family with device-dax vmas which are really
slow even of the fast variants. THP is great on -fast variants but all except
hugetlbfs perform rather poorly on non-fast gup.
So to summarize what the series does:
Patches 1-5: Much like Muchun series, we reuse tail page areas across a given
page size (namely @align was referred by remaining memremap/dax code) and
enabling of memremap to initialize the ZONE_DEVICE pages as compound pages or a
given @align order. The main difference though, is that contrary to the hugetlbfs
series, there's no vmemmap for the area, because we are onlining it. IOW no
freeing of pages of already initialized vmemmap like the case for hugetlbfs,
which simplifies the logic (besides not being arch-specific). After these,
there's quite visible region bootstrap of pmem memmap given that we would
initialize fewer struct pages depending on the page size.
NVDIMM namespace bootstrap improves from ~750ms to ~190ms/<=1ms on emulated NVDIMMs
with 2M and 1G respectivally. The net gain in improvement is similarly observed
in proportion when running on actual NVDIMMs.
Patch 6 - 8: Optimize grabbing/release a page refcount changes given that we
are working with compound pages i.e. we do 1 increment/decrement to the head
page for a given set of N subpages compared as opposed to N individual writes.
{get,pin}_user_pages_fast() for zone_device with compound pagemap consequently
improves considerably, and unpin_user_pages() improves as well when passed a
set of consecutive pages:
before after
(get_user_pages_fast 1G;2M page size) ~75k us -> ~3.2k ; ~5.2k us
(pin_user_pages_fast 1G;2M page size) ~125k us -> ~3.4k ; ~5.5k us
The RDMA patch (patch 8/9) is to demonstrate the improvement for an existing
user. For unpin_user_pages() we have an additional test to demonstrate the
improvement. The test performs MR reg/unreg continuously and measuring its
rate for a given period. So essentially ib_mem_get and ib_mem_release being
stress tested which at the end of day means: pin_user_pages_longterm() and
unpin_user_pages() for a scatterlist:
Before:
159 rounds in 5.027 sec: 31617.923 usec / round (device-dax)
466 rounds in 5.009 sec: 10748.456 usec / round (hugetlbfs)
After:
305 rounds in 5.010 sec: 16426.047 usec / round (device-dax)
1073 rounds in 5.004 sec: 4663.622 usec / round (hugetlbfs)
Patch 9: Improves {pin,get}_user_pages() and its longterm counterpart. It
is very experimental, and I imported most of follow_hugetlb_page(), except
that we do the same trick as gup-fast. In doing the patch I feel this batching
should live in follow_page_mask() and having that being changed to return a set
of pages/something-else when walking over PMD/PUDs for THP / devmap pages. This
patch then brings the previous test of mr reg/unreg (above) on parity
between device-dax and hugetlbfs.
Some of the patches are a little fresh/WIP (specially patch 3 and 9) and we are
still running tests. Hence the RFC, asking for comments and general direction
of the work before continuing.
Patches apply on top of linux-next tag next-20201208 (commit a9e26cb5f261).
Comments and suggestions very much appreciated!
Thanks,
Joao
Joao Martins (9):
memremap: add ZONE_DEVICE support for compound pages
sparse-vmemmap: Consolidate arguments in vmemmap section populate
sparse-vmemmap: Reuse vmemmap areas for a given page size
mm/page_alloc: Reuse tail struct pages for compound pagemaps
device-dax: Compound pagemap support
mm/gup: Grab head page refcount once for group of subpages
mm/gup: Decrement head page once for group of subpages
RDMA/umem: batch page unpin in __ib_mem_release()
mm: Add follow_devmap_page() for devdax vmas
drivers/dax/device.c | 54 ++++++---
drivers/infiniband/core/umem.c | 25 +++-
include/linux/huge_mm.h | 4 +
include/linux/memory_hotplug.h | 16 ++-
include/linux/memremap.h | 2 +
include/linux/mm.h | 6 +-
mm/gup.c | 130 ++++++++++++++++-----
mm/huge_memory.c | 202 +++++++++++++++++++++++++++++++++
mm/memory_hotplug.c | 13 ++-
mm/memremap.c | 13 ++-
mm/page_alloc.c | 28 ++++-
mm/sparse-vmemmap.c | 97 +++++++++++++---
mm/sparse.c | 16 +--
13 files changed, 531 insertions(+), 75 deletions(-)
--
2.17.1
1 year, 3 months
Can you please help with the failure to create namespace via Ndctl
issue?
by Xu, Chunye
Hi Vishal Verma,
Now there is an urgent issue from Alibaba - AEP namespace disable&destroy issue https://hsdes.intel.com/appstore/article/#/22012576039.
Here attached the log of ndctl cmd and you can find several error prompting info inside.
You can see it fails to recreate namespace again after destroying the namespace with mode being devdax.
]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl create-namespace -r region0 --mode=devdax
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
Can you please help to check with priority as it blocks Alibaba's deployment?
Many thanks,
Chunye
1 year, 3 months
[PATCH v2] libnvdimm: Notify disk drivers to revalidate region
read-only
by Dan Williams
Previous kernels allowed the BLKROSET to override the disk's read-only
status. With that situation fixed the pmem driver needs to rely on
notification events to reevaluate the disk read-only status after the
host region has been marked read-write.
Recall that when libnvdimm determines that the persistent memory has
lost persistence (for example lack of energy to flush from DRAM to FLASH
on an NVDIMM-N device) it marks the region read-only, but that state can
be overridden by the user via:
echo 0 > /sys/bus/nd/devices/regionX/read_only
...to date there is no notification that the region has restored
persistence, so the user override is the only recovery.
Fixes: 52f019d43c22 ("block: add a hard-readonly flag to struct gendisk")
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Martin K. Petersen <martin.petersen(a)oracle.com>
Cc: Hannes Reinecke <hare(a)suse.de>
Cc: Jens Axboe <axboe(a)kernel.dk>
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Changes since v1 [1]:
- Move from the sinking ship of revalidate_disk() to the local hotness
of nd_pmem_notify() (hch).
[1]: http://lore.kernel.org/r/161527286194.446794.5215036039655765042.stgit@dw...
drivers/nvdimm/bus.c | 14 ++++++--------
drivers/nvdimm/pmem.c | 37 +++++++++++++++++++++++++++++++++----
drivers/nvdimm/region_devs.c | 7 +++++++
include/linux/nd.h | 1 +
4 files changed, 47 insertions(+), 12 deletions(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 48f0985ca8a0..3a777d0073b7 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -631,16 +631,14 @@ void nvdimm_check_and_set_ro(struct gendisk *disk)
struct nd_region *nd_region = to_nd_region(dev->parent);
int disk_ro = get_disk_ro(disk);
- /*
- * Upgrade to read-only if the region is read-only preserve as
- * read-only if the disk is already read-only.
- */
- if (disk_ro || nd_region->ro == disk_ro)
+ /* catch the disk up with the region ro state */
+ if (disk_ro == nd_region->ro)
return;
- dev_info(dev, "%s read-only, marking %s read-only\n",
- dev_name(&nd_region->dev), disk->disk_name);
- set_disk_ro(disk, 1);
+ dev_info(dev, "%s read-%s, marking %s read-%s\n",
+ dev_name(&nd_region->dev), nd_region->ro ? "only" : "write",
+ disk->disk_name, nd_region->ro ? "only" : "write");
+ set_disk_ro(disk, nd_region->ro);
}
EXPORT_SYMBOL(nvdimm_check_and_set_ro);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index b8a85bfb2e95..7daac795db39 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -26,6 +26,7 @@
#include <linux/mm.h>
#include <asm/cacheflush.h>
#include "pmem.h"
+#include "btt.h"
#include "pfn.h"
#include "nd.h"
@@ -585,7 +586,7 @@ static void nd_pmem_shutdown(struct device *dev)
nvdimm_flush(to_nd_region(dev->parent), NULL);
}
-static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
+static void pmem_revalidate_poison(struct device *dev)
{
struct nd_region *nd_region;
resource_size_t offset = 0, end_trunc = 0;
@@ -595,9 +596,6 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
struct range range;
struct kernfs_node *bb_state;
- if (event != NVDIMM_REVALIDATE_POISON)
- return;
-
if (is_nd_btt(dev)) {
struct nd_btt *nd_btt = to_nd_btt(dev);
@@ -635,6 +633,37 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
sysfs_notify_dirent(bb_state);
}
+static void pmem_revalidate_region(struct device *dev)
+{
+ struct pmem_device *pmem;
+
+ if (is_nd_btt(dev)) {
+ struct nd_btt *nd_btt = to_nd_btt(dev);
+ struct btt *btt = nd_btt->btt;
+
+ nvdimm_check_and_set_ro(btt->btt_disk);
+ return;
+ }
+
+ pmem = dev_get_drvdata(dev);
+ nvdimm_check_and_set_ro(pmem->disk);
+}
+
+static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
+{
+ switch (event) {
+ case NVDIMM_REVALIDATE_POISON:
+ pmem_revalidate_poison(dev);
+ break;
+ case NVDIMM_REVALIDATE_REGION:
+ pmem_revalidate_region(dev);
+ break;
+ default:
+ dev_WARN_ONCE(dev, 1, "notify: unknown event: %d\n", event);
+ break;
+ }
+}
+
MODULE_ALIAS("pmem");
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO);
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_PMEM);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ef23119db574..51870eb51da6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -518,6 +518,12 @@ static ssize_t read_only_show(struct device *dev,
return sprintf(buf, "%d\n", nd_region->ro);
}
+static int revalidate_read_only(struct device *dev, void *data)
+{
+ nd_device_notify(dev, NVDIMM_REVALIDATE_REGION);
+ return 0;
+}
+
static ssize_t read_only_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
@@ -529,6 +535,7 @@ static ssize_t read_only_store(struct device *dev,
return rc;
nd_region->ro = ro;
+ device_for_each_child(dev, NULL, revalidate_read_only);
return len;
}
static DEVICE_ATTR_RW(read_only);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index cec526c8043d..ee9ad76afbba 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -11,6 +11,7 @@
enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
+ NVDIMM_REVALIDATE_REGION,
};
enum nvdimm_claim_class {
1 year, 3 months