[Linux-nvdimm] another pmem variant V3
by Christoph Hellwig
Here is another version of the same trivial pmem driver, because two
obviously aren't enough. The first patch is the same pmem driver
that Ross posted a short time ago, just modified to use platform_devices
to find the persistant memory region instead of hardconding it in the
Kconfig. This allows to keep pmem.c separate from any discovery mechanism,
but still allow auto-discovery.
The other two patches are a heavily rewritten version of the code that
Intel gave to various storage vendors to discover the type 12 (and earlier
type 6) nvdimms, which I massaged into a form that is hopefully suitable
for mainline.
Note that pmem.c really is the minimal version as I think we need something
included ASAP. We'll eventually need to be able to do other I/O from and
to it, and as most people know everyone has their own preferre method to
do it, which I'd like to discuss once we have the basic driver in.
This has been tested on a real NVDIMM on a system with a type 12
capable BIOS.
Changes since V2:
- dropped support for the memmap= kernel command line override
- dropped the not needed memblock_reserve call
- merged various cleanups from Boaz in pmem.c
Changes since V1:
- s/E820_PROTECTED_KERN/E820_PMEM/g
- map the persistent memory as uncached
- better kernel parameter description
- various typo fixes
- MODULE_LICENSE fix
7 years, 4 months
[Linux-nvdimm] [PATCH 0/3 v5] dax: some dax fixes and cleanups
by Boaz Harrosh
Hi Andrew
I finally had the time to beat up these fixes based on linux-next/akpm
and it looks OK.
I'm sending the two fix patches with @stable + a patch-1 for the 4.0
Kernel. The 4.1-rc Kernel will need a different patch.
It is your call if you want these in stable. It is a breakage in the dax
code that went into 4.0. But I guess it will not have that many users right
at the get go. So feel free to remove the CC:@stable. (Also the old XIP that
this DAX changed had all the same problems)
[v5]
* A new patch-1 Based on linux-next/akpm branch because mm/memory.c
completely changed there.
Also a 4.0 version of the same patch-1 if needed for stable@
List of patches:
[PATCH 1/3] mm(v4.1): New pfn_mkwrite same as page_mkwrite for VM_PFNMAP
[PATCH 2/3] dax: use pfn_mkwrite to update c/mtime + freeze
[PATCH 3/3] dax: Unify ext2/4_{dax,}_file_operations
All these patches are based on linux-next/akpm. I'm not sure how
it will interact with ext4-next though.
[PATCH 1/3 @stable] mm(v4.0): New pfn_mkwrite same as page_mkwrite for VM_PFNMAP
This patch is for 4.0 based tree if we decide to send
[PATCH 2/3] to stable.
[v4] dax: some dax fixes and cleanups
* First patch fixed according to Andrew's comments. Thanks Andrew.
1st and 2nd patch can go into current Kernel as they fix something
that was merged this release.
* Added a new patch to fix up splice in the dax case, and cleanup.
This one can wait for 4.1 (Also the first two not that anyone uses dax
in production.)
* DAX freeze is not fixed yet. As we have more problems then I originally
hoped for, as pointed out by Dave.
(Just as a referance I'm sending a NO-GOOD additional patch to show what
is not good enough to do. Was the RFC of [v3])
* Not re-posting the xfstest Dave please pick this up (It already found bugs
in none dax FSs)
[v3] dax: Fix mmap-write not updating c/mtime
* I'm re-posting the two DAX patches that fix the mmap-write after read
problem with DAX. (No changes since [v2])
* I'm also posting a 3rd RFC patch to address what Jan said about fs_freeze
and making mapping read-only.
Jan Please review and see if this is what you meant.
[v2]
Jan Kara has pointed out that if we add the
sb_start/end_pagefault pair in the new pfn_mkwrite we
are then fixing another bug where: A user could start
writing to the page while filesystem is frozen.
[v1]
The main problem is that current mm/memory.c will no call us with page_mkwrite
if we do not have an actual page mapping, which is what DAX uses.
The solution presented here introduces a new pfn_mkwrite to solve this problem.
Please see patch-2 for details.
I've been running with this patch for 4 month both HW and VMs with no apparent
danger, but see patch-1 I played it safe.
I am also posting an xfstest 080 that demonstrate this problem, I believe
that also some git operations (can't remember which) suffer from this problem.
Actually Eryu Guan found that this test fails on some other FS as well.
Matthew hi
I would love to have your ACK on these patches?
Thanks
Boaz
7 years, 4 months
Re: [Linux-nvdimm] [PATCH 1/3] pmem: Initial version of persistent memory driver
by Dr. Greg Wettstein
On Mar 26, 9:32am, Christoph Hellwig wrote:
} Subject: [PATCH 1/3] pmem: Initial version of persistent memory driver
Hi, I hope the week has been going well for everyone.
> From: Ross Zwisler <ross.zwisler(a)linux.intel.com>
>
> PMEM is a new driver that presents a reserved range of memory as a
> block device. This is useful for developing with NV-DIMMs, and
> can be used with volatile memory as a development platform.
We are interested in NV-DIMM's for a variety of reasons so the
discussion on this has been interesting, particularly the 'correct'
method of abstracting access.
We needed a block device representation of memory for a number of
projects we are working on and put the following together:
ftp://ftp.enjellic.com/pub/hpd/hpd_driver-1.1beta.tar.gz
Which has patches for 3.10 and 3.14.
We built HPD on top of the hugepage kernel infrastructure. In our
opinion, for whatever that is worth, there were a number of advantages
to building this on a page based abstraction. Not the least of which
was that NUMA awareness just naturally fell out of that model.
While the above patches don't have support for 1GB pages in them that
was also a straight forward exercise.
I don't even pretend to understand all the complexities and mechanics
of the E820/EFI memory mapping issues involved or the various issues
with persistency triggers and such but mapping these through something
like the hugepage infrastructure 'feels' like it would have a number
of longterm advantages with respect to isolating implementations from
the block layer interface.
Have a good remainder of the week.
Greg
}-- End of excerpt from Christoph Hellwig
As always,
Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
4206 N. 19th Ave. Specializing in information infra-structure
Fargo, ND 58102 development.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: greg(a)enjellic.com
------------------------------------------------------------------------------
"This patch causes a CONFIG_PREEMPT=y, CONFIG_PREEMPT_BKL=y,
CONFIG_DEBUG_PREEMPT=y kernel on a ppc64 G5 to hang immediately after
displaying the penguins, but apparently not before having set the
hardware clock backwards 101 years."
"After having carefully reviewed the above description and having
decided that these effects were not a part of the patch's design
intent I have temporarily set it aside, thanks."
-- Andrew Morton
linux-kernel
7 years, 4 months
[Linux-nvdimm] [PATCH v2] x86: Revert E820_PRAM change in e820_end_pfn()
by Toshi Kani
'Commit ec776ef6bbe17 ("x86/mm: Add support for the non-standard
protected e820 type")' added E820_PRAM ranges, which do not have
have struct-page. Therefore, there is no need to update max_pfn
to cover the E820_PRAM ranges. Revert the change made to account
E820_PRAM as RAM in e820.c in the commit.
Signed-off-by: Yinghai Lu <yinghai(a)kernel.org>
Signed-off-by: Toshi Kani <toshi.kani(a)hp.com>
Tested-by: Christoph Hellwig <hch(a)lst.de>
---
The patch is based on the tip branch.
---
arch/x86/kernel/e820.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e2ce85d..e09a346 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -752,7 +752,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
/*
* Find the highest page frame number we have available
*/
-static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
+static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
{
int i;
unsigned long last_pfn = 0;
@@ -763,11 +763,7 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
unsigned long start_pfn;
unsigned long end_pfn;
- /*
- * Persistent memory is accounted as ram for purposes of
- * establishing max_pfn and mem_map.
- */
- if (ei->type != E820_RAM && ei->type != E820_PRAM)
+ if (ei->type != type)
continue;
start_pfn = ei->addr >> PAGE_SHIFT;
@@ -792,12 +788,12 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
}
unsigned long __init e820_end_of_ram_pfn(void)
{
- return e820_end_pfn(MAX_ARCH_PFN);
+ return e820_end_pfn(MAX_ARCH_PFN, E820_RAM);
}
unsigned long __init e820_end_of_low_ram_pfn(void)
{
- return e820_end_pfn(1UL << (32-PAGE_SHIFT));
+ return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
}
static void early_panic(char *msg)
7 years, 4 months
[Linux-nvdimm] [PATCH] x86: Revert E820_PRAM change in e820_end_pfn()
by Toshi Kani
'Commit ec776ef6bbe17 ("x86/mm: Add support for the non-standard
protected e820 type")' added E820_PRAM ranges, which do not have
have struct-page. Therefore, there is no need to update max_pfn
to cover the E820_PRAM ranges. Revert the change made to account
E820_PRAM as RAM in e820_end_pfn() in the commit.
Signed-off-by: Yinghai Lu <yinghai(a)kernel.org>
Signed-off-by: Toshi Kani <toshi.kani(a)hp.com>
Tested-by: Christoph Hellwig <hch(a)lst.de>
---
The patch is based on the tip branch.
---
arch/x86/kernel/e820.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e2ce85d..4dfe4bd 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -763,11 +763,7 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
unsigned long start_pfn;
unsigned long end_pfn;
- /*
- * Persistent memory is accounted as ram for purposes of
- * establishing max_pfn and mem_map.
- */
- if (ei->type != E820_RAM && ei->type != E820_PRAM)
+ if (ei->type != E820_RAM)
continue;
start_pfn = ei->addr >> PAGE_SHIFT;
7 years, 4 months
[Linux-nvdimm] another pmem variant V2
by Christoph Hellwig
Here is another version of the same trivial pmem driver, because two
obviously aren't enough. The first patch is the same pmem driver
that Ross posted a short time ago, just modified to use platform_devices
to find the persistant memory region instead of hardconding it in the
Kconfig. This allows to keep pmem.c separate from any discovery mechanism,
but still allow auto-discovery.
The other two patches are a heavily rewritten version of the code that
Intel gave to various storage vendors to discover the type 12 (and earlier
type 6) nvdimms, which I massaged into a form that is hopefully suitable
for mainline.
Note that pmem.c really is the minimal version as I think we need something
included ASAP. We'll eventually need to be able to do other I/O from and
to it, and as most people know everyone has their own preferre method to
do it, which I'd like to discuss once we have the basic driver in.
This has been tested both with a real NVDIMM on a system with a type 12
capable bios, as well as with "fake persistent" memory using the memmap=
option.
Changes since V1:
- s/E820_PROTECTED_KERN/E820_PMEM/g
- map the persistent memory as uncached
- better kernel parameter description
- various typo fixes
- MODULE_LICENSE fix
7 years, 4 months