[Linux-nvdimm] [PATCH 0/3 v2] e820: Fix handling of NvDIMM chips
by Boaz Harrosh
Hi
[v2]
* Added warning at bring up about unknown type
* Added an extra patch to warn-print in request_resource
* changed name from NvDIMM-12 => unknown-12
I wish we would reconsider this. So we need to suffer until some unknown
future when ACPI decides to reuse type-12. When this happens we can fix
it then, NO?
* Now based on 4.0-rc1
If someone still has objections to [PATCH 3A/3] which is the most simple
thing to do, then We will be forced to do [PATCH 3B/3] ugliness. Ingo
your call.
[v1]
There is a deficiency in current e820.c handling where unknown new memory-chip
types come up as a BUSY resource when some other driver (like pmem) tries to
call request_mem_region_exclusive() on that resource. Even though, actually
there is nothing using it.
>From inspecting the code and the history of e820.c it looks like a BUG.
In any way this is a problem for the new type-12 NvDIMM memory chips that
are circulating around. (It is estimated that there are already 100ds of
thousands NvDIMM chips in active use)
The patches below first fixes the above problem for any future type
memory, so external drivers can access these mem chips.
I then also add the NvDIMM type-12 memory constant so it comes up
nice in dprints and at /proc/iomem
Just as before all these chips are very much usable with the pmem
driver. This lets us remove the hack for type-12 NvDIMMs that ignores
the return code from request_mem_region_exclusive() in pmem.c.
For all the pmem people. I maintain a tree with these patches
and latest pmem code here:
git://git.open-osd.org/pmem.git
[web-view:http://git.open-osd.org/gitweb.cgi?p=pmem.git;a=summary]
List of patches:
[PATCH 1/3] e820: Don't let unknown DIMM type come out BUSY
The main fix
[PATCH 2/3] resource: Add new flag IORESOURCE_WARN (64bit)
Warn in request_resource
[PATCH 3A/3] e820: Add the unknown-12 Memory type (DDR3-NvDIMM)
Please accept this simple patch
[PATCH 3B/3] e820: dynamic unknown-xxx names (for DDR3-NvDIMM)
Else we need this crap
Thanks
Boaz
6 years
Re: [Linux-nvdimm] [PATCH 0/2] e820: Fix handling of NvDIMM chips
by Boaz Harrosh
On 02/19/2015 02:47 AM, Christoph Hellwig wrote:
> On Wed, Feb 18, 2015 at 10:15:32AM -0800, Dan Williams wrote:
>> In fact it was originally "type-6" until ACPI 5 claimed that number
>> for official use, so these platforms, with early proof-of-concept
>> nvdimm support, have already gone through one transition to a new
>> number. They need to do the same once an official number for nvdimm
>> support is published.
>>
>> Put another way, these early platforms are already using out-of-tree
>> patches for nvdimm enabling. They can continue to do so, or switch to
>> standard methods when the standard is published.
>
> Not supporting hardware that is widely avaiable (I have some, too)
> is not very user friendly.
>
> I'll submit a patch allowing a nvdimm_type= kernel option that allows
> to detect them, but will do nothing by default. The code needed is very
> small and it would be very useful for all kinds of projects.
>
I do not see why you need the nvdimm_type= kernel option at all.
I have here a script that auto detects any NvDIMM. It works with all
the chips that I have access to. And Also it has support for if you have
memmap=sss\$aaa.
For all these detected regions it will load a pmem device.
It is easy to filter for any type of memory you want. What
will the (annoying) kernel option give you?
OK I might be jumping the guns, send the patch and I'll look
at it.
Thanks
Boaz
6 years
[Linux-nvdimm] Persistence test fails
by Roger C. Pao
$ sudo hexedit /dev/pmem0
Cut and paste 5361792048656c6c6f20746f204d79204c6974746c6520467269656e64
into hexedit:
00000000 53 61 79 20 48 65 6C 6C 6F 20 74 6F 20 4D 79 20 Say Hello to
My
00000010 4C 69 74 74 6C 65 20 46 72 69 65 6E 64 00 00 00 Little
Friend...
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
--- pmem0
--0x0/0x100000000---------------------------------------------
Ctrl-X and Y to save.
$ sudo hexedit /dev/pmem0
will display what we pasted.
Ctrl-X to exit.
At this point, you can do a graceful Restart, pull the power cables from
the back of the computer, whatever type of power cycle test you want.
Once the computer is back on and you login and get back to a terminal,
check the contents of /dev/pmem0 again.
In my case, it is zeros at 0x000 but has stuff at 0x400. I expected it to
be what I pasted in above.
00000400 00 00 04 00 00 00 10 00 CC CC 00 00 69 B3 0F 00
............i...
00000410 F4 FF 03 00 00 00 00 00 02 00 00 00 02 00 00 00
................
00000420 00 80 00 00 00 80 00 00 00 20 00 00 00 00 00 00 .........
......
00000430 65 7A B5 54 01 00 18 00 53 EF 00 00 01 00 00 00
ez.T....S.......
00000440 64 7A B5 54 00 4E ED 00 00 00 00 00 01 00 00 00
dz.T.N..........
00000450 00 00 00 00 0B 00 00 00 00 01 00 00 38 00 00 00
............8...
00000460 02 00 00 00 03 00 00 00 D6 C5 30 FC 9E 54 42 27
..........0..TB'
00000470 B5 4D 44 4F 15 A5 5C CB 00 00 00 00 00 00 00 00
.MDO..\.........
00000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000490 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000004A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000004B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000004C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF 00
................
000004D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
000004E0 00 00 00 00 00 00 00 00 00 00 00 00 43 63 EE 31
............Cc.1
000004F0 CA F3 4D DD AF DB 00 5E A9 D3 70 6D 01 00 00 00
..M....^..pm....
00000500 00 00 00 00 00 00 00 00 64 7A B5 54 00 00 00 00
........dz.T....
00000510 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000520 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000540 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
00000550 00 00 00 00 00 00 00 00 00 00 00 00 1C 00 1C 00
................
--- pmem0
--0x400/0x100000000-------------------------------------------
This occurs with both Ross's prd and Boaz's pmem trees both at HEAD.
6 years
Re: [Linux-nvdimm] [PATCH 0/2] e820: Fix handling of NvDIMM chips
by Dan Williams
On Wed, Feb 18, 2015 at 4:47 PM, Christoph Hellwig <hch(a)infradead.org> wrote:
> On Wed, Feb 18, 2015 at 10:15:32AM -0800, Dan Williams wrote:
>> In fact it was originally "type-6" until ACPI 5 claimed that number
>> for official use, so these platforms, with early proof-of-concept
>> nvdimm support, have already gone through one transition to a new
>> number. They need to do the same once an official number for nvdimm
>> support is published.
>>
>> Put another way, these early platforms are already using out-of-tree
>> patches for nvdimm enabling. They can continue to do so, or switch to
>> standard methods when the standard is published.
>
> Not supporting hardware that is widely avaiable (I have some, too)
> is not very user friendly.
Yes, as I agreed with Ingo, allowing a driver to assume control of an
unknown memory type with a warning or a kernel taint seems fine.
6 years
Re: [Linux-nvdimm] [PATCH 0/2] e820: Fix handling of NvDIMM chips
by Dan Williams
On Wed, Feb 18, 2015 at 11:27 AM, Ingo Molnar <mingo(a)kernel.org> wrote:
>
> * Dan Williams <dan.j.williams(a)intel.com> wrote:
>
>> On Wed, Feb 18, 2015 at 10:53 AM, Ingo Molnar <mingo(a)kernel.org> wrote:
>> >
>> > * Dan Williams <dan.j.williams(a)intel.com> wrote:
>> >
>> >> On Wed, Feb 18, 2015 at 10:30 AM, Ingo Molnar <mingo(a)kernel.org> wrote:
>> >> >
>> >> > * Dan Williams <dan.j.williams(a)intel.com> wrote:
>> >> >
>> >> >> On Tue, Feb 17, 2015 at 12:42 AM, Boaz Harrosh <boaz(a)plexistor.com> wrote:
>> >> >> > On 02/17/2015 12:03 AM, Matthew Wilcox wrote:
>> >> >> >> On Mon, Feb 16, 2015 at 01:07:07PM +0200, Boaz Harrosh wrote:
>> >> >> >>> In any way this is a problem for the new type-12 NvDIMM memory chips that
>> >> >> >>> are circulating around. (It is estimated that there are already 100ds of
>> >> >> >>> thousands NvDIMM chips in active use)
>> >> >> >>
>> >> >> >> Hang on. NV-DIMM chips don't know anyhing about E820
>> >> >> >> tables. They don't have anything in them that says "I
>> >> >> >> am type 12!". How they are reported is up to the
>> >> >> >> BIOS. Just because your BIOS vendor has chosen to
>> >> >> >> report tham as type 12 doesn't mean that any other
>> >> >> >> BIOS vedor is going to have done the same thing.
>> >> >> >>
>> >> >> >> Fortunately, the BIOS people have all got together and
>> >> >> >> decided what they're going to do, and it's not type
>> >> >> >> 12. Unfortunately, I think I'm bound by various
>> >> >> >> agreements to not say what they are going to do until
>> >> >> >> they do. But putting this temporary workaround in the
>> >> >> >> kernel to accomodate one BIOS vendor's unreleased
>> >> >> >> experimental code seems like entirely the wrong idea.
>> >> >> >>
>> >> >> >
>> >> >> > I had a feeling I'm entering an holy war ;-).
>> >> >> >
>> >> >> > I hope you are OK with my first patch. That an unknown
>> >> >> > type need not be reported busy, and behave same as
>> >> >> > "reserved"?
>> >> >>
>> >> >> No, it seems the safe thing to do is prevent the
>> >> >> kernel from accessing any memory that it does not know
>> >> >> the side-effects of accessing.
>> >> >
>> >> > Well, except when the kernel does know how to access
>> >> > it: when an nvdimm driver knows about its own memory
>> >> > region and knows how to handle it, right?
>> >>
>> >> Yes, except that "type-12" is something picked out of the
>> >> air that may be invalidated by a future spec change.
>> >>
>> >> If firmware wants any driver to handle a memory range it
>> >> can already use E820_RESERVED. The only reason a
>> >> new-type was picked in these early implementations was
>> >> for experiments around reserving nvdimm memory for driver
>> >> use, but also extending it to be covered with struct page
>> >> mappings. Outside of that there is no real driving
>> >> reason for the new type.
>> >
>> > But ... if a user is blessed/haunted with such firmware,
>> > why not let new types fall back to 'reserved', which is a
>> > reasonable default that still allows sufficiently aware
>> > Linux drivers to work, right?
>>
>> True.
>>
>> >
>> >> > So is there any practical reason to mark the memory
>> >> > resource as busy in that case, instead of just adding
>> >> > it to the reserved list by default and allowing
>> >> > properly informed drivers to (exclusively) request it?
>> >>
>> >> I'm not sure we want firmware to repeat this confusion
>> >> going forward. Why support new memory types unless
>> >> defined by ACPI or otherwise sufficiently described by
>> >> E820_RESERVED?
>> >
>> > Because it would make the kernel more functional? We should
>> > always err on the side of allowing more functionality and
>> > not erect roadblocks.
>> >
>>
>> I'm not convinced Linux is better off enabling one-off
>> BIOS implementations to pick non-standard numbers. Would
>> it not be safer to at least confirm with the user via a
>> configuration option, "do you want drivers to access
>> unknown memory types"?
>
> Well, we could emit a warning (or taint the kernel), to
> keep the user informed that there's a version mismatch
> between kernel and firmware - but otherwise still allow
> informed drivers to register that region?
Sounds good to me.
6 years
Re: [Linux-nvdimm] [PATCH 0/2] e820: Fix handling of NvDIMM chips
by Dan Williams
On Wed, Feb 18, 2015 at 10:53 AM, Ingo Molnar <mingo(a)kernel.org> wrote:
>
> * Dan Williams <dan.j.williams(a)intel.com> wrote:
>
>> On Wed, Feb 18, 2015 at 10:30 AM, Ingo Molnar <mingo(a)kernel.org> wrote:
>> >
>> > * Dan Williams <dan.j.williams(a)intel.com> wrote:
>> >
>> >> On Tue, Feb 17, 2015 at 12:42 AM, Boaz Harrosh <boaz(a)plexistor.com> wrote:
>> >> > On 02/17/2015 12:03 AM, Matthew Wilcox wrote:
>> >> >> On Mon, Feb 16, 2015 at 01:07:07PM +0200, Boaz Harrosh wrote:
>> >> >>> In any way this is a problem for the new type-12 NvDIMM memory chips that
>> >> >>> are circulating around. (It is estimated that there are already 100ds of
>> >> >>> thousands NvDIMM chips in active use)
>> >> >>
>> >> >> Hang on. NV-DIMM chips don't know anyhing about E820
>> >> >> tables. They don't have anything in them that says "I
>> >> >> am type 12!". How they are reported is up to the
>> >> >> BIOS. Just because your BIOS vendor has chosen to
>> >> >> report tham as type 12 doesn't mean that any other
>> >> >> BIOS vedor is going to have done the same thing.
>> >> >>
>> >> >> Fortunately, the BIOS people have all got together and
>> >> >> decided what they're going to do, and it's not type
>> >> >> 12. Unfortunately, I think I'm bound by various
>> >> >> agreements to not say what they are going to do until
>> >> >> they do. But putting this temporary workaround in the
>> >> >> kernel to accomodate one BIOS vendor's unreleased
>> >> >> experimental code seems like entirely the wrong idea.
>> >> >>
>> >> >
>> >> > I had a feeling I'm entering an holy war ;-).
>> >> >
>> >> > I hope you are OK with my first patch. That an unknown
>> >> > type need not be reported busy, and behave same as
>> >> > "reserved"?
>> >>
>> >> No, it seems the safe thing to do is prevent the
>> >> kernel from accessing any memory that it does not know
>> >> the side-effects of accessing.
>> >
>> > Well, except when the kernel does know how to access
>> > it: when an nvdimm driver knows about its own memory
>> > region and knows how to handle it, right?
>>
>> Yes, except that "type-12" is something picked out of the
>> air that may be invalidated by a future spec change.
>>
>> If firmware wants any driver to handle a memory range it
>> can already use E820_RESERVED. The only reason a
>> new-type was picked in these early implementations was
>> for experiments around reserving nvdimm memory for driver
>> use, but also extending it to be covered with struct page
>> mappings. Outside of that there is no real driving
>> reason for the new type.
>
> But ... if a user is blessed/haunted with such firmware,
> why not let new types fall back to 'reserved', which is a
> reasonable default that still allows sufficiently aware
> Linux drivers to work, right?
True.
>
>> > So is there any practical reason to mark the memory
>> > resource as busy in that case, instead of just adding
>> > it to the reserved list by default and allowing
>> > properly informed drivers to (exclusively) request it?
>>
>> I'm not sure we want firmware to repeat this confusion
>> going forward. Why support new memory types unless
>> defined by ACPI or otherwise sufficiently described by
>> E820_RESERVED?
>
> Because it would make the kernel more functional? We should
> always err on the side of allowing more functionality and
> not erect roadblocks.
>
I'm not convinced Linux is better off enabling one-off BIOS
implementations to pick non-standard numbers. Would it not be safer
to at least confirm with the user via a configuration option, "do you
want drivers to access unknown memory types"?
6 years
Re: [Linux-nvdimm] [PATCH 0/2] e820: Fix handling of NvDIMM chips
by Dan Williams
On Wed, Feb 18, 2015 at 10:30 AM, Ingo Molnar <mingo(a)kernel.org> wrote:
>
> * Dan Williams <dan.j.williams(a)intel.com> wrote:
>
>> On Tue, Feb 17, 2015 at 12:42 AM, Boaz Harrosh <boaz(a)plexistor.com> wrote:
>> > On 02/17/2015 12:03 AM, Matthew Wilcox wrote:
>> >> On Mon, Feb 16, 2015 at 01:07:07PM +0200, Boaz Harrosh wrote:
>> >>> In any way this is a problem for the new type-12 NvDIMM memory chips that
>> >>> are circulating around. (It is estimated that there are already 100ds of
>> >>> thousands NvDIMM chips in active use)
>> >>
>> >> Hang on. NV-DIMM chips don't know anyhing about E820
>> >> tables. They don't have anything in them that says "I
>> >> am type 12!". How they are reported is up to the
>> >> BIOS. Just because your BIOS vendor has chosen to
>> >> report tham as type 12 doesn't mean that any other
>> >> BIOS vedor is going to have done the same thing.
>> >>
>> >> Fortunately, the BIOS people have all got together and
>> >> decided what they're going to do, and it's not type
>> >> 12. Unfortunately, I think I'm bound by various
>> >> agreements to not say what they are going to do until
>> >> they do. But putting this temporary workaround in the
>> >> kernel to accomodate one BIOS vendor's unreleased
>> >> experimental code seems like entirely the wrong idea.
>> >>
>> >
>> > I had a feeling I'm entering an holy war ;-).
>> >
>> > I hope you are OK with my first patch. That an unknown
>> > type need not be reported busy, and behave same as
>> > "reserved"?
>>
>> No, it seems the safe thing to do is prevent the kernel
>> from accessing any memory that it does not know the
>> side-effects of accessing.
>
> Well, except when the kernel does know how to access it:
> when an nvdimm driver knows about its own memory region and
> knows how to handle it, right?
Yes, except that "type-12" is something picked out of the air that may
be invalidated by a future spec change.
If firmware wants any driver to handle a memory range it can already
use E820_RESERVED. The only reason a new-type was picked in these
early implementations was for experiments around reserving nvdimm
memory for driver use, but also extending it to be covered with struct
page mappings. Outside of that there is no real driving reason for
the new type.
> So is there any practical reason to mark the memory
> resource as busy in that case, instead of just adding it to
> the reserved list by default and allowing properly informed
> drivers to (exclusively) request it?
I'm not sure we want firmware to repeat this confusion going forward.
Why support new memory types unless defined by ACPI or otherwise
sufficiently described by E820_RESERVED?
6 years
[Linux-nvdimm] [PATCH 0/2] e820: Fix handling of NvDIMM chips
by Boaz Harrosh
Hi
There is a deficiency in current e820.c handling where unknown new memory-chip
types come up as a BUSY resource when some other driver (like pmem) tries to
call request_mem_region_exclusive() on that resource. Even though, actually
there is nothing using it.
>From inspecting the code and the history of e820.c it looks like a BUG.
In any way this is a problem for the new type-12 NvDIMM memory chips that
are circulating around. (It is estimated that there are already 100ds of
thousands NvDIMM chips in active use)
The patches below first fixes the above problem for any future type
memory, so external drivers can access these mem chips.
I then also add the NvDIMM type-12 memory constant so it comes up
nice in dprints and at /proc/iomem
Just as before all these chips are very much usable with the pmem
driver. This lets us remove the hack for type-12 NvDIMMs that ignores
the return code from request_mem_region_exclusive() in pmem.c.
There is a 3rd patch just for reference to pmem.c which enables
pmem to work also on Old Kernels which do not include these 2
patches.
For all the pmem people. I maintain a tree with these patches
and latest pmem code (Also including DAX) here:
git://git.open-osd.org/pmem.git
[web-view:http://git.open-osd.org/gitweb.cgi?p=pmem.git;a=summary]
List of patches:
[PATCH 1/2] e820: Don't let unknown DIMM type come out BUSY
[RFC 2/2] e820: Add the NvDIMM Memory type (type-12)
These are for submission
[PATCH 3/3] pmem: Allow request_mem to fail (CONFIG_BLK_DEV_PMEM_IGNORE_REQUEST_MEM_RET)
Just for reference
Thanks
Boaz
6 years
Re: [Linux-nvdimm] modprobe: ERROR: could not insert 'pmem': Invalid argument
by Boaz Harrosh
On 02/11/2015 01:33 AM, Roger Pao wrote:
> adrbd correctly found all NVDIMMs and presented a block device for them. It was simply plug and play.
>
> Requiring the user to look at dmesg output for the location of the NVDIMMs, creating a map= for pmem or the equivalent for prd is a bit complex for an end user.
>
> It's worse if they have to hack the kernel's e820 handling of NVDIMMs. I assume pmem/prd will have an easier to use solution in the future.
>
> On Tue, Feb 10, 2015 at 3:05 PM, Dan Williams <dan.j.williams(a)intel.com <mailto:dan.j.williams@intel.com>> wrote:
>
> On Tue, Feb 10, 2015 at 2:56 PM, Roger C. Pao (Enmotus)
> <rcpao.enmotus(a)gmail.com <mailto:rcpao.enmotus@gmail.com>> wrote:
> > Thanks Dan.
> >
> > Unfortunately, this tells me that NVDIMM support in Linux still has a way to
> > go before it can be used by our customers.
> >
> > At least these two implementations (pmem and prd) are publicly available
> > open source code unlike adrbd.
> >
> > I will continue using the prd driver for now as I have it working in a md
> > RAID0 with /dev/pmem0 and /dev/sdb (a disk). The md RAID0 is only to verify
> > the block device interface works for what we need it to do. We will
> > continue developing our products with the assumption that /dev/pmem0 will
> > behave similar to any other /dev/sdx in the system.
> >
> > Regardless of the implementation you finally decide to use, I recommend
> > adding a driver parameter to disable the call to
> > request_mem_region_exclusive()
>
> That's dangerous. Especially when we have kernel command line and
> kconfig options for defining ranges, we do not want a driver
> scribbling on random locations that it may not own.
>
>
I'm working on a patch that will ignore the failure of request_mem_region_exclusive()
and continue, in this case the driver tests the first and last word of the region to
see if it is actually accessible.
In the same tree I will put a patch that fixes e820 handling of type-12 memory, the
guys in the lab ran with this patch for a long time and did not tell me. Note that
the tree I have at pmem already has patches to mm. Though it could be compiled out of
tree for an STD Kernel if you set BLK_DEV_PMEM_USE_PAGES=n.
It will take me a couple of days to actually test everything though (mainly because
I moved to a 3.20-rc base tree).
Thanks
Boaz
6 years