On Fri, Sep 18, 2015 at 01:35:44PM -0700, Dan Williams wrote:
On Fri, Sep 18, 2015 at 1:12 PM, Konrad Rzeszutek Wilk
> On Fri, Sep 18, 2015 at 03:59:16PM -0400, Konrad Rzeszutek Wilk wrote:
> Sorry, hit send too fast. The description should have had:
> "A bunch of changes that will help in understanding it.
> However there...
>> There were two acronyms: DCR and ND in the document
>> that need an definition.
> .. and I have no idea what they mean.
DCR refers to the "NVDIMM Control Region Structure" defined in ACPI 6
Section 22.214.171.124. It defines a vendor-id, device-id, and interface
format for a given DIMM.
ND is shorthand for "libnvdimm subsystem" throughout the
implementation. At various times it has meant "Non-volatile Devices",
Thanks! Updated the patch with:
From cd1b616f52fb83a3285935e002c5de295788e2aa Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Date: Fri, 18 Sep 2015 15:56:54 -0400
Subject: [PATCH] libnvdimm: Minor corrections.
A bunch of changes that I hope will help in understanding it
better for first-time readers.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Documentation/nvdimm/nvdimm.txt | 49 +++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/Documentation/nvdimm/nvdimm.txt b/Documentation/nvdimm/nvdimm.txt
index 197a0b6..e894de6 100644
@@ -62,6 +62,12 @@ DAX: File system extensions to bypass the page cache and block layer
mmap persistent memory, from a PMEM block device, directly into a
process address space.
+DSM: Device Specific Method: ACPI method to to control specific
+device - in this case the firmware.
+DCR: NVDIMM Control Region Structure defined in ACPI 6 Section 126.96.36.199.
+It defines a vendor-id, device-id, and interface format for a given DIMM.
BTT: Block Translation Table: Persistent memory is byte addressable.
Existing software may have an expectation that the power-fail-atomicity
of writes is at least one sector, 512 bytes. The BTT is an indirection
@@ -133,16 +139,16 @@ device driver:
registered, can be immediately attached to nd_pmem.
2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
- defined apertures. A set of apertures will all access just one DIMM.
- Multiple windows allow multiple concurrent accesses, much like
+ defined apertures. A set of apertures will access just one DIMM.
+ Multiple windows (apertures) allow multiple concurrent accesses, much like
tagged-command-queuing, and would likely be used by different threads or
The NFIT specification defines a standard format for a BLK-aperture, but
the spec also allows for vendor specific layouts, and non-NFIT BLK
- implementations may other designs for BLK I/O. For this reason "nd_blk"
- calls back into platform-specific code to perform the I/O. One such
- implementation is defined in the "Driver Writer's Guide" and "DSM
+ implementations may have other designs for BLK I/O. For this reason
+ "nd_blk" calls back into platform-specific code to perform the I/O.
+ One such implementation is defined in the "Driver Writer's Guide" and
@@ -152,7 +158,7 @@ Why BLK?
While PMEM provides direct byte-addressable CPU-load/store access to
NVDIMM storage, it does not provide the best system RAS (recovery,
availability, and serviceability) model. An access to a corrupted
-system-physical-address address causes a cpu exception while an access
+system-physical-address address causes a CPU exception while an access
to a corrupted address through an BLK-aperture causes that block window
to raise an error status in a register. The latter is more aligned with
the standard error model that host-bus-adapter attached disks present.
@@ -162,7 +168,7 @@ data could be interleaved in an opaque hardware specific manner
PMEM vs BLK
-BLK-apertures solve this RAS problem, but their presence is also the
+BLK-apertures solve these RAS problems, but their presence is also the
major contributing factor to the complexity of the ND subsystem. They
complicate the implementation because PMEM and BLK alias in DPA space.
Any given DIMM's DPA-range may contribute to one or more
@@ -220,8 +226,8 @@ socket. Each unique interface (BLK or PMEM) to DPA space is
by a region device with a dynamically assigned id (REGION0 - REGION5).
1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
- single PMEM namespace is created in the REGION0-SPA-range that spans
- DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
+ single PMEM namespace is created in the REGION0-SPA-range that spans most
+ of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
interleaved system-physical-address range is reclaimed as BLK-aperture
accessed space starting at DPA-offset (a) into each DIMM. In that
reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
@@ -230,13 +236,13 @@ by a region device with a dynamically assigned id (REGION0 -
2. In the last portion of DIMM0 and DIMM1 we have an interleaved
system-physical-address range, REGION1, that spans those two DIMMs as
- well as DIMM2 and DIMM3. Some of REGION1 allocated to a PMEM namespace
- named "pm1.0" the rest is reclaimed in 4 BLK-aperture namespaces (for
+ well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace
+ named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
each DIMM in the interleave set), "blk2.1", "blk3.1",
3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
- interleaved system-physical-address range (i.e. the DPA address below
+ interleaved system-physical-address range (i.e. the DPA address past
offset (b) are also included in the "blk4.0" and "blk5.0"
Note, that this example shows that BLK-aperture namespaces don't need to
be contiguous in DPA-space.
@@ -252,15 +258,15 @@ LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
What follows is a description of the LIBNVDIMM sysfs layout and a
corresponding object hierarchy diagram as viewed through the LIBNDCTL
-api. The example sysfs paths and diagrams are relative to the Example
+API. The example sysfs paths and diagrams are relative to the Example
NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
-Every api call in the LIBNDCTL library requires a context that holds the
+Every API call in the LIBNDCTL library requires a context that holds the
logging parameters and other library instance state. The library is
based on the libabc template:
LIBNDCTL: instantiate a new library context example
@@ -409,7 +415,7 @@ Bit 31:28 Reserved
-A generic REGION device is registered for each PMEM range orBLK-aperture
+A generic REGION device is registered for each PMEM range or BLK-aperture
set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
sets on the "nfit_test.0" bus. The primary role of regions are to be a
container of "mappings". A mapping is a tuple of <DIMM,
@@ -509,7 +515,7 @@ At first glance it seems since NFIT defines just PMEM and BLK
types that we should simply name REGION devices with something derived
from those type names. However, the ND subsystem explicitly keeps the
REGION name generic and expects userspace to always consider the
-region-attributes for 4 reasons:
+region-attributes for four reasons:
1. There are already more than two REGION and "namespace" types. For
PMEM there are two subtypes. As mentioned previously we have PMEM where
@@ -698,8 +704,8 @@ static int configure_namespace(struct ndctl_region *region,
Why the Term "namespace"?
- 1. Why not "volume" for instance? "volume" ran the risk of
- as a volume manager like device-mapper.
+ 1. Why not "volume" for instance? "volume" ran the risk of
+ ND (libnvdimm subsystem) to a volume manager like device-mapper.
2. The term originated to describe the sub-devices that can be created
within a NVME controller (see the nvme specification:
@@ -774,13 +780,14 @@ block" needs to be destroyed. Note, that to destroy a BTT the
needs to be written in raw mode. By default, the kernel will autodetect
the presence of a BTT and disable raw mode. This autodetect behavior
can be suppressed by enabling raw mode for the namespace via the
Summary LIBNDCTL Diagram
-For the given example above, here is the view of the objects as seen by the LIBNDCTL
+For the given example above, here is the view of the objects as seen by the
|CTX| +---------+ +--------------+ +---------------+
+-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0"