[Linux-nvdimm] [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
by Dan Williams
Changes since v1 [1]:
1/ added include/asm-generic/pfn.h for the __pfn_t definition and helpers.
2/ added kmap_atomic_pfn_t()
3/ rebased on v4.1-rc2
[1]: http://marc.info/?l=linux-kernel&m=142653770511970&w=2
---
A lead in note, this looks scarier than it is. Most of the code thrash
is automated via Coccinelle. Also the subtle differences behind an
'unsigned long pfn' and a '__pfn_t' are mitigated by type-safety and a
Kconfig option (default disabled CONFIG_PMEM_IO) that globally controls
whether a pfn and a __pfn_t are equivalent.
The motivation for this change is persistent memory and the desire to
use it not only via the pmem driver, but also as a memory target for I/O
(DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel. Aside
from the pmem driver and DAX, persistent memory is not able to be used
in these I/O scenarios due to the lack of a backing struct page, i.e.
persistent memory is not part of the memmap. This patchset takes the
position that the solution is to teach I/O paths that want to operate on
persistent memory to do so by referencing a __pfn_t. The alternatives
are discussed in the changelog for "[PATCH v2 01/10] arch: introduce
__pfn_t for persistent memory i/o", copied here:
Alternatives:
1/ Provide struct page coverage for persistent memory in
DRAM. The expectation is that persistent memory capacities make
this untenable in the long term.
2/ Provide struct page coverage for persistent memory with
persistent memory. While persistent memory may have near DRAM
performance characteristics it may not have the same
write-endurance of DRAM. Given the update frequency of struct
page objects it may not be suitable for persistent memory.
3/ Dynamically allocate struct page. This appears to be on
the order of the complexity of converting code paths to use
__pfn_t references instead of struct page, and the amount of
setup required to establish a valid struct page reference is
mostly wasted when the only usage in the block stack is to
perform a page_to_pfn() conversion for dma-mapping. Instances
of kmap() / kmap_atomic() usage appear to be the only occasions
in the block stack where struct page is non-trivially used. A
new kmap_atomic_pfn_t() is proposed to handle those cases.
---
Dan Williams (9):
arch: introduce __pfn_t for persistent memory i/o
block: add helpers for accessing a bio_vec page
block: convert .bv_page to .bv_pfn bio_vec
dma-mapping: allow archs to optionally specify a ->map_pfn() operation
scatterlist: use sg_phys()
x86: support dma_map_pfn()
x86: support kmap_atomic_pfn_t() for persistent memory
dax: convert to __pfn_t
block: base support for pfn i/o
Matthew Wilcox (1):
scatterlist: support "page-less" (__pfn_t only) entries
arch/Kconfig | 6 ++
arch/arm/mm/dma-mapping.c | 2 -
arch/microblaze/kernel/dma.c | 2 -
arch/powerpc/sysdev/axonram.c | 6 +-
arch/x86/Kconfig | 7 ++
arch/x86/kernel/Makefile | 1
arch/x86/kernel/amd_gart_64.c | 22 +++++-
arch/x86/kernel/kmap.c | 95 ++++++++++++++++++++++++++
arch/x86/kernel/pci-nommu.c | 22 +++++-
arch/x86/kernel/pci-swiotlb.c | 4 +
arch/x86/pci/sta2x11-fixup.c | 4 +
arch/x86/xen/pci-swiotlb-xen.c | 4 +
block/bio-integrity.c | 8 +-
block/bio.c | 82 ++++++++++++++++------
block/blk-core.c | 13 +++-
block/blk-integrity.c | 7 +-
block/blk-lib.c | 2 -
block/blk-merge.c | 15 ++--
block/bounce.c | 26 ++++---
drivers/block/aoe/aoecmd.c | 8 +-
drivers/block/brd.c | 6 +-
drivers/block/drbd/drbd_bitmap.c | 5 +
drivers/block/drbd/drbd_main.c | 6 +-
drivers/block/drbd/drbd_receiver.c | 4 +
drivers/block/drbd/drbd_worker.c | 3 +
drivers/block/floppy.c | 6 +-
drivers/block/loop.c | 13 ++--
drivers/block/nbd.c | 8 +-
drivers/block/nvme-core.c | 2 -
drivers/block/pktcdvd.c | 11 ++-
drivers/block/pmem.c | 16 +++-
drivers/block/ps3disk.c | 2 -
drivers/block/ps3vram.c | 2 -
drivers/block/rbd.c | 2 -
drivers/block/rsxx/dma.c | 2 -
drivers/block/umem.c | 2 -
drivers/block/zram/zram_drv.c | 10 +--
drivers/dma/ste_dma40.c | 5 -
drivers/iommu/amd_iommu.c | 21 ++++--
drivers/iommu/intel-iommu.c | 26 +++++--
drivers/iommu/iommu.c | 2 -
drivers/md/bcache/btree.c | 4 +
drivers/md/bcache/debug.c | 6 +-
drivers/md/bcache/movinggc.c | 2 -
drivers/md/bcache/request.c | 6 +-
drivers/md/bcache/super.c | 10 +--
drivers/md/bcache/util.c | 5 +
drivers/md/bcache/writeback.c | 2 -
drivers/md/dm-crypt.c | 12 ++-
drivers/md/dm-io.c | 2 -
drivers/md/dm-log-writes.c | 14 ++--
drivers/md/dm-verity.c | 2 -
drivers/md/raid1.c | 50 +++++++-------
drivers/md/raid10.c | 38 +++++-----
drivers/md/raid5.c | 6 +-
drivers/mmc/card/queue.c | 4 +
drivers/s390/block/dasd_diag.c | 2 -
drivers/s390/block/dasd_eckd.c | 14 ++--
drivers/s390/block/dasd_fba.c | 6 +-
drivers/s390/block/dcssblk.c | 8 +-
drivers/s390/block/scm_blk.c | 2 -
drivers/s390/block/scm_blk_cluster.c | 2 -
drivers/s390/block/xpram.c | 2 -
drivers/scsi/mpt2sas/mpt2sas_transport.c | 6 +-
drivers/scsi/mpt3sas/mpt3sas_transport.c | 6 +-
drivers/scsi/sd_dif.c | 4 +
drivers/staging/android/ion/ion_chunk_heap.c | 4 +
drivers/staging/lustre/lustre/llite/lloop.c | 2 -
drivers/target/target_core_file.c | 4 +
drivers/xen/biomerge.c | 4 +
drivers/xen/swiotlb-xen.c | 29 +++++---
fs/9p/vfs_addr.c | 2 -
fs/block_dev.c | 2 -
fs/btrfs/check-integrity.c | 6 +-
fs/btrfs/compression.c | 12 ++-
fs/btrfs/disk-io.c | 5 +
fs/btrfs/extent_io.c | 8 +-
fs/btrfs/file-item.c | 8 +-
fs/btrfs/inode.c | 19 +++--
fs/btrfs/raid56.c | 4 +
fs/btrfs/volumes.c | 2 -
fs/buffer.c | 4 +
fs/dax.c | 9 +-
fs/direct-io.c | 2 -
fs/exofs/ore.c | 4 +
fs/exofs/ore_raid.c | 2 -
fs/ext4/page-io.c | 2 -
fs/ext4/readpage.c | 4 +
fs/f2fs/data.c | 4 +
fs/f2fs/segment.c | 2 -
fs/gfs2/lops.c | 4 +
fs/jfs/jfs_logmgr.c | 4 +
fs/logfs/dev_bdev.c | 10 +--
fs/mpage.c | 2 -
fs/splice.c | 2 -
include/asm-generic/dma-mapping-common.h | 30 ++++++++
include/asm-generic/memory_model.h | 1
include/asm-generic/pfn.h | 67 ++++++++++++++++++
include/asm-generic/scatterlist.h | 10 +++
include/crypto/scatterwalk.h | 10 +++
include/linux/bio.h | 24 ++++---
include/linux/blk_types.h | 20 +++++
include/linux/blkdev.h | 6 +-
include/linux/dma-debug.h | 23 +++++-
include/linux/dma-mapping.h | 8 ++
include/linux/highmem.h | 23 ++++++
include/linux/mm.h | 1
include/linux/scatterlist.h | 91 ++++++++++++++++++++++---
include/linux/swiotlb.h | 4 +
init/Kconfig | 13 ++++
kernel/power/block_io.c | 2 -
lib/dma-debug.c | 10 ++-
lib/iov_iter.c | 22 +++---
lib/swiotlb.c | 20 ++++-
mm/page_io.c | 10 +--
net/ceph/messenger.c | 2 -
116 files changed, 896 insertions(+), 372 deletions(-)
create mode 100644 arch/x86/kernel/kmap.c
create mode 100644 include/asm-generic/pfn.h
5 years, 9 months
[GIT PULL v4 00/21] libnd: non-volatile memory device support
by Dan Williams
Jens, please pull from...
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm tags/libnd-for-jens
...to receive the libnd sub-system for the next merge window. This has
been through 3 rounds of review. Incremental diffstats and links to
previous postings:
v1: 39 files changed, 13102 insertions(+), 36 deletions(-)
https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
v2: 30 files changed, 3166 insertions(+), 3935 deletions(-)
https://lists.01.org/pipermail/linux-nvdimm/2015-April/000574.html
v3: 33 files changed, 2202 insertions(+), 1233 deletions(-)
https://lists.01.org/pipermail/linux-nvdimm/2015-May/000804.html
v4: Full diffstat since v3
Documentation/blockdev/libnd.txt | 2 +-
arch/x86/Kconfig | 4 ++
arch/x86/kernel/pmem.c | 92 +++++++++++++++++++++++------------
drivers/acpi/nfit.c | 20 ++++----
drivers/acpi/nfit.h | 4 +-
drivers/block/Kconfig | 8 ---
drivers/block/Makefile | 1 -
drivers/block/e820_pmem.c | 100 --------------------------------------
drivers/block/nd/Kconfig | 10 ++++
drivers/block/nd/btt.h | 2 +-
drivers/block/nd/namespace_devs.c | 5 +-
drivers/block/nd/pmem.c | 2 +-
drivers/block/nd/test/nfit.c | 10 ++--
include/acpi/acuuid.h | 16 +++---
14 files changed, 105 insertions(+), 171 deletions(-)
delete mode 100644 drivers/block/e820_pmem.c
1/ Kill drivers/block/e820_pmem.c, we can just register pmem
regions directly from arch/x86/kernel/pmem.c without need for an
intermediary driver (Christoph).
2/ Update to latest NFIT UUID definitions (Toshi). This
merges cleanly with, and is identical to the include/acpi/
NFIT enabling in Rafael's linux-pm.git/bleeding-edge branch.
3/ Fix up some miscellaneous checkpatch issues (Robert).
This branch has passed a full run through Fengguang's 0-day-kbuild-robot
with no outstanding reports, and it passes* our unit tests defined in
the ndctl repo (https://github.com/pmem/ndctl). As you can see the
magnitude of the review feedback has dropped off precipitously so I feel
confident in recommending this branch as a merge candidate. Some
general notes and credits appear in the tag-message below.
Thanks Jens!
* We have a handful of minor features pending behind this release that
are exercised in the latest unit tests. However, these patches have
been held back to save the libnd review effort from chasing a moving
target.
===
The following changes since commit 4c1eaa2344fb26bb5e936fb4d8ee307343ea0089:
drivers/block/pmem: Fix 32-bit build warning in pmem_alloc() (2015-04-01 17:03:57 +0200)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm tags/libnd-for-jens
for you to fetch changes up to dbcc765a7830454abb78e5352147324605455116:
libnd: Non-Volatile Devices (2015-05-27 02:48:52 -0400)
----------------------------------------------------------------
Initial LIBND submission
The LIBND sub-system provides generic support for non-volatile memory
devices. It extends the kernel's existing X86_PMEM_LEGACY support to also
enable devices that conform to the NVDIMM Firmware Interface Table (NFIT)
specification published with ACPI 6 (http://www.uefi.org/specifications).
NFIT describes devices that may include both a BLK (mmio aperture I/O)
mode of operation as well as PMEM (direct cpu load/store to a persistent
memory range). In addition to the generic LIBND bus driver implementation
and the I/O drivers (BLK and PMEM), a driver for layering atomic sector
update semantics on top of byte-addressable-memory, BTT, is also included.
See Documentation/blockdev/libnd.txt and Documentation/blockdev/btt.txt
for more details.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore.
----------------------------------------------------------------
Dan Williams (18):
e820, efi: add ACPI 6.0 persistent memory types
libnd, nfit: initial libnd infrastructure and NFIT support
libnd: control character device and libnd bus sysfs attributes
libnd, nfit: dimm/memory-devices
libnd: control (ioctl) messages for libnd bus and dimm devices
libnd, nd_dimm: dimm driver and base libnd device-driver infrastructure
libnd, nfit: regions (block-data-window, persistent memory, volatile memory)
libnd: support for legacy (non-aliasing) nvdimms
libnd, nd_pmem: add libnd support to the pmem driver
libnd, nfit: add interleave-set state-tracking infrastructure
libnd: namespace indices: read and validate
libnd: pmem label sets and namespace instantiation.
libnd: blk labels and namespace instantiation
libnd: write pmem label set
libnd: write blk label set
libnd: infrastructure for btt devices
nfit-test: manufactured NFITs for interface development
libnd: Non-Volatile Devices
Ross Zwisler (2):
pmem: Dynamically allocate partition numbers
libnd, nfit, nd_blk: driver for BLK-mode access persistent memory
Vishal Verma (1):
nd_btt: atomic sector updates
Documentation/blockdev/btt.txt | 273 ++++++
Documentation/blockdev/libnd.txt | 804 ++++++++++++++++++
MAINTAINERS | 39 +-
arch/arm64/kernel/efi.c | 1 +
arch/ia64/kernel/efi.c | 4 +
arch/x86/Kconfig | 4 +
arch/x86/boot/compressed/eboot.c | 4 +
arch/x86/include/uapi/asm/e820.h | 1 +
arch/x86/kernel/e820.c | 28 +-
arch/x86/kernel/pmem.c | 92 +-
arch/x86/platform/efi/efi.c | 3 +
drivers/acpi/Kconfig | 27 +
drivers/acpi/Makefile | 1 +
drivers/acpi/nfit.c | 1474 ++++++++++++++++++++++++++++++++
drivers/acpi/nfit.h | 160 ++++
drivers/block/Kconfig | 13 +-
drivers/block/Makefile | 2 +-
drivers/block/nd/Kconfig | 101 +++
drivers/block/nd/Makefile | 29 +
drivers/block/nd/blk.c | 252 ++++++
drivers/block/nd/btt.c | 1438 +++++++++++++++++++++++++++++++
drivers/block/nd/btt.h | 186 ++++
drivers/block/nd/btt_devs.c | 443 ++++++++++
drivers/block/nd/bus.c | 770 +++++++++++++++++
drivers/block/nd/core.c | 472 ++++++++++
drivers/block/nd/dimm.c | 115 +++
drivers/block/nd/dimm_devs.c | 516 +++++++++++
drivers/block/nd/label.c | 922 ++++++++++++++++++++
drivers/block/nd/label.h | 143 ++++
drivers/block/nd/namespace_devs.c | 1702 +++++++++++++++++++++++++++++++++++++
drivers/block/nd/nd-private.h | 111 +++
drivers/block/nd/nd.h | 257 ++++++
drivers/block/{ => nd}/pmem.c | 107 ++-
drivers/block/nd/region.c | 189 ++++
drivers/block/nd/region_devs.c | 667 +++++++++++++++
drivers/block/nd/test/Makefile | 5 +
drivers/block/nd/test/iomap.c | 151 ++++
drivers/block/nd/test/nfit.c | 1171 +++++++++++++++++++++++++
drivers/block/nd/test/nfit_test.h | 28 +
include/acpi/actbl1.h | 154 ++++
include/acpi/acuuid.h | 89 ++
include/linux/efi.h | 3 +-
include/linux/libnd.h | 129 +++
include/linux/nd.h | 98 +++
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/ndctl.h | 199 +++++
46 files changed, 13289 insertions(+), 89 deletions(-)
create mode 100644 Documentation/blockdev/btt.txt
create mode 100644 Documentation/blockdev/libnd.txt
create mode 100644 drivers/acpi/nfit.c
create mode 100644 drivers/acpi/nfit.h
create mode 100644 drivers/block/nd/Kconfig
create mode 100644 drivers/block/nd/Makefile
create mode 100644 drivers/block/nd/blk.c
create mode 100644 drivers/block/nd/btt.c
create mode 100644 drivers/block/nd/btt.h
create mode 100644 drivers/block/nd/btt_devs.c
create mode 100644 drivers/block/nd/bus.c
create mode 100644 drivers/block/nd/core.c
create mode 100644 drivers/block/nd/dimm.c
create mode 100644 drivers/block/nd/dimm_devs.c
create mode 100644 drivers/block/nd/label.c
create mode 100644 drivers/block/nd/label.h
create mode 100644 drivers/block/nd/namespace_devs.c
create mode 100644 drivers/block/nd/nd-private.h
create mode 100644 drivers/block/nd/nd.h
rename drivers/block/{ => nd}/pmem.c (70%)
create mode 100644 drivers/block/nd/region.c
create mode 100644 drivers/block/nd/region_devs.c
create mode 100644 drivers/block/nd/test/Makefile
create mode 100644 drivers/block/nd/test/iomap.c
create mode 100644 drivers/block/nd/test/nfit.c
create mode 100644 drivers/block/nd/test/nfit_test.h
create mode 100644 include/acpi/acuuid.h
create mode 100644 include/linux/libnd.h
create mode 100644 include/linux/nd.h
create mode 100644 include/uapi/linux/ndctl.h
5 years, 9 months
[PATCH v2 0/4] pmem api, generic ioremap_cache, and memremap
by Dan Williams
The pmem api is responsible for shepherding data out to persistent
media. The pmem driver uses this api, when available, to assert that
data is durable by the time bio_endio() is invoked. When an
architecture or cpu can not make persistence guarantees the driver warns
and falls back to "best effort" implementation.
Changes since v1 [1]:
1/ Rebase on tip/master + Toshi's ioremap_wt() patches and enable
ioremap_cache() to be used generically in drivers. Fix
devm_ioremap_resource() in the process.
2/ Rather than add yet another instance of "force cast away __iomem for
non-io-memory" take the opportunity to introduce memremap() for this use
case and fix up the current users that botch their handling of the
__iomem annotation.
3/ Mandate that consumers of the pmem api handle the case when archs, or
cpus within an arch are not able to make durability guarantees for
writes to persistent memory. See pmem_ops in drivers/block/pmem.c
4/ Drop the persistent_flush() api as there are no users until the BLK
driver is introduced, and even then it is not a "flush to persistence"
it is an invalidation of a previous mmio aperture setting
(io_flush_cache_range()).
5/ Add persistent_remap() to the pmem api for the arch to pick its
desired memory type that corresponds to the assumptions of
persistent_copy() and persistent_sync().
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000929.html
This boots and processes pmem writes on x86, cross-compile 0day results
are still pending.
---
Dan Williams (3):
arch/*/asm/io.h: add ioremap_cache() to all architectures
devm: fix ioremap_cache() usage
arch: introduce memremap()
Ross Zwisler (1):
arch, x86: cache management apis for persistent memory
arch/arc/include/asm/io.h | 1
arch/arm/include/asm/io.h | 2 +
arch/arm64/include/asm/io.h | 2 +
arch/arm64/kernel/efi.c | 4 +
arch/arm64/kernel/smp_spin_table.c | 10 ++--
arch/avr32/include/asm/io.h | 1
arch/frv/include/asm/io.h | 6 ++
arch/m32r/include/asm/io.h | 1
arch/m68k/include/asm/io_mm.h | 7 +++
arch/m68k/include/asm/io_no.h | 5 ++
arch/metag/include/asm/io.h | 5 ++
arch/microblaze/include/asm/io.h | 1
arch/mn10300/include/asm/io.h | 1
arch/nios2/include/asm/io.h | 1
arch/s390/include/asm/io.h | 1
arch/sparc/include/asm/io_32.h | 1
arch/sparc/include/asm/io_64.h | 1
arch/tile/include/asm/io.h | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/cacheflush.h | 24 +++++++++
arch/x86/include/asm/io.h | 7 +++
arch/x86/kernel/crash_dump_64.c | 6 +-
arch/x86/kernel/kdebugfs.c | 8 +--
arch/x86/kernel/ksysfs.c | 28 +++++-----
arch/x86/mm/ioremap.c | 10 +---
arch/xtensa/include/asm/io.h | 3 +
drivers/acpi/apei/einj.c | 8 +--
drivers/acpi/apei/erst.c | 14 +++--
drivers/block/pmem.c | 62 +++++++++++++++++++++--
drivers/firmware/google/memconsole.c | 4 +
include/asm-generic/io.h | 8 +++
include/asm-generic/iomap.h | 4 +
include/linux/device.h | 5 ++
include/linux/io.h | 38 ++++++++++++++
include/linux/pmem.h | 93 ++++++++++++++++++++++++++++++++++
lib/Kconfig | 3 +
lib/devres.c | 48 ++++++++----------
37 files changed, 347 insertions(+), 78 deletions(-)
create mode 100644 include/linux/pmem.h
5 years, 9 months
[PATCH v11 0/12] Support Write-Through mapping on x86
by Toshi Kani
This patchset adds support of Write-Through (WT) mapping on x86.
The study below shows that using WT mapping may be useful for
non-volatile memory.
http://www.hpl.hp.com/techreports/2012/HPL-2012-236.pdf
The patchset consists of the following changes.
- Patch 1/12 to 2/12 refactor !pat_enable paths
- Patch 3/12 to 8/12 add ioremap_wt()
- Patch 9/12 adds pgprot_writethrough()
- Patch 10/12 to 11/12 add set_memory_wt()
- Patch 12/12 changes the pmem driver to call ioremap_wt()
All new/modified interfaces have been tested.
---
v11:
- Reordered the refactor changes from patch 10-11 to 1-2.
(Borislav Petkov)
- Changed BUG() to panic(). (Borislav Petkov)
- Rebased to tip/master and resolved conflicts.
v10:
- Removed ioremap_writethrough(). (Thomas Gleixner)
- Clarified and cleaned up multiple comments and functions.
(Thomas Gleixner)
- Changed ioremap_change_attr() to accept the WT type.
v9:
- Changed to export the set_xxx_wt() interfaces with GPL.
(Ingo Molnar)
- Changed is_new_memtype_allowed() to handle WT cases.
- Changed arch-specific io.h to define ioremap_wt().
- Changed the pmem driver to use ioremap_wt().
- Rebased to 4.1-rc3 and resolved minor conflicts.
v8:
- Rebased to 4.0-rc1 and resolved conflicts with 9d34cfdf4 in
patch 5/7.
v7:
- Rebased to 3.19-rc3 as Juergen's patchset for the PAT management
has been accepted.
v6:
- Dropped the patch moving [set|get]_page_memtype() to pat.c
since the tip branch already has this change.
- Fixed an issue when CONFIG_X86_PAT is not defined.
v5:
- Clarified comment of why using slot 7. (Andy Lutomirski,
Thomas Gleixner)
- Moved [set|get]_page_memtype() to pat.c. (Thomas Gleixner)
- Removed BUG() from set_page_memtype(). (Thomas Gleixner)
v4:
- Added set_memory_wt() by adding WT support of regular memory.
v3:
- Dropped the set_memory_wt() patch. (Andy Lutomirski)
- Refactored the !pat_enabled handling. (H. Peter Anvin,
Andy Lutomirski)
- Added the picture of PTE encoding. (Konrad Rzeszutek Wilk)
v2:
- Changed WT to use slot 7 of the PAT MSR. (H. Peter Anvin,
Andy Lutomirski)
- Changed to have conservative checks to exclude all Pentium 2, 3,
M, and 4 families. (Ingo Molnar, Henrique de Moraes Holschuh,
Andy Lutomirski)
- Updated documentation to cover WT interfaces and usages.
(Andy Lutomirski, Yigal Korman)
---
Toshi Kani (12):
1/12 x86, mm, pat: Cleanup init flags in pat_init()
2/12 x86, mm, pat: Refactor !pat_enable handling
3/12 x86, mm, pat: Set WT to PA7 slot of PAT MSR
4/12 x86, mm, pat: Change reserve_memtype() for WT
5/12 x86, asm: Change is_new_memtype_allowed() for WT
6/12 x86, mm, asm-gen: Add ioremap_wt() for WT
7/12 arch/*/asm/io.h: Add ioremap_wt() to all architectures
8/12 video/fbdev, asm/io.h: Remove ioremap_writethrough()
9/12 x86, mm, pat: Add pgprot_writethrough() for WT
10/12 x86, mm, asm: Add WT support to set_page_memtype()
11/12 x86, mm: Add set_memory_wt() for WT
12/12 drivers/block/pmem: Map NVDIMM with ioremap_wt()
---
Documentation/x86/pat.txt | 13 +-
arch/arc/include/asm/io.h | 1 +
arch/arm/include/asm/io.h | 1 +
arch/arm64/include/asm/io.h | 1 +
arch/avr32/include/asm/io.h | 1 +
arch/frv/include/asm/io.h | 4 +-
arch/m32r/include/asm/io.h | 1 +
arch/m68k/include/asm/io_mm.h | 4 +-
arch/m68k/include/asm/io_no.h | 4 +-
arch/metag/include/asm/io.h | 3 +
arch/microblaze/include/asm/io.h | 2 +-
arch/mn10300/include/asm/io.h | 1 +
arch/nios2/include/asm/io.h | 1 +
arch/s390/include/asm/io.h | 1 +
arch/sparc/include/asm/io_32.h | 1 +
arch/sparc/include/asm/io_64.h | 1 +
arch/tile/include/asm/io.h | 2 +-
arch/x86/include/asm/cacheflush.h | 6 +-
arch/x86/include/asm/io.h | 2 +
arch/x86/include/asm/pgtable.h | 8 +-
arch/x86/include/asm/pgtable_types.h | 3 +
arch/x86/mm/init.c | 6 +-
arch/x86/mm/iomap_32.c | 12 +-
arch/x86/mm/ioremap.c | 29 ++++-
arch/x86/mm/pageattr.c | 65 +++++++---
arch/x86/mm/pat.c | 229 +++++++++++++++++++++++------------
arch/xtensa/include/asm/io.h | 1 +
drivers/block/pmem.c | 4 +-
drivers/video/fbdev/amifb.c | 4 +-
drivers/video/fbdev/atafb.c | 3 +-
drivers/video/fbdev/hpfb.c | 4 +-
include/asm-generic/io.h | 9 ++
include/asm-generic/iomap.h | 4 +
include/asm-generic/pgtable.h | 4 +
34 files changed, 310 insertions(+), 125 deletions(-)
5 years, 9 months
[PATCH v10 0/12] Support Write-Through mapping on x86
by Toshi Kani
This patchset adds support of Write-Through (WT) mapping on x86.
The study below shows that using WT mapping may be useful for
non-volatile memory.
http://www.hpl.hp.com/techreports/2012/HPL-2012-236.pdf
The patchset consists of the following changes.
- Patch 1/12 to 6/12 add ioremap_wt()
- Patch 7/12 adds pgprot_writethrough()
- Patch 8/12 to 9/12 add set_memory_wt()
- Patch 10/12 to 11/12 refactor !pat_enable paths
- Patch 12/12 changes the pmem driver to call ioremap_wt()
All new/modified interfaces have been tested.
---
v10:
- Removed ioremap_writethrough(). (Thomas Gleixner)
- Clarified and cleaned up multiple comments and functions.
(Thomas Gleixner)
- Changed ioremap_change_attr() to accept the WT type.
v9:
- Changed to export the set_xxx_wt() interfaces with GPL.
(Ingo Molnar)
- Changed is_new_memtype_allowed() to handle WT cases.
- Changed arch-specific io.h to define ioremap_wt().
- Changed the pmem driver to use ioremap_wt().
- Rebased to 4.1-rc3 and resolved minor conflicts.
v8:
- Rebased to 4.0-rc1 and resolved conflicts with 9d34cfdf4 in
patch 5/7.
v7:
- Rebased to 3.19-rc3 as Juergen's patchset for the PAT management
has been accepted.
v6:
- Dropped the patch moving [set|get]_page_memtype() to pat.c
since the tip branch already has this change.
- Fixed an issue when CONFIG_X86_PAT is not defined.
v5:
- Clarified comment of why using slot 7. (Andy Lutomirski,
Thomas Gleixner)
- Moved [set|get]_page_memtype() to pat.c. (Thomas Gleixner)
- Removed BUG() from set_page_memtype(). (Thomas Gleixner)
v4:
- Added set_memory_wt() by adding WT support of regular memory.
v3:
- Dropped the set_memory_wt() patch. (Andy Lutomirski)
- Refactored the !pat_enabled handling. (H. Peter Anvin,
Andy Lutomirski)
- Added the picture of PTE encoding. (Konrad Rzeszutek Wilk)
v2:
- Changed WT to use slot 7 of the PAT MSR. (H. Peter Anvin,
Andy Lutomirski)
- Changed to have conservative checks to exclude all Pentium 2, 3,
M, and 4 families. (Ingo Molnar, Henrique de Moraes Holschuh,
Andy Lutomirski)
- Updated documentation to cover WT interfaces and usages.
(Andy Lutomirski, Yigal Korman)
---
Toshi Kani (12):
1/12 x86, mm, pat: Set WT to PA7 slot of PAT MSR
2/12 x86, mm, pat: Change reserve_memtype() for WT
3/12 x86, asm: Change is_new_memtype_allowed() for WT
4/12 x86, mm, asm-gen: Add ioremap_wt() for WT
5/12 arch/*/asm/io.h: Add ioremap_wt() to all architectures
6/12 video/fbdev, asm/io.h: Remove ioremap_writethrough()
7/12 x86, mm, pat: Add pgprot_writethrough() for WT
8/12 x86, mm, asm: Add WT support to set_page_memtype()
9/12 x86, mm: Add set_memory_wt() for WT
10/12 x86, mm, pat: Cleanup init flags in pat_init()
11/12 x86, mm, pat: Refactor !pat_enable handling
12/12 drivers/block/pmem: Map NVDIMM with ioremap_wt()
---
Documentation/x86/pat.txt | 13 +-
arch/arc/include/asm/io.h | 1 +
arch/arm/include/asm/io.h | 1 +
arch/arm64/include/asm/io.h | 1 +
arch/avr32/include/asm/io.h | 1 +
arch/frv/include/asm/io.h | 4 +-
arch/m32r/include/asm/io.h | 1 +
arch/m68k/include/asm/io_mm.h | 4 +-
arch/m68k/include/asm/io_no.h | 4 +-
arch/metag/include/asm/io.h | 3 +
arch/microblaze/include/asm/io.h | 2 +-
arch/mn10300/include/asm/io.h | 1 +
arch/nios2/include/asm/io.h | 1 +
arch/s390/include/asm/io.h | 1 +
arch/sparc/include/asm/io_32.h | 1 +
arch/sparc/include/asm/io_64.h | 1 +
arch/tile/include/asm/io.h | 2 +-
arch/x86/include/asm/cacheflush.h | 6 +-
arch/x86/include/asm/io.h | 2 +
arch/x86/include/asm/pgtable.h | 8 +-
arch/x86/include/asm/pgtable_types.h | 3 +
arch/x86/mm/init.c | 6 +-
arch/x86/mm/iomap_32.c | 12 +-
arch/x86/mm/ioremap.c | 29 ++++-
arch/x86/mm/pageattr.c | 65 +++++++---
arch/x86/mm/pat.c | 232 +++++++++++++++++++++++------------
arch/xtensa/include/asm/io.h | 1 +
drivers/block/pmem.c | 4 +-
drivers/video/fbdev/amifb.c | 4 +-
drivers/video/fbdev/atafb.c | 3 +-
drivers/video/fbdev/hpfb.c | 4 +-
include/asm-generic/io.h | 9 ++
include/asm-generic/iomap.h | 4 +
include/asm-generic/pgtable.h | 4 +
34 files changed, 311 insertions(+), 127 deletions(-)
5 years, 9 months
[libnd-for-next PATCH] libnd: miscellaneous sparse fixes
by Dan Williams
It seems 0day is slowly leaking out new sparse reports for libnd.
Indeed running sparse locally reveals a small trove. Most are
straightforward but there are a few that remain open:
"drivers/block/nd/region.c:74:9: warning: context imbalance in
'nd_region_acquire_lane' - wrong count at exit
drivers/block/nd/region.c:88:36: warning: context imbalance in
'nd_region_release_lane' - unexpected unlock"
Not sure how to tell sparse that nd_region_acquire_lane() may nest,
conditionally acquires the lock at the top level, but only conditionally
if we have more cpus than we have lanes.
"drivers/block/nd/label.c:105:26: warning: Initializer entry defined twice
drivers/block/nd/label.c:105:33: also defined here
drivers/block/nd/pmem.c:166:25: warning: incorrect type in assignment
(different address spaces)
drivers/block/nd/pmem.c:166:25: expected void *virt_addr
drivers/block/nd/pmem.c:166:25: got void [noderef] <asn:2>*
drivers/block/nd/pmem.c:198:21: warning: incorrect type in argument 1
(different address spaces)
drivers/block/nd/pmem.c:198:21: expected void volatile [noderef] <asn:2>*addr
drivers/block/nd/pmem.c:198:21: got void *virt_addr
drivers/block/nd/pmem.c:212:21: warning: incorrect type in argument 1
(different address spaces)
drivers/block/nd/pmem.c:212:21: expected void volatile [noderef] <asn:2>*addr
drivers/block/nd/pmem.c:212:21: got void *virt_addr"
These are the result of ioremap() vs memcpy() where we know the virtual
address returned by ioremap() has no io side effects. The plan is to
introduce memremap() for these cases, as other users of ioremap() in the
kernel have this same problem.
Reported-by: kbuild test robot <fengguang.wu(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/block/nd/btt.c | 2 ++
drivers/block/nd/bus.c | 6 +++--
drivers/block/nd/label.c | 48 ++++++++++++++++++++++-------------------
drivers/block/nd/label.h | 4 ++-
drivers/block/nd/nd-private.h | 5 ----
drivers/block/nd/nd.h | 4 +++
drivers/block/nd/region.c | 2 +-
7 files changed, 38 insertions(+), 33 deletions(-)
diff --git a/drivers/block/nd/btt.c b/drivers/block/nd/btt.c
index a4287b6f4224..932177294f75 100644
--- a/drivers/block/nd/btt.c
+++ b/drivers/block/nd/btt.c
@@ -863,6 +863,7 @@ static int lba_to_arena(struct btt *btt, sector_t sector, __u32 *premap,
* readability, since they index into an array of locks
*/
static void lock_map(struct arena_info *arena, u32 premap)
+ __acquires(&arena->map_locks[idx].lock)
{
u32 idx = (premap * MAP_ENT_SIZE / L1_CACHE_BYTES) % arena->nfree;
@@ -870,6 +871,7 @@ static void lock_map(struct arena_info *arena, u32 premap)
}
static void unlock_map(struct arena_info *arena, u32 premap)
+ __releases(&arena->map_locks[idx].lock)
{
u32 idx = (premap * MAP_ENT_SIZE / L1_CACHE_BYTES) % arena->nfree;
diff --git a/drivers/block/nd/bus.c b/drivers/block/nd/bus.c
index dc69ccfae53a..8d13051714d3 100644
--- a/drivers/block/nd/bus.c
+++ b/drivers/block/nd/bus.c
@@ -337,7 +337,7 @@ static ssize_t devtype_show(struct device *dev, struct device_attribute *attr,
{
return sprintf(buf, "%s\n", dev->type->name);
}
-DEVICE_ATTR_RO(devtype);
+static DEVICE_ATTR_RO(devtype);
static struct attribute *nd_device_attributes[] = {
&dev_attr_modalias.attr,
@@ -374,7 +374,7 @@ void nd_bus_destroy_ndctl(struct nd_bus *nd_bus)
device_destroy(nd_class, MKDEV(nd_bus_major, nd_bus->id));
}
-static const struct nd_cmd_desc const __nd_cmd_dimm_descs[] = {
+static const struct nd_cmd_desc __nd_cmd_dimm_descs[] = {
[ND_CMD_IMPLEMENTED] = { },
[ND_CMD_SMART] = {
.out_num = 2,
@@ -420,7 +420,7 @@ const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd)
}
EXPORT_SYMBOL_GPL(nd_cmd_dimm_desc);
-static const struct nd_cmd_desc const __nd_cmd_bus_descs[] = {
+static const struct nd_cmd_desc __nd_cmd_bus_descs[] = {
[ND_CMD_IMPLEMENTED] = { },
[ND_CMD_ARS_CAP] = {
.in_num = 2,
diff --git a/drivers/block/nd/label.c b/drivers/block/nd/label.c
index 5052db591bec..e0f495e90728 100644
--- a/drivers/block/nd/label.c
+++ b/drivers/block/nd/label.c
@@ -21,6 +21,10 @@
#include <asm-generic/io-64-nonatomic-lo-hi.h>
+#ifndef __io_virt
+#define __io_virt(x) ((void __force *) (x))
+#endif
+
static u32 best_seq(u32 a, u32 b)
{
a &= NSINDEX_SEQ_MASK;
@@ -114,7 +118,7 @@ int nd_label_validate(struct nd_dimm_drvdata *ndd)
}
sum_save = readq(&nsindex[i]->checksum);
writeq(0, &nsindex[i]->checksum);
- sum = nd_fletcher64((void * __force) nsindex[i],
+ sum = nd_fletcher64(__io_virt(nsindex[i]),
sizeof_namespace_index(ndd), 1);
writeq(sum_save, &nsindex[i]->checksum);
if (sum != sum_save) {
@@ -190,21 +194,17 @@ void nd_label_copy(struct nd_dimm_drvdata *ndd,
struct nd_namespace_index __iomem *dst,
struct nd_namespace_index __iomem *src)
{
- void *s, *d;
-
if (dst && src)
/* pass */;
else
return;
- d = (void * __force) dst;
- s = (void * __force) src;
- memcpy(d, s, sizeof_namespace_index(ndd));
+ memcpy(__io_virt(dst), __io_virt(src), sizeof_namespace_index(ndd));
}
static struct nd_namespace_label __iomem *nd_label_base(struct nd_dimm_drvdata *ndd)
{
- void *base = to_namespace_index(ndd, 0);
+ void __iomem *base = to_namespace_index(ndd, 0);
return base + 2 * sizeof_namespace_index(ndd);
}
@@ -224,20 +224,23 @@ static int to_slot(struct nd_dimm_drvdata *ndd,
* preamble_index - common variable initialization for nd_label_* routines
* @nd_dimm: dimm container for the relevant label set
* @idx: namespace_index index
- * @nsindex: on return set to the currently active namespace index
+ * @nsindex_out: on return set to the currently active namespace index
* @free: on return set to the free label bitmap in the index
* @nslot: on return set to the number of slots in the label space
*/
static bool preamble_index(struct nd_dimm_drvdata *ndd, int idx,
- struct nd_namespace_index **nsindex,
+ struct nd_namespace_index __iomem **nsindex_out,
unsigned long **free, u32 *nslot)
{
- *nsindex = to_namespace_index(ndd, idx);
- if (*nsindex == NULL)
+ struct nd_namespace_index __iomem *nsindex;
+
+ nsindex = to_namespace_index(ndd, idx);
+ if (nsindex == NULL)
return false;
- *free = (unsigned long __force *) (*nsindex)->free;
- *nslot = readl(&(*nsindex)->nslot);
+ *free = __io_virt(nsindex->free);
+ *nslot = readl(&nsindex->nslot);
+ *nsindex_out = nsindex;
return true;
}
@@ -252,7 +255,7 @@ char *nd_label_gen_id(struct nd_label_id *label_id, u8 *uuid, u32 flags)
}
static bool preamble_current(struct nd_dimm_drvdata *ndd,
- struct nd_namespace_index **nsindex,
+ struct nd_namespace_index __iomem **nsindex,
unsigned long **free, u32 *nslot)
{
return preamble_index(ndd, ndd->ns_current, nsindex,
@@ -260,7 +263,7 @@ static bool preamble_current(struct nd_dimm_drvdata *ndd,
}
static bool preamble_next(struct nd_dimm_drvdata *ndd,
- struct nd_namespace_index **nsindex,
+ struct nd_namespace_index __iomem **nsindex,
unsigned long **free, u32 *nslot)
{
return preamble_index(ndd, ndd->ns_next, nsindex,
@@ -420,12 +423,13 @@ u32 nd_label_nfree(struct nd_dimm_drvdata *ndd)
static int nd_label_write_index(struct nd_dimm_drvdata *ndd, int index, u32 seq,
unsigned long flags)
{
- struct nd_namespace_index *nsindex = to_namespace_index(ndd, index);
+ struct nd_namespace_index __iomem *nsindex;
unsigned long offset;
u64 checksum;
u32 nslot;
int rc;
+ nsindex = to_namespace_index(ndd, index);
if (flags & ND_NSINDEX_INIT)
nslot = nd_dimm_num_label_slots(ndd);
else
@@ -450,7 +454,7 @@ static int nd_label_write_index(struct nd_dimm_drvdata *ndd, int index, u32 seq,
writew(1, &nsindex->minor);
writeq(0, &nsindex->checksum);
if (flags & ND_NSINDEX_INIT) {
- unsigned long *free = (unsigned long __force *) nsindex->free;
+ unsigned long *free = __io_virt(nsindex->free);
u32 nfree = ALIGN(nslot, BITS_PER_LONG);
int last_bits, i;
@@ -458,11 +462,11 @@ static int nd_label_write_index(struct nd_dimm_drvdata *ndd, int index, u32 seq,
for (i = 0, last_bits = nfree - nslot; i < last_bits; i++)
clear_bit_le(nslot + i, free);
}
- checksum = nd_fletcher64((void * __force) nsindex,
+ checksum = nd_fletcher64(__io_virt(nsindex),
sizeof_namespace_index(ndd), 1);
writeq(checksum, &nsindex->checksum);
rc = nd_dimm_set_config_data(ndd, readq(&nsindex->myoff),
- nsindex, sizeof_namespace_index(ndd));
+ __io_virt(nsindex), sizeof_namespace_index(ndd));
if (rc < 0)
return rc;
@@ -526,7 +530,7 @@ static int __pmem_label_update(struct nd_region *nd_region,
/* update label */
offset = nd_label_offset(ndd, nd_label);
- rc = nd_dimm_set_config_data(ndd, offset, nd_label,
+ rc = nd_dimm_set_config_data(ndd, offset, __io_virt(nd_label),
sizeof(struct nd_namespace_label));
if (rc < 0)
return rc;
@@ -552,7 +556,7 @@ static int __pmem_label_update(struct nd_region *nd_region,
static void del_label(struct nd_mapping *nd_mapping, int l)
{
- struct nd_namespace_label __iomem *next_label, __iomem *nd_label;
+ struct nd_namespace_label __iomem *next_label, *nd_label;
struct nd_dimm_drvdata *ndd = to_ndd(nd_mapping);
unsigned int slot;
int j;
@@ -709,7 +713,7 @@ static int __blk_label_update(struct nd_region *nd_region,
/* update label */
offset = nd_label_offset(ndd, nd_label);
- rc = nd_dimm_set_config_data(ndd, offset, nd_label,
+ rc = nd_dimm_set_config_data(ndd, offset, __io_virt(nd_label),
sizeof(struct nd_namespace_label));
if (rc < 0)
goto abort;
diff --git a/drivers/block/nd/label.h b/drivers/block/nd/label.h
index a26cebc9f389..71fac593e50f 100644
--- a/drivers/block/nd/label.h
+++ b/drivers/block/nd/label.h
@@ -124,8 +124,8 @@ static inline int nd_label_next_nsindex(int index)
struct nd_dimm_drvdata;
int nd_label_validate(struct nd_dimm_drvdata *ndd);
void nd_label_copy(struct nd_dimm_drvdata *ndd,
- struct nd_namespace_index *dst,
- struct nd_namespace_index *src);
+ struct nd_namespace_index __iomem *dst,
+ struct nd_namespace_index __iomem *src);
size_t sizeof_namespace_index(struct nd_dimm_drvdata *ndd);
int nd_label_active_count(struct nd_dimm_drvdata *ndd);
struct nd_namespace_label __iomem *nd_label_active(
diff --git a/drivers/block/nd/nd-private.h b/drivers/block/nd/nd-private.h
index b0571e334af9..e0eb5799ef3f 100644
--- a/drivers/block/nd/nd-private.h
+++ b/drivers/block/nd/nd-private.h
@@ -73,11 +73,6 @@ static inline void nd_btt_notify_ndio(struct nd_bus *nd_bus, struct nd_io *ndio)
struct nd_bus *walk_to_nd_bus(struct device *nd_dev);
int __init nd_bus_init(void);
void nd_bus_exit(void);
-int __init nd_dimm_init(void);
-int __init nd_region_init(void);
-void __init nd_region_init_locks(void);
-void nd_dimm_exit(void);
-int nd_region_exit(void);
void nd_region_probe_start(struct nd_bus *nd_bus, struct device *dev);
void nd_region_probe_end(struct nd_bus *nd_bus, struct device *dev, int rc);
struct nd_region;
diff --git a/drivers/block/nd/nd.h b/drivers/block/nd/nd.h
index b830801c9892..e826fa3dfeac 100644
--- a/drivers/block/nd/nd.h
+++ b/drivers/block/nd/nd.h
@@ -231,6 +231,10 @@ void nd_init_ndio(struct nd_io *ndio, nd_rw_bytes_fn rw_bytes,
void ndio_del_claim(struct nd_io_claim *ndio_claim);
struct nd_io_claim *ndio_add_claim(struct nd_io *ndio, struct device *holder,
ndio_notify_remove_fn notify_remove);
+int __init nd_dimm_init(void);
+int __init nd_region_init(void);
+void nd_dimm_exit(void);
+void nd_region_exit(void);
struct nd_dimm;
struct nd_dimm_drvdata *to_ndd(struct nd_mapping *nd_mapping);
int nd_dimm_init_nsarea(struct nd_dimm_drvdata *ndd);
diff --git a/drivers/block/nd/region.c b/drivers/block/nd/region.c
index 75ae27279f0e..5af7701ad6ea 100644
--- a/drivers/block/nd/region.c
+++ b/drivers/block/nd/region.c
@@ -180,7 +180,7 @@ int __init nd_region_init(void)
return nd_driver_register(&nd_region_driver);
}
-void __exit nd_region_exit(void)
+void nd_region_exit(void)
{
driver_unregister(&nd_region_driver.drv);
}
5 years, 9 months
[PATCH 0/6] I/O path improvements for ND_BLK and PMEM
by Ross Zwisler
This series adds a new PMEM API consisting of three functions:
persistent_copy(), persistent_flush() and persistent_sync().
These three functions are then used in the I/O paths for both the ND_BLK driver
and the PMEM driver to ensure that writes actually make it to the DIMM and
become durable before the I/O operation completes.
The first two patches in the series are just cleanup and correctness patches.
Patch three provides a reasonable architecture neutral default implementation
for these three APIs for architectures that do not implement the PMEM API.
These defaults allow all architectures to mostly work, aliasing
persistent_copy() to memcpy() and having persistent_flush() and
persistent_sync() be noops. With this patch set this implementation is
provided at the pmem.h level.
It's possible that other future consumers of the PMEM API (DAX, possibly
others) would prefer to have a different default behavior for architectures
that don't support the PMEM API. If this is the case we could move the choice
about what to do for those architectures down into consumer-specific header
files, so nd.h for libnd, for example. If DAX and other consumers are fine
with our defaults it's nicer to keep them common and in a global place. Please
let us know how other future consumers of the PMEM API feel about this.
Patches 5 and 6 update the I/O paths for flush hints and NVDIMM flags.
This series applies cleanly to Dan's "ndctl-for-next" tree:
https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/log/?h=libnd...
One last note - I'm going to be unavailable soon, so patch feedback will most
likely be handled by Dan Williams. Thanks, Dan. :)
Ross Zwisler (6):
pmem: add force casts to avoid __iomem annotation
nfit: Fix up address spaces, sparse warnings
x86, pmem: add PMEM API for persistent memory
pmem, nd_blk: update I/O paths to use PMEM API
nd_blk: add support for flush hints
nd_blk: add support for NVDIMM flags
MAINTAINERS | 1 +
arch/x86/Kconfig | 3 ++
arch/x86/include/asm/cacheflush.h | 23 ++++++++++
drivers/acpi/nfit.c | 89 ++++++++++++++++++++++++++++++++++-----
drivers/acpi/nfit.h | 28 +++++++++++-
drivers/block/nd/pmem.c | 22 +++++++---
include/linux/pmem.h | 79 ++++++++++++++++++++++++++++++++++
include/uapi/linux/ndctl.h | 5 +++
8 files changed, 232 insertions(+), 18 deletions(-)
create mode 100644 include/linux/pmem.h
--
1.9.3
5 years, 9 months
[Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
by Dan Williams
Changes since v1 [1]: Incorporates feedback received prior to April 24.
1/ Ingo said [2]:
"So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??"
Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation. That is fixed
now with all NFIT specifics factored out into acpi.c. The NFIT is no
longer required reading to review libnd. Only three concepts are
needed:
i/ PMEM - contiguous memory range where cpu stores are
persistent once they are flushed through the memory
controller.
ii/ BLK - mmio apertures (sliding windows) that can be
programmed to access an aperture's-worth of persistent
media at a time.
iii/ DPA - "dimm-physical-address", address space local to a
dimm. A dimm may provide both PMEM-mode and BLK-mode
access to a range of DPA. libnd manages allocation of DPA
to either PMEM or BLK-namespaces to resolve this aliasing.
The v1..v2 diffstat below shows the migration of nfit-specifics to
acpi.c and the new state of libnd being nfit-free. "nd" now only
refers to "non-volatile devices". Note, reworked documentation will
return once the review has settled.
Documentation/blockdev/nd.txt | 867 ---------------------
MAINTAINERS | 34 +-
arch/ia64/kernel/efi.c | 5 +-
arch/x86/kernel/e820.c | 11 +-
arch/x86/kernel/pmem.c | 2 +-
drivers/block/Makefile | 2 +-
drivers/block/nd/Kconfig | 135 ++--
drivers/block/nd/Makefile | 32 +-
drivers/block/nd/acpi.c | 1506 +++++++++++++++++++++++++++++++------
drivers/block/nd/acpi_nfit.h | 321 ++++++++
drivers/block/nd/blk.c | 27 +-
drivers/block/nd/btt.c | 6 +-
drivers/block/nd/btt_devs.c | 8 +-
drivers/block/nd/bus.c | 337 +++++----
drivers/block/nd/core.c | 574 +-------------
drivers/block/nd/dimm.c | 11 -
drivers/block/nd/dimm_devs.c | 292 ++-----
drivers/block/nd/e820.c | 100 +++
drivers/block/nd/libnd.h | 122 +++
drivers/block/nd/namespace_devs.c | 10 +-
drivers/block/nd/nd-private.h | 107 +--
drivers/block/nd/nd.h | 91 +--
drivers/block/nd/nfit.h | 238 ------
drivers/block/nd/pmem.c | 56 +-
drivers/block/nd/region.c | 78 +-
drivers/block/nd/region_devs.c | 783 +++----------------
drivers/block/nd/test/iomap.c | 86 +--
drivers/block/nd/test/nfit.c | 1115 +++++++++++++++------------
drivers/block/nd/test/nfit_test.h | 15 +-
include/uapi/linux/ndctl.h | 130 ++--
30 files changed, 3166 insertions(+), 3935 deletions(-)
delete mode 100644 Documentation/blockdev/nd.txt
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/e820.c
create mode 100644 drivers/block/nd/libnd.h
delete mode 100644 drivers/block/nd/nfit.h
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
2/ Christoph asked the pmem ida conversion to be moved to its own patch
(done), and to consider leaving the current pmem.c in drivers/block/.
Instead, I converted the e820-type-12 enabling to be the first
non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply
registers e820-type-12 ranges as libnd PMEM regions. Among other
things this conversion enables BTT for these ranges. The alternative
is to move drivers/block/nd/nd.h internals out to include/linux/
which I think is worse.
3/ Toshi reported that the NFIT parsing fails to handle the case of a
PMEM range with a single-dimm (non-aliasing) interleave description.
Support for this case was added and is tested by default by the
nfit_test.1 configuration.
4/ Toshi reported that we should not be treating a missing _STA property
as a "dimm disabled by firmware" case. (fixed).
5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
arch code. It is gone for now and we'll revisit when adding cached
mappings back to the PMEM driver.
6/ Toshi mentioned that the presence of two different nd_bus_probe()
functions was confusing. (cleaned up).
7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done).
8/ Linda asked for nfit_test to honor dynamic cma reservations via the
cma= command line (done). The cma requirements have also been
reduced to 128M as only the simulated DAX regions need CMA. The rest
can use vmalloc().
---
Available here:
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm nd-v2
---
Dan Williams (18):
e820, efi: add ACPI 6.0 persistent memory types
libnd, nd_acpi: initial libnd infrastructure and NFIT support
nd_acpi, nfit-test: manufactured NFITs for interface development
libnd: ndctl class device, and nd bus attributes
libnd, nd_acpi: dimm/memory-devices
libnd: ndctl.h, the nd ioctl abi
libnd, nd_dimm: dimm driver and base libnd device-driver infrastructure
libnd, nd_acpi: regions (block-data-window, persistent memory, volatile memory)
libnd: support for legacy (non-aliasing) nvdimms
pmem: use ida
libnd, nd_pmem: add libnd support to the pmem driver
libnd, nd_acpi: add interleave-set state-tracking infrastructure
libnd: namespace indices: read and validate
libnd: pmem label sets and namespace instantiation.
libnd: blk labels and namespace instantiation
libnd: write pmem label set
libnd: write blk label set
libnd: infrastructure for btt devices
Ross Zwisler (1):
libnd, nd_acpi, nd_blk: driver for BLK-mode access persistent memory
Vishal Verma (1):
nd_btt: atomic sector updates
Documentation/blockdev/btt.txt | 273 ++++++
arch/arm64/kernel/efi.c | 1
arch/ia64/kernel/efi.c | 4
arch/x86/boot/compressed/eboot.c | 4
arch/x86/include/uapi/asm/e820.h | 1
arch/x86/kernel/e820.c | 26 +
arch/x86/kernel/pmem.c | 2
arch/x86/platform/efi/efi.c | 3
drivers/block/Kconfig | 13
drivers/block/Makefile | 2
drivers/block/nd/Kconfig | 129 +++
drivers/block/nd/Makefile | 41 +
drivers/block/nd/acpi.c | 1505 +++++++++++++++++++++++++++++++++
drivers/block/nd/acpi_nfit.h | 321 +++++++
drivers/block/nd/blk.c | 264 ++++++
drivers/block/nd/btt.c | 1423 +++++++++++++++++++++++++++++++
drivers/block/nd/btt.h | 185 ++++
drivers/block/nd/btt_devs.c | 443 ++++++++++
drivers/block/nd/bus.c | 770 +++++++++++++++++
drivers/block/nd/core.c | 471 ++++++++++
drivers/block/nd/dimm.c | 115 +++
drivers/block/nd/dimm_devs.c | 507 +++++++++++
drivers/block/nd/e820.c | 100 ++
drivers/block/nd/label.c | 925 ++++++++++++++++++++
drivers/block/nd/label.h | 143 +++
drivers/block/nd/libnd.h | 122 +++
drivers/block/nd/namespace_devs.c | 1701 +++++++++++++++++++++++++++++++++++++
drivers/block/nd/nd-private.h | 114 ++
drivers/block/nd/nd.h | 261 ++++++
drivers/block/nd/pmem.c | 114 ++
drivers/block/nd/region.c | 159 +++
drivers/block/nd/region_devs.c | 637 ++++++++++++++
drivers/block/nd/test/Makefile | 5
drivers/block/nd/test/iomap.c | 151 +++
drivers/block/nd/test/nfit.c | 1131 +++++++++++++++++++++++++
drivers/block/nd/test/nfit_test.h | 26 +
include/linux/efi.h | 3
include/linux/nd.h | 98 ++
include/uapi/linux/Kbuild | 1
include/uapi/linux/ndctl.h | 199 ++++
40 files changed, 12345 insertions(+), 48 deletions(-)
create mode 100644 Documentation/blockdev/btt.txt
create mode 100644 drivers/block/nd/Kconfig
create mode 100644 drivers/block/nd/Makefile
create mode 100644 drivers/block/nd/acpi.c
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/blk.c
create mode 100644 drivers/block/nd/btt.c
create mode 100644 drivers/block/nd/btt.h
create mode 100644 drivers/block/nd/btt_devs.c
create mode 100644 drivers/block/nd/bus.c
create mode 100644 drivers/block/nd/core.c
create mode 100644 drivers/block/nd/dimm.c
create mode 100644 drivers/block/nd/dimm_devs.c
create mode 100644 drivers/block/nd/e820.c
create mode 100644 drivers/block/nd/label.c
create mode 100644 drivers/block/nd/label.h
create mode 100644 drivers/block/nd/libnd.h
create mode 100644 drivers/block/nd/namespace_devs.c
create mode 100644 drivers/block/nd/nd-private.h
create mode 100644 drivers/block/nd/nd.h
rename drivers/block/{pmem.c => nd/pmem.c} (68%)
create mode 100644 drivers/block/nd/region.c
create mode 100644 drivers/block/nd/region_devs.c
create mode 100644 drivers/block/nd/test/Makefile
create mode 100644 drivers/block/nd/test/iomap.c
create mode 100644 drivers/block/nd/test/nfit.c
create mode 100644 drivers/block/nd/test/nfit_test.h
create mode 100644 include/linux/nd.h
create mode 100644 include/uapi/linux/ndctl.h
5 years, 9 months
[Linux-nvdimm] [GIT PULL] PMEM driver for v4.1
by Ingo Molnar
Linus,
Please pull the latest x86-pmem-for-linus git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-pmem-for-linus
# HEAD: 4c1eaa2344fb26bb5e936fb4d8ee307343ea0089 drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
This is the initial support for the pmem block device driver:
persistent non-volatile memory space mapped into the system's physical
memory space as large physical memory regions.
The driver is based on Intel code, written by Ross Zwisler, with fixes
by Boaz Harrosh, integrated with x86 e820 memory resource management
and tidied up by Christoph Hellwig.
Note that there were two other separate pmem driver submissions to
lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are
reasonably happy with this initial version.
This version enables minimal support that enables persistent memory
devices out in the wild to work as block devices, identified through a
magic (non-standard) e820 flag and auto-discovered if
CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the
memory maps via the "memmap=..." boot option with the new, special '!'
modifier character.
Limitations: this is a regular block device, and since the pmem areas
are not struct page backed, they are invisible to the rest of the
system (other than the block IO device), so direct IO to/from pmem
areas, direct mmap() or XIP is not possible yet. The page cache will
also shadow and double buffer pmem contents, etc.
Initial support is for x86.
Thanks,
Ingo
------------------>
Christoph Hellwig (1):
x86/mm: Add support for the non-standard protected e820 type
Ingo Molnar (1):
drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
Ross Zwisler (1):
drivers/block/pmem: Add a driver for persistent memory
Documentation/kernel-parameters.txt | 6 +
MAINTAINERS | 6 +
arch/x86/Kconfig | 10 ++
arch/x86/include/uapi/asm/e820.h | 10 ++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/e820.c | 26 +++-
arch/x86/kernel/pmem.c | 53 ++++++++
drivers/block/Kconfig | 11 ++
drivers/block/Makefile | 1 +
drivers/block/pmem.c | 262 ++++++++++++++++++++++++++++++++++++
10 files changed, 380 insertions(+), 6 deletions(-)
create mode 100644 arch/x86/kernel/pmem.c
create mode 100644 drivers/block/pmem.c
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..c87122dd790f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
or
memmap=0x10000$0x18690000
+ memmap=nn[KMG]!ss[KMG]
+ [KNL,X86] Mark specific memory as protected.
+ Region of memory to be used, from ss to ss+nn.
+ The memory region may be marked as e820 type 12 (0xc)
+ and is NVDIMM or ADR memory.
+
memory_corruption_check=0/1 [X86]
Some BIOSes seem to corrupt the first 64k of
memory when doing things like suspend/resume.
diff --git a/MAINTAINERS b/MAINTAINERS
index 1de6afa8ee51..4517613dc638 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8071,6 +8071,12 @@ S: Maintained
F: Documentation/blockdev/ramdisk.txt
F: drivers/block/brd.c
+PERSISTENT MEMORY DRIVER
+M: Ross Zwisler <ross.zwisler(a)linux.intel.com>
+L: linux-nvdimm(a)lists.01.org
+S: Supported
+F: drivers/block/pmem.c
+
RANDOM NUMBER DRIVER
M: "Theodore Ts'o" <tytso(a)mit.edu>
S: Maintained
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..9e3bcd6f4a48 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1430,6 +1430,16 @@ config ILLEGAL_POINTER_VALUE
source "mm/Kconfig"
+config X86_PMEM_LEGACY
+ bool "Support non-standard NVDIMMs and ADR protected memory"
+ help
+ Treat memory marked using the non-standard e820 type of 12 as used
+ by the Intel Sandy Bridge-EP reference BIOS as protected memory.
+ The kernel will offer these regions to the 'pmem' driver so
+ they can be used for persistent storage.
+
+ Say Y if unsure.
+
config HIGHPTE
bool "Allocate 3rd-level pagetables from highmem"
depends on HIGHMEM
diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index d993e33f5236..960a8a9dc4ab 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -33,6 +33,16 @@
#define E820_NVS 4
#define E820_UNUSABLE 5
+/*
+ * This is a non-standardized way to represent ADR or NVDIMM regions that
+ * persist over a reboot. The kernel will ignore their special capabilities
+ * unless the CONFIG_X86_PMEM_LEGACY=y option is set.
+ *
+ * ( Note that older platforms also used 6 for the same type of memory,
+ * but newer versions switched to 12 as 6 was assigned differently. Some
+ * time they will learn... )
+ */
+#define E820_PRAM 12
/*
* reserved RAM used by kernel itself
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cdb1b70ddad0..971f18cd9ca0 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -94,6 +94,7 @@ obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o
obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
+obj-$(CONFIG_X86_PMEM_LEGACY) += pmem.o
obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 46201deee923..11cc7d54ec3f 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -149,6 +149,9 @@ static void __init e820_print_type(u32 type)
case E820_UNUSABLE:
printk(KERN_CONT "unusable");
break;
+ case E820_PRAM:
+ printk(KERN_CONT "persistent (type %u)", type);
+ break;
default:
printk(KERN_CONT "type %u", type);
break;
@@ -343,7 +346,7 @@ int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map,
* continue building up new bios map based on this
* information
*/
- if (current_type != last_type) {
+ if (current_type != last_type || current_type == E820_PRAM) {
if (last_type != 0) {
new_bios[new_bios_entry].size =
change_point[chgidx]->addr - last_addr;
@@ -688,6 +691,7 @@ void __init e820_mark_nosave_regions(unsigned long limit_pfn)
register_nosave_region(pfn, PFN_UP(ei->addr));
pfn = PFN_DOWN(ei->addr + ei->size);
+
if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
register_nosave_region(PFN_UP(ei->addr), pfn);
@@ -748,7 +752,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
/*
* Find the highest page frame number we have available
*/
-static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
+static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
{
int i;
unsigned long last_pfn = 0;
@@ -759,7 +763,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
unsigned long start_pfn;
unsigned long end_pfn;
- if (ei->type != type)
+ /*
+ * Persistent memory is accounted as ram for purposes of
+ * establishing max_pfn and mem_map.
+ */
+ if (ei->type != E820_RAM && ei->type != E820_PRAM)
continue;
start_pfn = ei->addr >> PAGE_SHIFT;
@@ -784,12 +792,12 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
}
unsigned long __init e820_end_of_ram_pfn(void)
{
- return e820_end_pfn(MAX_ARCH_PFN, E820_RAM);
+ return e820_end_pfn(MAX_ARCH_PFN);
}
unsigned long __init e820_end_of_low_ram_pfn(void)
{
- return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
+ return e820_end_pfn(1UL << (32-PAGE_SHIFT));
}
static void early_panic(char *msg)
@@ -866,6 +874,9 @@ static int __init parse_memmap_one(char *p)
} else if (*p == '$') {
start_at = memparse(p+1, &p);
e820_add_region(start_at, mem_size, E820_RESERVED);
+ } else if (*p == '!') {
+ start_at = memparse(p+1, &p);
+ e820_add_region(start_at, mem_size, E820_PRAM);
} else
e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);
@@ -907,6 +918,7 @@ static inline const char *e820_type_to_string(int e820_type)
case E820_ACPI: return "ACPI Tables";
case E820_NVS: return "ACPI Non-volatile Storage";
case E820_UNUSABLE: return "Unusable memory";
+ case E820_PRAM: return "Persistent RAM";
default: return "reserved";
}
}
@@ -940,7 +952,9 @@ void __init e820_reserve_resources(void)
* pci device BAR resource and insert them later in
* pcibios_resource_survey()
*/
- if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+ if (((e820.map[i].type != E820_RESERVED) &&
+ (e820.map[i].type != E820_PRAM)) ||
+ res->start < (1ULL<<20)) {
res->flags |= IORESOURCE_BUSY;
insert_resource(&iomem_resource, res);
}
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
new file mode 100644
index 000000000000..3420c874ddc5
--- /dev/null
+++ b/arch/x86/kernel/pmem.c
@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2015, Christoph Hellwig.
+ */
+#include <linux/memblock.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <asm/e820.h>
+#include <asm/page_types.h>
+#include <asm/setup.h>
+
+static __init void register_pmem_device(struct resource *res)
+{
+ struct platform_device *pdev;
+ int error;
+
+ pdev = platform_device_alloc("pmem", PLATFORM_DEVID_AUTO);
+ if (!pdev)
+ return;
+
+ error = platform_device_add_resources(pdev, res, 1);
+ if (error)
+ goto out_put_pdev;
+
+ error = platform_device_add(pdev);
+ if (error)
+ goto out_put_pdev;
+ return;
+
+out_put_pdev:
+ dev_warn(&pdev->dev, "failed to add 'pmem' (persistent memory) device!\n");
+ platform_device_put(pdev);
+}
+
+static __init int register_pmem_devices(void)
+{
+ int i;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+
+ if (ei->type == E820_PRAM) {
+ struct resource res = {
+ .flags = IORESOURCE_MEM,
+ .start = ei->addr,
+ .end = ei->addr + ei->size - 1,
+ };
+ register_pmem_device(&res);
+ }
+ }
+
+ return 0;
+}
+device_initcall(register_pmem_devices);
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 1b8094d4d7af..eb1fed5bd516 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -404,6 +404,17 @@ config BLK_DEV_RAM_DAX
and will prevent RAM block device backing store memory from being
allocated from highmem (only a problem for highmem systems).
+config BLK_DEV_PMEM
+ tristate "Persistent memory block device support"
+ help
+ Saying Y here will allow you to use a contiguous range of reserved
+ memory as one or more persistent block devices.
+
+ To compile this driver as a module, choose M here: the module will be
+ called 'pmem'.
+
+ If unsure, say N.
+
config CDROM_PKTCDVD
tristate "Packet writing on CD/DVD media"
depends on !UML
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 02b688d1438d..9cc6c18a1c7e 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_PS3_VRAM) += ps3vram.o
obj-$(CONFIG_ATARI_FLOPPY) += ataflop.o
obj-$(CONFIG_AMIGA_Z2RAM) += z2ram.o
obj-$(CONFIG_BLK_DEV_RAM) += brd.o
+obj-$(CONFIG_BLK_DEV_PMEM) += pmem.o
obj-$(CONFIG_BLK_DEV_LOOP) += loop.o
obj-$(CONFIG_BLK_CPQ_DA) += cpqarray.o
obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o
diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
new file mode 100644
index 000000000000..eabf4a8d0085
--- /dev/null
+++ b/drivers/block/pmem.c
@@ -0,0 +1,262 @@
+/*
+ * Persistent Memory Driver
+ *
+ * Copyright (c) 2014, Intel Corporation.
+ * Copyright (c) 2015, Christoph Hellwig <hch(a)lst.de>.
+ * Copyright (c) 2015, Boaz Harrosh <boaz(a)plexistor.com>.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <asm/cacheflush.h>
+#include <linux/blkdev.h>
+#include <linux/hdreg.h>
+#include <linux/init.h>
+#include <linux/platform_device.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/slab.h>
+
+#define PMEM_MINORS 16
+
+struct pmem_device {
+ struct request_queue *pmem_queue;
+ struct gendisk *pmem_disk;
+
+ /* One contiguous memory region per device */
+ phys_addr_t phys_addr;
+ void *virt_addr;
+ size_t size;
+};
+
+static int pmem_major;
+static atomic_t pmem_index;
+
+static void pmem_do_bvec(struct pmem_device *pmem, struct page *page,
+ unsigned int len, unsigned int off, int rw,
+ sector_t sector)
+{
+ void *mem = kmap_atomic(page);
+ size_t pmem_off = sector << 9;
+
+ if (rw == READ) {
+ memcpy(mem + off, pmem->virt_addr + pmem_off, len);
+ flush_dcache_page(page);
+ } else {
+ flush_dcache_page(page);
+ memcpy(pmem->virt_addr + pmem_off, mem + off, len);
+ }
+
+ kunmap_atomic(mem);
+}
+
+static void pmem_make_request(struct request_queue *q, struct bio *bio)
+{
+ struct block_device *bdev = bio->bi_bdev;
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+ int rw;
+ struct bio_vec bvec;
+ sector_t sector;
+ struct bvec_iter iter;
+ int err = 0;
+
+ if (bio_end_sector(bio) > get_capacity(bdev->bd_disk)) {
+ err = -EIO;
+ goto out;
+ }
+
+ BUG_ON(bio->bi_rw & REQ_DISCARD);
+
+ rw = bio_data_dir(bio);
+ sector = bio->bi_iter.bi_sector;
+ bio_for_each_segment(bvec, bio, iter) {
+ pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len, bvec.bv_offset,
+ rw, sector);
+ sector += bvec.bv_len >> 9;
+ }
+
+out:
+ bio_endio(bio, err);
+}
+
+static int pmem_rw_page(struct block_device *bdev, sector_t sector,
+ struct page *page, int rw)
+{
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+
+ pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector);
+ page_endio(page, rw & WRITE, 0);
+
+ return 0;
+}
+
+static long pmem_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, unsigned long *pfn, long size)
+{
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+ size_t offset = sector << 9;
+
+ if (!pmem)
+ return -ENODEV;
+
+ *kaddr = pmem->virt_addr + offset;
+ *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
+
+ return pmem->size - offset;
+}
+
+static const struct block_device_operations pmem_fops = {
+ .owner = THIS_MODULE,
+ .rw_page = pmem_rw_page,
+ .direct_access = pmem_direct_access,
+};
+
+static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
+{
+ struct pmem_device *pmem;
+ struct gendisk *disk;
+ int idx, err;
+
+ err = -ENOMEM;
+ pmem = kzalloc(sizeof(*pmem), GFP_KERNEL);
+ if (!pmem)
+ goto out;
+
+ pmem->phys_addr = res->start;
+ pmem->size = resource_size(res);
+
+ err = -EINVAL;
+ if (!request_mem_region(pmem->phys_addr, pmem->size, "pmem")) {
+ dev_warn(dev, "could not reserve region [0x%pa:0x%zx]\n", &pmem->phys_addr, pmem->size);
+ goto out_free_dev;
+ }
+
+ /*
+ * Map the memory as non-cachable, as we can't write back the contents
+ * of the CPU caches in case of a crash.
+ */
+ err = -ENOMEM;
+ pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
+ if (!pmem->virt_addr)
+ goto out_release_region;
+
+ pmem->pmem_queue = blk_alloc_queue(GFP_KERNEL);
+ if (!pmem->pmem_queue)
+ goto out_unmap;
+
+ blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
+ blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
+ blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
+
+ disk = alloc_disk(PMEM_MINORS);
+ if (!disk)
+ goto out_free_queue;
+
+ idx = atomic_inc_return(&pmem_index) - 1;
+
+ disk->major = pmem_major;
+ disk->first_minor = PMEM_MINORS * idx;
+ disk->fops = &pmem_fops;
+ disk->private_data = pmem;
+ disk->queue = pmem->pmem_queue;
+ disk->flags = GENHD_FL_EXT_DEVT;
+ sprintf(disk->disk_name, "pmem%d", idx);
+ disk->driverfs_dev = dev;
+ set_capacity(disk, pmem->size >> 9);
+ pmem->pmem_disk = disk;
+
+ add_disk(disk);
+
+ return pmem;
+
+out_free_queue:
+ blk_cleanup_queue(pmem->pmem_queue);
+out_unmap:
+ iounmap(pmem->virt_addr);
+out_release_region:
+ release_mem_region(pmem->phys_addr, pmem->size);
+out_free_dev:
+ kfree(pmem);
+out:
+ return ERR_PTR(err);
+}
+
+static void pmem_free(struct pmem_device *pmem)
+{
+ del_gendisk(pmem->pmem_disk);
+ put_disk(pmem->pmem_disk);
+ blk_cleanup_queue(pmem->pmem_queue);
+ iounmap(pmem->virt_addr);
+ release_mem_region(pmem->phys_addr, pmem->size);
+ kfree(pmem);
+}
+
+static int pmem_probe(struct platform_device *pdev)
+{
+ struct pmem_device *pmem;
+ struct resource *res;
+
+ if (WARN_ON(pdev->num_resources > 1))
+ return -ENXIO;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -ENXIO;
+
+ pmem = pmem_alloc(&pdev->dev, res);
+ if (IS_ERR(pmem))
+ return PTR_ERR(pmem);
+
+ platform_set_drvdata(pdev, pmem);
+
+ return 0;
+}
+
+static int pmem_remove(struct platform_device *pdev)
+{
+ struct pmem_device *pmem = platform_get_drvdata(pdev);
+
+ pmem_free(pmem);
+ return 0;
+}
+
+static struct platform_driver pmem_driver = {
+ .probe = pmem_probe,
+ .remove = pmem_remove,
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "pmem",
+ },
+};
+
+static int __init pmem_init(void)
+{
+ int error;
+
+ pmem_major = register_blkdev(0, "pmem");
+ if (pmem_major < 0)
+ return pmem_major;
+
+ error = platform_driver_register(&pmem_driver);
+ if (error)
+ unregister_blkdev(pmem_major, "pmem");
+ return error;
+}
+module_init(pmem_init);
+
+static void pmem_exit(void)
+{
+ platform_driver_unregister(&pmem_driver);
+ unregister_blkdev(pmem_major, "pmem");
+}
+module_exit(pmem_exit);
+
+MODULE_AUTHOR("Ross Zwisler <ross.zwisler(a)linux.intel.com>");
+MODULE_LICENSE("GPL v2");
5 years, 9 months
[PATCH v3 00/21] libnd: non-volatile memory device support
by Dan Williams
Changes since v2 [1]:
1/ Rebase on the ACPICA enabling for the NFIT data structures. The
ACPICA project owns the definition of ACPI data structures in
include/acpi/. This release incorporates the NFIT and UUID definitions
from ACPICA release R05_15_15 [2]. (Rafael, Bob)
2/ Move the ACPI NFIT driver to drivers/acpi/ (Rafael)
3/ Include documentation of the overall subsystem (Rafael)
4/ Arrange for stable block device names in the case where the platform
configuration has not changed (Toshi and Robert)
5/ Move test infrastructure to the end of the series (Jeff)
6/ Fix up the Kconfig text for CONFIG_ND_BLK to be more descriptive
(Andy)
7/ Report and continue upon detecting unknown NFIT tables rather than
failing (Jeff)
8/ Rename the namespace 'type' attribute to 'nstype' so that lsblk does
not mistake libnd block devices for scsi disks. (Robert and Christoph)
9/ Convert nd_region_{acquire|release}_lane() to user percpu variable
infrastructure (Ross)
Thanks for all of the review!
Note, there are incremental changes to address caching, persistent
flushing, queue flags, and expanded sector size support that are
deferred until this base support is cleared to merge.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000574.html
[2]: https://github.com/acpica/acpica/tree/R05_15_15
Here is the diffstat relative to v2:
Documentation/blockdev/libnd.txt | 804 ++++++++++++++++++++++++++++
MAINTAINERS | 39 +-
arch/ia64/kernel/efi.c | 2 +-
arch/x86/kernel/e820.c | 4 +-
drivers/acpi/Kconfig | 27 +
drivers/acpi/Makefile | 1 +
drivers/{block/nd/acpi.c => acpi/nfit.c} | 581 ++++++++++----------
drivers/acpi/nfit.h | 160 ++++++
drivers/block/Kconfig | 8 +
drivers/block/Makefile | 1 +
drivers/block/{nd/e820.c => e820_pmem.c} | 32 +-
drivers/block/nd/Kconfig | 72 +--
drivers/block/nd/Makefile | 12 -
drivers/block/nd/acpi_nfit.h | 321 -----------
drivers/block/nd/blk.c | 20 +-
drivers/block/nd/btt.c | 59 +-
drivers/block/nd/btt.h | 7 +-
drivers/block/nd/btt_devs.c | 2 +-
drivers/block/nd/bus.c | 2 +-
drivers/block/nd/core.c | 9 +-
drivers/block/nd/dimm_devs.c | 9 +
drivers/block/nd/label.c | 23 +-
drivers/block/nd/namespace_devs.c | 14 +-
drivers/block/nd/nd-private.h | 5 +-
drivers/block/nd/nd.h | 18 +-
drivers/block/nd/pmem.c | 25 +-
drivers/block/nd/region.c | 64 ++-
drivers/block/nd/region_devs.c | 54 +-
drivers/block/nd/test/nfit.c | 794 ++++++++++++++-------------
drivers/block/nd/test/nfit_test.h | 2 +
include/acpi/actbl1.h | 154 ++++++
include/acpi/acuuid.h | 89 +++
{drivers/block/nd => include/linux}/libnd.h | 21 +-
33 files changed, 2202 insertions(+), 1233 deletions(-)
create mode 100644 Documentation/blockdev/libnd.txt
rename drivers/{block/nd/acpi.c => acpi/nfit.c} (69%)
create mode 100644 drivers/acpi/nfit.h
rename drivers/block/{nd/e820.c => e820_pmem.c} (69%)
delete mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 include/acpi/acuuid.h
rename {drivers/block/nd => include/linux}/libnd.h (81%)
The libndctl changes for these updates are available in ndctl.git:
https://github.com/pmem/ndctl
For this set to move forward it needs acks from ACPI and BLOCK layer
developers. I am assuming this will ultimately go upstream via the
block tree. A branch in nvdimm.git will be prepared at the end of the
week to give the pending acks some time to land. Additional feedback
welcome, and hopefully it can be addressed incrementally from this
baseline going forward, i.e. aiming for inclusion in -next and no more
rebases before the 4.2 merge window opens.
---
Dan Williams (18):
e820, efi: add ACPI 6.0 persistent memory types
libnd, nfit: initial libnd infrastructure and NFIT support
libnd: control character device and libnd bus sysfs attributes
libnd, nfit: dimm/memory-devices
libnd: control (ioctl) messages for libnd bus and dimm devices
libnd, nd_dimm: dimm driver and base libnd device-driver infrastructure
libnd, nfit: regions (block-data-window, persistent memory, volatile memory)
libnd: support for legacy (non-aliasing) nvdimms
libnd, nd_pmem: add libnd support to the pmem driver
libnd, nfit: add interleave-set state-tracking infrastructure
libnd: namespace indices: read and validate
libnd: pmem label sets and namespace instantiation.
libnd: blk labels and namespace instantiation
libnd: write pmem label set
libnd: write blk label set
libnd: infrastructure for btt devices
nfit-test: manufactured NFITs for interface development
libnd: Non-Volatile Devices
Ross Zwisler (2):
pmem: Dynamically allocate partition numbers
libnd, nfit, nd_blk: driver for BLK-mode access persistent memory
Vishal Verma (1):
nd_btt: atomic sector updates
Documentation/blockdev/btt.txt | 273 ++++++
Documentation/blockdev/libnd.txt | 804 +++++++++++++++++
MAINTAINERS | 39 +
arch/arm64/kernel/efi.c | 1
arch/ia64/kernel/efi.c | 4
arch/x86/boot/compressed/eboot.c | 4
arch/x86/include/uapi/asm/e820.h | 1
arch/x86/kernel/e820.c | 28 +
arch/x86/kernel/pmem.c | 2
arch/x86/platform/efi/efi.c | 3
drivers/acpi/Kconfig | 27 +
drivers/acpi/Makefile | 1
drivers/acpi/nfit.c | 1474 ++++++++++++++++++++++++++++++++
drivers/acpi/nfit.h | 160 +++
drivers/block/Kconfig | 21
drivers/block/Makefile | 3
drivers/block/e820_pmem.c | 100 ++
drivers/block/nd/Kconfig | 91 ++
drivers/block/nd/Makefile | 29 +
drivers/block/nd/blk.c | 252 +++++
drivers/block/nd/btt.c | 1438 +++++++++++++++++++++++++++++++
drivers/block/nd/btt.h | 186 ++++
drivers/block/nd/btt_devs.c | 443 ++++++++++
drivers/block/nd/bus.c | 770 +++++++++++++++++
drivers/block/nd/core.c | 472 ++++++++++
drivers/block/nd/dimm.c | 115 +++
drivers/block/nd/dimm_devs.c | 516 +++++++++++
drivers/block/nd/label.c | 922 ++++++++++++++++++++
drivers/block/nd/label.h | 143 +++
drivers/block/nd/namespace_devs.c | 1701 +++++++++++++++++++++++++++++++++++++
drivers/block/nd/nd-private.h | 111 ++
drivers/block/nd/nd.h | 257 ++++++
drivers/block/nd/pmem.c | 107 ++
drivers/block/nd/region.c | 189 ++++
drivers/block/nd/region_devs.c | 667 +++++++++++++++
drivers/block/nd/test/Makefile | 5
drivers/block/nd/test/iomap.c | 151 +++
drivers/block/nd/test/nfit.c | 1171 +++++++++++++++++++++++++
drivers/block/nd/test/nfit_test.h | 28 +
include/acpi/actbl1.h | 154 +++
include/acpi/acuuid.h | 89 ++
include/linux/efi.h | 3
include/linux/libnd.h | 129 +++
include/linux/nd.h | 98 ++
include/uapi/linux/Kbuild | 1
include/uapi/linux/ndctl.h | 199 ++++
46 files changed, 13324 insertions(+), 58 deletions(-)
create mode 100644 Documentation/blockdev/btt.txt
create mode 100644 Documentation/blockdev/libnd.txt
create mode 100644 drivers/acpi/nfit.c
create mode 100644 drivers/acpi/nfit.h
create mode 100644 drivers/block/e820_pmem.c
create mode 100644 drivers/block/nd/Kconfig
create mode 100644 drivers/block/nd/Makefile
create mode 100644 drivers/block/nd/blk.c
create mode 100644 drivers/block/nd/btt.c
create mode 100644 drivers/block/nd/btt.h
create mode 100644 drivers/block/nd/btt_devs.c
create mode 100644 drivers/block/nd/bus.c
create mode 100644 drivers/block/nd/core.c
create mode 100644 drivers/block/nd/dimm.c
create mode 100644 drivers/block/nd/dimm_devs.c
create mode 100644 drivers/block/nd/label.c
create mode 100644 drivers/block/nd/label.h
create mode 100644 drivers/block/nd/namespace_devs.c
create mode 100644 drivers/block/nd/nd-private.h
create mode 100644 drivers/block/nd/nd.h
rename drivers/block/{pmem.c => nd/pmem.c} (70%)
create mode 100644 drivers/block/nd/region.c
create mode 100644 drivers/block/nd/region_devs.c
create mode 100644 drivers/block/nd/test/Makefile
create mode 100644 drivers/block/nd/test/iomap.c
create mode 100644 drivers/block/nd/test/nfit.c
create mode 100644 drivers/block/nd/test/nfit_test.h
create mode 100644 include/acpi/acuuid.h
create mode 100644 include/linux/libnd.h
create mode 100644 include/linux/nd.h
create mode 100644 include/uapi/linux/ndctl.h
5 years, 9 months