[PATCH 0/3] fs, bdev: handle end of life
by Dan Williams
As mentioned in [PATCH 1/3] "block, fs: reliably communicate bdev
end-of-life", historically we have waited for filesystem specific
heuristics to attempt to guess when a block device is gone. Sometimes
this works, but in other cases the system can hang waiting for the fs to
trigger its shutdown protocol.
Now with DAX we need new actions, like unmapping all inodes, to be taken
upon a shutdown event. Those actions need to be taken whether the
shutdown event comes from the block device being torn down, or some
other file system specific event.
For now, the approach taken in the following patches only affects xfs
and block drivers that are converted to use del_gendisk_queue(). We can
add more filesystems and driver support over time.
Note that 'bdi_gone' was chosen over 'shutdown' so as not to be confused
with generic_shutdown_super()
---
Dan Williams (3):
block, fs: reliably communicate bdev end-of-life
xfs: handle shutdown notifications
writeback: fix false positive WARN in __mark_inode_dirty
block/genhd.c | 87 +++++++++++++++++++++++++++++++++++-------
drivers/block/brd.c | 3 -
drivers/nvdimm/pmem.c | 3 -
drivers/s390/block/dcssblk.c | 6 +--
fs/block_dev.c | 79 +++++++++++++++++++++++++++++++++-----
fs/xfs/xfs_super.c | 9 ++++
include/linux/fs.h | 4 ++
include/linux/genhd.h | 1
mm/backing-dev.c | 7 +++
9 files changed, 166 insertions(+), 33 deletions(-)
5 years
[PATCHV4 0/3] Machine check recovery when kernel accesses poison
by Tony Luck
Ingo: I think I have fixed up everything to make all the people who
commented happy. Do you have any further suggestions, or is this ready
to go into the tip tree?
This series is initially targeted at the folks doing filesystems
on top of NVDIMMs. They really want to be able to return -EIO
when there is a h/w error (just like spinning rust, and SSD does).
I plan to use the same infrastructure in parts 1&2 to write a
machine check aware "copy_from_user()" that will SIGBUS the
calling application when a syscall touches poison in user space
(just like we do when the application touches the poison itself).
Changes V3-V4:
Andy: Simplify fixup_mcexception() by dropping used-once local variable
Andy: "Reviewed-by" tag added to part1
Boris: Moved new functions to memcpy_64.S and declaration to asm/string_64.h
Boris: Changed name s/mcsafe_memcpy/__mcsafe_copy/ to make it clear that this
is an internal function and that return value doesn't follow memcpy() semantics.
Boris: "Reviewed-by" tag added to parts 1&2
Changes V2-V3:
Andy: Don't hack "regs->ax = BIT(63) | addr;" in the machine check
handler. Now have better fixup code that computes the number
of remaining bytes (just like page-fault fixup).
Andy: #define for BIT(63). Done, plus couple of extra macros using it.
Boris: Don't clutter up generic code (like mm/extable.c) with this.
I moved everything under arch/x86 (the asm-generic change is
a more generic #define).
Boris: Dependencies for CONFIG_MCE_KERNEL_RECOVERY are too generic.
I made it a real menu item with default "n". Dan Williams
will use "select MCE_KERNEL_RECOVERY" from his persistent
filesystem code.
Boris: Simplify conditionals in mce.c by moving tolerant/kill_it
checks earlier, with a skip to end if they aren't set.
Boris: Miscellaneous grammar/punctuation. Fixed.
Boris: Don't leak spurious __start_mcextable symbols into kernels
that didn't configure MCE_KERNEL_RECOVERY. Done.
Tony: New code doesn't belong in user_copy_64.S/uaccess*.h. Moved
to new .S/.h files
Elliott:Cacheing behavior non-optimal. Could use movntdqa, vmovntdqa
or vmovntdqa on source addresses. I didn't fix this yet. Think
of the current mcsafe_memcpy() as the first of several functions.
This one is useful for small copies (meta-data) where the overhead
of saving SSE/AVX state isn't justified.
Changes V1->V2:
0-day: Reported build errors and warnings on 32-bit systems. Fixed
0-day: Reported bloat to tinyconfig. Fixed
Boris: Suggestions to use extra macros to reduce code duplication in _ASM_*EXTABLE. Done
Boris: Re-write "tolerant==3" check to reduce indentation level. See below.
Andy: Check IP is valid before searching kernel exception tables. Done.
Andy: Explain use of BIT(63) on return value from mcsafe_memcpy(). Done (added decode macros).
Andy: Untangle mess of code in tail of do_machine_check() to make it
clear what is going on (e.g. that we only enter the ist_begin_non_atomic()
if we were called from user code, not from kernel!). Done.
Tony Luck (3):
x86, ras: Add new infrastructure for machine check fixup tables
x86, ras: Extend machine check recovery code to annotated ring0 areas
x86, ras: Add __mcsafe_copy() function to recover from machine checks
arch/x86/Kconfig | 10 +++
arch/x86/include/asm/asm.h | 10 ++-
arch/x86/include/asm/mce.h | 14 ++++
arch/x86/include/asm/string_64.h | 8 ++
arch/x86/kernel/cpu/mcheck/mce-severity.c | 21 ++++-
arch/x86/kernel/cpu/mcheck/mce.c | 86 +++++++++++--------
arch/x86/kernel/vmlinux.lds.S | 6 +-
arch/x86/kernel/x8664_ksyms_64.c | 4 +
arch/x86/lib/memcpy_64.S | 133 ++++++++++++++++++++++++++++++
arch/x86/mm/extable.c | 16 ++++
include/asm-generic/vmlinux.lds.h | 12 +--
11 files changed, 276 insertions(+), 44 deletions(-)
--
2.1.4
5 years
re: 40 PR8-9 contextual and dofollow
by improve.alexa.ranks@mg-dot.cn
Manual Backlinks with log in details. Have full control on your
backlinks.
- 40 PR 8-9 Backlinks from authority sites
- Permanent One-way Links
- Anchor Text + Ping
- All domains unique
- Completed in 5-7 business days
- Detailed Submission Report
http://www.mg-dot.cn/detail.php?id=122
Unsubscribe option is available on the footer of our website
5 years
re: 40 PR8-9 contextual and dofollow
by improve.alexa.ranks@mg-dot.cn
Manual Backlinks with log in details. Have full control on your
backlinks.
- 40 PR 8-9 Backlinks from authority sites
- Permanent One-way Links
- Anchor Text + Ping
- All domains unique
- Completed in 5-7 business days
- Detailed Submission Report
http://www.mg-dot.cn/detail.php?id=122
Unsubscribe option is available on the footer of our website
5 years
[-mm PATCH v4 00/18] get_user_pages() for dax pte and pmd mappings
by Dan Williams
Changes since v3 [1]:
1/ Minimize the impact of the modifications to get_page() by moving
zone_device manipulations out of line and marking them unlikely(). In
v3 a simple function like:
get_page(page);
do_something_with_page(page);
put_page(page);
...had a text size of 672 bytes. That is now down to 289 bytes,
compared to the pre-patch baseline size of 267 bytes. Disassembly shows
that aside from conditional branch on the page zone number, data which
should already be dcache hot, there is no icache impact in the typical
path. (Andrew, Dave Hansen)
2/ Minimize the impact to mm.h by moving ~200 lines of definitions to
pfn_t.h and memremap.h. (Andrew)
3/ Move struct vmem_altmap helper routines to the only C file that
consumes them. (Andrew)
4/ Clean up definitions of pfn_pte, pfn_pmd, pte_devmap, and pmd_devmap
to have proper dependencies on CONFIG_MMU and
CONFIG_TRANSPARENT_HUGEPAGE to avoid the need to touch arch headers
outside of x86.
5/ Skip registering 'memory block' sysfs devices for zone_device ranges
since they are not normal memory and are not eligible to be 'onlined'.
6/ Improve the diagnostic debug messages in fs/dax.c to include
buffer_head details. (Willy)
These replace the following 18 patches:
kvm-rename-pfn_t-to-kvm_pfn_t.patch..dax-re-enable-dax-pmd-mappings.patch
...in the current -mm series, the other 7 patches from v3 are
unmodified. They have received a build success notification from the
kbuild robot over 108 configs.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-December/003370.html
---
Original summary:
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
---
Dan Williams (18):
kvm: rename pfn_t to kvm_pfn_t
mm, dax, pmem: introduce pfn_t
mm: skip memory block registration for ZONE_DEVICE
mm: introduce find_dev_pagemap()
x86, mm: introduce vmem_altmap to augment vmemmap_populate()
libnvdimm, pfn, pmem: allocate memmap array in persistent memory
avr32: convert to asm-generic/memory_model.h
hugetlb: fix compile error on tile
frv: fix compiler warning from definition of __pmd()
x86, mm: introduce _PAGE_DEVMAP
mm, dax, gpu: convert vm_insert_mixed to pfn_t
mm, dax: convert vmf_insert_pfn_pmd() to pfn_t
libnvdimm, pmem: move request_queue allocation earlier in probe
mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd
mm, x86: get_user_pages() for dax mappings
dax: provide diagnostics for pmd mapping failures
dax: re-enable dax pmd mappings
arch/arm/include/asm/kvm_mmu.h | 5 -
arch/arm/kvm/mmu.c | 10 +
arch/arm64/include/asm/kvm_mmu.h | 3
arch/avr32/include/asm/page.h | 8 +
arch/frv/include/asm/page.h | 2
arch/ia64/include/asm/page.h | 1
arch/mips/include/asm/kvm_host.h | 6 -
arch/mips/kvm/emulate.c | 2
arch/mips/kvm/tlb.c | 14 +-
arch/powerpc/include/asm/kvm_book3s.h | 4 -
arch/powerpc/include/asm/kvm_ppc.h | 2
arch/powerpc/kvm/book3s.c | 6 -
arch/powerpc/kvm/book3s_32_mmu_host.c | 2
arch/powerpc/kvm/book3s_64_mmu_host.c | 2
arch/powerpc/kvm/e500.h | 2
arch/powerpc/kvm/e500_mmu_host.c | 8 +
arch/powerpc/kvm/trace_pr.h | 2
arch/powerpc/sysdev/axonram.c | 9 +
arch/x86/include/asm/pgtable.h | 26 +++-
arch/x86/include/asm/pgtable_types.h | 7 +
arch/x86/kvm/iommu.c | 11 +-
arch/x86/kvm/mmu.c | 37 +++--
arch/x86/kvm/mmu_audit.c | 2
arch/x86/kvm/paging_tmpl.h | 6 -
arch/x86/kvm/vmx.c | 2
arch/x86/kvm/x86.c | 2
arch/x86/mm/gup.c | 57 +++++++-
arch/x86/mm/init_64.c | 33 ++++-
arch/x86/mm/pat.c | 5 -
drivers/base/memory.c | 13 ++
drivers/block/brd.c | 7 +
drivers/gpu/drm/exynos/exynos_drm_gem.c | 4 -
drivers/gpu/drm/gma500/framebuffer.c | 4 -
drivers/gpu/drm/msm/msm_gem.c | 4 -
drivers/gpu/drm/omapdrm/omap_gem.c | 7 +
drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 -
drivers/nvdimm/pfn_devs.c | 3
drivers/nvdimm/pmem.c | 73 +++++++---
drivers/s390/block/dcssblk.c | 11 +-
fs/Kconfig | 3
fs/dax.c | 76 ++++++++--
include/asm-generic/pgtable.h | 6 +
include/linux/blkdev.h | 5 -
include/linux/huge_mm.h | 15 ++
include/linux/hugetlb.h | 1
include/linux/io.h | 15 --
include/linux/kvm_host.h | 37 +++--
include/linux/kvm_types.h | 2
include/linux/list.h | 12 ++
include/linux/memory_hotplug.h | 3
include/linux/memremap.h | 114 ++++++++++++++++
include/linux/mm.h | 72 ++++++++--
include/linux/mm_types.h | 5 +
include/linux/pfn.h | 9 +
include/linux/pfn_t.h | 102 ++++++++++++++
kernel/memremap.c | 227 ++++++++++++++++++++++++++++++-
lib/list_debug.c | 9 +
mm/gup.c | 19 ++-
mm/huge_memory.c | 119 ++++++++++++----
mm/memory.c | 26 ++--
mm/memory_hotplug.c | 67 +++++++--
mm/mprotect.c | 5 -
mm/page_alloc.c | 11 +-
mm/pgtable-generic.c | 2
mm/sparse-vmemmap.c | 76 ++++++++++
mm/sparse.c | 8 +
mm/swap.c | 3
virt/kvm/kvm_main.c | 47 +++---
68 files changed, 1204 insertions(+), 298 deletions(-)
create mode 100644 include/linux/memremap.h
create mode 100644 include/linux/pfn_t.h
5 years
You have new fax, document 000759174
by Interfax
You have received a new fax.
Please, download fax document attached to this email.
Author: Lewis Knowles
File name: scan000759174.doc
File size: 204 Kb
Processed in: 36 seconds
Scan quality: 500 DPI
Pages: 12
Date: Mon, 28 Dec 2015 11:30:10 +0300
Thank you for using Interfax!
5 years
RE: Doubts of the NVDIMM implement
by Wu, Bob
Thanks, Williams.
And I still want to get more information about the 3D-XPoint technology:
Does Intel NVDIMM based on 3D-XPoint has firmware inside to handle wear-leveling issue?
or need user space software to take care of it?
Can anybody share some information and documents with me, Thanks!
Thanks,
Bob
-----Original Message-----
From: Dan Williams [mailto:dan.j.williams@intel.com]
Sent: 2015年12月25日 3:54
To: Wu, Bob
Subject: Re: Doubts of the NVDIMM implement
The Linux implementation is generic and solely based on the public NFIT definition in ACPI 6.0. It supports NVDIMMs from any vendor that implements the NFIT table. Wear-leveling is out of scope as far as the ACPI specification is concerned.
http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
On Wed, Dec 23, 2015 at 9:09 PM, Wu, Bob <Bob.Wu(a)emc.com> wrote:
> Hi Williams
>
>
>
> Sorry to interrupt you, can I ask you some questions about NVDIMM.
>
>
>
> I’m a software engineer of EMC, my team has a plan to replace NVRAM by
> NVDIMM next year,
>
> Now I’m digging into the NVDIMM stuff.
>
> My first question is does NVDIMM need wear-leveling software guarantee?
>
> Although the durability is much better than NAND, about 1000 times,
> but if a software is writing
>
> into a same address for 300000 times, the NVDIMM also will have bad
> “blocks”.
>
>
>
> And if need wear-leveling, which layer should implement it? The
> product firmware or the driver or the host software?
>
>
>
> Thanks,
>
> Bob
>
>
5 years