[PATCH v2 0/5] ext4: DAX data corruption fixes
by Ross Zwisler
This series prevents a pair of data corruptions with ext4 + DAX. The first
such corruption happens when combining the inline data feature with DAX,
and the second happens when combining data journaling with DAX.
Both can be reliably reproduced with the fstests that I have posted here:
https://patchwork.kernel.org/patch/9948377/
https://patchwork.kernel.org/patch/9948381/
My opinion is that the first three patches in this series should be applied
to the v4.14 RC series and backported to stable. The last two patches in
this series are just cleanup and can probably wait until v4.15.
Ross Zwisler (5):
ext4: prevent data corruption with inline data + DAX
ext4: prevent data corruption with journaling + DAX
ext4: add sanity check for encryption + DAX
ext4: add ext4_should_use_dax()
ext4: remove duplicate extended attributes defs
fs/ext4/ext4.h | 37 -------------------------------------
fs/ext4/inline.c | 10 ----------
fs/ext4/inode.c | 24 ++++++++++++++++--------
fs/ext4/ioctl.c | 16 +++++++++++++---
fs/ext4/super.c | 8 ++++++++
5 files changed, 37 insertions(+), 58 deletions(-)
--
2.9.5
4 years, 7 months
[RFC] KVM "fake DAX" device flushing
by Pankaj Gupta
We are sharing the prototype version of 'fake DAX' flushing
interface for the initial feedback. This is still work in progress
and not yet ready for merging.
Protoype right now just implements basic functionality without advanced
features with two major parts:
- Qemu virtio-pmem device
It exposes a persistent memory range to KVM guest which at host side is file
backed memory and works as persistent memory device. In addition to this it
provides a virtio flushing interface for KVM guest to do a Qemu side sync for
guest DAX persistent memory range.
- Guest virtio-pmem driver
Reads persistent memory range from paravirt device and reserves system memory map.
It also allocates a block device corresponding to the pmem range which is accessed
by DAX capable file systems. (file system support is still pending).
We shared the project idea for 'fake DAX' flushing interface here [1].
Based on suggestions here [2], we implemented guest 'virtio-pmem'
driver and Qemu paravirt device.
[1] https://www.spinics.net/lists/kvm/msg149761.html
[2] https://www.spinics.net/lists/kvm/msg153095.html
Work yet to be done:
- Separate out the common code used by ACPI pmem interface and
reuse it.
- In pmem device memmap allocation and working. There is some parallel work
going on upstream related to 'memory_hotplug restructuring' [3] and also hitting
a memory section alignment issue [4].
[3] https://lwn.net/Articles/712099/
[4] https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg02978.html
- Provide DAX capable file-system(ext4 & XFS) support.
- Qemu device flush functionality.
- Qemu live migration work when host page cache is used.
- Multiple virtio-pmem disks support.
Prototype implementation for feedback:
Kernel: https://github.com/pagupta/linux/commit/d15cf90074eae91aeed7a228da3faf319...
Qemu : https://github.com/pagupta/qemu/commit/9c428db1e1076970e097e2b0ef8afe5250...
Please provide feedback. Also, I would be attending KVM Forum in Prague from (25-27 Oct).
If you are attending KVM forum/Linux conference, I would love to have a discussion on ideas
and future work.
Thank you,
Pankaj Gupta
4 years, 7 months
nfit test deadlock
by Ross Zwisler
Hey Dan,
I was getting the ndctl unit tests working again in my setup today, and on the
first run of ndctl's "make check" hit a deadlock. This seems to be very easy
to reproduce, all you have to do is specify a number of jobs to make that is
larger than 1 (which I was accidentally doing via an alias),
i.e. "make -j32 check"
This seems to reproduce 100% of the time.
I'll append the ouptut of "echo w > /proc/sysrq-trigger" to the end of this
mail.
I was using v4.13 and ndctl 58.2.
- Ross
---
[ 132.668043] sysrq: SysRq : Show Blocked State
[ 132.668968] task PC stack pid father
[ 132.670774] lt-libndctl D 0 5991 5983 0x00000004
[ 132.672102] Call Trace:
[ 132.672744] __schedule+0x411/0xb10
[ 132.673266] ? trace_hardirqs_on+0xd/0x10
[ 132.674058] schedule+0x40/0x90
[ 132.674545] __kernfs_remove+0x1f9/0x310
[ 132.675298] ? remove_wait_queue+0x70/0x70
[ 132.676046] kernfs_remove_by_name_ns+0x45/0x90
[ 132.676848] remove_files.isra.1+0x35/0x70
[ 132.677451] sysfs_remove_group+0x44/0x90
[ 132.678259] sysfs_remove_groups+0x2e/0x50
[ 132.679047] device_remove_attrs+0x4d/0x80
[ 132.679438] device_del+0x1ec/0x330
[ 132.679888] device_unregister+0x1a/0x60
[ 132.680266] nvdimm_bus_unregister+0x17/0x20 [libnvdimm]
[ 132.680876] acpi_nfit_unregister+0x15/0x20 [nfit]
[ 132.681329] devm_action_release+0xf/0x20
[ 132.681835] release_nodes+0x16d/0x2b0
[ 132.682196] devres_release_all+0x3c/0x50
[ 132.682573] device_release_driver_internal+0x175/0x220
[ 132.683231] device_release_driver+0x12/0x20
[ 132.683715] bus_remove_device+0x100/0x180
[ 132.684102] device_del+0x1f4/0x330
[ 132.684428] platform_device_del+0x28/0x90
[ 132.684967] platform_device_unregister+0x12/0x30
[ 132.685412] nfit_test_exit+0x17/0x92f [nfit_test]
[ 132.685980] SyS_delete_module+0x1d8/0x230
[ 132.686369] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.686915] RIP: 0033:0x7f841012b317
[ 132.687255] RSP: 002b:00007fffe5ce0898 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 132.688070] RAX: ffffffffffffffda RBX: 00007f84103e4500 RCX: 00007f841012b317
[ 132.688850] RDX: 00007f84103e5730 RSI: 0000000000000800 RDI: 000000000258ac98
[ 132.689501] RBP: 00007fffe5ce05b0 R08: 00007f8410e19c80 R09: 0000000000000017
[ 132.690257] R10: 000000000000006d R11: 0000000000000206 R12: 0000000000000038
[ 132.690988] R13: 0000000000000001 R14: 0000000000000000 R15: 00000000fbad2887
[ 132.691735] lt-dsm-fail D 0 5995 5986 0x00000004
[ 132.692246] Call Trace:
[ 132.692481] __schedule+0x411/0xb10
[ 132.692972] schedule+0x40/0x90
[ 132.693288] schedule_preempt_disabled+0x18/0x30
[ 132.694083] __mutex_lock+0x487/0xa20
[ 132.694720] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.695452] mutex_lock_nested+0x1b/0x20
[ 132.696245] ? mutex_lock_nested+0x1b/0x20
[ 132.696947] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.697750] ? kernfs_seq_start+0x2f/0x90
[ 132.698302] ? __mutex_lock+0x228/0xa20
[ 132.699077] ? lock_acquire+0xea/0x1f0
[ 132.699698] ? kernfs_seq_start+0x37/0x90
[ 132.700083] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.700529] dev_attr_show+0x20/0x50
[ 132.701022] ? sysfs_file_ops+0x46/0x60
[ 132.701392] sysfs_kf_seq_show+0xb2/0x110
[ 132.701910] kernfs_seq_show+0x27/0x30
[ 132.702271] seq_read+0x103/0x3d0
[ 132.702709] kernfs_fop_read+0x11e/0x190
[ 132.703082] __vfs_read+0x37/0x160
[ 132.703399] ? security_file_permission+0x9e/0xc0
[ 132.704000] vfs_read+0xab/0x150
[ 132.704312] SyS_read+0x58/0xc0
[ 132.704737] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.705295] RIP: 0033:0x7fc0be0d4a80
[ 132.705964] RSP: 002b:00007fff3b5cfd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.707094] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc0be0d4a80
[ 132.708154] RDX: 0000000000000400 RSI: 00007fff3b5cfd80 RDI: 0000000000000004
[ 132.709206] RBP: 00007fff3b5d02a0 R08: 0000000001a3ec00 R09: 0000000000000035
[ 132.709968] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000401620
[ 132.710707] R13: 00007fff3b5d0cd0 R14: 0000000000000000 R15: 0000000000000000
[ 132.711369] lt-parent-uuid D 0 5998 5989 0x00000004
[ 132.711984] Call Trace:
[ 132.712229] __schedule+0x411/0xb10
[ 132.712565] schedule+0x40/0x90
[ 132.713004] schedule_preempt_disabled+0x18/0x30
[ 132.713443] __mutex_lock+0x487/0xa20
[ 132.713891] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.714378] mutex_lock_nested+0x1b/0x20
[ 132.714853] ? mutex_lock_nested+0x1b/0x20
[ 132.715239] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.715818] ? kernfs_seq_start+0x2f/0x90
[ 132.716205] ? __mutex_lock+0x228/0xa20
[ 132.716674] ? lock_acquire+0xea/0x1f0
[ 132.717035] ? kernfs_seq_start+0x37/0x90
[ 132.717412] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.718006] dev_attr_show+0x20/0x50
[ 132.718344] ? sysfs_file_ops+0x46/0x60
[ 132.718818] sysfs_kf_seq_show+0xb2/0x110
[ 132.719204] kernfs_seq_show+0x27/0x30
[ 132.719557] seq_read+0x103/0x3d0
[ 132.720011] kernfs_fop_read+0x11e/0x190
[ 132.720386] __vfs_read+0x37/0x160
[ 132.720826] ? security_file_permission+0x9e/0xc0
[ 132.721267] vfs_read+0xab/0x150
[ 132.721571] SyS_read+0x58/0xc0
[ 132.722072] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.722511] RIP: 0033:0x7f5906882a80
[ 132.722967] RSP: 002b:00007ffc205e7108 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.723749] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5906882a80
[ 132.724410] RDX: 0000000000000400 RSI: 00007ffc205e7180 RDI: 0000000000000004
[ 132.725174] RBP: 00007ffc205e7160 R08: 0000000000808350 R09: 00007f5906f7b88e
[ 132.725909] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000401ac0
[ 132.726899] R13: 00007ffc205e78b0 R14: 0000000000000000 R15: 0000000000000000
[ 132.727997] lt-multi-pmem D 0 6042 6009 0x00000004
[ 132.728930] Call Trace:
[ 132.729307] __schedule+0x411/0xb10
[ 132.729760] schedule+0x40/0x90
[ 132.730067] schedule_preempt_disabled+0x18/0x30
[ 132.730500] __mutex_lock+0x487/0xa20
[ 132.730990] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.731476] mutex_lock_nested+0x1b/0x20
[ 132.731946] ? mutex_lock_nested+0x1b/0x20
[ 132.732331] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.732892] ? kernfs_seq_start+0x2f/0x90
[ 132.733266] ? __mutex_lock+0x228/0xa20
[ 132.733730] ? lock_acquire+0xea/0x1f0
[ 132.734084] ? kernfs_seq_start+0x37/0x90
[ 132.734455] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.735042] dev_attr_show+0x20/0x50
[ 132.735379] ? sysfs_file_ops+0x46/0x60
[ 132.735853] sysfs_kf_seq_show+0xb2/0x110
[ 132.736233] kernfs_seq_show+0x27/0x30
[ 132.736685] seq_read+0x103/0x3d0
[ 132.737009] kernfs_fop_read+0x11e/0x190
[ 132.737375] __vfs_read+0x37/0x160
[ 132.737848] ? security_file_permission+0x9e/0xc0
[ 132.738317] vfs_read+0xab/0x150
[ 132.738935] SyS_read+0x58/0xc0
[ 132.739375] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.740161] RIP: 0033:0x7f8d9d7f9a80
[ 132.740770] RSP: 002b:00007ffedd96f848 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.741829] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8d9d7f9a80
[ 132.742479] RDX: 0000000000000400 RSI: 00007ffedd96f8c0 RDI: 0000000000000004
[ 132.743252] RBP: 00007ffedd96fde0 R08: 0000000000c24870 R09: 0000000000000035
[ 132.743985] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000404b70
[ 132.744719] R13: 00007ffedd970150 R14: 0000000000000000 R15: 0000000000000000
[ 132.745387] lt-pmem-ns D 0 6108 6082 0x00000004
[ 132.746010] Call Trace:
[ 132.746253] __schedule+0x411/0xb10
[ 132.746685] schedule+0x40/0x90
[ 132.746987] schedule_preempt_disabled+0x18/0x30
[ 132.747412] __mutex_lock+0x487/0xa20
[ 132.747902] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.748389] mutex_lock_nested+0x1b/0x20
[ 132.748869] ? mutex_lock_nested+0x1b/0x20
[ 132.749255] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.749824] ? kernfs_seq_start+0x2f/0x90
[ 132.750202] ? __mutex_lock+0x228/0xa20
[ 132.750562] ? lock_acquire+0xea/0x1f0
[ 132.751051] ? kernfs_seq_start+0x37/0x90
[ 132.751433] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.751978] dev_attr_show+0x20/0x50
[ 132.752314] ? sysfs_file_ops+0x46/0x60
[ 132.752785] sysfs_kf_seq_show+0xb2/0x110
[ 132.753164] kernfs_seq_show+0x27/0x30
[ 132.753517] seq_read+0x103/0x3d0
[ 132.753974] kernfs_fop_read+0x11e/0x190
[ 132.754348] __vfs_read+0x37/0x160
[ 132.754781] ? security_file_permission+0x9e/0xc0
[ 132.755272] vfs_read+0xab/0x150
[ 132.755735] SyS_read+0x58/0xc0
[ 132.756184] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.756981] RIP: 0033:0x7fda3d852a80
[ 132.757471] RSP: 002b:00007ffefc56b388 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.758724] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fda3d852a80
[ 132.759657] RDX: 0000000000000400 RSI: 00007ffefc56b400 RDI: 0000000000000004
[ 132.760318] RBP: 00007ffefc56b840 R08: 00000000024c9d90 R09: 0000000000000000
[ 132.761086] R10: 0000000000000055 R11: 0000000000000246 R12: 0000000000401970
[ 132.761817] R13: 00007ffefc56ba90 R14: 0000000000000000 R15: 0000000000000000
[ 132.762475] lt-blk-ns D 0 6235 6203 0x00000004
[ 132.763085] Call Trace:
[ 132.763327] __schedule+0x411/0xb10
[ 132.763769] schedule+0x40/0x90
[ 132.764070] schedule_preempt_disabled+0x18/0x30
[ 132.764495] __mutex_lock+0x487/0xa20
[ 132.764979] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.765468] mutex_lock_nested+0x1b/0x20
[ 132.765936] ? mutex_lock_nested+0x1b/0x20
[ 132.766325] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.766908] ? kernfs_seq_start+0x2f/0x90
[ 132.767284] ? __mutex_lock+0x228/0xa20
[ 132.767753] ? lock_acquire+0xea/0x1f0
[ 132.768109] ? kernfs_seq_start+0x37/0x90
[ 132.768479] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.769055] dev_attr_show+0x20/0x50
[ 132.769391] ? sysfs_file_ops+0x46/0x60
[ 132.769856] sysfs_kf_seq_show+0xb2/0x110
[ 132.770230] kernfs_seq_show+0x27/0x30
[ 132.770682] seq_read+0x103/0x3d0
[ 132.771003] kernfs_fop_read+0x11e/0x190
[ 132.771365] __vfs_read+0x37/0x160
[ 132.771962] ? security_file_permission+0x9e/0xc0
[ 132.772705] vfs_read+0xab/0x150
[ 132.773154] SyS_read+0x58/0xc0
[ 132.773763] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.774386] RIP: 0033:0x7fe3d21a9a80
[ 132.775003] RSP: 002b:00007ffe84450168 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.775780] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe3d21a9a80
[ 132.776430] RDX: 0000000000000400 RSI: 00007ffe844501e0 RDI: 0000000000000004
[ 132.777170] RBP: 00007ffe84450620 R08: 0000000001fd6d90 R09: 0000000000000000
[ 132.777907] R10: 0000000000000055 R11: 0000000000000246 R12: 0000000000401a80
[ 132.778786] R13: 00007ffe84450870 R14: 0000000000000000 R15: 0000000000000000
[ 132.779870] lt-ndctl D 0 6322 6058 0x00000004
[ 132.780672] Call Trace:
[ 132.780914] __schedule+0x411/0xb10
[ 132.781238] schedule+0x40/0x90
[ 132.781543] schedule_preempt_disabled+0x18/0x30
[ 132.782019] __mutex_lock+0x487/0xa20
[ 132.782373] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.782878] mutex_lock_nested+0x1b/0x20
[ 132.783246] ? mutex_lock_nested+0x1b/0x20
[ 132.783648] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.784104] ? retint_kernel+0x2d/0x2d
[ 132.784450] ? trace_hardirqs_on_caller+0xf5/0x190
[ 132.784907] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 132.785336] ? retint_kernel+0x2d/0x2d
[ 132.785715] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.786156] dev_attr_show+0x20/0x50
[ 132.786485] ? sysfs_file_ops+0x46/0x60
[ 132.786860] sysfs_kf_seq_show+0xb2/0x110
[ 132.787236] kernfs_seq_show+0x27/0x30
[ 132.787603] seq_read+0x103/0x3d0
[ 132.787918] kernfs_fop_read+0x11e/0x190
[ 132.788401] __vfs_read+0x37/0x160
[ 132.788748] ? security_file_permission+0x9e/0xc0
[ 132.789182] vfs_read+0xab/0x150
[ 132.789485] SyS_read+0x58/0xc0
[ 132.789802] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.790230] RIP: 0033:0x7fc11000ea80
[ 132.790562] RSP: 002b:00007ffd8aa3b858 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.791253] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc11000ea80
[ 132.791918] RDX: 0000000000000400 RSI: 00007ffd8aa3b8d0 RDI: 0000000000000003
[ 132.792568] RBP: 00007ffd8aa3bd00 R08: 00000000021f2e30 R09: 00007fc110b15c12
[ 132.793221] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.793885] R13: 00007ffd8aa3c050 R14: 0000000000000000 R15: 0000000000000000
[ 132.794540] lt-ndctl D 0 6325 6105 0x00000004
[ 132.795054] Call Trace:
[ 132.795288] __schedule+0x411/0xb10
[ 132.795637] schedule+0x40/0x90
[ 132.795931] schedule_preempt_disabled+0x18/0x30
[ 132.796356] __mutex_lock+0x487/0xa20
[ 132.796716] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.797198] mutex_lock_nested+0x1b/0x20
[ 132.797566] ? mutex_lock_nested+0x1b/0x20
[ 132.797970] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.798434] ? kernfs_seq_start+0x2f/0x90
[ 132.798823] ? __mutex_lock+0x228/0xa20
[ 132.799183] ? lock_acquire+0xea/0x1f0
[ 132.799535] ? kernfs_seq_start+0x37/0x90
[ 132.799921] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.800361] dev_attr_show+0x20/0x50
[ 132.800713] ? sysfs_file_ops+0x46/0x60
[ 132.801069] sysfs_kf_seq_show+0xb2/0x110
[ 132.801437] kernfs_seq_show+0x27/0x30
[ 132.801801] seq_read+0x103/0x3d0
[ 132.802115] kernfs_fop_read+0x11e/0x190
[ 132.802476] __vfs_read+0x37/0x160
[ 132.802813] ? security_file_permission+0x9e/0xc0
[ 132.803245] vfs_read+0xab/0x150
[ 132.803551] SyS_read+0x58/0xc0
[ 132.803861] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.804285] RIP: 0033:0x7f075e511a80
[ 132.804673] RSP: 002b:00007fff66916aa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.805461] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f075e511a80
[ 132.806118] RDX: 0000000000000400 RSI: 00007fff66916b20 RDI: 0000000000000003
[ 132.806786] RBP: 00007fff66916f50 R08: 00000000013dae30 R09: 00007f075f018c12
[ 132.807428] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.808083] R13: 00007fff669172a0 R14: 0000000000000000 R15: 0000000000000000
[ 132.808760] lt-ndctl D 0 6326 6103 0x00000004
[ 132.809265] Call Trace:
[ 132.809502] __schedule+0x411/0xb10
[ 132.809844] schedule+0x40/0x90
[ 132.810140] schedule_preempt_disabled+0x18/0x30
[ 132.810566] __mutex_lock+0x487/0xa20
[ 132.810919] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.811398] mutex_lock_nested+0x1b/0x20
[ 132.811777] ? mutex_lock_nested+0x1b/0x20
[ 132.812157] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.812632] ? kernfs_seq_start+0x2f/0x90
[ 132.813003] ? __mutex_lock+0x228/0xa20
[ 132.813354] ? lock_acquire+0xea/0x1f0
[ 132.813717] ? kernfs_seq_start+0x37/0x90
[ 132.814091] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.814533] dev_attr_show+0x20/0x50
[ 132.814874] ? sysfs_file_ops+0x46/0x60
[ 132.815227] sysfs_kf_seq_show+0xb2/0x110
[ 132.815615] kernfs_seq_show+0x27/0x30
[ 132.815962] seq_read+0x103/0x3d0
[ 132.816270] kernfs_fop_read+0x11e/0x190
[ 132.816652] __vfs_read+0x37/0x160
[ 132.816973] ? security_file_permission+0x9e/0xc0
[ 132.817400] vfs_read+0xab/0x150
[ 132.817722] SyS_read+0x58/0xc0
[ 132.818019] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.818436] RIP: 0033:0x7f8cfd143a80
[ 132.818784] RSP: 002b:00007ffddd621de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.819466] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8cfd143a80
[ 132.820118] RDX: 0000000000000400 RSI: 00007ffddd621e60 RDI: 0000000000000003
[ 132.820776] RBP: 00007ffddd622290 R08: 0000000002400e30 R09: 00007f8cfdc4ac12
[ 132.821418] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.822174] R13: 00007ffddd6225e0 R14: 0000000000000000 R15: 0000000000000000
[ 132.822846] lt-ndctl D 0 6327 6065 0x00000004
[ 132.823351] Call Trace:
[ 132.823606] __schedule+0x411/0xb10
[ 132.823936] schedule+0x40/0x90
[ 132.824227] schedule_preempt_disabled+0x18/0x30
[ 132.824669] __mutex_lock+0x487/0xa20
[ 132.825010] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.825481] mutex_lock_nested+0x1b/0x20
[ 132.825862] ? mutex_lock_nested+0x1b/0x20
[ 132.826241] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.826711] ? kernfs_seq_start+0x2f/0x90
[ 132.827079] ? __mutex_lock+0x228/0xa20
[ 132.827432] ? lock_acquire+0xea/0x1f0
[ 132.827796] ? kernfs_seq_start+0x37/0x90
[ 132.828244] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.828704] dev_attr_show+0x20/0x50
[ 132.829035] ? sysfs_file_ops+0x46/0x60
[ 132.829385] sysfs_kf_seq_show+0xb2/0x110
[ 132.829774] kernfs_seq_show+0x27/0x30
[ 132.830124] seq_read+0x103/0x3d0
[ 132.830435] kernfs_fop_read+0x11e/0x190
[ 132.830818] __vfs_read+0x37/0x160
[ 132.831138] ? security_file_permission+0x9e/0xc0
[ 132.831572] vfs_read+0xab/0x150
[ 132.831893] SyS_read+0x58/0xc0
[ 132.832189] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.832633] RIP: 0033:0x7f5f1bdeda80
[ 132.832964] RSP: 002b:00007fffec6bb078 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.833678] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f5f1bdeda80
[ 132.834324] RDX: 0000000000000400 RSI: 00007fffec6bb0f0 RDI: 0000000000000003
[ 132.834985] RBP: 00007fffec6bb520 R08: 0000000000f31e30 R09: 00007f5f1c8f4c12
[ 132.835648] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.836286] R13: 00007fffec6bb870 R14: 0000000000000000 R15: 0000000000000000
[ 132.836951] lt-ndctl D 0 6328 6186 0x00000004
[ 132.837451] Call Trace:
[ 132.837706] __schedule+0x411/0xb10
[ 132.838034] schedule+0x40/0x90
[ 132.838425] schedule_preempt_disabled+0x18/0x30
[ 132.838867] __mutex_lock+0x487/0xa20
[ 132.839205] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.839696] mutex_lock_nested+0x1b/0x20
[ 132.840055] ? mutex_lock_nested+0x1b/0x20
[ 132.840430] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.840897] ? kernfs_seq_start+0x2f/0x90
[ 132.841265] ? __mutex_lock+0x228/0xa20
[ 132.841640] ? lock_acquire+0xea/0x1f0
[ 132.841986] ? kernfs_seq_start+0x37/0x90
[ 132.842355] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.842812] dev_attr_show+0x20/0x50
[ 132.843141] ? sysfs_file_ops+0x46/0x60
[ 132.843494] sysfs_kf_seq_show+0xb2/0x110
[ 132.843879] kernfs_seq_show+0x27/0x30
[ 132.844225] seq_read+0x103/0x3d0
[ 132.844538] kernfs_fop_read+0x11e/0x190
[ 132.844914] __vfs_read+0x37/0x160
[ 132.845232] ? security_file_permission+0x9e/0xc0
[ 132.845681] vfs_read+0xab/0x150
[ 132.845983] SyS_read+0x58/0xc0
[ 132.846276] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.846713] RIP: 0033:0x7fd70537ca80
[ 132.847046] RSP: 002b:00007ffe1d968be8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.847743] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd70537ca80
[ 132.848380] RDX: 0000000000000400 RSI: 00007ffe1d968c60 RDI: 0000000000000003
[ 132.849043] RBP: 00007ffe1d969090 R08: 00000000020fbe30 R09: 00007fd705e83c12
[ 132.849705] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.850408] R13: 00007ffe1d9693e0 R14: 0000000000000000 R15: 0000000000000000
[ 132.851070] lt-ndctl D 0 6332 6090 0x00000004
[ 132.851590] Call Trace:
[ 132.851824] __schedule+0x411/0xb10
[ 132.852151] schedule+0x40/0x90
[ 132.852444] schedule_preempt_disabled+0x18/0x30
[ 132.852884] __mutex_lock+0x487/0xa20
[ 132.853226] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.853720] mutex_lock_nested+0x1b/0x20
[ 132.854084] ? mutex_lock_nested+0x1b/0x20
[ 132.854499] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.855071] ? kernfs_seq_start+0x2f/0x90
[ 132.855444] ? __mutex_lock+0x228/0xa20
[ 132.855817] ? lock_acquire+0xea/0x1f0
[ 132.856160] ? kernfs_seq_start+0x37/0x90
[ 132.856530] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.856981] dev_attr_show+0x20/0x50
[ 132.857309] ? sysfs_file_ops+0x46/0x60
[ 132.857679] sysfs_kf_seq_show+0xb2/0x110
[ 132.858047] kernfs_seq_show+0x27/0x30
[ 132.858386] seq_read+0x103/0x3d0
[ 132.858715] kernfs_fop_read+0x11e/0x190
[ 132.859075] __vfs_read+0x37/0x160
[ 132.859388] ? security_file_permission+0x9e/0xc0
[ 132.859836] vfs_read+0xab/0x150
[ 132.860138] SyS_read+0x58/0xc0
[ 132.860430] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.860868] RIP: 0033:0x7f00fc1b2a80
[ 132.861198] RSP: 002b:00007ffe8183f8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.861887] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f00fc1b2a80
[ 132.862523] RDX: 0000000000000400 RSI: 00007ffe8183f930 RDI: 0000000000000003
[ 132.863168] RBP: 00007ffe8183fd60 R08: 000000000188ee30 R09: 00007f00fccb9c12
[ 132.863825] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.864459] R13: 00007ffe818400b0 R14: 0000000000000000 R15: 0000000000000000
[ 132.865117] lt-ndctl D 0 6335 6045 0x00000004
[ 132.865634] Call Trace:
[ 132.865867] __schedule+0x411/0xb10
[ 132.866189] schedule+0x40/0x90
[ 132.866476] schedule_preempt_disabled+0x18/0x30
[ 132.866912] __mutex_lock+0x487/0xa20
[ 132.867250] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.867737] mutex_lock_nested+0x1b/0x20
[ 132.868096] ? mutex_lock_nested+0x1b/0x20
[ 132.868467] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.868933] ? kernfs_seq_start+0x2f/0x90
[ 132.869301] ? __mutex_lock+0x228/0xa20
[ 132.869675] ? lock_acquire+0xea/0x1f0
[ 132.870019] ? kernfs_seq_start+0x37/0x90
[ 132.870387] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.870844] dev_attr_show+0x20/0x50
[ 132.871174] ? sysfs_file_ops+0x46/0x60
[ 132.871528] sysfs_kf_seq_show+0xb2/0x110
[ 132.872010] kernfs_seq_show+0x27/0x30
[ 132.872361] seq_read+0x103/0x3d0
[ 132.872693] kernfs_fop_read+0x11e/0x190
[ 132.873058] __vfs_read+0x37/0x160
[ 132.873373] ? security_file_permission+0x9e/0xc0
[ 132.873820] vfs_read+0xab/0x150
[ 132.874123] SyS_read+0x58/0xc0
[ 132.874415] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.874850] RIP: 0033:0x7f7620445a80
[ 132.875179] RSP: 002b:00007ffd0df5f188 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.875871] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7620445a80
[ 132.876510] RDX: 0000000000000400 RSI: 00007ffd0df5f200 RDI: 0000000000000003
[ 132.877153] RBP: 00007ffd0df5f630 R08: 0000000000e8fe30 R09: 00007f7620f4cc12
[ 132.877809] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.878449] R13: 00007ffd0df5f980 R14: 0000000000000000 R15: 0000000000000000
[ 132.879206] lt-ndctl D 0 6343 6039 0x00000004
[ 132.879732] Call Trace:
[ 132.879969] __schedule+0x411/0xb10
[ 132.880292] schedule+0x40/0x90
[ 132.880604] schedule_preempt_disabled+0x18/0x30
[ 132.881028] __mutex_lock+0x487/0xa20
[ 132.881365] ? acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.881858] mutex_lock_nested+0x1b/0x20
[ 132.882221] ? mutex_lock_nested+0x1b/0x20
[ 132.882615] acpi_nfit_flush_probe+0x3a/0x150 [nfit]
[ 132.883071] ? kernfs_seq_start+0x2f/0x90
[ 132.883437] ? __mutex_lock+0x228/0xa20
[ 132.883811] ? lock_acquire+0xea/0x1f0
[ 132.884159] ? kernfs_seq_start+0x37/0x90
[ 132.884533] wait_probe_show+0x25/0x60 [libnvdimm]
[ 132.884985] dev_attr_show+0x20/0x50
[ 132.885317] ? sysfs_file_ops+0x46/0x60
[ 132.885692] sysfs_kf_seq_show+0xb2/0x110
[ 132.886068] kernfs_seq_show+0x27/0x30
[ 132.886411] seq_read+0x103/0x3d0
[ 132.886741] kernfs_fop_read+0x11e/0x190
[ 132.887107] __vfs_read+0x37/0x160
[ 132.887424] ? security_file_permission+0x9e/0xc0
[ 132.887873] vfs_read+0xab/0x150
[ 132.888177] SyS_read+0x58/0xc0
[ 132.888593] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 132.889018] RIP: 0033:0x7f1dc06f3a80
[ 132.889346] RSP: 002b:00007ffdd14888c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 132.890045] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1dc06f3a80
[ 132.890709] RDX: 0000000000000400 RSI: 00007ffdd1488940 RDI: 0000000000000003
[ 132.891352] RBP: 00007ffdd1488d70 R08: 0000000001fb2e30 R09: 00007f1dc11fac12
[ 132.892009] R10: 0000000000000092 R11: 0000000000000246 R12: 00000000004074a0
[ 132.892680] R13: 00007ffdd14890c0 R14: 0000000000000000 R15: 0000000000000000
4 years, 7 months
[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace flush
by Dan Williams
Changes since v7 [1]:
* Fix IOVA reuse race by leaving the dma scatterlist mapped until
unregistration time. Use iommu_unmap() in ib_umem_lease_break() to
force-invalidate the ibverbs memory registration. (David Woodhouse)
* Introduce iomap_can_allocate() as a way to check if any layouts are
present in the mmap write-fault path to prevent block map changes, and
start the leak break process when an allocating write-fault occurs.
This also removes the i_mapdcount bloat of 'struct inode' from v7.
(Dave Chinner)
* Provide generic_map_direct_{open,close,lease} to cleanup the
filesystem wiring to implement MAP_DIRECT support (Dave Chinner)
* Abandon (defer to a potential new fcntl()) support for using
MAP_DIRECT on non-DAX files. With this change we can validate the
inode is MAP_DIRECT capable just once at mmap time rather than every
fault. (Dave Chinner)
* Arrange for lease_direct leases to also wait the
/proc/sys/fs/lease-break-time period before calling break_fn. For
example, allow the lease-holder time to quiesce RDMA operations before
the iommu starts throwing io-faults.
* Switch intel-iommu to use iommu_num_sg_pages().
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-October/012707.html
---
MAP_DIRECT is a mechanism that allows an application to establish a
mapping where the kernel will not change the block-map, or otherwise
dirty the block-map metadata of a file without notification. It supports
a "flush from userspace" model where persistent memory applications can
bypass the overhead of ongoing coordination of writes with the
filesystem, and it provides safety to RDMA operations involving DAX
mappings.
The kernel always has the ability to revoke access and convert the file
back to normal operation after performing a "lease break". Similar to
fcntl leases, there is no way for userspace to to cancel the lease break
process once it has started, it can only delay it via the
/proc/sys/fs/lease-break-time setting.
MAP_DIRECT enables XFS to supplant the device-dax interface for
mmap-write access to persistent memory with no ongoing coordination with
the filesystem via fsync/msync syscalls.
---
Dan Williams (14):
mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags
fs, mm: pass fd to ->mmap_validate()
fs: MAP_DIRECT core
xfs: prepare xfs_break_layouts() for reuse with MAP_DIRECT
fs, xfs, iomap: introduce iomap_can_allocate()
xfs: wire up MAP_DIRECT
iommu, dma-mapping: introduce dma_get_iommu_domain()
fs, mapdirect: introduce ->lease_direct()
xfs: wire up ->lease_direct()
device-dax: wire up ->lease_direct()
iommu: up-level sg_num_pages() from amd-iommu
iommu/vt-d: use iommu_num_sg_pages
IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings
tools/testing/nvdimm: enable rdma unit tests
arch/alpha/include/uapi/asm/mman.h | 1
arch/mips/include/uapi/asm/mman.h | 1
arch/mips/kernel/vdso.c | 2
arch/parisc/include/uapi/asm/mman.h | 1
arch/tile/mm/elf.c | 3
arch/x86/mm/mpx.c | 3
arch/xtensa/include/uapi/asm/mman.h | 1
drivers/base/dma-mapping.c | 10 +
drivers/dax/Kconfig | 1
drivers/dax/device.c | 4
drivers/infiniband/core/umem.c | 90 +++++-
drivers/iommu/amd_iommu.c | 40 +--
drivers/iommu/intel-iommu.c | 30 +-
drivers/iommu/iommu.c | 27 ++
fs/Kconfig | 5
fs/Makefile | 1
fs/aio.c | 2
fs/mapdirect.c | 382 ++++++++++++++++++++++++++
fs/xfs/Kconfig | 4
fs/xfs/Makefile | 1
fs/xfs/xfs_file.c | 103 +++++++
fs/xfs/xfs_iomap.c | 3
fs/xfs/xfs_layout.c | 45 +++
fs/xfs/xfs_layout.h | 13 +
fs/xfs/xfs_pnfs.c | 30 --
fs/xfs/xfs_pnfs.h | 10 -
include/linux/dma-mapping.h | 3
include/linux/fs.h | 2
include/linux/iomap.h | 10 +
include/linux/iommu.h | 2
include/linux/mapdirect.h | 57 ++++
include/linux/mm.h | 17 +
include/linux/mman.h | 42 +++
include/rdma/ib_umem.h | 8 +
include/uapi/asm-generic/mman-common.h | 1
include/uapi/asm-generic/mman.h | 1
ipc/shm.c | 3
mm/internal.h | 2
mm/mmap.c | 28 +-
mm/nommu.c | 5
mm/util.c | 7
tools/include/uapi/asm-generic/mman-common.h | 1
tools/testing/nvdimm/Kbuild | 31 ++
tools/testing/nvdimm/config_check.c | 2
tools/testing/nvdimm/test/iomap.c | 14 +
45 files changed, 938 insertions(+), 111 deletions(-)
create mode 100644 fs/mapdirect.c
create mode 100644 fs/xfs/xfs_layout.c
create mode 100644 fs/xfs/xfs_layout.h
create mode 100644 include/linux/mapdirect.h
4 years, 7 months
[PATCH] Fix mpage_writepage() for pages with buffers
by Matthew Wilcox
When using FAT on a block device which supports rw_page, we can hit
BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we
call clean_buffers() after unlocking the page we've written. Introduce a
new clean_page_buffers() which cleans all buffers associated with a page
and call it from within bdev_write_page().
Reported-by: Toshi Kani <toshi.kani(a)hpe.com>
Reported-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Tested-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Cc: stable(a)vger.kernel.org
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9941dc8342df..3fbe75bdd257 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -716,10 +716,12 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
set_page_writeback(page);
result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, true);
- if (result)
+ if (result) {
end_page_writeback(page);
- else
+ } else {
+ clean_page_buffers(page);
unlock_page(page);
+ }
blk_queue_exit(bdev->bd_queue);
return result;
}
diff --git a/fs/mpage.c b/fs/mpage.c
index 2e4c41ccb5c9..d97b003f1607 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -468,6 +468,16 @@ static void clean_buffers(struct page *page, unsigned first_unmapped)
try_to_free_buffers(page);
}
+/*
+ * For situations where we want to clean all buffers attached to a page.
+ * We don't need to calculate how many buffers are attached to the page,
+ * we just need to specify a number larger than the maximum number of buffers.
+ */
+void clean_page_buffers(struct page *page)
+{
+ clean_buffers(page, PAGE_SIZE);
+}
+
static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
void *data)
{
@@ -605,10 +615,8 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
if (bio == NULL) {
if (first_unmapped == blocks_per_page) {
if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
- page, wbc)) {
- clean_buffers(page, first_unmapped);
+ page, wbc))
goto out;
- }
}
bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
BIO_MAX_PAGES, GFP_NOFS|__GFP_HIGH);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c8dae555eccf..446b24cac67d 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -232,6 +232,7 @@ int generic_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
void page_zero_new_buffers(struct page *page, unsigned from, unsigned to);
+void clean_page_buffers(struct page *page);
int cont_write_begin(struct file *, struct address_space *, loff_t,
unsigned, unsigned, struct page **, void **,
get_block_t *, loff_t *);
4 years, 7 months
[ndctl PATCH] ndctl, test: rdma vs dax
by Dan Williams
Use the rxe (Soft-ROCE) driver to unit test the DAX paths in ibverbs
memory registration (ib_umem_get).
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
configure.ac | 11 +++
test/Makefile.am | 13 +++
test/rdma.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
test/rdma.sh | 54 +++++++++++++
4 files changed, 302 insertions(+)
create mode 100644 test/rdma.c
create mode 100755 test/rdma.sh
diff --git a/configure.ac b/configure.ac
index 5b103813ee6f..087df2f7b3a6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -94,6 +94,17 @@ PKG_CHECK_MODULES([UDEV], [libudev])
PKG_CHECK_MODULES([UUID], [uuid])
PKG_CHECK_MODULES([JSON], [json-c])
+AC_ARG_WITH([libibverbs],
+ AS_HELP_STRING([--with-libibverbs],
+ [Enable RDMA functionality. @<:@default=no@:>@]),
+ [], [with_libibverbs=no])
+if test "x$with_libibverbs" = "xyes"; then
+ AC_CHECK_LIB(ibverbs, ibv_get_device_list, [],
+ AC_MSG_ERROR([libibverbs not found.]))
+ AC_DEFINE(ENABLE_RDMA, 1, [Enable RDMA])
+fi
+AM_CONDITIONAL([ENABLE_RDMA], [test "x$with_libibverbs" = "xyes"])
+
AC_ARG_WITH([libpmem],
AS_HELP_STRING([--with-libpmem],
[Install with libpmem support. @<:@default=no@:>@]),
diff --git a/test/Makefile.am b/test/Makefile.am
index 9223628b2608..0be0d0ab8828 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -42,6 +42,11 @@ check_PROGRAMS +=\
dax-pmd \
device-dax \
mmap
+
+if ENABLE_RDMA
+TESTS += rdma.sh
+check_PROGRAMS += rdma
+endif
endif
LIBNDCTL_LIB =\
@@ -110,3 +115,11 @@ multi_pmem_LDADD = \
$(UUID_LIBS) \
$(KMOD_LIBS) \
../libutil.a
+
+rdma_SOURCES =\
+ rdma.c \
+ $(testcore)
+
+rdma_LDADD = \
+ $(LIBNDCTL_LIB)
+ -libverbs
diff --git a/test/rdma.c b/test/rdma.c
new file mode 100644
index 000000000000..043483272162
--- /dev/null
+++ b/test/rdma.c
@@ -0,0 +1,224 @@
+/*
+ * Copyright (c) 2014-2017, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#include <stdio.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <ctype.h>
+#include <errno.h>
+#include <unistd.h>
+#include <limits.h>
+#include <syslog.h>
+#include <sys/mman.h>
+#include <linux/mman.h>
+
+#include <util/size.h>
+#include <ndctl/libndctl.h>
+#include <infiniband/verbs.h>
+#include <ccan/array_size/array_size.h>
+
+static struct ibv_qp *create_qp(struct ibv_pd *pd, struct ibv_cq *cq)
+{
+ struct ibv_qp *qp;
+ struct ibv_qp_init_attr qp_attr = {
+ .send_cq = cq,
+ .recv_cq = cq,
+ .cap = {
+ .max_send_wr = 1,
+ .max_recv_wr = 1,
+ .max_send_sge = 1,
+ .max_recv_sge = 1,
+ },
+ .qp_type = IBV_QPT_RC,
+ };
+
+ qp = ibv_create_qp(pd, &qp_attr);
+ if (!qp)
+ return NULL;
+ if (qp_attr.cap.max_send_wr < 1 || qp_attr.cap.max_recv_wr < 1
+ || qp_attr.cap.max_send_sge < 1
+ || qp_attr.cap.max_recv_sge < 1) {
+ fprintf(stderr, "%s: insufficient queue pair capabilities\n",
+ __func__);
+ ibv_destroy_qp(qp);
+ return NULL;
+ }
+ return qp;
+}
+
+static int post_recv(struct ibv_qp *qp, struct ibv_mr *mr, void *addr,
+ size_t len)
+{
+ struct ibv_recv_wr wr = {
+ .sg_list = &(struct ibv_sge) {
+ .addr = (uint64_t) addr,
+ .length = len,
+ .lkey = mr->lkey
+ },
+ .num_sge = 1,
+ .next = NULL,
+ };
+ struct ibv_recv_wr *bad_wr;
+
+ return ibv_post_recv(qp, &wr, &bad_wr);
+}
+
+static int do_rdma(struct ndctl_ctx *ctx, int fd, unsigned long map_flags)
+{
+ int nr_devs, rc = -ENXIO;
+ void *addr;
+ struct ibv_pd *pd;
+ struct ibv_mr *mr;
+ struct ibv_cq *cq;
+ struct ibv_qp *qp;
+ struct ibv_context *ictx;
+ size_t map_len = 4*HPAGE_SIZE;
+ struct ibv_device **idevs, *idev;
+
+ addr = mmap(NULL, map_len, PROT_READ|PROT_WRITE, map_flags, fd, 0);
+ if (addr == MAP_FAILED) {
+ fprintf(stderr, "failed to map test file\n");
+ return -ENXIO;
+ }
+
+ idevs = ibv_get_device_list(&nr_devs);
+ if (!idevs || !nr_devs) {
+ fprintf(stderr, "ibverbs device not found\n");
+ goto err_dev;
+ }
+
+ idev = idevs[0];
+ ictx = ibv_open_device(idev);
+ if (ictx)
+ fprintf(stderr, "%s: opened dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ else {
+ fprintf(stderr, "%s: failed to open dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ goto err_open;
+ }
+
+ pd = ibv_alloc_pd(ictx);
+ if (!pd) {
+ fprintf(stderr, "%s: failed alloc_pd dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ goto err_pd;
+ }
+
+ mr = ibv_reg_mr(pd, addr, map_len, IBV_ACCESS_LOCAL_WRITE
+ | IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ);
+ if (!mr) {
+ fprintf(stderr, "%s: failed reg_mr dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ goto err_mr;
+ }
+
+ cq = ibv_create_cq(ictx, 1, NULL, NULL, 0);
+ if (!cq) {
+ fprintf(stderr, "%s: failed create_cq dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ goto err_cq;
+ }
+
+ qp = create_qp(pd, cq);
+ if (!cq) {
+ fprintf(stderr, "%s: failed create_qp dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ goto err_qp;
+ }
+
+ rc = post_recv(qp, mr, addr, map_len);
+ if (rc) {
+ fprintf(stderr, "%s: failed post_recv (%d) dev: %s\n", __func__,
+ rc, ibv_get_device_name(idev));
+ goto err_post_recv;
+ }
+
+ fprintf(stderr, "%s: successful post_recv dev: %s\n", __func__,
+ ibv_get_device_name(idev));
+ rc = 0;
+err_post_recv:
+ ibv_destroy_qp(qp);
+err_qp:
+ ibv_destroy_cq(cq);
+err_cq:
+ ibv_dereg_mr(mr);
+err_mr:
+ ibv_dealloc_pd(pd);
+err_pd:
+ ibv_close_device(ictx);
+err_open:
+ ibv_free_device_list(idevs);
+err_dev:
+ munmap(addr, map_len);
+ return rc;
+}
+
+static int test_rdma(int fd, int loglevel)
+{
+ int err, i;
+ struct ndctl_ctx *ctx;
+ unsigned long test_flags[] = {
+ MAP_SHARED,
+ MAP_SHARED_VALIDATE | MAP_DIRECT,
+ };
+
+ err = ndctl_new(&ctx);
+ if (err < 0)
+ return err;
+
+ ndctl_set_log_priority(ctx, loglevel);
+
+ for (i = 0; i < (int) ARRAY_SIZE(test_flags); i++) {
+ unsigned long map_flags = test_flags[i];
+
+ err = do_rdma(ctx, fd, map_flags);
+ switch (map_flags) {
+ case MAP_SHARED:
+ if (err == 0) {
+ fprintf(stderr, "expected failure map_flags: %#lx\n",
+ map_flags);
+ return EXIT_FAILURE;
+ }
+ break;
+ case (MAP_SHARED_VALIDATE | MAP_DIRECT):
+ if (err != 0) {
+ fprintf(stderr, "expected success map_flags: %#lx\n",
+ map_flags);
+ return EXIT_FAILURE;
+ }
+ break;
+ default:
+ fprintf(stderr, "unhandled test case\n");
+ return EXIT_FAILURE;
+ }
+ }
+
+ ndctl_unref(ctx);
+ return err;
+}
+
+int __attribute__((weak)) main(int argc, char *argv[])
+{
+ int rc, fd;
+
+ if (argc < 1)
+ return -EINVAL;
+
+ fd = open(argv[1], O_RDWR);
+ rc = test_rdma(fd, LOG_DEBUG);
+ if (fd >= 0)
+ close(fd);
+ return rc;
+}
diff --git a/test/rdma.sh b/test/rdma.sh
new file mode 100755
index 000000000000..3b486d7b1680
--- /dev/null
+++ b/test/rdma.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+
+# Copyright(c) 2015-2017 Intel Corporation. All rights reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of version 2 of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+
+MNT=test_dax_mnt
+FILE=image
+NDCTL="../ndctl/ndctl"
+json2var="s/[{}\",]//g; s/:/=/g"
+blockdev=""
+
+err() {
+ echo "test-rdma: failed at line $1"
+ if [ -n "$blockdev" ]; then
+ umount /dev/$blockdev
+ else
+ rc=77
+ fi
+ rmdir $MNT
+ exit $rc
+}
+
+set -e
+mkdir -p $MNT
+trap 'err $LINENO' ERR
+
+rxe_cfg stop
+rxe_cfg start
+if ! rxe_cfg status | grep -n rxe0; then
+ rxe_cfg add eth0
+fi
+
+dev=$(./dax-dev)
+json=$($NDCTL list -N -n $dev)
+eval $(echo $json | sed -e "$json2var")
+rc=1
+
+# TODO test with sparse file, and a file that needs to do unwritten
+# extent conversion
+mkfs.xfs -f /dev/$blockdev
+mount /dev/$blockdev $MNT -o dax
+dd if=/dev/zero of=$MNT/$FILE bs=1G count=1
+./rdma $MNT/$FILE
+umount $MNT
+
+exit 0
4 years, 7 months
[ndctl PATCH 0/8] add an inject-error command to ndctl
by Vishal Verma
These patches add a new command to ndctl for error injection. They are
implemented such that the interface provided to a user is consistent
with the kernel - i.e. all media errors are expected/displayed in terms
of 512 byte sectors. The underlying ACPI DSMs need and provide byte
relative offsets/lengths, but these are converted to 512B sectors for
consistency.
These also update unit tests to use the new error injection commands,
and add two new unit tests - first, to test the error injection commands
themselves, and second, to test BTT error clearing.
Vishal Verma (8):
libndctl: fix a memory leak in add_bus
ndctl, list: move the --human description to an include
libndctl: add APIs to get scrub count and to wait for a scrub
ccan/list: add a list_add_after helper
ndctl: add an inject-error command
ndctl/test: add a new unit test for inject-error
ndctl/test: update existing unit tests to use error-inject
ndctl/test: add a new unit test for BTT error clearing
Documentation/ndctl/Makefile.am | 1 +
Documentation/ndctl/human-option.txt | 5 +
Documentation/ndctl/ndctl-inject-error.txt | 108 +++++
Documentation/ndctl/ndctl-list.txt | 8 +-
Documentation/ndctl/ndctl.txt | 1 +
builtin.h | 1 +
ccan/list/list.h | 32 ++
contrib/ndctl | 5 +-
ndctl/Makefile.am | 3 +-
ndctl/inject-error.c | 745 +++++++++++++++++++++++++++++
ndctl/lib/libndctl.c | 86 ++++
ndctl/lib/libndctl.sym | 2 +
ndctl/lib/private.h | 1 +
ndctl/libndctl-nfit.h | 8 +
ndctl/libndctl.h.in | 2 +
ndctl/ndctl.c | 1 +
test/Makefile.am | 4 +-
test/btt-errors.sh | 152 ++++++
test/clear.sh | 5 +-
test/dax-errors.sh | 5 +-
test/daxdev-errors.sh | 17 +-
test/inject-error.sh | 89 ++++
util/json.c | 26 +
util/json.h | 3 +
util/size.h | 1 +
25 files changed, 1297 insertions(+), 14 deletions(-)
create mode 100644 Documentation/ndctl/human-option.txt
create mode 100644 Documentation/ndctl/ndctl-inject-error.txt
create mode 100644 ndctl/inject-error.c
create mode 100755 test/btt-errors.sh
create mode 100755 test/inject-error.sh
--
2.9.5
4 years, 7 months
[PATCH v3 0/4] add error injection commands to nfit_test
by Vishal Verma
v3:
patch 1,2:
- move the nfit_test kbuild update for badrange to patch 1 (Dan)
v2:
patch 1:
- change all instances of 'be' to 'bre' to avoid confusion with
big endian (Dan)
patch 2:
- move an injection related define to a local nfit_test header
since it is not used outside of nfit_test (Dan)
These patches add error injection support to nfit_test by emulating the
ACPI6.2 ARS error injection commands. The commands are sent via the
ND_CMD_CALL interface, so only nfit_test knows of the various
definitions related to this.
Note that this patch set will break ndctl unit tests unless the ndctl
patches for error injection are also applied.
Dave Jiang (2):
libnvdimm: move poison list functions to a new 'badrange' file
nfit_test: add error injection DSMs
Vishal Verma (2):
libnvdimm, badrange: remove a WARN for list_empty
nfit_test: when clearing poison, also remove badrange entries
drivers/acpi/nfit/core.c | 2 +-
drivers/acpi/nfit/mce.c | 2 +-
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/badrange.c | 293 ++++++++++++++++++++++++++++++++++
drivers/nvdimm/bus.c | 24 +--
drivers/nvdimm/core.c | 260 +-----------------------------
drivers/nvdimm/nd-core.h | 3 +-
drivers/nvdimm/nd.h | 6 -
include/linux/libnvdimm.h | 21 ++-
tools/testing/nvdimm/Kbuild | 1 +
tools/testing/nvdimm/test/nfit.c | 199 +++++++++++++++++++----
tools/testing/nvdimm/test/nfit_test.h | 5 +
12 files changed, 503 insertions(+), 314 deletions(-)
create mode 100644 drivers/nvdimm/badrange.c
--
2.9.5
4 years, 7 months