[ndctl PATCH] ndctl, check: Add a sigbus handler to detect metadata corruption
by Vishal Verma
If we have poison/badblocks in the BTT metadata sections, the mmap-reads
happening in the checker will trigger a SIGBUS, and the program will
halt abruptly. Add a sigbus handler which notifies the user of this, and
prints out a relevant error message:
namespace5.0: namespace_check: checking namespace5.0
namespace5.0: btt_discover_arenas: found 1 BTT arena
namespace5.0: btt_check_arenas: checking arena 0
namespace5.0: namespace_check: Received a SIGBUS
namespace5.0: namespace_check: Metadata corruption found, recovery is not possible
error checking namespaces: Bad address
Cc: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jeff Moyer <jmoyer(a)redhat.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
ndctl/check.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/ndctl/check.c b/ndctl/check.c
index 3b30a98..3775c2e 100644
--- a/ndctl/check.c
+++ b/ndctl/check.c
@@ -13,6 +13,8 @@
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
+#include <setjmp.h>
+#include <signal.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
@@ -40,6 +42,13 @@
#include <ndctl.h>
#endif
+static sigjmp_buf sj_env;
+
+static void sigbus_hdl(int sig, siginfo_t *siginfo, void *ptr)
+{
+ siglongjmp(sj_env, 1);
+}
+
static int repair_msg(struct btt_chk *bttc)
{
info(bttc, " Run with --repair to make the changes\n");
@@ -872,6 +881,7 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
int raw_mode, rc, disabled_flag = 0, open_flags;
struct btt_sb *btt_sb;
struct btt_chk *bttc;
+ struct sigaction act;
char path[50];
bttc = calloc(1, sizeof(*bttc));
@@ -882,6 +892,15 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
if (opts->verbose)
bttc->ctx.log_priority = LOG_DEBUG;
+ memset(&act, 0, sizeof(act));
+ act.sa_sigaction = sigbus_hdl;
+ act.sa_flags = SA_SIGINFO;
+
+ if (sigaction(SIGBUS, &act, 0)) {
+ err(bttc, "Unable to set sigaction\n");
+ return -errno;
+ }
+
bttc->opts = opts;
bttc->start_off = BTT_START_OFFSET;
bttc->sys_page_size = sysconf(_SC_PAGESIZE);
@@ -949,6 +968,18 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
goto out_sb;
}
+ /*
+ * This is where we jump to if we receive a SIGBUS, prior to doing any
+ * mmaped reads, and can safely abort
+ */
+ if (sigsetjmp(sj_env, 1)) {
+ err(bttc, "Received a SIGBUS\n");
+ err(bttc,
+ "Metadata corruption found, recovery is not possible\n");
+ rc = -EFAULT;
+ goto out_close;
+ }
+
rc = btt_info_read_verify(bttc, btt_sb, bttc->start_off);
if (rc) {
rc = btt_recover_first_sb(bttc);
--
2.9.3
5 years, 5 months
[PATCH 00/22] Introduce common scatterlist map function
by Logan Gunthorpe
Hi Everyone,
As part of my effort to enable P2P DMA transactions with PCI cards,
we've identified the need to be able to safely put IO memory into
scatterlists (and eventually other spots). This probably involves a
conversion from struct page to pfn_t but that migration is a ways off
and those decisions are yet to be made.
As an initial step in that direction, I've started cleaning up some of the
scatterlist code by trying to carve out a better defined layer between it
and it's users. The longer term goal would be to remove sg_page or replace
it with something that can potentially fail.
This patchset is the first step in that effort. I've introduced
a common function to map scatterlist memory and converted all the common
kmap(sg_page()) cases. This removes about 66 sg_page calls (of ~331).
Seeing this is a fairly large cleanup set that touches a wide swath of
the kernel I have limited the people I've sent this to. I'd suggest we look
toward merging the first patch and then I can send the individual subsystem
patches on to their respective maintainers and get them merged
independantly. (This is to avoid the conflicts I created with my last
cleanup set... Sorry) Though, I'm certainly open to other suggestions to get
it merged.
The patchset is based on v4.11-rc6 and can be found in the sg_map
branch from this git tree:
https://github.com/sbates130272/linux-p2pmem.git
Thanks,
Logan
Logan Gunthorpe (22):
scatterlist: Introduce sg_map helper functions
nvmet: Make use of the new sg_map helper function
libiscsi: Make use of new the sg_map helper function
target: Make use of the new sg_map function at 16 call sites
drm/i915: Make use of the new sg_map helper function
crypto: hifn_795x: Make use of the new sg_map helper function
crypto: shash, caam: Make use of the new sg_map helper function
crypto: chcr: Make use of the new sg_map helper function
dm-crypt: Make use of the new sg_map helper in 4 call sites
staging: unisys: visorbus: Make use of the new sg_map helper function
RDS: Make use of the new sg_map helper function
scsi: ipr, pmcraid, isci: Make use of the new sg_map helper in 4 call
sites
scsi: hisi_sas, mvsas, gdth: Make use of the new sg_map helper
function
scsi: arcmsr, ips, megaraid: Make use of the new sg_map helper
function
scsi: libfc, csiostor: Change to sg_copy_buffer in two drivers
xen-blkfront: Make use of the new sg_map helper function
mmc: sdhci: Make use of the new sg_map helper function
mmc: spi: Make use of the new sg_map helper function
mmc: tmio: Make use of the new sg_map helper function
mmc: sdricoh_cs: Make use of the new sg_map helper function
mmc: tifm_sd: Make use of the new sg_map helper function
memstick: Make use of the new sg_map helper function
crypto/shash.c | 9 +-
drivers/block/xen-blkfront.c | 33 +++++--
drivers/crypto/caam/caamalg.c | 8 +-
drivers/crypto/chelsio/chcr_algo.c | 28 +++---
drivers/crypto/hifn_795x.c | 32 ++++---
drivers/dma-buf/dma-buf.c | 3 +
drivers/gpu/drm/i915/i915_gem.c | 27 +++---
drivers/md/dm-crypt.c | 38 +++++---
drivers/memstick/host/jmb38x_ms.c | 23 ++++-
drivers/memstick/host/tifm_ms.c | 22 ++++-
drivers/mmc/host/mmc_spi.c | 26 +++--
drivers/mmc/host/sdhci.c | 35 ++++++-
drivers/mmc/host/sdricoh_cs.c | 14 ++-
drivers/mmc/host/tifm_sd.c | 88 +++++++++++++----
drivers/mmc/host/tmio_mmc.h | 12 ++-
drivers/mmc/host/tmio_mmc_dma.c | 5 +
drivers/mmc/host/tmio_mmc_pio.c | 24 +++++
drivers/nvme/target/fabrics-cmd.c | 16 +++-
drivers/scsi/arcmsr/arcmsr_hba.c | 16 +++-
drivers/scsi/csiostor/csio_scsi.c | 54 +----------
drivers/scsi/cxgbi/libcxgbi.c | 5 +
drivers/scsi/gdth.c | 9 +-
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 14 ++-
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 13 ++-
drivers/scsi/ipr.c | 27 +++---
drivers/scsi/ips.c | 8 +-
drivers/scsi/isci/request.c | 42 ++++----
drivers/scsi/libfc/fc_libfc.c | 49 ++--------
drivers/scsi/libiscsi_tcp.c | 32 ++++---
drivers/scsi/megaraid.c | 9 +-
drivers/scsi/mvsas/mv_sas.c | 10 +-
drivers/scsi/pmcraid.c | 19 ++--
drivers/staging/unisys/visorhba/visorhba_main.c | 12 ++-
drivers/target/iscsi/iscsi_target.c | 27 ++++--
drivers/target/target_core_rd.c | 3 +-
drivers/target/target_core_sbc.c | 122 +++++++++++++++++-------
drivers/target/target_core_transport.c | 18 ++--
drivers/target/target_core_user.c | 43 ++++++---
include/linux/scatterlist.h | 97 +++++++++++++++++++
include/scsi/libiscsi_tcp.h | 3 +-
include/target/target_core_backend.h | 4 +-
net/rds/ib_recv.c | 17 +++-
42 files changed, 739 insertions(+), 357 deletions(-)
--
2.1.4
5 years, 5 months
Shipment delivery problem #7185722
by fnwummxdse@32270-23911.cloudwaysapps.com
Dear Customer,
Your item has arrived at April 14, but our courier was not able to deliver the parcel.
Please check the attachment for complete details!
All the best,
Mario Donaldson,
UPS Office Clerk.
5 years, 5 months
[PATCH 0/5] libnvdimm: acpi updates and a revert
by Dan Williams
With Dave's recent fix [1], we can restore error clearing for btt i/o in
4.12.
ACPI 6.1 introduced new health state flags. Beyond reflecting them in
the dimmX/flags sysfs attribute we also need to handle the deeper
implications of the ACPI_NFIT_MEM_MAP_FAILED flag which changes
assumptions on how the driver discovers dimms. In the "map failed" case
there may missing or no SPA entries associated with a dimm. Those dimms
should still be registered with libnvdimm so that the error state can be
communicated and recovery attempted.
[1]: https://patchwork.kernel.org/patch/9680035/
---
Dan Williams (5):
Revert "libnvdimm: band aid btt vs clear poison locking"
acpi, nfit: add support for acpi 6.1 dimm state flags
tools/testing/nvdimm: test acpi 6.1 health state flags
acpi, nfit: support "map failed" dimms
acpi, nfit: limit ->flush_probe() to initialization work
drivers/acpi/nfit/core.c | 61 +++++++++++++++++++++++++++++++-------
drivers/acpi/nfit/nfit.h | 1 +
drivers/nvdimm/claim.c | 10 +-----
tools/testing/nvdimm/test/nfit.c | 40 +++++++++++++++++++++++--
4 files changed, 88 insertions(+), 24 deletions(-)
5 years, 5 months
panics related to nfit_test?
by Linda Knippers
I'm trying to run the ndctl tests on 4.11-rc5. I've never run them before but I
think I correctly followed all the directions for building and installing the
tools/testing/nvdimm components as described in the ndctl README.md. I'm
seeing two problems that may be related and I'm wondering whether this could
be build/user error or something real.
1) Running the tests was causing my system to panic when the nfit_test module
is unloaded. I determined I don't actually have to run a test to cause the panic, just
modprobe the modules as listed in ndctl nfit_test_init(), then modprobe nfit_test,
then rmmod nfit_test. I'm doing this on a system without NVDIMMs. I get
the same thing on a system with NVDIMMs although the other modules are already
loaded.
This is the panic I get, very reproducibly.
[53617.173340] nfit_test nfit_test.0: failed to evaluate _FIT
<The above message is from the modprobe. Is that expected? The rest is after the rmmod.>
[53683.797952] BUG: unable to handle kernel NULL pointer dereference at (null)
[53683.837521] IP: __list_del_entry_valid+0x29/0xd0
[53683.861449] PGD 105f4fb067
[53683.861449] PUD 1054889067
[53683.874551] PMD 0
[53683.887664]
[53683.903937] Oops: 0000 [#1] SMP
[53683.918657] Modules linked in: nfit_test(O-) nd_pmem(O) nd_e820(O) nd_blk(O) nd_btt(O)
dax_pmem(O) dax(O) nfit(O) libnvdimm(O) nfit_test_iomap(O) ip6t_rpfilter ipt_REJECT nf_reject_ipv4
ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat
kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif aesni_intel
crypto_simd glue_helper cryptd sg hpilo iTCO_wdt
[53684.252765] hpwdt ipmi_si ipmi_devintf iTCO_vendor_support ioatdma i2c_i801 lpc_ich shpchp pcspkr
acpi_power_meter ipmi_msghandler dca wmi ip_tables xfs sd_mod mgag200 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm bnx2x tg3 mdio hpsa ptp i2c_core pps_core
libcrc32c scsi_transport_sas crc32c_intel
[53684.394684] CPU: 35 PID: 4087 Comm: rmmod Tainted: G W O 4.11.0-rc5+ #3
[53684.430295] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
[53684.469368] task: ffff9cdbaca9ad00 task.stack: ffffbf3348cc8000
[53684.497175] RIP: 0010:__list_del_entry_valid+0x29/0xd0
[53684.521315] RSP: 0018:ffffbf3348ccbd90 EFLAGS: 00010007
[53684.545823] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[53684.579642] RDX: dead000000000200 RSI: ffff9cdbaf4268a0 RDI: ffffbf334e302000
[53684.613132] RBP: ffffbf3348ccbd90 R08: 0000000000000000 R09: ffffbf334e302000
[53684.646725] R10: 0000000000000004 R11: ffff9cdbaf4268a0 R12: ffffbf3348ccbdc8
[53684.680100] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9ce7a36f2400
[53684.713655] FS: 00007f1fab239740(0000) GS:ffff9ce7af040000(0000) knlGS:0000000000000000
[53684.751875] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[53684.778962] CR2: 0000000000000000 CR3: 000000106eb12000 CR4: 00000000003406e0
[53684.812949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[53684.847826] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[53684.883234] Call Trace:
[53684.896228] release_nodes+0x76/0x260
[53684.913359] devres_release_all+0x3c/0x60
[53684.932192] device_release_driver_internal+0x151/0x1f0
[53684.956700] driver_detach+0x3f/0x80
[53684.973569] bus_remove_driver+0x55/0xd0
[53684.992057] driver_unregister+0x2c/0x50
[53685.010575] platform_driver_unregister+0x12/0x20
[53685.032584] nfit_test_exit+0x10/0xaa9 [nfit_test]
[53685.055372] SyS_delete_module+0x1ba/0x220
[53685.074931] do_syscall_64+0x67/0x180
[53685.092329] entry_SYSCALL64_slow_path+0x25/0x25
[53685.114144] RIP: 0033:0x7f1faa70dc27
[53685.131113] RSP: 002b:00007ffc579ffa98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[53685.167000] RAX: ffffffffffffffda RBX: 0000000002560340 RCX: 00007f1faa70dc27
[53685.201314] RDX: 00007f1faa77e000 RSI: 0000000000000800 RDI: 00000000025603a8
[53685.234812] RBP: 0000000000000000 R08: 00007f1faa9d1060 R09: 00007f1faa77e000
[53685.267909] R10: 00007ffc579ff820 R11: 0000000000000202 R12: 00007ffc57a00922
[53685.301350] R13: 0000000000000000 R14: 0000000002560340 R15: 0000000002560010
[53685.335068] Code: 00 00 55 48 8b 07 48 ba 00 01 00 00 00 00 ad de 4c 8b 47 08 48 89 e5 48 39 d0
74 27 48 ba 00 02 00 00 00 00 ad de 49 39 d0 74 7e <4d> 8b 00 4c 39 c7 75 55 4c 8b 40 08 4c 39 c7 75
2b b8 01 00 00
[53685.427540] RIP: __list_del_entry_valid+0x29/0xd0 RSP: ffffbf3348ccbd90
[53685.459123] CR2: 0000000000000000
[53685.477027] ---[ end trace 2392c114f429911a ]---
[53685.503198] Kernel panic - not syncing: Fatal exception
[53685.528001] Kernel Offset: 0x2da00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
[53685.584866] ---[ end Kernel panic - not syncing: Fatal exception
2) If I skip the step of loading all the other modules on a system without
NVDIMMs and just load nfit_test, the system will panic. It sometimes panics
immediately in the nd code and sometimes a few seconds later. Here's an
example of a more immediate panic:
[ 81.125797] nfit_test nfit_test.0: failed to evaluate _FIT
[ 82.213983] BUG: unable to handle kernel
[ 82.213985] nd_bus ndbus1: nd_pmem.probe(btt6.0) = -19
[ 82.213990] nd_bus ndbus1: nd_pmem.probe(pfn6.0) = -19
[ 82.214012] nd_bus ndbus1: dax_pmem.probe(dax6.0) = -19
[ 82.214029] nd_pmem namespace7.0: nd_btt_probe: btt: <none>
[ 82.214031] btt7.1: nd_btt_release
[ 82.214035] nd_bus ndbus1: nd_pmem.probe(btt7.0) = -19
[ 82.214036] nd_pmem namespace7.0: nd_pfn_probe: pfn: <none>
[ 82.214037] pfn7.1: nd_pfn_release
[ 82.214043] nd_pmem namespace7.0: nd_dax_probe: dax: <none>
[ 82.214063] dax7.1: nd_dax_release
[ 82.214066] nd_pmem namespace7.0: unable to guarantee persistence of writes
[ 82.214078] nd_bus ndbus1: dax_pmem.probe(dax7.0) = -19
[ 82.214104] nd_bus ndbus1: nd_pmem.probe(pfn7.0) = -19
[ 82.215500] pmem7: detected capacity change from 0 to 4194304
[ 82.215505] nd_bus ndbus1: nd_pmem.probe(namespace7.0) = 0
[ 82.584056] paging request at fffffc8a4d260060
[ 82.603976] IP: kfree+0x4b/0x180
[ 82.618403] PGD 0
[ 82.618404]
[ 82.634670] Oops: 0000 [#1] SMP
[ 82.648749] Modules linked in: dax_pmem(O) nd_pmem(O) dax(O) nd_blk(O) nd_btt(O) nfit_test(O)
nfit(O) libnvdimm(O) nfit_test_iomap(O) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat
fat intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ipmi_ssif ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd ipmi_si sg ipmi_devintf iTCO_wdt
[ 82.977028] hpilo hpwdt iTCO_vendor_support wmi ipmi_msghandler ioatdma pcspkr acpi_power_meter
shpchp i2c_i801 lpc_ich dca ip_tables xfs sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm bnx2x tg3 mdio ptp hpsa i2c_core pps_core libcrc32c
crc32c_intel scsi_transport_sas
[ 83.109212] CPU: 1 PID: 3600 Comm: kworker/u145:3 Tainted: G O 4.11.0-rc5+ #3
[ 83.148180] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
[ 83.187404] Workqueue: events_unbound async_run_entry_fn
[ 83.212597] task: ffff9aa1a4830000 task.stack: ffffac1085dfc000
[ 83.240553] RIP: 0010:kfree+0x4b/0x180
[ 83.258477] RSP: 0018:ffffac1085dffbf8 EFLAGS: 00010282
[ 83.284608] RAX: fffffc8a4d260040 RBX: ffffac1089801000 RCX: 0000000000000000
[ 83.320332] RDX: 0000656240000000 RSI: 0000000000000001 RDI: ffffac1089801000
[ 83.354911] RBP: ffffac1085dffc10 R08: 000000000001e6a0 R09: ffffffff883bc49c
[ 83.388249] R10: ffff9aa1af45e6a0 R11: fffffc4491b0bb00 R12: ffffffff883bc6b0
[ 83.421940] R13: ffffffff884e9693 R14: 0000000000000000 R15: 000000000000003b
[ 83.454797] FS: 0000000000000000(0000) GS:ffff9aa1af440000(0000) knlGS:0000000000000000
[ 83.492958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 83.520259] CR2: fffffc8a4d260060 CR3: 0000000007a09000 CR4: 00000000003406e0
[ 83.553821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 83.587522] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 83.621249] Call Trace:
[ 83.633325] ? pinctrl_put+0x30/0x30
[ 83.650243] devres_free+0x23/0x30
[ 83.666292] devres_release+0x32/0x50
[ 83.683445] devm_pinctrl_put+0x23/0x40
[ 83.701820] pinctrl_bind_pins+0xf0/0x290
[ 83.720422] driver_probe_device+0xa5/0x470
[ 83.740234] __device_attach_driver+0x7e/0xe0
[ 83.760925] ? driver_allows_async_probing+0x30/0x30
[ 83.784698] bus_for_each_drv+0x68/0xb0
[ 83.802935] __device_attach+0xdd/0x160
[ 83.821835] device_initial_probe+0x13/0x20
[ 83.841870] bus_probe_device+0x92/0xa0
[ 83.860552] device_add+0x44b/0x610
[ 83.877357] ? __switch_to+0x23e/0x510
[ 83.895861] nd_async_device_register+0x12/0x50 [libnvdimm]
[ 83.923081] async_run_entry_fn+0x39/0x170
[ 83.942256] process_one_work+0x165/0x410
[ 83.961263] worker_thread+0x137/0x4c0
[ 83.978660] kthread+0x101/0x140
[ 83.993687] ? rescuer_thread+0x3b0/0x3b0
[ 84.012334] ? kthread_park+0x90/0x90
[ 84.028959] ret_from_fork+0x2c/0x40
[ 84.045767] Code: 96 00 00 00 b8 00 00 00 80 48 8b 15 30 d3 9f 00 48 01 d8 0f 83 d3 00 00 00 48 01
d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 1d 4a a3 00 <4c> 8b 58 20 41 f6 c3 01 0f 85 12 01 00 00 49 89 c3
49 8b 43 20
[ 84.134033] RIP: kfree+0x4b/0x180 RSP: ffffac1085dffbf8
[ 84.158740] CR2: fffffc8a4d260060
[ 84.174402] ---[ end trace 0f035cd21307487a ]---
Here's an example of a panic that happened a bit later.
[ 111.030442] nfit_test nfit_test.0: failed to evaluate _FIT
[ 119.845687] BUG: unable to handle kernel paging request at ffffe2d2af202360
[ 119.880981] IP: kmem_cache_free+0x5a/0x1f0
[ 119.900905] PGD 0
[ 119.900905]
[ 119.918093] Oops: 0000 [#1] SMP
[ 119.935350] Modules linked in: dax_pmem(O) nd_pmem(O) dax(O) nd_blk(O) nd_btt(O) nfit_test(O)
nfit(O) libnvdimm(O) nfit_test_iomap(O) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper
cryptd ipmi_ssif sg iTCO_wdt hpwdt iTCO_vendor_support
[ 120.260507] hpilo i2c_i801 ioatdma wmi shpchp lpc_ich pcspkr dca ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter ip_tables xfs sd_mod mgag200 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm bnx2x tg3 mdio hpsa ptp i2c_core pps_core
libcrc32c scsi_transport_sas crc32c_intel
[ 120.388305] CPU: 2 PID: 1075 Comm: systemd-readahe Tainted: G O 4.11.0-rc5+ #3
[ 120.427776] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
[ 120.464931] task: ffff96af257aad00 task.stack: ffffb0ea8638c000
[ 120.491504] RIP: 0010:kmem_cache_free+0x5a/0x1f0
[ 120.512262] RSP: 0018:ffffb0ea8638f9e0 EFLAGS: 00010282
[ 120.535816] RAX: ffffe2d2af202340 RBX: ffffb0ea8808d000 RCX: ffff96a323fb8c00
[ 120.568021] RDX: 00006960c0000000 RSI: ffffb0ea8808d000 RDI: ffff96a03fc07ac0
[ 120.600166] RBP: ffffb0ea8638f9f8 R08: ffffb0ea8808d008 R09: ffffffffc0757dc5
[ 120.631954] R10: ffff96a32f49e660 R11: ffffe269918fee00 R12: ffff96a03fc07ac0
[ 120.665620] R13: 0000000000000014 R14: ffff96a323fb8c00 R15: 0000000000000000
[ 120.698552] FS: 00007fe16ec50740(0000) GS:ffff96a32f480000(0000) knlGS:0000000000000000
[ 120.735389] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 120.761414] CR2: ffffe2d2af202360 CR3: 000000046455b000 CR4: 00000000003406e0
[ 120.793444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 120.825045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 120.856860] Call Trace:
[ 120.868207] xfs_trans_free_item_desc+0x45/0x50 [xfs]
[ 120.893931] xfs_trans_free_items+0x80/0xb0 [xfs]
[ 120.917504] xfs_log_commit_cil+0x47c/0x5d0 [xfs]
[ 120.938651] __xfs_trans_commit+0x128/0x230 [xfs]
[ 120.959812] __xfs_trans_roll+0x6c/0xe0 [xfs]
[ 120.979722] xfs_trans_roll+0x25/0x40 [xfs]
[ 120.998656] xfs_defer_trans_roll+0x6b/0x170 [xfs]
[ 121.020117] xfs_defer_finish+0x7a/0x410 [xfs]
[ 121.040102] ? kvfree+0x35/0x40
[ 121.055078] xfs_finish_rename+0x3a/0x70 [xfs]
[ 121.076053] xfs_rename+0x75a/0xaa0 [xfs]
[ 121.094554] xfs_vn_rename+0xe4/0x140 [xfs]
[ 121.113325] vfs_rename+0x4d1/0x760
[ 121.129256] SyS_rename+0x359/0x3d0
[ 121.145260] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 121.166688] RIP: 0033:0x7fe16e0ad887
[ 121.183453] RSP: 002b:00007fff01d91ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ 121.217581] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fe16e0ad887
[ 121.249689] RDX: 000056433f863170 RSI: 000056433f794010 RDI: 000056433f862430
[ 121.281785] RBP: 00007fe16ec506a0 R08: 000056433f863090 R09: 00007fe16ec50740
[ 121.313793] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000002
[ 121.346068] R13: 0000000000000007 R14: 000056433f863090 R15: 000000000000a000
[ 121.378075] Code: b8 00 00 00 80 4c 8b 4d 08 48 8b 15 b1 d8 9f 00 48 01 d8 0f 83 b7 00 00 00 48 01
d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 9e 4f a3 00 <4c> 8b 58 20 41 f6 c3 01 0f 85 56 01 00 00 49 89 c3
4c 8b 17 65
[ 121.468457] RIP: kmem_cache_free+0x5a/0x1f0 RSP: ffffb0ea8638f9e0
[ 121.495935] CR2: ffffe2d2af202360
[ 121.510867] ---[ end trace f947aa5ca41bdfb4 ]---
5 years, 5 months
[GIT PULL] libnvdimm fixes for 4.11-rc7
by Dan Williams
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes
...to receive:
A small crop of lockdep, sleeping while atomic, and other fixes /
band-aids in advance of the full-blown reworks targeting the next
merge window. The largest change here is "libnvdimm: fix blk free
space accounting" which deletes a pile of buggy code that better
testing would have caught before merging. The next change that is
borderline too big for a late rc is switching the device-dax locking
from rcu to srcu, I couldn't think of a smaller way to make that fix.
The __copy_user_nocache fix will have a full replacement in 4.12 to
move those pmem special case considerations into the pmem driver. The
"libnvdimm: band aid btt vs clear poison locking" commit admits that
our error clearing support for btt went in broken, so we just disable
it in 4.11 and -stable. A replacement / full fix is in the pipeline
for 4.12
Some of these would have been caught earlier had
CONFIG_DEBUG_ATOMIC_SLEEP been enabled on my development station. I
wonder if we should have:
config DEBUG_ATOMIC_SLEEP
default PROVE_LOCKING
...since I mistakenly thought I got both with CONFIG_PROVE_LOCKING=y.
These have received a build success notification from the 0day robot,
and some have appeared in a -next release with no reported issues.
---
The following changes since commit c02ed2e75ef4c74e41e421acb4ef1494671585e8:
Linux 4.11-rc4 (2017-03-26 14:15:16 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes
for you to fetch changes up to 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd:
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
(2017-04-12 13:45:18 -0700)
----------------------------------------------------------------
Dan Williams (6):
acpi, nfit, libnvdimm: fix interleave set cookie calculation
(64-bit comparison)
libnvdimm: fix blk free space accounting
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
libnvdimm: band aid btt vs clear poison locking
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
arch/x86/include/asm/pmem.h | 42 ++++++++++++++++++-------
drivers/acpi/nfit/core.c | 6 +++-
drivers/dax/Kconfig | 1 +
drivers/dax/dax.c | 13 ++++----
drivers/nvdimm/bus.c | 6 ++++
drivers/nvdimm/claim.c | 10 +++++-
drivers/nvdimm/dimm_devs.c | 77 +++++++--------------------------------------
7 files changed, 70 insertions(+), 85 deletions(-)
commit b03b99a329a14b7302f37c3ea6da3848db41c8c5
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon Mar 27 21:53:38 2017 -0700
acpi, nfit, libnvdimm: fix interleave set cookie calculation
(64-bit comparison)
While reviewing the -stable patch for commit 86ef58a4e35e "nfit,
libnvdimm: fix interleave set cookie calculation" Ben noted:
"This is returning an int, thus it's effectively doing a 32-bit
comparison and not the 64-bit comparison you say is needed."
Update the compare operation to be immune to this integer demotion problem.
Cc: <stable(a)vger.kernel.org>
Cc: Nicholas Moulin <nicholas.w.moulin(a)linux.intel.com>
Fixes: 86ef58a4e35e ("nfit, libnvdimm: fix interleave set cookie
calculation")
Reported-by: Ben Hutchings <ben(a)decadent.org.uk>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
commit fe514739d8538783749d3ce72f78e5a999ea5668
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue Apr 4 15:08:36 2017 -0700
libnvdimm: fix blk free space accounting
Commit a1f3e4d6a0c3 "libnvdimm, region: update nd_region_available_dpa()
for multi-pmem support" reworked blk dpa (DIMM Physical Address)
accounting to comprehend multiple pmem namespace allocations aliasing
with a given blk-dpa range.
The following call trace is a result of failing to account for allocated
blk capacity.
WARNING: CPU: 1 PID: 2433 at
tools/testing/nvdimm/../../../drivers/nvdimm/names
4 size_store+0x6f3/0x930 [libnvdimm]
nd_region region5: allocation underrun: 0x0 of 0x1000000 bytes
[..]
Call Trace:
dump_stack+0x86/0xc3
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5f/0x80
size_store+0x6f3/0x930 [libnvdimm]
dev_attr_store+0x18/0x30
If a given blk-dpa allocation does not alias with any pmem ranges then
the full allocation should be accounted as busy space, not the size of
the current pmem contribution to the region.
The thinkos that led to this confusion was not realizing that the struct
resource management is already guaranteeing no collisions between pmem
allocations and blk allocations on the same dimm. Also, we do not try to
support blk allocations in aliased pmem holes.
This patch also fixes a case where the available blk goes negative.
Cc: <stable(a)vger.kernel.org>
Fixes: a1f3e4d6a0c3 ("libnvdimm, region: update
nd_region_available_dpa() for multi-pmem support").
Reported-by: Dariusz Dokupil <dariusz.dokupil(a)intel.com>
Reported-by: Dave Jiang <dave.jiang(a)intel.com>
Reported-by: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Dave Jiang <dave.jiang(a)intel.com>
Tested-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
commit 0beb2012a1722633515c8aaa263c73449636c893
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri Apr 7 09:47:24 2017 -0700
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
Holding the reconfig_mutex over a potential userspace fault sets up a
lockdep dependency chain between filesystem-DAX and the libnvdimm ioctl
path. Move the user access outside of the lock.
[ INFO: possible circular locking dependency detected ]
4.11.0-rc3+ #13 Tainted: G W O
-------------------------------------------------------
fallocate/16656 is trying to acquire lock:
(&nvdimm_bus->reconfig_mutex){+.+.+.}, at:
[<ffffffffa00080b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
but task is already holding lock:
(jbd2_handle){++++..}, at: [<ffffffff813b4944>]
start_this_handle+0x104/0x460
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (jbd2_handle){++++..}:
lock_acquire+0xbd/0x200
start_this_handle+0x16a/0x460
jbd2__journal_start+0xe9/0x2d0
__ext4_journal_start_sb+0x89/0x1c0
ext4_dirty_inode+0x32/0x70
__mark_inode_dirty+0x235/0x670
generic_update_time+0x87/0xd0
touch_atime+0xa9/0xd0
ext4_file_mmap+0x90/0xb0
mmap_region+0x370/0x5b0
do_mmap+0x415/0x4f0
vm_mmap_pgoff+0xd7/0x120
SyS_mmap_pgoff+0x1c5/0x290
SyS_mmap+0x22/0x30
entry_SYSCALL_64_fastpath+0x1f/0xc2
-> #1 (&mm->mmap_sem){++++++}:
lock_acquire+0xbd/0x200
__might_fault+0x70/0xa0
__nd_ioctl+0x683/0x720 [libnvdimm]
nvdimm_ioctl+0x8b/0xe0 [libnvdimm]
do_vfs_ioctl+0xa8/0x740
SyS_ioctl+0x79/0x90
do_syscall_64+0x6c/0x200
return_from_SYSCALL_64+0x0/0x7a
-> #0 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
__lock_acquire+0x16b6/0x1730
lock_acquire+0xbd/0x200
__mutex_lock+0x88/0x9b0
mutex_lock_nested+0x1b/0x20
nvdimm_bus_lock+0x21/0x30 [libnvdimm]
nvdimm_forget_poison+0x25/0x50 [libnvdimm]
nvdimm_clear_poison+0x106/0x140 [libnvdimm]
pmem_do_bvec+0x1c2/0x2b0 [nd_pmem]
pmem_make_request+0xf9/0x270 [nd_pmem]
generic_make_request+0x118/0x3b0
submit_bio+0x75/0x150
Cc: <stable(a)vger.kernel.org>
Fixes: 62232e45f4a2 ("libnvdimm: control (ioctl) messages for
nvdimm_bus and nvdimm devices")
Cc: Dave Jiang <dave.jiang(a)intel.com>
Reported-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
commit 4aa5615e080a9855e607accc75b07ab79b252dde
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri Apr 7 12:25:52 2017 -0700
libnvdimm: band aid btt vs clear poison locking
The following warning results from holding a lane spinlock,
preempt_disable(), or the btt map spinlock and then trying to take the
reconfig_mutex to walk the poison list and potentially add new entries.
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
[..]
Call Trace:
dump_stack+0x85/0xc8
___might_sleep+0x184/0x250
__might_sleep+0x4a/0x90
__mutex_lock+0x58/0x9b0
? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
? acpi_nfit_forget_poison+0x79/0x80 [nfit]
? _raw_spin_unlock+0x27/0x40
mutex_lock_nested+0x1b/0x20
nvdimm_bus_lock+0x21/0x30 [libnvdimm]
nvdimm_forget_poison+0x25/0x50 [libnvdimm]
nvdimm_clear_poison+0x106/0x140 [libnvdimm]
nsio_rw_bytes+0x164/0x270 [libnvdimm]
btt_write_pg+0x1de/0x3e0 [nd_btt]
? blk_queue_enter+0x30/0x290
btt_make_request+0x11a/0x310 [nd_btt]
? blk_queue_enter+0xb7/0x290
? blk_queue_enter+0x30/0x290
generic_make_request+0x118/0x3b0
As a minimal fix, disable error clearing when the BTT is enabled for the
namespace. For the final fix a larger rework of the poison list locking
is needed.
Note that this is not a problem in the blk case since that path never
calls nvdimm_clear_poison().
Cc: <stable(a)vger.kernel.org>
Fixes: 82bf1037f2ca ("libnvdimm: check and clear poison before
writing to pmem")
Cc: Dave Jiang <dave.jiang(a)intel.com>
[jeff: dynamically disable error clearing in the btt case]
Suggested-by: Jeff Moyer <jmoyer(a)redhat.com>
Reviewed-by: Jeff Moyer <jmoyer(a)redhat.com>
Reported-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
commit 956a4cd2c957acf638ff29951aabaa9d8e92bbc2
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Fri Apr 7 16:42:08 2017 -0700
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
The following warning triggers with a new unit test that stresses the
device-dax interface.
===============================
[ ERR: suspicious RCU usage. ]
4.11.0-rc4+ #1049 Tainted: G O
-------------------------------
./include/linux/rcupdate.h:521 Illegal context switch in RCU
read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
2 locks held by fio/9070:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8d0739d7>]
__do_page_fault+0x167/0x4f0
#1: (rcu_read_lock){......}, at: [<ffffffffc03fbd02>]
dax_dev_huge_fault+0x32/0x620 [dax]
Call Trace:
dump_stack+0x86/0xc3
lockdep_rcu_suspicious+0xd7/0x110
___might_sleep+0xac/0x250
__might_sleep+0x4a/0x80
__alloc_pages_nodemask+0x23a/0x360
alloc_pages_current+0xa1/0x1f0
pte_alloc_one+0x17/0x80
__pte_alloc+0x1e/0x120
__get_locked_pte+0x1bf/0x1d0
insert_pfn.isra.70+0x3a/0x100
? lookup_memtype+0xa6/0xd0
vm_insert_mixed+0x64/0x90
dax_dev_huge_fault+0x520/0x620 [dax]
? dax_dev_huge_fault+0x32/0x620 [dax]
dax_dev_fault+0x10/0x20 [dax]
__do_fault+0x1e/0x140
__handle_mm_fault+0x9af/0x10d0
handle_mm_fault+0x16d/0x370
? handle_mm_fault+0x47/0x370
__do_page_fault+0x28c/0x4f0
trace_do_page_fault+0x58/0x2a0
do_async_page_fault+0x1a/0xa0
async_page_fault+0x28/0x30
Inserting a page table entry may trigger an allocation while we are
holding a read lock to keep the device instance alive for the duration
of the fault. Use srcu for this keep-alive protection.
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd
Author: Dan Williams <dan.j.williams(a)intel.com>
Date: Thu Apr 6 09:04:31 2017 -0700
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.
Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.
Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable(a)vger.kernel.org>
Cc: <x86(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Matthew Wilcox <mawilcox(a)microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
5 years, 5 months
[PATCH v2 0/7] libnvdimm: acpi updates and a revert
by Dan Williams
Changes since v1:
* moved the rename of nfit_mem_dcr_init() to __nfit_mem_init() to "acpi,
nfit: support 'map failed' dimms"
* added "acpi, nfit: collate health state flags"
* added "tools/testing/nvdimm: fix nfit_test shutdown crashes"
---
With Dave's recent fix [1], we can restore error clearing for btt i/o in
4.12.
ACPI 6.1 introduced new health state flags. Beyond reflecting them in
the dimmX/flags sysfs attribute we also need to handle the deeper
implications of the ACPI_NFIT_MEM_MAP_FAILED flag which changes
assumptions on how the driver discovers dimms. In the "map failed" case
there may missing or no SPA entries associated with a dimm. Those dimms
should still be registered with libnvdimm so that the error state can be
communicated and recovery attempted.
[1]: https://patchwork.kernel.org/patch/9680035/
---
Dan Williams (7):
Revert "libnvdimm: band aid btt vs clear poison locking"
acpi, nfit: add support for acpi 6.1 dimm state flags
tools/testing/nvdimm: test acpi 6.1 health state flags
acpi, nfit: support "map failed" dimms
acpi, nfit: collate health state flags
acpi, nfit: limit ->flush_probe() to initialization work
tools/testing/nvdimm: fix nfit_test shutdown crashes
drivers/acpi/nfit/core.c | 82 +++++++++++++++++++++++++++++++-------
drivers/acpi/nfit/nfit.h | 3 +
drivers/nvdimm/claim.c | 10 -----
tools/testing/nvdimm/test/nfit.c | 50 +++++++++++++++++++++--
4 files changed, 115 insertions(+), 30 deletions(-)
5 years, 5 months