Dan Williams <dan.j.williams(a)intel.com> writes:
On Mon, Apr 15, 2019 at 1:32 PM Jeff Moyer <jmoyer(a)redhat.com>
wrote:
>
> Hi,
>
> We ran into a situation where one dimm's label area was corrupt (it had
> multiple entries for the same DPA), and so that nmem was disabled. The
> region as a whole was enabled, though. In this case, there was a single
> namespace that showed up as disabled. Trying to reconfigure that
> namespace resulted in the following panic:
I'm concerned that there is a seed device for userspace to get its
hopes up thinking it can do anything with the region.
That looked to be by design, to me. I thought we kept the region active
in case you wanted to create a dimm local (blk) namespace.
> BUG: unable to handle kernel NULL pointer dereference at
0000000000000024
> [ 108.386384] #PF error: [normal kernel read fault]
> [ 108.391090] PGD 11c3eae5067 P4D 11c3eae5067 PUD 11c262c9067 PMD 0
> [ 108.397267] Oops: 0000 [#1] SMP NOPTI
> [ 108.400936] CPU: 39 PID: 3838 Comm: ndctl Not tainted 5.1.0-rc5 #1
> [ 108.407114] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS
SE5C620.86B.0D.01.0286.011120190816 01/11/2019
> [ 108.417200] RIP: 0010:preamble_next+0xd/0x60 [libnvdimm]
> ...
>
> The problem is that there is no driver data associated with the
> nd_mapping in this case. Check for that.
>
> Signed-off-by: Jeff Moyer <jmoyer(a)redhat.com>
> Reported-by: Zhang Yi <yizhan(a)redhat.com>
>
> diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
> index f3d753d3169c..da1f28d8c9e6 100644
> --- a/drivers/nvdimm/label.c
> +++ b/drivers/nvdimm/label.c
> @@ -1217,7 +1217,7 @@ static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
> return 0;
>
> /* no index || no labels == nothing to delete */
> - if (!preamble_next(ndd, &nsindex, &free, &nslot))
> + if (!ndd || !preamble_next(ndd, &nsindex, &free, &nslot))
> return 0;
This looks ok, but I think I'd prefer to fix init_active_labels() to
make sure it is failing region activation. I suspect something is
wrong with this logic, but nothing is immediately coming to mind.
I also worry that, if there had been available space in the region,
new namespace creation could fail in the same way.
/*
* If the dimm is disabled then we may need to prevent
* the region from being activated.
*/
if (!ndd) {
if (test_bit(NDD_LOCKED, &nvdimm->flags))
/* fail, label data may be unreadable */;
else if (test_bit(NDD_ALIASING, &nvdimm->flags))
/* fail, labels needed to disambiguate dpa */;
else
return 0;
dev_err(&nd_region->dev, "%s: is %s, failing
probe\n",
dev_name(&nd_mapping->nvdimm->dev),
test_bit(NDD_LOCKED, &nvdimm->flags)
? "locked" : "disabled");
return -ENXIO;
}
I also looked at this logic. The only thing wrong here, I think, is
that you don't need to disable the region for a locked dimm (unless all
dimms are locked).
I'll try to reproduce...
OK. If you can create custom labels, I'd say that would be the easiest
way to reproduce this.
Thanks!
Jeff