On Wed, Apr 26, 2017 at 12:43 PM, Jeff Moyer <jmoyer(a)redhat.com> wrote:
Dan Williams <dan.j.williams(a)intel.com> writes:
> In the case where a dimm does not have any associated flush hints the
> ndrd->flush_wpq array may be uninitialized leading to crashes with the
> following signature:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: region_visible+0x10f/0x160 [libnvdimm]
>
> Call Trace:
> internal_create_group+0xbe/0x2f0
> sysfs_create_groups+0x40/0x80
> device_add+0x2d8/0x650
> nd_async_device_register+0x12/0x40 [libnvdimm]
> async_run_entry_fn+0x39/0x170
> process_one_work+0x212/0x6c0
> ? process_one_work+0x197/0x6c0
> worker_thread+0x4e/0x4a0
> kthread+0x10c/0x140
> ? process_one_work+0x6c0/0x6c0
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x31/0x40
Sorry for being dense, but I'm having a tough time connecting the dots,
here. How does region_visible trip over the missing (not uninitialized,
you're actually walking off the end of the structure) wpq_flush array?
So, you're not dense, or you're at least as equally dense as me,
because I didn't immediately understand where this failure was coming
from either. I just happened to trigger it while running patch2 and
thought the current code just looked unsafe by inspection.
Anyway, the fix looks valid.
Reviewed-by: Jeff Moyer <jmoyer(a)redhat.com>
Thanks!