-----Original Message-----
From: Coly Li <colyli(a)suse.de>
Sent: Thursday, September 3, 2020 4:37 PM
To: Mike Snitzer <snitzer(a)redhat.com>
Cc: Jan Kara <jack(a)suse.com>; Ira Weiny <ira.weiny(a)intel.com>; Pankaj Gupta
<pankaj.gupta.linux(a)gmail.com>; Vishal Verma <vishal.l.verma(a)intel.com>;
linux-nvdimm(a)lists.01.org; Adrian Huang12 <ahuang12(a)lenovo.com>
Subject: [External] Re: flood of "dm-X: error: dax access failed" due to 5.9
commit 231609785cbfb
On 2020/9/3 13:20, Coly Li wrote:
> On 2020/9/3 00:51, Mike Snitzer wrote:
>> On Wed, Sep 02 2020 at 12:46pm -0400, Coly Li <colyli(a)suse.de> wrote:
>>
>>> On 2020/9/3 00:44, Mike Snitzer wrote:
>>>> On Wed, Sep 02 2020 at 12:40pm -0400, Coly Li <colyli(a)suse.de>
>>>> wrote:
>>>>
>>>>> On 2020/9/3 00:04, Mike Snitzer wrote:
>>>>>> 5.9 commit 231609785cbfb ("dax: print error message by
pr_info()
>>>>>> in
>>>>>> __generic_fsdax_supported()") switched from pr_debug() to
pr_info().
>>>>>>
>>>>>> The justification in the commit header is really inadequate.
If
>>>>>> there is a problem that you need to drill in on, repeat the
>>>>>> testing after enabling the dynamic debugging.
>>>>>>
>>>>>> Otherwise, now all DM devices that aren't layered on DAX
capable
>>>>>> devices spew really confusing noise to users when they simply
>>>>>> activate their non-DAX DM devices:
>>>>>>
>>>>>> [66567.129798] dm-6: error: dax access failed (-5)
[66567.134400]
>>>>>> dm-6: error: dax access failed (-5) [66567.139152] dm-6:
error:
>>>>>> dax access failed (-5) [66567.314546] dm-2: error: dax access
>>>>>> failed (-95) [66567.319380] dm-2: error: dax access failed
(-95)
>>>>>> [66567.324254] dm-2: error: dax access failed (-95)
>>>>>> [66567.479025] dm-2: error: dax access failed (-95)
>>>>>> [66567.483713] dm-2: error: dax access failed (-95)
>>>>>> [66567.488722] dm-2: error: dax access failed (-95)
>>>>>> [66567.494061] dm-2: error: dax access failed (-95)
>>>>>> [66567.498823] dm-2: error: dax access failed (-95)
>>>>>> [66567.503693] dm-2: error: dax access failed (-95)
>>>>>>
>>>>>> commit 231609785cbfb must be reverted.
>>>>>>
>>>>>> Please advise, thanks.
>>>>>
>>>>> Adrian Huang from Lenovo posted a patch, which titled: dax: do not
>>>>> print error message for non-persistent memory block device
>>>>>
>>>>> It fixes the issue, but no response for now. Maybe we should take
this fix.
>>>>
>>>> OK, yes sounds like it. It was merged and is commit
>>>> c2affe920b0e066
>>>> ("dax: do not print error message for non-persistent memory block
>>>> device")
>>>
>>> Thanks for informing me this patch is merged, I am going to update
>>> my local one :-)
>>
>> So the thing is I'm running v5.9-rc3 (which includes this commit) but
>> I'm still seeing all these warnings when I run the lvm2 testsuite.
>> The reason _seems_ to be because the lvm2 testsuite uses brd devices
>> for test devices. So there is something about the brd device that
>> shows commit c2affe920b0e066 isn't enough :(
>
> [Resend and CC Adrian Huang]
>
> Hi Mike,
>
> Could you please apply and test this attached patch based on v5.9-rc3 ?
>
> It seems the pointer dax_dev of __generic_fsdax_supported() parameter
> is not initialized (IMHO this is not a dm bug), therefore the &&
> should be
> || to check the dax support state.
>
> Also I add two pr_info() to print the variables value, let's see
> whether my guess makes sense.
Also I suggest some kind of change like this in drivers/md/dm.c,
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index
fb0255d25e4b..566d8208df47 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -818,6 +818,8 @@ int dm_get_table_device(struct mapped_device *md,
dev_t dev, fmode_t mode,
return -ENOMEM;
}
+ memset(td, 0, sizeof(struct table_device));
+
This does not help. See the following log.
-----------------
# lvm2-testsuite --only activate-minor
.......
[ 0:00] #activate-minor.sh:22+ aux prepare_vg 2
[ 0:00] ## preparing ramdisk device...ok (/dev/ram0)
[ 0:01] 6,3160,150710756,-;brd: module loaded
[ 0:01] ## preparing 2 devices...ok
[ 0:01] 6,3161,150730864,-;dax_dev: 0000000000000000
[ 0:01] 6,3162,150730866,-;bdev_dax_supported(): 0
[ 0:01] 6,3163,150730903,-;dax_dev: 0000000000000000
[ 0:01] 6,3164,150730905,-;bdev_dax_supported(): 0
[ 0:01] 6,3165,150731019,-;dax_dev: 0000000000000000
[ 0:01] 6,3166,150731020,-;bdev_dax_supported(): 0
[ 0:01] 6,3167,150731512,-;dax_dev: 0000000000000000
[ 0:01] 6,3168,150731514,-;bdev_dax_supported(): 0
[ 0:01] 6,3169,150731525,-;dax_dev: 0000000000000000
[ 0:01] 6,3170,150731525,-;bdev_dax_supported(): 0
[ 0:01] 6,3171,150731656,-;dax_dev: 0000000000000000
[ 0:01] 6,3172,150731657,-;bdev_dax_supported(): 0
.......
[ 0:01] lvchange $vg/foo -a y
[ 0:01] #activate-minor.sh:25+ lvchange LVMTEST12302vg/foo -a y
[ 0:01] /tmp/LVMTEST12302.W0HGxyzxst/dev/mapper/LVMTEST12302vg-foo not set up by udev:
Falling back to direct node creation.
[ 0:01] 6,3173,150927070,-;dax_dev: 00000000f0a5865d
[ 0:01] 6,3174,150927072,-;bdev_dax_supported(): 0
[ 0:01] 6,3175,150927081,-;dax_dev: 00000000f0a5865d
[ 0:01] 6,3176,150927082,-;bdev_dax_supported(): 0
[ 0:01] 6,3177,150927241,-;dax_dev: 00000000f0a5865d
[ 0:01] 6,3178,150927242,-;bdev_dax_supported(): 0
----------------
td->dm_dev.mode = mode;
td->dm_dev.bdev = NULL;
The above change may make sure *dax_dev sent into
__generic_fsdax_supported() is always NULL if the target does not support DAX.
But IMHO this is not 100% necessary, it just make
__generic_fsdax_supported() return false faster by the following change in
previous attached patch,
- if (!dax_dev && !bdev_dax_supported(bdev, blocksize)) {
+ if (!dax_dev || !bdev_dax_supported(bdev, blocksize)) {
I am not very familiar with dm code, CMIIW, just for your information.
Coly Li
-- Adrian