On Tue 15-09-20 15:12:21, Verma, Vishal L wrote:
On Tue, 2020-09-15 at 10:01 +0200, Jan Kara wrote:
> Hi!
>
> On Tue 15-09-20 11:03:29, colyli(a)suse.de wrote:
> > Could you please to take a look? I am offline in the next two weeks.
>
> I just had a look into this. IMHO the justification in 6180bb446a "dax: fix
> detection of dax support for non-persistent memory block devices" is just
> bogus and people got confused by the previous condition
>
> if (!dax_dev && !bdev_dax_supported(bdev, blocksize))
>
> which was bogus as well. bdev_dax_supported() always returns false for bdev
> that doesn't have dax_dev (naturally so). So in the original condition
> there was no point in calling bdev_dax_supported() if we know dax_dev is
> NULL.
>
> Then this was changed to:
>
> if (!dax_dev || !bdev_dax_supported(bdev, blocksize))
>
> which looks more sensible at the first sight. But only at the first sight -
> if you look at wider context, __generic_fsdax_supported() is the bulk of
> code that decides whether a device supports DAX so calling
> bdev_dax_supported() from it indeed doesn't look as such a great idea. So
> IMO the condition should be just:
>
> if (!dax_dev)
>
> I'll send a fix for this.
>
> Also there's the process question how this patch could get to Linus when
> any attempt to use DAX would immediately kill the machine like Mikulas
> spotted. This shows the that patch was untested with DAX by anybody on the
> path from the developer to Linus...
This was entirely my fault, and I apologize. I got confused as to what
state my branches were in, and I thought this had cleared our unit tests
etc, when it obviously hadn't. I'm going to take a harder look at my
personal branch/patch management process to make sure it doesn't happen
again!
No worries. Bugs happen and this was still caught rather early without real
harm caused... I was just ranting to make sure testing does happen in the
future :)
Honza
> > -------- 原始邮件 --------
> > 发件人: Mikulas Patocka <mpatocka(a)redhat.com>
> > 日期: 2020年9月14日周一半夜11:48
> > 收件人: Coly Li <colyli(a)suse.de>, Dan Williams
<dan.j.williams(a)intel.com>,
> > Dave Jiang <dave.jiang(a)intel.com>
> > 抄送: Jan Kara <jack(a)suse.com>, Vishal Verma
<vishal.l.verma(a)intel.com>,
> > Adrian Huang <ahuang12(a)lenovo.com>, Ira Weiny
<ira.weiny(a)intel.com>, Mike
> > Snitzer <snitzer(a)redhat.com>, Pankaj Gupta
<pankaj.gupta.linux(a)gmail.com>,
> > linux-nvdimm(a)lists.01.org
> > 主题: regression caused by patch 6180bb446ab624b9ab8bf201ed251ca87f07b413
> > ("dax: fix detection of dax support for non-persistent memory block
> > devices")
> >
> > Hi
> >
> > The patch 6180bb446ab624b9ab8bf201ed251ca87f07b413 ("dax: fix
detection of
> > dax support for non-persistent memory block devices") causes crash
when
> > attempting to mount the ext4 filesystem on /dev/pmem0 ("mkfs.ext4
> > /dev/pmem0; mount -t ext4 /dev/pmem0 /mnt/test"). The device
/dev/pmem0 is
> > emulated using the "memmap" kernel parameter.
> >
> > The patch causes infinite recursion and double-fault exception:
> >
> > __generic_fsdax_supported
> > bdev_dax_supported
> > __bdev_dax_supported
> > dax_supported
> > dax_dev->ops->dax_supported
> > generic_fsdax_supported
> > __generic_fsdax_supported
> >
> > Mikulas
> >
> >
> >
> > [ 17.500619] traps: PANIC: double fault, error_code: 0x0
> > [ 17.500619] double fault: 0000 [#1] PREEMPT SMP
> > [ 17.500620] CPU: 0 PID: 1326 Comm: mount Not tainted 5.9.0-rc1-bisect #
> > 10
> > [ 17.500620] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 17.500621] RIP: 0010:__generic_fsdax_supported+0x6a/0x500
> > [ 17.500622] Code: ff ff ff ff ff 7f 00 48 21 f3 48 01 c3 48 c1 e3 09 f6
> > c7 0e 0f 85 fa 01 00 00 48 85 ff 49 89 fd 74 11 be 00 10 00 00 4c 89 e7
> > <e8> b1 fe ff ff 84 c0 75 11 31 c0 48 83 c4 48 5b 5d 41 5c 41 5d 41
> > [ 17.500623] RSP: 0018:ffff88940b4fdff8 EFLAGS: 00010286
> > [ 17.500624] RAX: 0000000000000000 RBX: 00000007fffff000 RCX:
> > 0000000000000000
> > [ 17.500625] RDX: 0000000000001000 RSI: 0000000000001000 RDI:
> > ffff88940b34c300
> > [ 17.500625] RBP: 0000000000000000 R08: 0000000004000000 R09:
> > 8080808080808080
> > [ 17.500626] R10: 0000000000000000 R11: fefefefefefefeff R12:
> > ffff88940b34c300
> > [ 17.500626] R13: ffff88940b3dc000 R14: ffff88940badd000 R15:
> > 0000000000000001
> > [ 17.500627] FS: 00000000f7c25780(0000) GS:ffff88940fa00000(0000)
> > knlGS:0000000000000000
> > [ 17.500628] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> > [ 17.500628] CR2: ffff88940b4fdfe8 CR3: 000000140bd15000 CR4:
> > 00000000000006b0
> > [ 17.500628] Call Trace:
> > [ 17.500629] Modules linked in: uvesafb cfbfillrect cfbimgblt cn
> > cfbcopyarea fb fbdev ipv6 tun autofs4 binfmt_misc configfs af_packet
> > virtio_rng rng_core mousedev evdev pcspkr virtio_balloon button raid10
> > raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor
> > async_tx libcrc32c raid1 raid0 md_mod sd_mod t10_pi virtio_scsi virtio_net
> > net_failover psmouse scsi_mod failover
> > [ 17.500638] ---[ end trace 3c877fcb5b865459 ]---
> > [ 17.500638] RIP: 0010:__generic_fsdax_supported+0x6a/0x500
> > [ 17.500639] Code: ff ff ff ff ff 7f 00 48 21 f3 48 01 c3 48 c1 e3 09 f6
> > c7 0e 0f 85 fa 01 00 00 48 85 ff 49 89 fd 74 11 be 00 10 00 00 4c 89 e7
> > <e8> b1 fe ff ff 84 c0 75 11 31 c0 48 83 c4 48 5b 5d 41 5c 41 5d 41
> > [ 17.500640] RSP: 0018:ffff88940b4fdff8 EFLAGS: 00010286
> > [ 17.500641] RAX: 0000000000000000 RBX: 00000007fffff000 RCX:
> > 0000000000000000
> > [ 17.500641] RDX: 0000000000001000 RSI: 0000000000001000 RDI:
> > ffff88940b34c300
> > [ 17.500642] RBP: 0000000000000000 R08: 0000000004000000 R09:
> > 8080808080808080
> > [ 17.500642] R10: 0000000000000000 R11: fefefefefefefeff R12:
> > ffff88940b34c300
> > [ 17.500643] R13: ffff88940b3dc000 R14: ffff88940badd000 R15:
> > 0000000000000001
> > [ 17.500643] FS: 00000000f7c25780(0000) GS:ffff88940fa00000(0000)
> > knlGS:0000000000000000
> > [ 17.500644] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> > [ 17.500644] CR2: ffff88940b4fdfe8 CR3: 000000140bd15000 CR4:
> > 00000000000006b0
> > [ 17.500645] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 17.500941] Kernel Offset: disabled
> >
> >
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR