On Tue, Dec 19, 2017 at 05:11:38PM -0800, Dan Williams wrote:
On Fri, Nov 10, 2017 at 1:08 AM, Christoph Hellwig <hch(a)lst.de>
wrote:
>> + struct {
>> + /*
>> + * ZONE_DEVICE pages are never on an lru or handled by
>> + * a slab allocator, this points to the hosting device
>> + * page map.
>> + */
>> + struct dev_pagemap *pgmap;
>> + /*
>> + * inode association for MEMORY_DEVICE_FS_DAX page-idle
>> + * callbacks. Note that we don't use ->mapping
since
>> + * that has hard coded page-cache assumptions in
>> + * several paths.
>> + */
>
> What assumptions? I'd much rather fix those up than having two fields
> that have the same functionality.
[ Reviving this old thread where you asked why I introduce page->inode
instead of reusing page->mapping ]
For example, xfs_vm_set_page_dirty() assumes that page->mapping being
non-NULL indicates a typical page cache page, this is a false
assumption for DAX.
That means every single filesystem has an incorrect assumption for
DAX pages. xfs_vm_set_page_dirty() is derived directly from
__set_page_dirty_buffers(), which is the default function that
set_page_dirty() calls to do it's work. Indeed, ext4 also calls
__set_page_dirty_buffers(), so whatever problem XFS has here with
DAX and racing truncates is going to manifest in ext4 as well.
My guess at a fix for this is to add
pagecache_page() checks to locations like this, but I worry about how
to find them all. Where pagecache_page() is:
bool pagecache_page(struct page *page)
{
if (!page->mapping)
return false;
if (!IS_DAX(page->mapping->host))
return false;
return true;
}
This is likely to be a problem in lots more places if we have to
treat "has page been truncated away" race checks on dax mappings
differently to page cache mappings. This smells of a whack-a-mole
style bandaid to me....
Cheers,
Dave.
--
Dave Chinner
david(a)fromorbit.com