On Thu, May 07, 2015 at 09:53:13PM +0200, Ingo Molnar wrote:
* Ingo Molnar <mingo(a)kernel.org> wrote:
> > Is handling kernel pagefault on the vmemmap completely out of the
> > picture ? So we would carveout a chunck of kernel address space
> > for those pfn and use it for vmemmap and handle pagefault on it.
> That's pretty clever. The page fault doesn't even have to do remote
> TLB shootdown, because it only establishes mappings - so it's pretty
> atomic, a bit like the minor vmalloc() area faults we are doing.
> Some sort of LRA (least recently allocated) scheme could unmap the
> area in chunks if it's beyond a certain size, to keep a limit on
> size. Done from the same context and would use remote TLB shootdown.
> The only limitation I can see is that such faults would have to be
> able to sleep, to do the allocation. So pfn_to_page() could not be
> used in arbitrary contexts.
So another complication would be that we cannot just unmap such pages
when we want to recycle them, because the struct page in them might be
in use - so all struct page uses would have to refcount the underlying
page. We don't really do that today: code just looks up struct pages
and assumes they never go away.
I still think this is doable, like i said in another email, i think we
should introduce a special pfn_to_page_dev|pmem|waffle|somethingyoulike()
to place that are allowed to allocate the underlying struct page.
For instance we can use a default page to backup all this special vmem
range with some specialy crafted struct page that says that its is
invalid memory (make this mapping read only so all write to this
special struct page is forbidden).
Now once an authorized user comes along and need a real struct page it
trigger a page allocation that replace the page full of fake invalid
struct page with a page with correct valid struct page that can be
manipulated by other part of the kernel.
So regular pfn_to_page() would test against special vmemmap and if
special test the content of struct page for some flag. If it's the
invalid page flag it returns 0.
But once a proper struct page is allocated then pfn_page would return
the struct page as expected.
That way you will catch all invalid user of such page ie user that use
the page after its lifetime is done. You will also limit the creation
of the underlying proper struct page to only code that are legitimate
to ask for a proper struct page for given pfn.
Also you would get kernel write fault on the page full of fake struct
page and that would allow to catch further wrong use.
Anyway this is how i envision this and i think it would work for my
usecase too (GPU it is for me :))