On 22.07.19 14:00, Christian Borntraeger wrote:
On 22.07.19 13:43, Cornelia Huck wrote:
> On Mon, 22 Jul 2019 13:20:18 +0200
> Christian Borntraeger <borntraeger(a)de.ibm.com> wrote:
>
>> On 22.07.19 12:56, Dr. David Alan Gilbert wrote:
>>> * Christian Borntraeger (borntraeger(a)de.ibm.com) wrote:
>>>>
>>>>
>>>> On 18.07.19 16:30, Dan Williams wrote:
>>>>> On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal <vgoyal(a)redhat.com>
wrote:
>>>>>>
>>>>>> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote:
>>>>>>> On Wed, 15 May 2019 15:27:03 -0400
>>>>>>> Vivek Goyal <vgoyal(a)redhat.com> wrote:
>>>>>>>
>>>>>>>> From: Stefan Hajnoczi <stefanha(a)redhat.com>
>>>>>>>>
>>>>>>>> Setup a dax device.
>>>>>>>>
>>>>>>>> Use the shm capability to find the cache entry and map
it.
>>>>>>>>
>>>>>>>> The DAX window is accessed by the fs/dax.c infrastructure
and must have
>>>>>>>> struct pages (at least on x86). Use
devm_memremap_pages() to map the
>>>>>>>> DAX window PCI BAR and allocate struct page.
>>>>>>>>
>>>>>>>
>>>>>>> Sorry for being this late. I don't see any more recent
version so I will
>>>>>>> comment here.
>>>>>>>
>>>>>>> I'm trying to figure out how is this supposed to work on
s390. My concern
>>>>>>> is, that on s390 PCI memory needs to be accessed by special
>>>>>>> instructions. This is taken care of by the stuff defined in
>>>>>>> arch/s390/include/asm/io.h. E.g. we 'override'
__raw_writew so it uses
>>>>>>> the appropriate s390 instruction. However if the code does
not use the
>>>>>>> linux abstractions for accessing PCI memory, but assumes it
can be
>>>>>>> accessed like RAM, we have a problem.
>>>>>>>
>>>>>>> Looking at this patch, it seems to me, that we might end up
with exactly
>>>>>>> the case described. For example AFAICT copy_to_iter() (3)
resolves to
>>>>>>> the function in lib/iov_iter.c which does not seem to cater
for s390
>>>>>>> oddities.
>>>>>>>
>>>>>>> I didn't have the time to investigate this properly, and
since virtio-fs
>>>>>>> is virtual, we may be able to get around what is otherwise a
>>>>>>> limitation on s390. My understanding of these areas is
admittedly
>>>>>>> shallow, and since I'm not sure I'll have much more
time to
>>>>>>> invest in the near future I decided to raise concern.
>>>>>>>
>>>>>>> Any opinions?
>>>>>>
>>>>>> Hi Halil,
>>>>>>
>>>>>> I don't understand s390 and how PCI works there as well. Is
there any
>>>>>> other transport we can use there to map IO memory directly and
access
>>>>>> using DAX?
>>>>>>
>>>>>> BTW, is DAX supported for s390.
>>>>>>
>>>>>> I am also hoping somebody who knows better can chip in. Till that
time,
>>>>>> we could still use virtio-fs on s390 without DAX.
>>>>>
>>>>> s390 has so-called "limited" dax support, see
CONFIG_FS_DAX_LIMITED.
>>>>> In practice that means that support for PTE_DEVMAP is missing which
>>>>> means no get_user_pages() support for dax mappings. Effectively
it's
>>>>> only useful for execute-in-place as operations like fork() and
ptrace
>>>>> of dax mappings will fail.
>>>>
>>>>
>>>> This is only true for the dcssblk device driver
(drivers/s390/block/dcssblk.c
>>>> and arch/s390/mm/extmem.c).
>>>>
>>>> For what its worth, the dcssblk looks to Linux like normal memory (just
above the
>>>> previously detected memory) that can be used like normal memory. In
previous time
>>>> we even had struct pages for this memory - this was removed long ago
(when it was
>>>> still xip) to reduce the memory footprint for large dcss blocks and small
memory
>>>> guests.
>>>> Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that
memory?
>>>>
>>>> Now some observations:
>>>> - dcssblk is z/VM only (not KVM)
>>>> - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on
wether
>>>> a device driver is compiled in or not seems not flexible enough in case
if you
>>>> have device driver that does have struct pages and another one that
doesn't
>>>> - I do not see a reason why we should not be able to map anything from
QEMU
>>>> into the guest real memory via an additional KVM memory slot.
>>>> We would need to handle that in the guest somehow (and not as a PCI
bar),
>>>> register this with struct pages etc.
>
> You mean for ccw, right? I don't think we want pci to behave
> differently than everywhere else.
Yes for virtio-ccw. We would need to have a look at how virtio-ccw can create a memory
mapping with struct pages, so that DAX will work.(Dan, it is just struct pages that
you need, correct?)
>
>>>> - we must then look how we can create the link between the guest memory
and the
>>>> virtio-fs driver. For virtio-ccw we might be able to add a new ccw
command or
>>>> whatever. Maybe we could also piggy-back on some memory hotplug work
from David
>>>> Hildenbrand (add cc).
>>>>
>>>> Regarding limitations on the platform:
>>>> - while we do have PCI, the virtio devices are usually plugged via the
ccw bus.
>>>> That implies no PCI bars. I assume you use those PCI bars only to
implicitely
>>>> have the location of the shared memory
>>>> Correct?
>>>
>>> Right.
>>
>> So in essence we just have to provide a vm_get_shm_region callback in the
virtio-ccw
>> guest code?
>>
>> How many regions do we have to support? One region per device? Or many?
>> Even if we need more, this should be possible with a 2 new CCWs, e.g
READ_SHM_BASE(id)
>> and READ_SHM_SIZE(id)
>
> I'd just add a single CCW with a control block containing id and size.
>
> The main issue is where we put those regions, and what happens if we
> use both virtio-pci and virtio-ccw on the same machine.
Then these 2 devices should get independent memory regions that are added in an
independent (but still exclusive) way.
I remember that one discussion was who dictates the physical address
mapping. If I'm not wrong, PCI bars can be mapped freely by the guest
intot he address space. So it would not just be querying the start+size.
Unless we want a pre-determined mapping (which might make more sense for
s390x).
--
Thanks,
David / dhildenb