On 6/1/20 5:37 PM, Michal Suchánek wrote:
On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote:
> On 6/1/20 3:39 PM, Jan Kara wrote:
>> On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
>>> On 5/29/20 3:22 PM, Jan Kara wrote:
>>>> On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
>>>>> Thanks Michal. I also missed Jeff in this email thread.
>>>>
>>>> And I think you'll also need some of the sched maintainers for the
prctl
>>>> bits...
>>>>
>>>>> On 5/29/20 3:03 PM, Michal Suchánek wrote:
>>>>>> Adding Jan
>>>>>>
>>>>>> On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V
wrote:
>>>>>>> With POWER10, architecture is adding new pmem flush and sync
instructions.
>>>>>>> The kernel should prevent the usage of MAP_SYNC if
applications are not using
>>>>>>> the new instructions on newer hardware.
>>>>>>>
>>>>>>> This patch adds a prctl option MAP_SYNC_ENABLE that can be
used to enable
>>>>>>> the usage of MAP_SYNC. The kernel config option is added to
allow the user
>>>>>>> to control whether MAP_SYNC should be enabled by default or
not.
>>>>>>>
>>>>>>> Signed-off-by: Aneesh Kumar K.V
<aneesh.kumar(a)linux.ibm.com>
>>>> ...
>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>>> index 8c700f881d92..d5a9a363e81e 100644
>>>>>>> --- a/kernel/fork.c
>>>>>>> +++ b/kernel/fork.c
>>>>>>> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp
DEFINE_SPINLOCK(mmlist_lock);
>>>>>>> static unsigned long default_dump_filter =
MMF_DUMP_FILTER_DEFAULT;
>>>>>>> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
>>>>>>> +unsigned long default_map_sync_mask =
MMF_DISABLE_MAP_SYNC_MASK;
>>>>>>> +#else
>>>>>>> +unsigned long default_map_sync_mask = 0;
>>>>>>> +#endif
>>>>>>> +
>>>>
>>>> I'm not sure CONFIG is really the right approach here. For a distro
that would
>>>> basically mean to disable MAP_SYNC for all PPC kernels unless
application
>>>> explicitly uses the right prctl. Shouldn't we rather initialize
>>>> default_map_sync_mask on boot based on whether the CPU we run on
requires
>>>> new flush instructions or not? Otherwise the patch looks sensible.
>>>>
>>>
>>> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
>>> But on a virtualized platform there is no easy way to detect that. We could
>>> ideally hook this into the nvdimm driver where we look at the new compat
>>> string ibm,persistent-memory-v2 and then disable MAP_SYNC
>>> if we find a device with the specific value.
>>
>> Hum, couldn't we set some flag for nvdimm devices with
>> "ibm,persistent-memory-v2" property and then check it during mmap(2)
time
>> and when the device has this propery and the mmap(2) caller doesn't have
>> the prctl set, we'd disallow MAP_SYNC? That should make things mostly
>> seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
>> devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
>> applications need to be aware of new instructions so this isn't that much
>> additional burden...
>
> I am not sure application would want to add that much details/knowledge
> about a platform in their code. I was expecting application to do
>
> #ifdef __ppc64__
> prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0));
> #endif
> a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
> MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0);
>
>
> For that code all the complexity that we add w.r.t ibm,persistent-memory-v2
> is not useful. Do you see a value in making all these device specific rather
> than a conditional on __ppc64__?
If the vpmem devices continue to work with the old instruction on
POWER10 then it makes sense to make this per-device.
vPMEM doesn't have write_cache and hence it is synchronous even without
using any specific flush instruction. The question is do we want to have
different programming steps when running on vPMEM vs a persistent PMEM
device on ppc64.
I will work on the device specific ENABLE flag and then we can compare
the kernel complexity against the added benefit.
-aneesh