On 09/02/2015 03:27 AM, Boaz Harrosh wrote:
> > Yet you're ignoring the fact that flushing the entire
range of the
> > relevant VMAs may not be very efficient. It may be a very
> > large mapping with only a few pages that need flushing from the
> > cache, but you still iterate the mappings flushing GB ranges from
> > the cache at a time.
> >
So actually you are wrong about this. We have a working system and as part
of our testing rig we do performance measurements, constantly. Our random
mmap 4k writes test preforms very well and is in par with the random-direct-write
implementation even though on every unmap, we do a VMA->start/end cl_flushing.
The cl_flush operation is a no-op if the cacheline is not dirty and is a
memory bus storm with all the CLs that are dirty. So the only cost
is the iteration of vma->start-to-vma->end i+=64
I'd be curious what the cost is in practice. Do you have any actual
numbers of the cost of doing it this way?
Even if the instruction is a "noop", I'd really expect the overhead to
really add up for a tens-of-gigabytes mapping, no matter how much the
CPU optimizes it.