On Wed, May 30 2018 at 9:07am -0400,
Mikulas Patocka <mpatocka(a)redhat.com> wrote:
On Mon, 28 May 2018, Dan Williams wrote:
> On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka(a)redhat.com> wrote:
> > I measured it (with nvme backing store) and late cache flushing has 12%
> > better performance than eager flushing with memcpy_flushcache().
> I assume what you're seeing is ARM64 over-flushing the amount of dirty
> data so it becomes more efficient to do an amortized flush at the end?
> However, that effectively makes memcpy_flushcache() unusable in the
> way it can be used on x86. You claimed that ARM does not support
> non-temporal stores, but it does, see the STNP instruction. I do not
> want to see arch specific optimizations in drivers, so either
> write-through mappings is a potential answer to remove the need to
> explicitly manage flushing, or just implement STNP hacks in
> memcpy_flushcache() like you did with MOVNT on x86.
> > 131836 4k iops - vs - 117016.
> To be clear this is memcpy_flushcache() vs memcpy + flush?
I found out what caused the difference. I used dax_flush on the version of
dm-writecache that I had on the ARM machine (with the kernel 4.14, because
it is the last version where dax on ramdisk works) - and I thought that
dax_flush flushes the cache, but it doesn't.
When I replaced dax_flush with arch_wb_cache_pmem, the performance
difference between early flushing and late flushing disappeared.
So I think we can remove this per-architecture switch from dm-writecache.
That is really great news, can you submit an incremental patch that
layers ontop of the linux-dm.git 'dm-4.18' branch?