> Jan Kara wrote on 2016-08-08:
> > On Fri 05-08-16 19:58:33, Boylston, Brian wrote:
>
> I used NVML 1.1 for the measurements. In this version and with the
> hardware that I used, the pmem_persist() flow is:
>
> pmem_persist()
> pmem_flush()
> Func_flush() == flush_clflush
> CLFLUSH
> pmem_drain()
> Func_predrain_fence() == predrain_fence_empty
> no-op
>
> So, I don't think that pmem_persist() does anything to cause the filesystem
> to flush metadata as it doesn't make any system calls?
Ah, you are right. I somehow misread what is in NVML sources. I agree with
Christoph that _persist suffix is then misleading for the reasons he stated
but that's irrelevant to the test you did.
So it indeed seems that in your test movnt + sfence is an order of
magnitude faster than cached memcpy + cflush + sfence. I'm surprised I have
to say.
movnt is posted to WC buffer, which is asynchronously evicted to memory
when each line is filled.
clflush, on the other hand, must be serialized. So, it has to synchronously evict
line-by-line. clflushopt, when supported by new CPUs, should be a lot faster as
it can execute simultaneously and does not have to wait line-by-line. It'd be still
slower than uncached copy, though.
-Toshi