On Fri, 18 May 2018, Dan Williams wrote:
>> ...and I wonder what the benefit is of the 16-byte case? I
>> assume the bulk of the benefit is limited to the 4 and 8 byte copy
> dm-writecache uses 16-byte writes frequently, so it is needed for that.
> If we split 16-byte write to two 8-byte writes, it would degrade
> performance for architectures where memcpy_flushcache needs to flush the
My question was how measurable it is to special case 16-byte
transfers? I know Ingo is going to ask this question, so it would
speed things along if this patch included performance benefit numbers
for each special case in the changelog.
I tested it some times ago - and the movnti instruction has 2% better
throughput than the existing memcpy_flushcache function.
It is doing one 16-byte write for every sector written and one 8-byte
write for every sector clean-up. So, the overhead is measurable.