-----邮件原件-----
发件人: Elliott, Robert (Servers) [mailto:elliott@hpe.com]
发送时间: 2019年4月14日 11:21
收件人: Li,Rongqing <lirongqing(a)baidu.com>; Dan Williams
<dan.j.williams(a)intel.com>
抄送: linux-nvdimm <linux-nvdimm(a)lists.01.org>
主题: RE: [PATCH][RFC] nvdimm: pmem: always flush nvdimm for write request
>> @@ -215,7 +216,7 @@ static blk_qc_t pmem_make_request(struct
request_queue *q, struct bio *bio)
>> if (do_acct)
>> nd_iostat_end(bio, start);
>>
>> - if (bio->bi_opf & REQ_FUA)
>> + if (bio->bi_opf & REQ_FUA || op_is_write(op))
>> nvdimm_flush(nd_region);
...
>> Before:
>> Jobs: 32 (f=32): [W(32)][14.2%][w=1884MiB/s][w=482k IOPS][eta
>> 01m:43s]
>> After:
>> Jobs: 32 (f=32): [W(32)][8.3%][w=2378MiB/s][w=609k IOPS][eta 01m:50s]
>>
>> -RongQing
Doing more work cannot be faster than doing less work, so something else
must be happening here.
Dan Williams maybe know more.
Please post the full fio job file and how you invoke it (i.e., with
numactl).
This fio file is below, and we bind fio with cpu and node: numactl --membind=0
taskset -c 2-24 ./fio test_io_raw
[global]
numjobs=32
direct=1
filename=/dev/pmem0.1
iodepth=32
ioengine=libaio
group_reporting=1
bs=4K
time_based=1
[write1]
rw=randwrite
runtime=60
stonewall
These tools help show what is happening on the CPUs and memory
channels:
perf top
62.40% [kernel] [k] memcpy_flushcache
21.17% [kernel] [k] fput
6.12% [kernel] [k] apic_timer_interrupt
0.89% [kernel] [k] rq_qos_done_bio
0.66% [kernel] [k] bio_endio
0.44% [kernel] [k] aio_complete_rw
0.39% [kernel] [k] blkdev_bio_end_io
0.31% [kernel] [k] entry_SYSCALL_64
0.26% [kernel] [k] bio_disassociate_task
0.23% [kernel] [k] read_tsc
0.21% fio [.] axmap_isset
0.20% [kernel] [k] ktime_get_raw_ts64
0.19% [vdso] [.] 0x7ffc475e2b30
0.18% [kernel] [k] gup_pgd_range
0.18% [kernel] [k] entry_SYSCALL_64_after_hwframe
0.16% [kernel] [k] __audit_syscall_exit
0.16% [kernel] [k] __x86_indirect_thunk_rax
0.13% [kernel] [k] copy_user_enhanced_fast_string
0.13% [kernel] [k] syscall_return_via_sysret
0.12% [kernel] [k] preempt_count_add
0.12% [kernel] [k] preempt_count_sub
0.11% [kernel] [k] __x64_sys_clock_gettime
0.11% [kernel] [k] tracer_hardirqs_off
0.10% [kernel] [k] native_write_msr
0.10% [kernel] [k] posix_get_monotonic_raw
0.10% fio [.] get_io_u
pcm.x
http://pasted.co/6fc93b42
pcm-memory.x -pmm
http://pasted.co/d5c0c96b
If not bind fio with cpu and numa node, the performance will larger lower, but this
optimization is suitable both condition , it will about 40% improvement sometime.
-Li