Greeting,
FYI, we noticed a -9.4% regression of fsmark.files_per_sec due to commit:
commit: c654a418dd5c3518db4d07a31892e38e1e986215 ("btrfs: Do mandatory tree block
check before submitting bio")
https://github.com/kdave/btrfs-devel.git for-next-20190129
in testcase: fsmark
on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory
with following parameters:
iterations: 8
disk: 1SSD
nr_threads: 4
fs: btrfs
filesize: 8K
test_size: 24G
sync_method: fsyncBeforeClose
nr_directories: 16d
nr_files_per_directory: 256fpd
cpufreq_governor: performance
ucode: 0x3d
test-description: The fsmark is a file system benchmark to test synchronous write
workloads, for example, mail servers workload.
test-url:
https://sourceforge.net/projects/fsmark/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone
https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase/ucode:
gcc-7/performance/1SSD/8K/btrfs/8/x86_64-rhel-7.2/16d/256fpd/4/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-hsw-ep4/24G/fsmark/0x3d
commit:
2b0b391823 ("btrfs: extent_io: Kill the BUG_ON() in extent_writepages()")
c654a418dd ("btrfs: Do mandatory tree block check before submitting bio")
2b0b391823e8c62f c654a418dd5c3518db4d07a318
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4
dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
:4 25% 1:4 dmesg.WARNING:stack_recursion
%stddev %change %stddev
\ | \
8075 -9.4% 7319 fsmark.files_per_sec
409.86 +9.9% 450.32 fsmark.time.elapsed_time
409.86 +9.9% 450.32 fsmark.time.elapsed_time.max
2.873e+08 ± 2% +9.9% 3.158e+08 fsmark.time.file_system_outputs
1288 ± 2% +4.0% 1340 ± 2% fsmark.time.involuntary_context_switches
191.75 +7.6% 206.25 fsmark.time.percent_of_cpu_this_job_got
767.34 +18.8% 911.36 fsmark.time.system_time
23143246 -3.1% 22429576 fsmark.time.voluntary_context_switches
23417504 ± 75% +108.1% 48730239 ± 26% cpuidle.C3.usage
281323 +5.9% 297968 interrupts.CAL:Function_call_interrupts
801.50 ± 70% +52.2% 1219 ± 48% meminfo.Mlocked
0.28 ± 8% -0.0 0.24 ± 5% mpstat.cpu.iowait%
8012562 ± 5% +20.6% 9662962 ± 2% softirqs.RCU
132717 -12.3% 116400 vmstat.system.cs
1.90 ± 5% -0.3 1.58 turbostat.C1%
23416890 ± 75% +108.1% 48729751 ± 26% turbostat.C3
394.25 ± 52% +61.3% 636.00 ± 46% numa-meminfo.node0.Mlocked
135337 ± 14% -22.8% 104431 ± 21% numa-meminfo.node1.Active(anon)
132863 ± 15% -24.3% 100611 ± 22% numa-meminfo.node1.AnonPages
98.00 ± 53% +62.0% 158.75 ± 46% numa-vmstat.node0.nr_mlock
33821 ± 14% -22.8% 26114 ± 21% numa-vmstat.node1.nr_active_anon
33203 ± 15% -24.2% 25160 ± 22% numa-vmstat.node1.nr_anon_pages
33821 ± 14% -22.8% 26114 ± 21% numa-vmstat.node1.nr_zone_active_anon
36119248 ± 2% +9.9% 39682300 proc-vmstat.nr_dirtied
46399 -3.3% 44882 proc-vmstat.nr_dirty
200.25 ± 70% +52.1% 304.50 ± 48% proc-vmstat.nr_mlock
282070 +1.9% 287337 proc-vmstat.nr_slab_unreclaimable
36117868 ± 2% +9.9% 39680736 proc-vmstat.nr_written
46479 -3.2% 44979 proc-vmstat.nr_zone_write_pending
10836003 +2.0% 11056929 proc-vmstat.numa_hit
10821896 +2.0% 11042823 proc-vmstat.numa_local
12662604 +1.8% 12885672 proc-vmstat.pgalloc_normal
1095792 +9.5% 1199656 proc-vmstat.pgfault
1.502e+08 ± 2% +10.0% 1.652e+08 proc-vmstat.pgpgout
97565 ± 5% -10.1% 87759 ± 4% slabinfo.btrfs_delayed_ref_head.active_objs
1917 ± 5% -10.1% 1724 ± 4%
slabinfo.btrfs_delayed_ref_head.active_slabs
97786 ± 5% -10.0% 87963 ± 4% slabinfo.btrfs_delayed_ref_head.num_objs
1917 ± 5% -10.1% 1724 ± 4% slabinfo.btrfs_delayed_ref_head.num_slabs
268953 ± 8% +31.4% 353404 ± 3% slabinfo.btrfs_extent_buffer.active_objs
4816 ± 8% +28.9% 6208 ± 3% slabinfo.btrfs_extent_buffer.active_slabs
279384 ± 8% +28.9% 360143 ± 3% slabinfo.btrfs_extent_buffer.num_objs
4816 ± 8% +28.9% 6208 ± 3% slabinfo.btrfs_extent_buffer.num_slabs
82642 ± 5% -11.7% 73008 ± 4% slabinfo.btrfs_path.active_objs
2296 ± 5% -11.7% 2028 ± 4% slabinfo.btrfs_path.active_slabs
82667 ± 5% -11.7% 73035 ± 4% slabinfo.btrfs_path.num_objs
2296 ± 5% -11.7% 2028 ± 4% slabinfo.btrfs_path.num_slabs
0.15 ±173% +0.6 0.73 ± 20%
perf-profile.calltrace.cycles-pp.__mutex_lock.btrfs_log_inode_parent.btrfs_log_dentry_safe.btrfs_sync_file.do_fsync
0.00 +0.7 0.69 ± 11%
perf-profile.calltrace.cycles-pp.check_leaf.csum_tree_block.btree_csum_one_bio.btree_submit_bio_hook.submit_one_bio
0.00 +0.8 0.76 ± 26%
perf-profile.calltrace.cycles-pp.alloc_reserved_file_extent.__btrfs_run_delayed_refs.btrfs_run_delayed_refs.btrfs_commit_transaction.btrfs_sync_file
0.73 ± 70% +1.0 1.70 ± 12%
perf-profile.calltrace.cycles-pp.__mutex_lock.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync
0.81 ± 69% +1.0 1.79 ± 12%
perf-profile.calltrace.cycles-pp.wait_log_commit.btrfs_sync_log.btrfs_sync_file.do_fsync.__x64_sys_fsync
0.44 ±108% +1.0 1.45 ± 11%
perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.wait_log_commit.btrfs_sync_log.btrfs_sync_file
1.98 ± 41% +1.4 3.34 ± 6%
perf-profile.calltrace.cycles-pp.btree_csum_one_bio.btree_submit_bio_hook.submit_one_bio.btree_write_cache_pages.do_writepages
1.97 ± 41% +1.4 3.33 ± 6%
perf-profile.calltrace.cycles-pp.csum_tree_block.btree_csum_one_bio.btree_submit_bio_hook.submit_one_bio.btree_write_cache_pages
2.18 ± 41% +1.4 3.55 ± 6%
perf-profile.calltrace.cycles-pp.submit_one_bio.btree_write_cache_pages.do_writepages.__filemap_fdatawrite_range.btrfs_write_marked_extents
2.18 ± 41% +1.4 3.55 ± 6%
perf-profile.calltrace.cycles-pp.btree_submit_bio_hook.submit_one_bio.btree_write_cache_pages.do_writepages.__filemap_fdatawrite_range
0.00 +1.4 1.38 ± 28%
perf-profile.calltrace.cycles-pp.btrfs_run_delayed_refs.btrfs_commit_transaction.btrfs_sync_file.do_fsync.__x64_sys_fsync
0.00 +1.4 1.38 ± 28%
perf-profile.calltrace.cycles-pp.__btrfs_run_delayed_refs.btrfs_run_delayed_refs.btrfs_commit_transaction.btrfs_sync_file.do_fsync
0.00 +2.1 2.10 ± 29%
perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.do_fsync.__x64_sys_fsync.do_syscall_64
29280 ± 7% +18.8% 34793 ± 11% sched_debug.cfs_rq:/.exec_clock.max
7228 ± 3% +28.7% 9299 ± 8% sched_debug.cfs_rq:/.exec_clock.stddev
94886 ± 6% +22.5% 116219 ± 14% sched_debug.cfs_rq:/.min_vruntime.max
21547 ± 6% +32.3% 28510 ± 12% sched_debug.cfs_rq:/.min_vruntime.stddev
79973 ± 2% +27.8% 102211 ± 18% sched_debug.cfs_rq:/.spread0.max
21548 ± 6% +32.3% 28510 ± 12% sched_debug.cfs_rq:/.spread0.stddev
262.03 ± 4% +15.4% 302.32 ± 8% sched_debug.cpu.curr->pid.avg
1176 +12.5% 1323 ± 3% sched_debug.cpu.curr->pid.stddev
192468 ± 6% +12.2% 215926 sched_debug.cpu.nr_load_updates.avg
199740 ± 6% +11.6% 222824 sched_debug.cpu.nr_load_updates.max
189228 ± 7% +12.5% 212828 sched_debug.cpu.nr_load_updates.min
422023 ± 2% +12.9% 476640 ± 2% sched_debug.cpu.nr_switches.stddev
421731 ± 2% +12.9% 476244 ± 2% sched_debug.cpu.sched_count.stddev
210686 ± 2% +12.9% 237967 ± 2% sched_debug.cpu.sched_goidle.stddev
240403 ± 4% +12.3% 270038 ± 6% sched_debug.cpu.ttwu_count.stddev
48292 ± 7% +18.1% 57018 sched_debug.cpu.ttwu_local.stddev
33.52 ± 8% -23.1% 25.79 ± 6% perf-stat.i.MPKI
8.742e+08 ± 9% +19.5% 1.045e+09 ± 5% perf-stat.i.branch-instructions
133650 -12.4% 117116 perf-stat.i.context-switches
2.56 ± 9% -15.6% 2.16 ± 4% perf-stat.i.cpi
1.68 ± 9% -19.5% 1.36 ± 13% perf-stat.i.cpu-migrations
1.516e+09 ± 8% +15.6% 1.753e+09 ± 6% perf-stat.i.dTLB-loads
5.121e+09 ± 8% +20.7% 6.181e+09 ± 6% perf-stat.i.instructions
0.43 ± 6% +17.8% 0.51 ± 4% perf-stat.i.ipc
30.72 ± 6% -22.7% 23.76 ± 4% perf-stat.overall.MPKI
2.22 ± 5% -14.8% 1.89 ± 4% perf-stat.overall.cpi
0.48 ± 10% -0.1 0.40 ± 5% perf-stat.overall.dTLB-load-miss-rate%
0.45 ± 6% +17.2% 0.53 ± 4% perf-stat.overall.ipc
8.72e+08 ± 9% +19.5% 1.042e+09 ± 5% perf-stat.ps.branch-instructions
133314 -12.4% 116849 perf-stat.ps.context-switches
1.68 ± 9% -19.5% 1.35 ± 13% perf-stat.ps.cpu-migrations
1.513e+09 ± 8% +15.6% 1.749e+09 ± 6% perf-stat.ps.dTLB-loads
5.108e+09 ± 8% +20.7% 6.167e+09 ± 6% perf-stat.ps.instructions
2.101e+12 ± 9% +32.5% 2.783e+12 ± 5% perf-stat.total.instructions
fsmark.time.system_time
940 +-+----------------------O----O---------------------------------------+
920 +-+ O O OOOO |
O O OOO O O O |
900 +O+ O O O |
880 +-O |
| O O O O |
860 +-+ O O O |
840 +-+ O |
820 +-+ |
| |
800 +-+ + |
780 +-+ + :+ + + + + + + + + ++ + |
|+ : + ++ + + + + + : + + + :+ : + : :++. + + +.++ ++ + |
760 +-+++ + + +++ ++ + +++ ++ + + + + ++++ + +|
740 +-+-------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen