Hi Michal Hocko,
On Wed, Nov 24, 2021 at 06:01:12PM +0100, Michal Hocko wrote:
On Wed 24-11-21 16:34:35, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a 10.3% improvement of hackbench.throughput due to commit:
>
>
> commit: 58056f77502f3567b760c9a8fc8d2e9081515b2d ("memcg, kmem: further
deprecate kmem.limit_in_bytes")
>
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
I am really surprised to see an improvement from this patch. I do not
expect your benchmarking would be using kmem limit. The above patch
hasn't really removed the page counter out of the picture so there
shouldn't be any real reason for performance improvement. I strongly
suspect this is just some benchmark artifact or unreliable evaluation.
Fengwei Yin helped further analyze this improvement.
The patch changed the behavior of function obj_cgroup_charge_pages. It's shown
in the perf-callstack as following line:
5.63 ± 11% -5.6 0.00
perf-profile.calltrace.cycles-pp.page_counter_try_charge.obj_cgroup_charge_pages.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb
So Fengwei prepared a patch which reverting the changes in
obj_cgroup_charge_pages in 58056f7750 (as attached mod.patch)
by this patch, the performance is similar to 16f6bf266c, the improvement
disappear.
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
gcc-9/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/hackbench/0x700001e
commit:
16f6bf266c ("mm/list_lru.c: prefer struct_size over open coded arithmetic")
58056f7750 ("memcg, kmem: further deprecate kmem.limit_in_bytes")
ae12af515d ('58056f7750' minus 'changes in obj_cgroup_charge_pages',
attached mod.patch)
16f6bf266c94017c 58056f77502f3567b760c9a8fc8 ae12af515da0d557c25f86e89b0
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
124966 +8.8% 136017 ± 2% -0.1% 124791 ± 2%
hackbench.throughput
...
5.41 ± 12% -5.4 0.00 +0.3 5.73 ± 13%
perf-profile.calltrace.cycles-pp.page_counter_try_charge.obj_cgroup_charge_pages.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb
detail comparison data as attached 16f6b-58056-ae12a
in brief, the result prove what we suspect. The original patch removed code
- !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
which improved the hackbench throughput. Thanks.
--
Michal Hocko
SUSE Labs