On 3/5/20 8:39 PM, Mel Gorman wrote:
On Thu, Mar 05, 2020 at 07:15:40PM +0800, Chen, Rong A wrote:
>
> On 3/5/2020 6:12 PM, Mel Gorman wrote:
>> On Thu, Mar 05, 2020 at 10:58:22AM +0800, Rong Chen wrote:
>>> Hi,
>>>
>>> I tested on branch tip/sched/core, the regression is still there.
>>>
>>>
>>>
=========================================================================================
>>> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/ucode:
>>>
gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-phoronix/lkp-nhm-2ep1/aom-av1-1.2.0/phoronix-test-suite/0x1d
>>>
>>> commit:
>>> 6d4d22468dae3d8757af9f8b81b848a76ef4409d ("sched/fair: Reorder
enqueue/dequeue_task_fair path")
>>> 6499b1b2dd1b8d404a16b9fbbf1af6b9b3c1d83d ("sched/numa: Replace
runnable_load_avg by load_avg")
>>> 253e2b69ef8fda4d9345ff496b12058faaeeff6b ("sched/fair: fix
statistics for find_idlest_group()")
>>>
>> Are you sure? I ask because tip/sched/core does not have the
>> patch "sched/fair: fix statistics for find_idlest_group" in it
>> nor is commit 253e2b69ef8fda4d9345ff496b12058faaeeff6b part of the
>> tip/sched/core history. You'd need to test the full series in the current
>> tip/sched/core with minimally 289de3598481 ("sched/fair: Fix statistics
>> for find_idlest_group()") from tip/sched/urgent on top. That's still
>> missing two fixes but one is a build issue and the other is a missing
>> rcu_read_lock that is unlikely to cause corruption unless there is a
>> hotplug event during the test.
> Yes, commit 6d4d22468dae3 is not from tip/sched/core, I created it based on
> tip/sched/core.
>
Understood.
> $ git log --oneline
6d4d22468dae3d8757af9f8b81b848a76ef4409d~..253e2b69ef8fda4d9345ff496b12058faaeeff6b
> 253e2b69ef8fd sched/fair: fix statistics for find_idlest_group()
> a0f03b617c3b2 sched/numa: Stop an exhastive search if a reasonable swap
> candidate or idle CPU is found
> 88cca72c9673e sched/numa: Bias swapping tasks based on their preferred node
> 5fb52dd93a2fe sched/numa: Find an alternative idle CPU if the CPU is part of
> an active NUMA balance
> ff7db0bf24db9 sched/numa: Prefer using an idle CPU as a migration target
> instead of comparing tasks
> 070f5e860ee2b sched/fair: Take into account runnable_avg to classify group
> 9f68395333ad7 sched/pelt: Add a new runnable average signal
> 0dacee1bfa70e sched/pelt: Remove unused runnable load average
> fb86f5b211924 sched/numa: Use similar logic to the load balancer for moving
> between domains with spare capacity
> 6499b1b2dd1b8 sched/numa: Replace runnable_load_avg by load_avg
> 6d4d22468dae3 sched/fair: Reorder enqueue/dequeue_task_fair path
>
Excellent. Now in your previous mail, the regression was reported based
on
commit:
6d4d22468dae3d8757af9f8b81b848a76ef4409d ("sched/fair: Reorder
enqueue/dequeue_task_fair path")
6499b1b2dd1b8d404a16b9fbbf1af6b9b3c1d83d ("sched/numa: Replace runnable_load_avg
by load_avg")
253e2b69ef8fda4d9345ff496b12058faaeeff6b ("sched/fair: fix statistics for
find_idlest_group()")
Can you confirm whether the report is based on just those commits or the
entire series? If it's not the entire series, can you give me the report
for the full series please? We know for a fact that this series is not
bisection safe in terms of performance.
Hi,
Sorry for the late, yes the report is based on the entire series, the
result of head commit is 0.03:
commit:
a0f03b617c3b2644d3d47bf7d9e60aed01bd5b10 ("sched/numa: Stop an
exhastive search if a reasonable swap candidate or idle CPU is found")
$ ag -A9 phoronix-test-suite.aom-av1.0.frames_per_second
/result/phoronix-test-suite/performance-aom-av1-1.2.0-ucode=0x1d/lkp-nhm-2ep1/debian-x86_64-phoronix/x86_64-rhel-7.6/gcc-7/a0f03b617c3b2644d3d47bf7d9e60aed01bd5b10/matrix.json
2: "phoronix-test-suite.aom-av1.0.frames_per_second": [
3- 0.03,
4- 0.03,
5- 0.03,
6- 0.03,
7- 0.03,
8- 0.03,
9- 0.03,
10- 0.03
11- ],
>> Also, can you tell me more about the hardware? The stats indicate it's NUMA
>> but it only has 16 threads which seems very low for a modern NUMA machine.
>>
> root@lkp-nhm-2ep1:~# lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Thread(s) per core: 2
> Core(s) per socket: 4
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 26
> Model name: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
> Stepping: 5
> CPU MHz: 1655.106
> CPU max MHz: 2927.0000
> CPU min MHz: 1596.0000
> BogoMIPS: 5852.83
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 8192K
> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
> cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp
> tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
>
Ok, that makes some sense. It's an 11 year old Nehalem machine that is
no longer manufactured. No wonder I did not catch anything on my own
tests.
Can you tell me if this regression is machine-specific or are you seeing
it on a range of machines, particularly newer ones?
I tested on a Cascade Lake machine and doesn't see the regression.
As you'd expect, it's machine-specific.
Best Regards,
Rong Chen