+20
−4
+1
−0
Loading
maillist inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9EHKI CVE: NA Reference: https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@amd.com/ -------------------------------- With the curr entity's eligibility check, a wakeup preemption is very likely when an entity with positive lag joins the runqueue pushing the avg_vruntime of the runqueue backwards, making the vruntime of the current entity ineligible. This leads to aggressive wakeup preemption which was previously guarded by wakeup_granularity_ns in legacy CFS. Below figure depicts one such aggressive preemption scenario with EEVDF in DeathStarBench [1]: deadline for Nginx | +-------+ | | /-- | Nginx | -|------------------> | | +-------+ | | | | | -----------|-------------------------------> vruntime timeline | \--> rq->avg_vruntime | | wakes service on the same runqueue since system is busy | | +---------+| \-->| Service || (service has +ve lag pushes avg_vruntime backwards) +---------+| | | wakeup | +--|-----+ | preempts \---->| N|ginx | --------------------> | {deadline for Nginx} +--|-----+ | (Nginx ineligible) -----------|-------------------------------> vruntime timeline \--> rq->avg_vruntime When NGINX server is involuntarily switched out, it cannot accept any incoming request, leading to longer turn around time for the clients and thus loss in DeathStarBench throughput. ================================================================== Test : DeathStarBench Units : Normalized latency Interpretation: Lower is better Statistic : Mean ================================================================== tip 1.00 eevdf 1.14 (+14.61%) For current running task, skip eligibility check in pick_eevdf() if it has not exhausted the slice promised to it during selection despite the situation having changed since. The behavior is guarded by RUN_TO_PARITY_WAKEUP sched_feat to simplify testing. With RUN_TO_PARITY_WAKEUP enabled, performance loss seen with DeathStarBench since the merge of EEVDF disappears. Following are the results from testing on a Dual Socket 3rd Generation EPYC server (2 x 64C/128T): ================================================================== Test : DeathStarBench Units : Normalized throughput Interpretation: Higher is better Statistic : Mean ================================================================== Pinning scaling tip run-to-parity-wakeup(pct imp) 1CCD 1 1.00 1.16 (%diff: 16%) 2CCD 2 1.00 1.03 (%diff: 3%) 4CCD 4 1.00 1.12 (%diff: 12%) 8CCD 8 1.00 1.05 (%diff: 6%) With spec_rstack_overflow=off, the DeathStarBench performance with the proposed solution is same as the performance on v6.5 release before EEVDF was merged. This may lead to newly waking task waiting longer for its turn on the CPU, however, testing on the same system did not reveal any consistent regressions with the standard benchmarks. Link: https://github.com/delimitrou/DeathStarBench/ [1] Signed-off-by:K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by:
Zhang Qiao <zhangqiao22@huawei.com>