Commit 04da62af authored by Wenyu Huang's avatar Wenyu Huang Committed by liukai
Browse files

sched/fair: Fix qos_timer deadlock when cpuhp offline

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IB7GK5



--------------------------------

When cpu hotplug offline, if qos_overload_timer_handler() concurrently
running, It could trigger an ABBA deadlock. As qos_overload_timer_handler()
requires rq lock, while the cpu hotplug attached firstly and waiting for
the qos_timer handling, This can cause Hard LOCKUP like:

[359230.788754] Call trace:
[359230.788755] hrtimer_active+0x7c/0xec
[359230.788757] hrtimer_cancel+0x3c/0x60
[359230.788758] unthrottle_qos_cfs_rqs+0xbc/0x110
[359230.788760] unthrottle_offline_cfs_rqs+0x40/0x150
[359230.788762] rq_offline_fair+0x60/0x70
[359230.788764] set_rq_offline.part.0+0x54/0xf4
[359230.788765] set_rq_offline+0x34/0x44
[359230.788767] rq_attach_root+0x1e8/0x260
[359230.788768] cpu_attach_domain+0x244/0x430
[359230.788770] detach_destroy_domains+0xbc/0x140
[359230.788772] partition_sched_domains_locked+0x23c/0x314
[359230.788774] rebuild_sched_domains_locked+0x1f0/0x270
[359230.788776] cpuset_hotplug_workfn+0x514/0x74c
[359230.788777] process_one_work+0x34c/0x800
[359230.788779] worker_thread+0xa8/0x500
[359230.788780] kthread+0x1e0/0x220
[359230.788782] ret_from_fork+0x10/0x18
[359230.788783] Kernel panic - not syncing: Hard LOCKUP

Fix it by switch to use __unthrottle_qos_cfs_rqs(), instead of
unthrottle_qos_cfs_rqs() in unthrottle_offline_cfs_rqs, so that
it will not trigger cancel_qos_timer() when cpu hotplug offline.

Fixes: 926b9b0c ("sched: Throttle qos cfs_rq when current cpu is running online task")
Signed-off-by: default avatarZhao Wenhui <zhaowenhui8@huawei.com>
Signed-off-by: default avatarWenyu Huang <huangwenyu5@huawei.com>
Signed-off-by: default avatarLiu Kai <liukai284@huawei.com>
parent 95bc6947
Loading
Loading
Loading
Loading
+2 −4
Original line number Diff line number Diff line
@@ -152,6 +152,7 @@ unsigned int sysctl_overload_detect_period = 5000; /* in ms */
unsigned int sysctl_offline_wait_interval = 100;  /* in ms */
static int one_thousand = 1000;
static int hundred_thousand = 100000;
static int __unthrottle_qos_cfs_rqs(int cpu);
static int unthrottle_qos_cfs_rqs(int cpu);
static bool qos_smt_expelled(int this_cpu);
#endif
@@ -6686,7 +6687,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
	 */
	rq_clock_start_loop_update(rq);
#ifdef CONFIG_QOS_SCHED
	unthrottle_qos_cfs_rqs(cpu_of(rq));
	__unthrottle_qos_cfs_rqs(cpu_of(rq));
#endif

	rcu_read_lock();
@@ -6713,9 +6714,6 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
	rcu_read_unlock();

	rq_clock_stop_loop_update(rq);
#ifdef CONFIG_QOS_SCHED
	unthrottle_qos_cfs_rqs(cpu_of(rq));
#endif
}

bool cfs_task_bw_constrained(struct task_struct *p)