Commit dbc510f3 authored by Wei Li's avatar Wei Li Committed by Zheng Yejian
Browse files

tracing/timerlat: Fix a race during cpuhp processing

stable inclusion
from stable-v6.6.55
commit a6e9849063a6c8f4cb2f652a437e44e3ed24356c
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAYRAT
CVE: CVE-2024-49866

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a6e9849063a6c8f4cb2f652a437e44e3ed24356c

--------------------------------

commit 829e0c9f0855f26b3ae830d17b24aec103f7e915 upstream.

There is another found exception that the "timerlat/1" thread was
scheduled on CPU0, and lead to timer corruption finally:

```
ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220
WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0
Modules linked in:
CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:debug_print_object+0x7d/0xb0
...
Call Trace:
 <TASK>
 ? __warn+0x7c/0x110
 ? debug_print_object+0x7d/0xb0
 ? report_bug+0xf1/0x1d0
 ? prb_read_valid+0x17/0x20
 ? handle_bug+0x3f/0x70
 ? exc_invalid_op+0x13/0x60
 ? asm_exc_invalid_op+0x16/0x20
 ? debug_print_object+0x7d/0xb0
 ? debug_print_object+0x7d/0xb0
 ? __pfx_timerlat_irq+0x10/0x10
 __debug_object_init+0x110/0x150
 hrtimer_init+0x1d/0x60
 timerlat_main+0xab/0x2d0
 ? __pfx_timerlat_main+0x10/0x10
 kthread+0xb7/0xe0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2d/0x40
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
```

After tracing the scheduling event, it was discovered that the migration
of the "timerlat/1" thread was performed during thread creation. Further
analysis confirmed that it is because the CPU online processing for
osnoise is implemented through workers, which is asynchronous with the
offline processing. When the worker was scheduled to create a thread, the
CPU may has already been removed from the cpu_online_mask during the offline
process, resulting in the inability to select the right CPU:

T1                       | T2
[CPUHP_ONLINE]           | cpu_device_down()
osnoise_hotplug_workfn() |
                         |     cpus_write_lock()
                         |     takedown_cpu(1)
                         |     cpus_write_unlock()
[CPUHP_OFFLINE]          |
    cpus_read_lock()     |
    start_kthread(1)     |
    cpus_read_unlock()   |

To fix this, skip online processing if the CPU is already offline.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240924094515.3561410-4-liwei391@huawei.com


Fixes: c8895e27 ("trace/osnoise: Support hotplug operations")
Signed-off-by: default avatarWei Li <liwei391@huawei.com>
Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
parent ed7fbeaf
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -2095,6 +2095,8 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy)
	mutex_lock(&interface_lock);
	cpus_read_lock();

	if (!cpu_online(cpu))
		goto out_unlock;
	if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
		goto out_unlock;