Commit f3ac4475 authored by Chen Ridong's avatar Chen Ridong
Browse files

cpuset: fix race between rebuild scheduler domains and hotplug work

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9ER36



--------------------------------

When offlining cpus, it holds cpu_hotplug_lock and call
cpuset_hotplug_workfn asynchronously, which holds and releases
cpuset_mutex repeatly to update cpusets, and it will release
cpu_hotplug_lock before cpuset_hotplug_workfn finish. It means that some
interfaces like cpuset_write_resmask holding two locks may rebuild
scheduler domains when some cpusets are not refreshed, which may lead to
generate domains with offlining cpus and will panic.

As commit 406100f3 ("cpuset: fix race between hotplug work and later
 CPU offline")  mentioned. This problem happen in cgroup v2:

This problem can also happen in cgroup v1 pressure test, which onlines
and offlines cpus, and sets cpuset.cpus to rebuild domains with
sched_load_balance off.

CPU: 16 PID: 2815 Comm: bash Not tainted 4.19.90+ #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-4
RIP: 0010:build_sched_domains+0x3c5/0xe50
Code: 10 c7 43 48 9a 04 00 00 89 4b 50 f6 c4 02 74 28 48 63 ce 48 8b 45 20 48 8b 0c cd 0
RSP: 0018:ffffc90001e13d58 EFLAGS: 00000202
RAX: 2eeb0d75c0854802 RBX: ffff888103af4600 RCX: ffffffff81046d20
RDX: 0000000000000040 RSI: 0000000000000040 RDI: ffff888103af4738
RBP: ffff88810162c600 R08: ffff888103c858a0 R09: 00000000fffd7bac
R10: 00000000fffd7bac R11: 0000000000000401 R12: 0000000000000002
R13: ffff888103c858a0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f70a5bf6740(0000) GS:ffff888237800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc660b0020 CR3: 000000010d716000 CR4: 00000000000006e0
Call Trace:
 partition_sched_domains+0x240/0x31a
 cpuset_write_resmask+0x51c/0x770
 kernfs_fop_write+0xf8/0x180
 vfs_write+0xaf/0x190
 ksys_write+0x52/0xc0
 do_syscall_64+0x47/0x170
 entry_SYSCALL_64_after_hwframe+0x5c/0xc1

It must guarantee that cpus in domains passing to
partition_and_rebuild_sched_domains must be active. So the domains should
be checked after generate_sched_domains.

Fixes: 388afd85 ("cpuset: remove async hotplug propagation work")
Signed-off-by: default avatarChen Ridong <chenridong@huawei.com>
parent 28b2cacc
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment