cpuset: fix race between rebuild scheduler domains and hotplug work
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9ER36 -------------------------------- When offlining cpus, it holds cpu_hotplug_lock and call cpuset_hotplug_workfn asynchronously, which holds and releases cpuset_mutex repeatly to update cpusets, and it will release cpu_hotplug_lock before cpuset_hotplug_workfn finish. It means that some interfaces like cpuset_write_resmask holding two locks may rebuild scheduler domains when some cpusets are not refreshed, which may lead to generate domains with offlining cpus and will panic. As commit 406100f3 ("cpuset: fix race between hotplug work and later CPU offline") mentioned. This problem happen in cgroup v2: This problem can also happen in cgroup v1 pressure test, which onlines and offlines cpus, and sets cpuset.cpus to rebuild domains with sched_load_balance off. CPU: 16 PID: 2815 Comm: bash Not tainted 4.19.90+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-4 RIP: 0010:build_sched_domains+0x3c5/0xe50 Code: 10 c7 43 48 9a 04 00 00 89 4b 50 f6 c4 02 74 28 48 63 ce 48 8b 45 20 48 8b 0c cd 0 RSP: 0018:ffffc90001e13d58 EFLAGS: 00000202 RAX: 2eeb0d75c0854802 RBX: ffff888103af4600 RCX: ffffffff81046d20 RDX: 0000000000000040 RSI: 0000000000000040 RDI: ffff888103af4738 RBP: ffff88810162c600 R08: ffff888103c858a0 R09: 00000000fffd7bac R10: 00000000fffd7bac R11: 0000000000000401 R12: 0000000000000002 R13: ffff888103c858a0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f70a5bf6740(0000) GS:ffff888237800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc660b0020 CR3: 000000010d716000 CR4: 00000000000006e0 Call Trace: partition_sched_domains+0x240/0x31a cpuset_write_resmask+0x51c/0x770 kernfs_fop_write+0xf8/0x180 vfs_write+0xaf/0x190 ksys_write+0x52/0xc0 do_syscall_64+0x47/0x170 entry_SYSCALL_64_after_hwframe+0x5c/0xc1 It must guarantee that cpus in domains passing to partition_and_rebuild_sched_domains must be active. So the domains should be checked after generate_sched_domains. Fixes: 388afd85 ("cpuset: remove async hotplug propagation work") Signed-off-by:Chen Ridong <chenridong@huawei.com>
Loading
Please sign in to comment