+68
−11
+22
−0
+5
−0
Loading
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8YPFU CVE: NA ------------------------------------------------- When I run a stress test about pcie hotplug and removing operations by sysfs, I got a hange task, and the following call trace is printed. [ 242.683775] INFO: task irq/52-pciehp:128 blocked for more than 120 seconds. [ 242.684615] Not tainted 6.6.0+ #168 [ 242.685065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.685932] task:irq/52-pciehp state:D stack:0 pid:128 ppid:2 flags:0x00000008 [ 242.686858] Call trace: [ 242.687178] __switch_to+0xa8/0xc8 [ 242.687566] __schedule+0x1ec/0x4a0 [ 242.688019] schedule+0x2c/0x80 [ 242.688382] schedule_preempt_disabled+0x18/0x30 [ 242.688905] __mutex_lock.isra.0+0x140/0x2e0 [ 242.689385] __mutex_lock_slowpath+0x1c/0x30 [ 242.689867] mutex_lock+0x6c/0x88 [ 242.690239] pci_lock_rescan_remove+0x24/0x38 [ 242.690346] EXT4-fs error (device vda): ext4_lookup:1857: inode #37226: comm systemd-journal: deleted inode referenced: 41314 [ 242.690730] pciehp_configure_device+0x34/0x168 [ 242.692589] board_added+0xf8/0x160 [ 242.692979] __pciehp_enable_slot+0x44/0xf8 [ 242.693446] pciehp_enable_slot+0x40/0xd8 [ 242.693892] pciehp_handle_presence_or_link_change+0xfc/0x208 [ 242.694530] pciehp_ist+0x1c4/0x1d0 [ 242.694920] irq_thread_fn+0x34/0xb8 [ 242.695332] irq_thread+0xd8/0x198 [ 242.695725] kthread+0xe4/0xf0 [ 242.696072] ret_from_fork+0x10/0x20 [ 242.696483] INFO: task bash:432 blocked for more than 120 seconds. [ 242.697176] Not tainted 6.6.0+ #168 [ 242.697625] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.698495] task:bash state:D stack:0 pid:432 ppid:400 flags:0x00000008 [ 242.699438] Call trace: [ 242.699726] __switch_to+0xa8/0xc8 [ 242.700116] __schedule+0x1ec/0x4a0 [ 242.700511] schedule+0x2c/0x80 [ 242.700867] __synchronize_irq+0x80/0xc0 [ 242.701308] __free_irq+0xf0/0x308 [ 242.701691] free_irq+0x3c/0x90 [ 242.702046] pcie_shutdown_notification+0x48/0x90 [ 242.702574] pciehp_remove+0x30/0x60 [ 242.702977] pcie_port_remove_service+0x40/0x70 [ 242.703495] device_remove+0x78/0x90 [ 242.703934] __device_release_driver+0x150/0x1c8 [ 242.704453] device_release_driver+0x34/0x58 [ 242.704929] bus_remove_device+0xd8/0x158 [ 242.705374] device_del+0x168/0x2f8 [ 242.705766] device_unregister+0x28/0x88 [ 242.706205] remove_iter+0x34/0x50 [ 242.706585] device_for_each_child+0x64/0xb8 [ 242.707061] pcie_portdrv_remove+0x40/0xb0 [ 242.707525] pci_device_remove+0x44/0xa8 [ 242.708006] device_remove+0x54/0x90 [ 242.708407] __device_release_driver+0x150/0x1c8 [ 242.708921] device_release_driver+0x34/0x58 [ 242.709396] pci_stop_dev+0x44/0xa8 [ 242.709785] pci_stop_bus_device+0x64/0x80 [ 242.710240] pci_stop_and_remove_bus_device_locked+0x60/0xd8 [ 242.710866] remove_store+0x98/0xb0 [ 242.711264] dev_attr_store+0x20/0x40 [ 242.711672] sysfs_kf_write+0x4c/0x68 [ 242.712142] kernfs_fop_write_iter+0x130/0x1c8 [ 242.712641] new_sync_write+0xa4/0x128 [ 242.713060] vfs_write.part.0+0x128/0x168 [ 242.713503] vfs_write+0x9c/0xf0 [ 242.713864] ksys_write+0x74/0x108 [ 242.714245] __arm64_sys_write+0x24/0x38 [ 242.714682] invoke_syscall.constprop.0+0x54/0xe8 [ 242.715214] el0_svc_common.constprop.0+0x44/0xc8 [ 242.715752] do_el0_svc+0x1c/0x40 [ 242.716125] el0_svc+0x4c/0xd8 [ 242.716470] el0t_64_sync_handler+0xc0/0xc8 [ 242.716936] el0t_64_sync+0x1a4/0x1a8 When we remove a slot by sysfs. 'pci_stop_and_remove_bus_device_locked()' will be called. This function will get the global mutex lock 'pci_rescan_remove_lock', and remove the slot. If the irq thread 'pciehp_ist' is still running, we will wait until it exits. If a pciehp interrupt happens immediately after we remove the slot by sysfs, but before we free the pciehp irq in 'pci_stop_and_remove_bus_device_locked()'. 'pciehp_ist' will hung because the global mutex lock 'pci_rescan_remove_lock' is held by the sysfs operation. But the sysfs operation is waiting for the pciehp irq thread 'pciehp_ist' ends. Then a hung task occurs. So this two kinds of operation, removing through attention buttion and removing through /sys/devices/pci***/remove, should not be excuted at the same time. This patch add a global variable to mark that one of these operations is under processing. When this variable is set, if another operation is requested, it will be rejected. We use a global variable 'slot_being_removed_rescaned' to mark whether a slot is being removed or rescaned. This will cause a slot hotplug operation is delayed if another slot is being remove or rescaned. But if these two slots are under different root ports, they should not influence each other. This patch make the flag 'slot_being_removed_rescanned' per root port so that one slot hotplug operation doesn't influence slots below another root port. We record the root port in struct pci_dev when the pci device is initialized and added into the system instead of using 'pcie_find_root_port()' to find the root port when we need it. Because iterating the pci tree needs the protection of 'pci_lock_rescan_remove()'. This will make the problem more complexed because the lock is very coarse-grained. We don't need to worry about 'use-after-free' because child pci devices are always removed before the root port device is removed. Signed-off-by:Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by:
Hanjun Guo <guohanjun@huawei.com> Conflicts: drivers/pci/hotplug/pciehp_ctrl.c drivers/pci/hotplug/pciehp_hpc.c include/linux/pci.h Signed-off-by:
Xiongfeng Wang <wangxiongfeng2@huawei.com>