Commit 762c10db authored by Xiongfeng Wang's avatar Xiongfeng Wang Committed by Xie XiuQi
Browse files

pciehp: fix a race between pciehp and removing operations by sysfs



hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

When I run a stress test about pcie hotplug and removing operations by
sysfs, I got a hange task, and the following call trace is printed.

 INFO: task irq/746-pciehp:41551 blocked for more than 120 seconds.
       Tainted: P        W  OE     4.19.25-
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 irq/746-pciehp  D    0 41551      2 0x00000228
 Call trace:
  __switch_to+0x94/0xe8
  __schedule+0x270/0x8b0
  schedule+0x2c/0x88
  schedule_preempt_disabled+0x14/0x20
  __mutex_lock.isra.1+0x1fc/0x540
  __mutex_lock_slowpath+0x24/0x30
  mutex_lock+0x80/0xa8
  pci_lock_rescan_remove+0x20/0x28
  pciehp_configure_device+0x30/0x140
  pciehp_handle_presence_or_link_change+0x35c/0x4b0
  pciehp_ist+0x1cc/0x1d0
  irq_thread_fn+0x30/0x80
  irq_thread+0x128/0x200
  kthread+0x134/0x138
  ret_from_fork+0x10/0x18
 INFO: task bash:6424 blocked for more than 120 seconds.
       Tainted: P        W  OE     4.19.25-
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 bash            D    0  6424   2231 0x00000200
 Call trace:
  __switch_to+0x94/0xe8
  __schedule+0x270/0x8b0
  schedule+0x2c/0x88
  schedule_timeout+0x224/0x448
  wait_for_common+0x198/0x2a0
  wait_for_completion+0x28/0x38
  kthread_stop+0x60/0x190
  __free_irq+0x1c0/0x348
  free_irq+0x40/0x88
  pcie_shutdown_notification+0x54/0x80
  pciehp_remove+0x30/0x50
  pcie_port_remove_service+0x3c/0x58
  device_release_driver_internal+0x1b4/0x250
  device_release_driver+0x28/0x38
  bus_remove_device+0xd4/0x160
  device_del+0x128/0x348
  device_unregister+0x24/0x78
  remove_iter+0x48/0x58
  device_for_each_child+0x6c/0xb8
  pcie_port_device_remove+0x2c/0x48
  pcie_portdrv_remove+0x5c/0x68
  pci_device_remove+0x48/0xd8
  device_release_driver_internal+0x1b4/0x250
  device_release_driver+0x28/0x38
  pci_stop_bus_device+0x84/0xb8
  pci_stop_and_remove_bus_device_locked+0x24/0x40
  remove_store+0xa4/0xb8
  dev_attr_store+0x44/0x60
  sysfs_kf_write+0x58/0x80
  kernfs_fop_write+0xe8/0x1f0
  __vfs_write+0x60/0x190
  vfs_write+0xac/0x1c0
  ksys_write+0x6c/0xd8
  __arm64_sys_write+0x24/0x30
  el0_svc_common+0xa0/0x180
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc

When we remove a slot by sysfs.
'pci_stop_and_remove_bus_device_locked()' will be called. This function
will get the global mutex lock 'pci_rescan_remove_lock', and remove the
slot. If the irq thread 'pciehp_ist' is still running, we will wait
until it exits.

If a pciehp interrupt happens immediately after we remove the slot by
sysfs, but before we free the pciehp irq in
'pci_stop_and_remove_bus_device_locked()'. 'pciehp_ist' will hung
because the global mutex lock 'pci_rescan_remove_lock' is held by the
sysfs operation. But the sysfs operation is waiting for the pciehp irq
thread 'pciehp_ist' ends. Then a hung task occurs.

So this two kinds of operation, removing through attention buttion and
removing through /sys/devices/pci***/remove, should not be excuted at
the same time. This patch add a global variable to mark that one of these
operations is under processing. When this variable is set,  if another
operation is requested, it will be rejected.

Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
parent 541d167d
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment