Skip to content
Commit a4691038 authored by James Smart's avatar James Smart Committed by Martin K. Petersen
Browse files

scsi: lpfc: Fix unload hang after back to back PCI EEH faults

When injecting EEH errors the port is getting hung up waiting on the node
list to empty, message number 0233. The driver is stuck at this point and
also can't unload. The driver makes transport remoteport delete calls which
try to abort I/O's, but the EEH daemon has already called the driver to
detach and the detachment has set the global FC_UNLOADING flag.  There are
several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is
set, resulting in transports waiting for I/O while the driver is waiting on
transports to clean up.

Additionally, during study of the list, a locking issue was found in
lpfc_sli_abort_iocb_ring that could corrupt the list.

A special case was added to the lpfc_cleanup() routine to call
lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot
is offline (e.g. EEH).

The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the
ring_lock.  Also added code to cancel the I/Os if the pci-slot is offline
and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags
to prevent trying to send an I/O that we cannot handle.

Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.com


Co-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
parent 35ed9613
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment