Commit 8b4d9469 authored by Damien Le Moal's avatar Damien Le Moal
Browse files

ata: libata-scsi: Fix delayed scsi_rescan_device() execution



Commit 6aa0365a ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
   device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
   not atomic and a suspend PM event may be triggered between them,
   casuing scsi_rescan_device() to be called on a suspended device and
   in that function blocking while holding the scsi device lock. This
   would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state
done

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.

Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.

Fixes: 6aa0365a ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
Reviewed-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
parent ff48b378
Loading
Loading
Loading
Loading
+16 −0
Original line number Diff line number Diff line
@@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET

static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
{
	/*
	 * We are about to suspend the port, so we do not care about
	 * scsi_rescan_device() calls scheduled by previous resume operations.
	 * The next resume will schedule the rescan again. So cancel any rescan
	 * that is not done yet.
	 */
	cancel_delayed_work_sync(&ap->scsi_rescan_task);

	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
}

static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
{
	/*
	 * We are about to suspend the port, so we do not care about
	 * scsi_rescan_device() calls scheduled by previous resume operations.
	 * The next resume will schedule the rescan again. So cancel any rescan
	 * that is not done yet.
	 */
	cancel_delayed_work_sync(&ap->scsi_rescan_task);

	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
}

+15 −18
Original line number Diff line number Diff line
@@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
	struct ata_link *link;
	struct ata_device *dev;
	unsigned long flags;
	bool delay_rescan = false;
	int ret = 0;

	mutex_lock(&ap->scsi_scan_mutex);
	spin_lock_irqsave(ap->lock, flags);
@@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
		ata_for_each_dev(dev, link, ENABLED) {
			struct scsi_device *sdev = dev->sdev;

			/*
			 * If the port was suspended before this was scheduled,
			 * bail out.
			 */
			if (ap->pflags & ATA_PFLAG_SUSPENDED)
				goto unlock;

			if (!sdev)
				continue;
			if (scsi_device_get(sdev))
				continue;

			/*
			 * If the rescan work was scheduled because of a resume
			 * event, the port is already fully resumed, but the
			 * SCSI device may not yet be fully resumed. In such
			 * case, executing scsi_rescan_device() may cause a
			 * deadlock with the PM code on device_lock(). Prevent
			 * this by giving up and retrying rescan after a short
			 * delay.
			 */
			delay_rescan = sdev->sdev_gendev.power.is_suspended;
			if (delay_rescan) {
				scsi_device_put(sdev);
				break;
			}

			spin_unlock_irqrestore(ap->lock, flags);
			scsi_rescan_device(sdev);
			ret = scsi_rescan_device(sdev);
			scsi_device_put(sdev);
			spin_lock_irqsave(ap->lock, flags);

			if (ret)
				goto unlock;
		}
	}

unlock:
	spin_unlock_irqrestore(ap->lock, flags);
	mutex_unlock(&ap->scsi_scan_mutex);

	if (delay_rescan)
	/* Reschedule with a delay if scsi_rescan_device() returned an error */
	if (ret)
		schedule_delayed_work(&ap->scsi_rescan_task,
				      msecs_to_jiffies(5));
}