Commit 91c11d5f authored by Sagi Grimberg's avatar Sagi Grimberg Committed by Christoph Hellwig
Browse files

nvme-rdma: stop auth work after tearing down queues in error recovery



when starting error recovery there might be a authentication work
running, and it involves I/O commands. Given the controller is tearing
down there is no chance for the I/O to complete other than timing out
which may unnecessarily take a full io timeout.

So first tear down the queues, fail/cancel all inflight I/O (including
potentially authentication) and only then stop authentication. This
ensures that failover is not stalled due to blocked authentication I/O.

Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
parent 1f1a4f89
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -1153,13 +1153,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
	struct nvme_rdma_ctrl *ctrl = container_of(work,
			struct nvme_rdma_ctrl, err_work);

	nvme_auth_stop(&ctrl->ctrl);
	nvme_stop_keep_alive(&ctrl->ctrl);
	flush_work(&ctrl->ctrl.async_event_work);
	nvme_rdma_teardown_io_queues(ctrl, false);
	nvme_start_queues(&ctrl->ctrl);
	nvme_rdma_teardown_admin_queue(ctrl, false);
	nvme_start_admin_queue(&ctrl->ctrl);
	nvme_auth_stop(&ctrl->ctrl);

	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
		/* state change failure is ok if we started ctrl delete */