Commit 9078e843 authored by Shay Drory's avatar Shay Drory Committed by Saeed Mahameed
Browse files

net/mlx5: Avoid recovery in probe flows



Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.

Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.

Fixes: 51d138c2 ("net/mlx5: Fix health error state handling")
Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
parent 44aee8ea
Loading
Loading
Loading
Loading
+6 −0
Original line number Diff line number Diff line
@@ -674,6 +674,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
	dev = container_of(priv, struct mlx5_core_dev, priv);
	devlink = priv_to_devlink(dev);

	mutex_lock(&dev->intf_state_mutex);
	if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
		mlx5_core_err(dev, "health works are not permitted at this stage\n");
		return;
	}
	mutex_unlock(&dev->intf_state_mutex);
	enter_error_state(dev, false);
	if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) {
		devl_lock(devlink);