Commit 7065b774 authored by Shay Drory's avatar Shay Drory Committed by zhaoxiaoqiang11
Browse files

net/mlx5: Avoid recovery in probe flows

stable inclusion
from stable-v5.10.163
commit 670b20617346d4d4e9ddbeed971037fdb98b19d3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=670b20617346d4d4e9ddbeed971037fdb98b19d3



----------------------------------------------------

[ Upstream commit 9078e843 ]

Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.

Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.

Fixes: 51d138c2 ("net/mlx5: Fix health error state handling")
Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
Signed-off-by: default avatarzhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com>
parent b66487cb
Loading
Loading
Loading
Loading
+6 −0
Original line number Diff line number Diff line
@@ -618,6 +618,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
	priv = container_of(health, struct mlx5_priv, health);
	dev = container_of(priv, struct mlx5_core_dev, priv);

	mutex_lock(&dev->intf_state_mutex);
	if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
		mlx5_core_err(dev, "health works are not permitted at this stage\n");
		return;
	}
	mutex_unlock(&dev->intf_state_mutex);
	enter_error_state(dev, false);
	if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) {
		if (mlx5_health_try_recover(dev))