Commit 83262ab5 authored by Yishai Hadas's avatar Yishai Hadas Committed by Ye Bin
Browse files

RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error

mainline inclusion
from mainline-v6.14-rc1
commit abb604a1a9c87255c7a6f3b784410a9707baf467
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBPC5V
CVE: CVE-2025-21732

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb604a1a9c87255c7a6f3b784410a9707baf467

--------------------------------

This patch addresses a race condition for an ODP MR that can result in a
CQE with an error on the UMR QP.

During the __mlx5_ib_dereg_mr() flow, the following sequence of calls
occurs:

mlx5_revoke_mr()
 mlx5r_umr_revoke_mr()
 mlx5r_umr_post_send_wait()

At this point, the lkey is freed from the hardware's perspective.

However, concurrently, mlx5_ib_invalidate_range() might be triggered by
another task attempting to invalidate a range for the same freed lkey.

This task will:
 - Acquire the umem_odp->umem_mutex lock.
 - Call mlx5r_umr_update_xlt() on the UMR QP.
 - Since the lkey has already been freed, this can lead to a CQE error,
   causing the UMR QP to enter an error state [1].

To resolve this race condition, the umem_odp->umem_mutex lock is now also
acquired as part of the mlx5_revoke_mr() scope.  Upon successful revoke,
we set umem_odp->private which points to that MR to NULL, preventing any
further invalidation attempts on its lkey.

[1] From dmesg:

   infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error
   cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2

   WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
   Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core
   CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
   RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
   [..]
   Call Trace:
   <TASK>
   mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib]
   mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib]
   __mmu_notifier_invalidate_range_start+0x1e1/0x240
   zap_page_range_single+0xf1/0x1a0
   madvise_vma_behavior+0x677/0x6e0
   do_madvise+0x1a2/0x4b0
   __x64_sys_madvise+0x25/0x30
   do_syscall_64+0x6b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: e6fb246c ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/r/68a1e007c25b2b8fe5d625f238cc3b63e5341f77.1737290229.git.leon@kernel.org


Signed-off-by: default avatarYishai Hadas <yishaih@nvidia.com>
Reviewed-by: default avatarArtemy Kovalyov <artemyko@nvidia.com>
Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>

Conflicts:
	drivers/infiniband/hw/mlx5/mr.c
[fix context diff]
Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
parent 20d7e01c
Loading
Loading
Loading
Loading
+15 −2
Original line number Diff line number Diff line
@@ -1877,9 +1877,14 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
{
	struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
	struct mlx5_cache_ent *ent = mr->mmkey.cache_ent;
	bool is_odp = is_odp_mr(mr);
	int ret = 0;

	if (is_odp)
		mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);

	if (mr->mmkey.cacheable && !mlx5r_umr_revoke_mr(mr) && !cache_ent_find_and_store(dev, mr))
		return 0;
		goto out;

	if (ent) {
		xa_lock_irq(&ent->mkeys);
@@ -1887,7 +1892,15 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
		mr->mmkey.cache_ent = NULL;
		xa_unlock_irq(&ent->mkeys);
	}
	return destroy_mkey(dev, mr);
	ret = destroy_mkey(dev, mr);
out:
	if (is_odp) {
		if (!ret)
			to_ib_umem_odp(mr->umem)->private = NULL;
		mutex_unlock(&to_ib_umem_odp(mr->umem)->umem_mutex);
	}

	return ret;
}

int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
+2 −0
Original line number Diff line number Diff line
@@ -250,6 +250,8 @@ static bool mlx5_ib_invalidate_range(struct mmu_interval_notifier *mni,
	if (!umem_odp->npages)
		goto out;
	mr = umem_odp->private;
	if (!mr)
		goto out;

	start = max_t(u64, ib_umem_start(umem_odp), range->start);
	end = min_t(u64, ib_umem_end(umem_odp), range->end);