Loading
RDMA/hns: Fix mbox timing out by adding retry mechanism
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBSLDS CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/leon-for-rc&id=9747c0c7791d4a5a62018a0c9c563dd2e6f6c1c0 ---------------------------------------------------------------------- If a QP is modified to error state and a flush CQE process is triggered, the subsequent QP destruction mbox can still be successfully posted but will be blocked in HW until the flush CQE process finishes. This causes further mbox posting timeouts in driver. The blocking time is related to QP depth. Considering an extreme case where SQ depth and RQ depth are both 32K, the blocking time can reach about 135ms. This patch adds a retry mechanism for mbox posting. For each try, FW waits 15ms for HW to complete the previous mbox, otherwise return a timeout error code to driver. Counting other time consumption in FW, set 8 tries for mbox posting and a 5ms time gap before each retry to increase to a sufficient timeout limit. Fixes: 0425e3e6 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by:Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250208105930.522796-1-huangjunxian6@hisilicon.com Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Xinghai Cen <cenxinghai@h-partners.com>