RDMA/hns: Fix simultaneous reset and resource deregistration
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I87LFL -------------------------------------------------------------------------- In the current solution, the pseudo WC enables the user-mode to detect the device error in advance and releases context resources. As a result, there is a high probability that hardware reset and context resource release occur at the same time. During the hardware reset, the MBOX cannot instruct the hardware to stop accessing the memory, but the corresponding resources are released during the reset. The hardware is unaware that the driver has freed resources. Therefore, the remaining tasks of the hardware access invalid memory, and the RAS alarm is reported. If the driver detects above scenario, the driver will not release the resources.Instead, record it in a linked list. Wait for the roce driver to uninstall before releasing it. In this way, the hardware does not access the invalid memory, and the driver does not cause memory leakage. Fixes: 306b8c76 ("RDMA/hns: Do not destroy QP resources in the hw resetting phase") Signed-off-by:wenglianfa <wenglianfa@huawei.com>
Loading
Please sign in to comment