Skip to content
Commit 9e272ed6 authored by Yangyang Li's avatar Yangyang Li Committed by Leon Romanovsky
Browse files

RDMA/hns: Disable local invalidate operation

When function reset and local invalidate are mixed, HNS RoCEE may hang.
Before introducing the cause of the problem, two hardware internal
concepts need to be introduced:

    1. Execution queue: The queue of hardware execution instructions,
    function reset and local invalidate are queued for execution in this
    queue.

    2.Local queue: A queue that stores local operation instructions. The
    instructions in the local queue will be sent to the execution queue
    for execution. The instructions in the local queue will not be removed
    until the execution is completed.

The reason for the problem is as follows:

    1. There is a function reset instruction in the execution queue, which
    is currently being executed. A necessary condition for the successful
    execution of function reset is: the hardware pipeline needs to empty
    the instructions that were not completed before;

    2. A local invalidate instruction at the head of the local queue is
    sent to the execution queue. Now there are two instructions in the
    execution queue, the first is the function reset instruction, and the
    second is the local invalidate instruction, which will be executed in
    se quence;

    3. The user has issued many local invalidate operations, causing the
    local queue to be filled up.

    4. The user still has a new local operation command and is queuing to
    enter the local queue. But the local queue is full and cannot receive
    new instructions, this instruction is temporarily stored at the
    hardware pipeline.

    5. The function reset has been waiting for the instruction before the
    hardware pipeline stage is drained. The hardware pipeline stage also
    caches a local invalidate instruction, so the function reset cannot be
    completed, and the instructions after it cannot be executed.

These factors together cause the execution logic deadlock of the hardware,
and the consequence is that RoCEE will not have any response.  Considering
that the local operation command may potentially cause RoCEE to hang, this
feature is no longer supported.

Fixes: e93df010

 ("RDMA/hns: Support local invalidate for hip08 in kernel space")
Signed-off-by: default avatarYangyang Li <liyangyang20@huawei.com>
Signed-off-by: default avatarWenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: default avatarHaoyue Xu <xuhaoyue1@hisilicon.com>
Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.com
Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
parent b75927cf
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment