Skip to content
Commit 79867462 authored by Nicolai Hähnle's avatar Nicolai Hähnle Committed by Alex Deucher
Browse files

drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs



Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

    [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
    [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
    [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
    [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
    [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
    [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
    [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
    [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
    [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
    [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
    [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
    [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
    [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
    [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
    [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
    [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
    [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

Note: The correctness of this change depends on the earlier commit
"drm/amd/sched: move adding finish callback to amd_sched_job_begin"

v2: set an error on the finished fence

Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
Reviewed-by: default avatarAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
parent 29d25355
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment