Commit 2bbab7ce authored by Yifan Zhang's avatar Yifan Zhang Committed by Alex Deucher
Browse files

drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure



KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.

Note: Google Test filter = KFDSVMRangeTest.*
[==========] Running 18 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 18 tests from KFDSVMRangeTest
[ RUN      ] KFDSVMRangeTest.BasicSystemMemTest
[       OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
[ RUN      ] KFDSVMRangeTest.SetGetAttributesTest
[          ] Get default atrributes
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure
Value of: expectedDefaultResults[i]
  Actual: 4
Expected: outputAttributes[i].type
Which is: 2
[          ] Setting/Getting atrributes
[  FAILED  ]

the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus
some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before
iterate svm interval tree.

Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
parent 3919a485
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
@@ -3030,6 +3030,14 @@ svm_range_get_attr(struct kfd_process *p, uint64_t start, uint64_t size,
	pr_debug("svms 0x%p [0x%llx 0x%llx] nattr 0x%x\n", &p->svms, start,
		 start + size - 1, nattr);

	/* Flush pending deferred work to avoid racing with deferred actions from
	 * previous memory map changes (e.g. munmap). Concurrent memory map changes
	 * can still race with get_attr because we don't hold the mmap lock. But that
	 * would be a race condition in the application anyway, and undefined
	 * behaviour is acceptable in that case.
	 */
	flush_work(&p->svms.deferred_list_work);

	mmap_read_lock(mm);
	if (!svm_range_is_valid(mm, start, size)) {
		pr_debug("invalid range\n");