Commit d7a8680e authored by John Harrison's avatar John Harrison
Browse files

drm/i915: Improve long running compute w/a for GuC submission



A workaround was added to the driver to allow compute workloads to run
'forever' by disabling pre-emption on the RCS engine for Gen12.
It is not totally unbound as the heartbeat will kick in eventually
and cause a reset of the hung engine.

However, this does not work well in GuC submission mode. In GuC mode,
the pre-emption timeout is how GuC detects hung contexts and triggers
a per engine reset. Thus, disabling the timeout means also losing all
per engine reset ability. A full GT reset will still occur when the
heartbeat finally expires, but that is a much more destructive and
undesirable mechanism.

The purpose of the workaround is actually to give compute tasks longer
to reach a pre-emption point after a pre-emption request has been
issued. This is necessary because Gen12 does not support mid-thread
pre-emption and compute tasks can have long running threads.

So, rather than disabling the timeout completely, just set it to a
'long' value.

v2: Review feedback from Tvrtko - must hard code the 'long' value
instead of determining it algorithmically. So make it an extra CONFIG
definition. Also, remove the execlist centric comment from the
existing pre-emption timeout CONFIG option given that it applies to
more than just execlists.

Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: default avatarMichal Mrozek <michal.mrozek@intel.com>
Acked-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-5-John.C.Harrison@Intel.com
parent 47daf84a
Loading
Loading
Loading
Loading
+22 −4
Original line number Diff line number Diff line
@@ -57,10 +57,28 @@ config DRM_I915_PREEMPT_TIMEOUT
	default 640 # milliseconds
	help
	  How long to wait (in milliseconds) for a preemption event to occur
	  when submitting a new context via execlists. If the current context
	  does not hit an arbitration point and yield to HW before the timer
	  expires, the HW will be reset to allow the more important context
	  to execute.
	  when submitting a new context. If the current context does not hit
	  an arbitration point and yield to HW before the timer expires, the
	  HW will be reset to allow the more important context to execute.

	  This is adjustable via
	  /sys/class/drm/card?/engine/*/preempt_timeout_ms

	  May be 0 to disable the timeout.

	  The compiled in default may get overridden at driver probe time on
	  certain platforms and certain engines which will be reflected in the
	  sysfs control.

config DRM_I915_PREEMPT_TIMEOUT_COMPUTE
	int "Preempt timeout for compute engines (ms, jiffy granularity)"
	default 7500 # milliseconds
	help
	  How long to wait (in milliseconds) for a preemption event to occur
	  when submitting a new context to a compute capable engine. If the
	  current context does not hit an arbitration point and yield to HW
	  before the timer expires, the HW will be reset to allow the more
	  important context to execute.

	  This is adjustable via
	  /sys/class/drm/card?/engine/*/preempt_timeout_ms
+7 −2
Original line number Diff line number Diff line
@@ -508,9 +508,14 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id,
	engine->props.timeslice_duration_ms =
		CONFIG_DRM_I915_TIMESLICE_DURATION;

	/* Override to uninterruptible for OpenCL workloads. */
	/*
	 * Mid-thread pre-emption is not available in Gen12. Unfortunately,
	 * some compute workloads run quite long threads. That means they get
	 * reset due to not pre-empting in a timely manner. So, bump the
	 * pre-emption timeout value to be much higher for compute engines.
	 */
	if (GRAPHICS_VER(i915) == 12 && (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE))
		engine->props.preempt_timeout_ms = 0;
		engine->props.preempt_timeout_ms = CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE;

	/* Cap properties according to any system limits */
#define CLAMP_PROP(field) \