Skip to content
  1. Aug 02, 2023
  2. Aug 01, 2023
  3. Jul 31, 2023
    • Janusz Krzysztofik's avatar
      drm/i915: Fix premature release of request's reusable memory · 946e047a
      Janusz Krzysztofik authored
      Infinite waits for completion of GPU activity have been observed in CI,
      mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
      igt@perf@stress-open-close.  Root cause analysis, based of ftrace dumps
      generated with a lot of extra trace_printk() calls added to the code,
      revealed loops of request dependencies being accidentally built,
      preventing the requests from being processed, each waiting for completion
      of another one's activity.
      
      After we substitute a new request for a last active one tracked on a
      timeline, we set up a dependency of our new request to wait on completion
      of current activity of that previous one.  While doing that, we must take
      care of keeping the old request still in memory until we use its
      attributes for setting up that await dependency, or we can happen to set
      up the await dependency on an unrelated request that already reuses the
      memory previously allocated to the old one, already released.  Combined
      with perf adding consecutive kernel context remote requests to different
      user context timelines, unresolvable loops of await dependencies can be
      built, leading do infinite waits.
      
      We obtain a pointer to the previous request to wait upon when we
      substitute it with a pointer to our new request in an active tracker,
      e.g. in intel_timeline.last_request.  In some processing paths we protect
      that old request from being freed before we use it by getting a reference
      to it under RCU protection, but in others, e.g.  __i915_request_commit()
      -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
      we don't.  But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
      that RCU protection is not sufficient against reuse of memory.
      
      We could protect i915_request's memory from being prematurely reused by
      calling its release function via call_rcu() and using rcu_read_lock()
      consequently, as proposed in v1.  However, that approach leads to
      significant (up to 10 times) increase of SLAB utilization by i915_request
      SLAB cache.  Another potential approach is to take a reference to the
      previous active fence.
      
      When updating an active fence tracker, we first lock the new fence,
      substitute a pointer of the current active fence with the new one, then we
      lock the substituted fence.  With this approach, there is a time window
      after the substitution and before the lock when the request can be
      concurrently released by an interrupt handler and its memory reused, then
      we may happen to lock and return a new, unrelated request.
      
      Always get a reference to the current active fence first, before
      replacing it with a new one.  Having it protected from premature release
      and reuse, lock it and then replace with the new one but only if not
      yet signalled via a potential concurrent interrupt nor replaced with
      another one by a potential concurrent thread, otherwise retry, starting
      from getting a reference to the new current one.  Adjust users to not
      get a reference to the previous active fence themselves and always put the
      reference got by __i915_active_fence_set() when no longer needed.
      
      v3: Fix lockdep splat reports and other issues caused by incorrect use of
          try_cmpxchg() (use (cmpxchg() != prev) instead)
      v2: Protect request's memory by getting a reference to it in favor of
          delegating its release to call_rcu() (Chris)
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
      Fixes: df9f85d8
      
       ("drm/i915: Serialise i915_active_fence_set() with itself")
      Suggested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.6+
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
      946e047a
  4. Jul 28, 2023
    • Alan Previn's avatar
      drm/i915/selftest/gsc: Ensure GSC Proxy init completes before selftests · b1cef13e
      Alan Previn authored
      
      
      On MTL, if the GSC Proxy init flows haven't completed, submissions to the
      GSC engine will fail. Those init flows are dependent on the mei's
      gsc_proxy component that is loaded in parallel with i915 and a
      worker that could potentially start after i915 driver init is done.
      
      That said, all subsytems that access the GSC engine today does check
      for such init flow completion before using the GSC engine. However,
      selftests currently don't wait on anything before starting.
      
      To fix this, add a waiter function at the start of __run_selftests
      that waits for gsc-proxy init flows to complete. Selftests shouldn't
      care if the proxy-init failed as that should be flagged elsewhere.
      
      Difference from prior versions:
         v7: - Change the fw status to INTEL_UC_FIRMWARE_LOAD_FAIL if the
               proxy-init fails so that intel_gsc_uc_fw_proxy_get_status
               catches it. (Daniele)
         v6: - Add a helper that returns something more than a boolean
               so we selftest can stop waiting if proxy-init hadn't
               completed but failed (Daniele).
         v5: - Move the call to __wait_gsc_proxy_completed from common
               __run_selftests dispatcher to the group-level selftest
               function (Trvtko).
             - change the pr_info to pr_warn if we hit the timeout.
         v4: - Remove generalized waiters function table framework (Tvrtko).
             - Remove mention of CI-framework-timeout from comments (Tvrtko).
         v3: - Rebase to latest drm-tip.
         v2: - Based on internal testing, increase the timeout for gsc-proxy
               specific case to 8 seconds.
      
      Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Signed-off-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230720230126.375566-1-alan.previn.teres.alexis@intel.com
      b1cef13e
  5. Jul 26, 2023
  6. Jul 25, 2023
  7. Jul 24, 2023
  8. Jul 21, 2023
    • Daniele Ceraolo Spurio's avatar
      drm/i915/huc: check HuC and GuC version compatibility on MTL · e4731b51
      Daniele Ceraolo Spurio authored
      
      
      Due to a change in the auth flow on MTL, GuC 70.7.0 and newer will only
      be able to authenticate HuC 8.5.1 and newer. The plan is to update the 2
      binaries synchronously in linux-firmware so that the fw repo always has
      a matching pair that works; still, it's better to check in the kernel so
      we can print an error message and abort HuC loading if the binaries are
      out of sync instead of failing the authentication.
      
      v2: Add clarification comment, fix typo in commit msg, clean up variable
      declaration (John)
      
      Signed-off-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> #v1
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230717234905.117114-1-daniele.ceraolospurio@intel.com
      e4731b51
  9. Jul 19, 2023
  10. Jul 17, 2023
  11. Jul 13, 2023
  12. Jul 12, 2023
    • Andrzej Hajda's avatar
      drm/i915/perf: add sentinel to xehp_oa_b_counters · 2f42c5af
      Andrzej Hajda authored
      Arrays passed to reg_in_range_table should end with empty record.
      
      The patch solves KASAN detected bug with signature:
      BUG: KASAN: global-out-of-bounds in xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]
      Read of size 4 at addr ffffffffa1555d90 by task perf/1518
      
      CPU: 4 PID: 1518 Comm: perf Tainted: G U 6.4.0-kasan_438-g3303d06107f3+ #1
      Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPFWI1.R00.3223.D80.2305311348 05/31/2023
      Call Trace:
      <TASK>
      ...
      xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]
      
      Fixes: 0fa9349d
      
       ("drm/i915/perf: complete programming whitelisting for XEHPSDV")
      Signed-off-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230711153410.1224997-1-andrzej.hajda@intel.com
      2f42c5af
  13. Jul 10, 2023
  14. Jul 08, 2023
  15. Jul 07, 2023
  16. Jul 06, 2023
  17. Jul 04, 2023
    • Lucas De Marchi's avatar
      drm/i915/gt: Also check set bits in clr_set() · e3affc7c
      Lucas De Marchi authored
      
      
      When checking if the workarounds were applied successfully, the
      read-back mask should also contain the bits being set: it's possible
      that in a call to wa_write_clr_set(), the cleared bits are not a
      superset of the set bits.
      
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-8-lucas.demarchi@intel.com
      e3affc7c
    • Lucas De Marchi's avatar
      drm/i915/gt: Remove bogus comment on IVB_FBC_RT_BASE_UPPER · 03286f94
      Lucas De Marchi authored
      
      
      The comment on the parameter being 0 to avoid the read back doesn't
      apply as this is not a call to wa_add(), but rather to
      wa_write_clr_set(). So, this register is actually checked and it's
      according to the Bspec that the register is RW, not RO.
      
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-7-lucas.demarchi@intel.com
      03286f94
    • Lucas De Marchi's avatar
      drm/i915/gt: Enable read back on XEHP_FF_MODE2 · 9a54a7c3
      Lucas De Marchi authored
      
      
      Contrary to GEN12_FF_MODE2, platforms using XEHP_FF_MODE2 are not
      affected by Wa_1608008084, hence read back can be enabled.
      
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-6-lucas.demarchi@intel.com
      9a54a7c3
    • Lucas De Marchi's avatar
      drm/i915/gt: Drop read from GEN8_L3CNTLREG in ICL workaround · fc311f11
      Lucas De Marchi authored
      
      
      Now that non-masked registers are already read before programming the
      context reads, the additional read became redudant, so remove it.
      
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-5-lucas.demarchi@intel.com
      fc311f11
    • Lucas De Marchi's avatar
      drm/i915/gt: Fix context workarounds with non-masked regs · 28cf243a
      Lucas De Marchi authored
      
      
      Most of the context workarounds tweak masked registers, but not all. For
      masked registers, when writing the value it's sufficient to just write
      the wa->set_bits since that will take care of both the clr and set bits
      as well as not overwriting other bits.
      
      However there are some workarounds, the registers are non-masked. Up
      until now the driver was simply emitting a MI_LOAD_REGISTER_IMM with the
      set_bits to program the register via the GPU in the WA bb. This has the
      side effect of overwriting the content of the register outside of bits
      that should be set and also doesn't handle the bits that should be
      cleared.
      
      Kenneth reported that on DG2, mesa was seeing a weird behavior due to
      the kernel programming of L3SQCREG5 in dg2_ctx_gt_tuning_init(). With
      the GPU idle, that register could be read via intel_reg as 0x00e001ff,
      but during a 3D workload it would change to 0x0000007f. So the
      programming of that tuning was affecting more than the bits in
      L3_PWM_TIMER_INIT_VAL_MASK. Matt Roper noticed the lack of rmw for the
      context workarounds due to the use of MI_LOAD_REGISTER_IMM.
      
      So, for registers that are not masked, read its value via mmio, modify
      and then set it in the buffer to be written by the GPU. This should take
      care in a simple way of programming just the bits required by the
      tuning/workaround. If in future there are registers that involved that
      can't be read by the CPU, a more complex approach may be required like
      a) issuing additional instructions to read and modify; or b) scan the
      golden context and patch it in place before saving it; or something
      else. But for now this should suffice.
      
      Scanning the context workarounds for all platforms, these are the
      impacted ones with the respective registers
      
      	mtl: DRAW_WATERMARK
      	mtl/dg2: XEHP_L3SQCREG5, XEHP_FF_MODE2
      
      ICL has some non-masked registers in the context workarounds:
      GEN8_L3CNTLREG, IVB_FBC_RT_BASE and VB_FBC_RT_BASE_UPPER, but there
      shouldn't be an impact. The first is already being manually read and the
      other 2 are intentionally overwriting the entire register. Same
      reasoning applies to GEN12_FF_MODE2: the WA is intentionally
      overwriting all the bits to avoid a read-modify-write.
      
      v2:  Reword commit message wrt GEN12_FF_MODE2 and the changed behavior
      on preparatory patches.
      v3: Also skip reading if clear|set bits covers everything
      
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23783#note_1968971
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-4-lucas.demarchi@intel.com
      28cf243a
    • Lucas De Marchi's avatar
      drm/i915/gt: Clear all bits from GEN12_FF_MODE2 · e8f7df16
      Lucas De Marchi authored
      
      
      Right now context workarounds don't do a rmw and instead only write to
      the register. Since 2 separate programmings to the same register are
      coalesced into a single write, this is not problematic for
      GEN12_FF_MODE2 since both TDS and GS timer are going to be written
      together and the other remaining bits be zeroed.
      
      However in order to fix other workarounds that may want to preserve the
      unrelated bits in the same register, context workarounds need to
      be changed to a rmw. To prepare for that, move the programming of
      GEN12_FF_MODE2 to a single place so the value passed for "clear" can
      be all the bits. Otherwise the second workaround would be dropped as
      it'd be detected as overwriting a previously programmed workaround.
      
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-3-lucas.demarchi@intel.com
      e8f7df16
    • Lucas De Marchi's avatar
      drm/i915/gt: Move wal_get_fw_for_rmw() · f567947b
      Lucas De Marchi authored
      
      
      Move helper function to get all the forcewakes required by the wa list
      to the top, so it can be re-used by other functions.
      
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230630203509.1635216-2-lucas.demarchi@intel.com
      f567947b
  18. Jun 26, 2023
  19. Jun 24, 2023
    • Matt Roper's avatar
      drm/i915: Extend Wa_14015795083 platforms · 6580176f
      Matt Roper authored
      
      
      This workaround was already implemented for DG2, PVC, and some steppings
      of MTL, but the workaround database has now been updated to extend this
      workaround to TGL, RKL, DG1, and ADL.
      
      v2:
       - Skip readback verification for these extra gen12lp platforms.  On
         some of the platforms, the firmware locks this register, preventing
         the driver from making any modifications.  We should still try to
         apply the workaround, but if the register is locked and the value
         doesn't stick, that's semi-expected and not something we want to flag
         as a driver error on debug builds.
      
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Reviewed-by: default avatarHaridhar Kalvala <haridhar.kalvala@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230616225041.3922719-1-matthew.d.roper@intel.com
      6580176f
  20. Jun 22, 2023