Skip to content
  1. Nov 06, 2020
  2. Nov 03, 2020
  3. Oct 29, 2020
  4. Oct 28, 2020
    • Chris Wilson's avatar
      drm/i915/gem: Avoid synchronous binds deep within locks · c071ab8c
      Chris Wilson authored
      On bxt, we require a VT'd w/a to serialise all GGTT updates with memory
      transfers, and use stop_machine() for this purpose. stop_machine() is a
      global serialisation barrier and so dangerous to use from within
      critical sections, as the stop_machine() will wait for all cpus to enter
      the stop_machine callback, and those cpus may be waiting for the
      critical section already held.
      
      Fixes: d7085b0f
      
       ("drm/i915/gem: Poison stolen pages before use")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201027184759.29888-1-chris@chris-wilson.co.uk
      c071ab8c
  5. Oct 23, 2020
    • Chris Wilson's avatar
      drm/i915/selftests: Exercise intel_timeline_read_hwsp() · 6e7a21e7
      Chris Wilson authored
      
      
      intel_timeline_read_hwsp() is used to support semaphore waits between
      engines, that may themselves be deferred for arbitrary periods -- that
      is the read of the target request's HWSP is at an indeterminant point in
      the future. To support this, we need to prevent overwriting a HWSP that
      is being watched across a seqno wrap (otherwise the next request will
      write its value into the old HWSP preventing the watcher from making
      progress, ad infinitum.) To simulate the observer across a wrap, let's
      create a request that reads from the HWSP and dispatch it at different
      points around a wrap to see if the value is lost.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201021220411.5777-2-chris@chris-wilson.co.uk
      6e7a21e7
    • Chris Wilson's avatar
      drm/i915/gt: Use the local HWSP offset during submission · c10f6019
      Chris Wilson authored
      We wrap the timeline on construction of the next request, but there may
      still be requests in flight that have not yet finalized the breadcrumb.
      (The breadcrumb is delayed as we need engine-local offsets, and for the
      virtual engine that is not known until execution.) As such, by the time
      we write to the timeline's HWSP offset it may have changed, and we
      should use the value we preserved in the request instead.
      
      Though the window is small and infrequent (at full flow we can expect a
      timeline's seqno to wrap once every 30 minutes), the impact of writing
      the old seqno into the new HWSP is severe: the old requests are never
      completed, and the new requests are completed before they are even
      submitted.
      
      Fixes: ebece753
      
       ("drm/i915: Keep timeline HWSP allocated until idle across the system")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.2+
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
      c10f6019
    • Chris Wilson's avatar
      drm/i915/selftests: Skip RPS tests on Ironlake (only IPS) · b1cff585
      Chris Wilson authored
      
      
      Since Ironlake uses intel_ips.ko for its dynamic frequency adjustment,
      we do not have direct control over the frequency management so such
      tests are defunct. Similarly, we can't check the gen6+ RPS registers on
      Ironlake.
      
      Hopefully this catches all the invalid tests now that Ironlake has
      rejoined the dynamic GPU frequency club. There is an opportunity for the
      reader to add tests to exercise MEMINTRSTS and co.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201022210814.23004-1-chris@chris-wilson.co.uk
      b1cff585
  6. Oct 22, 2020
    • Tvrtko Ursulin's avatar
      drm/i915/pmu: Fix CPU hotplug with multiple GPUs · 537f9c84
      Tvrtko Ursulin authored
      
      
      Since we keep a driver global mask of online CPUs and base the decision
      whether PMU needs to be migrated upon it, we need to make sure the
      migration is done for all registered PMUs (so GPUs).
      
      To do this we need to track the current CPU for each PMU and base the
      decision on whether to migrate on a comparison between global and local
      state.
      
      At the same time, since dynamic CPU hotplug notification slots are a
      scarce resource and given how we already register the multi instance type
      state, we can and should add multiple instance of the i915 PMU to this
      same state and not allocate a new one for every GPU.
      
      v2:
       * Use pr_notice. (Chris)
      
      v3:
       * Handle a nasty interaction where unregistration which triggers a false
         CPU offline event. (Chris)
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201020161144.678668-1-tvrtko.ursulin@linux.intel.com
      537f9c84
    • Tvrtko Ursulin's avatar
      drm/i915/pmu: Handle PCI unbind · b00bccb3
      Tvrtko Ursulin authored
      
      
      Mark the device as closed and keep references to driver data alive to
      allow for safe driver unbind with active PMU clients. Perf core does not
      otherwise handle this case so we have to do it manually like this.
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201020100822.543332-1-tvrtko.ursulin@linux.intel.com
      b00bccb3
    • Chris Wilson's avatar
      drm/i915/gem: Flush coherency domains on first set-domain-ioctl · 44c2200a
      Chris Wilson authored
      
      
      Avoid skipping what appears to be a no-op set-domain-ioctl if the cache
      coherency state is inconsistent with our target domain. This also has
      the utility of using the population of the pages to validate the backing
      store.
      
      The danger in skipping the first set-domain is leaving the cache
      inconsistent and submitting stale data, or worse leaving the clean data
      in the cache and not flushing it to the GPU. The impact should be small
      as it requires a no-op set-domain as the very first ioctl in a
      particular sequence not found in typical userspace.
      
      Reported-by: default avatarZbigniew Kempczyński <zbigniew.kempczynski@intel.com>
      Fixes: 754a2544
      
       ("drm/i915: Skip object locking around a no-op set-domain ioctl")
      Testcase: igt/gem_mmap_offset/blt-coherency
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
      Cc: <stable@vger.kernel.org> # v5.2+
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201019203825.10966-1-chris@chris-wilson.co.uk
      44c2200a
    • Matthew Auld's avatar
      drm/i915/region: fix max size calculation · 83ebef47
      Matthew Auld authored
      We are incorrectly limiting the max allocation size as per the mm
      max_order, which is effectively the largest power-of-two that we can fit
      in the region size. However, it's normal to setup the region or
      allocator with a non-power-of-two size(for example 3G), which we should
      already handle correctly, except it seems for the early too-big-check.
      
      v2: make sure we also exercise the I915_BO_ALLOC_CONTIGUOUS path, which
      is quite different, since for that we are actually limited by the
      largest power-of-two that we can fit within the region size. (Chris)
      
      Fixes: b908be54
      
       ("drm/i915: support creating LMEM objects")
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: CQ Tang <cq.tang@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201021103606.241395-1-matthew.auld@intel.com
      83ebef47
    • Chris Wilson's avatar
      drm/i915/selftests: Flush the old heartbeat more gently · 8f2b4b68
      Chris Wilson authored
      
      
      In order to test how fast the heartbeat can respond, we measure with the
      interval set to its minimum. Before we measure though, we want to be
      sure we start with a fresh pulse, and so wait until any old one is
      complete. During that wait though, we were continually flushing the
      work, and so continually re-evaluating to see if the pulse was complete,
      and each attempt would count as an unresponsive system. If the engine
      did not complete the request in the couple of busy-spins, it would flag
      an error. This is unfortunate, so let's not busy-spin waiting for the
      old heartbeat, but terminate it and start afresh.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
      8f2b4b68
  7. Oct 20, 2020
  8. Oct 19, 2020
  9. Oct 16, 2020
    • Chris Wilson's avatar
      drm/i915: Use the active reference on the vma while capturing · 178536b8
      Chris Wilson authored
      During error capture, we need to take a reference to the vma from before
      the reset in order to catpure the contents of the vma later. Currently
      we are using both an active reference and a kref, but due to nature of
      the i915_vma reference handling, that kref is on the vma->obj and not
      the vma itself. This means the vma may be destroyed as soon as it is
      idle, that is in between the i915_active_release(&vma->active) and the
      i915_vma_put(vma):
      
      <3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
      <3> [197.866339] Read of size 8 at addr ffff8881258cb800 by task gem_exec_captur/1041
      <3> [197.866467]
      <4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1
      <4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.1608081044 08/08/2016
      <4> [197.866530] Call Trace:
      <4> [197.866549]  dump_stack+0x99/0xd0
      <4> [197.866760]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
      <4> [197.866783]  print_address_description.constprop.8+0x3e/0x60
      <4> [197.866797]  ? kmsg_dump_rewind_nolock+0xd4/0xd4
      <4> [197.866819]  ? lockdep_hardirqs_off+0xd4/0x120
      <4> [197.867037]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
      <4> [197.867249]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
      <4> [197.867270]  kasan_report.cold.10+0x1f/0x37
      <4> [197.867492]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
      <4> [197.867710]  intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
      <4> [197.867949]  i915_gpu_coredump.part.29+0x150/0x7b0 [i915]
      <4> [197.868186]  i915_capture_error_state+0x5e/0xc0 [i915]
      <4> [197.868396]  intel_gt_handle_error+0x6eb/0xa20 [i915]
      <4> [197.868624]  ? intel_gt_reset_global+0x370/0x370 [i915]
      <4> [197.868644]  ? check_flags+0x50/0x50
      <4> [197.868662]  ? __lock_acquire+0xd59/0x6b00
      <4> [197.868678]  ? register_lock_class+0x1ad0/0x1ad0
      <4> [197.868944]  i915_wedged_set+0xcf/0x1b0 [i915]
      <4> [197.869147]  ? i915_wedged_get+0x90/0x90 [i915]
      <4> [197.869371]  ? i915_wedged_get+0x90/0x90 [i915]
      <4> [197.869398]  simple_attr_write+0x153/0x1c0
      <4> [197.869428]  full_proxy_write+0xee/0x180
      <4> [197.869442]  ? __sb_start_write+0x1f3/0x310
      <4> [197.869465]  vfs_write+0x1a3/0x640
      <4> [197.869492]  ksys_write+0xec/0x1c0
      <4> [197.869507]  ? __ia32_sys_read+0xa0/0xa0
      <4> [197.869525]  ? lockdep_hardirqs_on_prepare+0x32b/0x4e0
      <4> [197.869541]  ? syscall_enter_from_user_mode+0x1c/0x50
      <4> [197.869566]  do_syscall_64+0x33/0x80
      <4> [197.869579]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      <4> [197.869590] RIP: 0033:0x7fd8b7aee281
      <4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
      <4> [197.869613] RSP: 002b:00007ffea3b72008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      <4> [197.869625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd8b7aee281
      <4> [197.869633] RDX: 0000000000000002 RSI: 00007fd8b81a82e7 RDI: 000000000000000d
      <4> [197.869641] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034
      <4> [197.869650] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd8b81a82e7
      <4> [197.869658] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000
      <3> [197.869707]
      <3> [197.869757] Allocated by task 1041:
      <4> [197.869833]  kasan_save_stack+0x19/0x40
      <4> [197.869843]  __kasan_kmalloc.constprop.5+0xc1/0xd0
      <4> [197.869853]  kmem_cache_alloc+0x106/0x8e0
      <4> [197.870059]  i915_vma_instance+0x212/0x1930 [i915]
      <4> [197.870270]  eb_lookup_vmas+0xe06/0x1d10 [i915]
      <4> [197.870475]  i915_gem_do_execbuffer+0x131d/0x4080 [i915]
      <4> [197.870682]  i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915]
      <4> [197.870701]  drm_ioctl_kernel+0x1d2/0x270
      <4> [197.870710]  drm_ioctl+0x40d/0x85c
      <4> [197.870721]  __x64_sys_ioctl+0x10d/0x170
      <4> [197.870731]  do_syscall_64+0x33/0x80
      <4> [197.870740]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      <3> [197.870748]
      <3> [197.870798] Freed by task 22:
      <4> [197.870865]  kasan_save_stack+0x19/0x40
      <4> [197.870875]  kasan_set_track+0x1c/0x30
      <4> [197.870884]  kasan_set_free_info+0x1b/0x30
      <4> [197.870894]  __kasan_slab_free+0x111/0x160
      <4> [197.870903]  kmem_cache_free+0xcd/0x710
      <4> [197.871109]  i915_vma_parked+0x618/0x800 [i915]
      <4> [197.871307]  __gt_park+0xdb/0x1e0 [i915]
      <4> [197.871501]  ____intel_wakeref_put_last+0xb1/0x190 [i915]
      <4> [197.871516]  process_one_work+0x8dc/0x15d0
      <4> [197.871525]  worker_thread+0x82/0xb30
      <4> [197.871535]  kthread+0x36d/0x440
      <4> [197.871545]  ret_from_fork+0x22/0x30
      <3> [197.871553]
      <3> [197.871602] The buggy address belongs to the object at ffff8881258cb740
       which belongs to the cache i915_vma of size 968
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553
      Fixes: 2850748e
      
       ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.5+
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk
      178536b8
    • Chris Wilson's avatar
      drm/i915/gt: Confirm the context survives execution · 89db9537
      Chris Wilson authored
      
      
      Repeat our sanitychecks from before execution to after execution. One
      expects that if we were to see these, the gpu would already be on fire,
      but the timing may be informative.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk
      89db9537
    • Chris Wilson's avatar
      drm/i915/gt: Cleanup kasan warning for on-stack (unsigned long) casting · 6971e07b
      Chris Wilson authored
      
      
      Kasan is gving a warning for passing a u32 parameter into find_first_bit
      (casting to a unsigned long *, with appropriate length restrictions):
      
      [   44.678262] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x2e/0x50
      [   44.678295] Read of size 8 at addr ffff888233f4fc30 by task core_hotunplug/474
      [   44.678326]
      [   44.678358] CPU: 0 PID: 474 Comm: core_hotunplug Not tainted 5.9.0+ #608
      [   44.678465] Hardware name: BESSTAR (HK) LIMITED GN41/Default string, BIOS BLT-BI-MINIPC-F4G-EX3R110-GA65A-101-D 10/12/2018
      [   44.678500] Call Trace:
      [   44.678534]  dump_stack+0x84/0xba
      [   44.678569]  print_address_description.constprop.0+0x21/0x220
      [   44.678605]  ? kmsg_dump_rewind_nolock+0x5f/0x5f
      [   44.678638]  ? _raw_spin_lock_irqsave+0x6d/0xb0
      [   44.678669]  ? _raw_write_lock_irqsave+0xb0/0xb0
      [   44.678702]  ? set_task_cpu+0x1e0/0x1e0
      [   44.678733]  ? find_first_bit+0x2e/0x50
      [   44.678763]  kasan_report.cold+0x20/0x42
      [   44.678794]  ? find_first_bit+0x2e/0x50
      [   44.678825]  __asan_load8+0x69/0x90
      [   44.678856]  find_first_bit+0x2e/0x50
      [   44.679027]  __caps_show.isra.0+0x9e/0x1f0 [i915]
      
      Since we are only using the shorter type for our own convenience,
      accommodate kasan and use unsigned long.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201013110845.16127-1-chris@chris-wilson.co.uk
      6971e07b
    • Chris Wilson's avatar
      drm/i915/gt: Undo forced context restores after trivial preemptions · bb65548e
      Chris Wilson authored
      We may try to preempt the currently executing request, only to find that
      after unravelling all the dependencies that the original executing
      context is still the earliest in the topological sort and re-submitted
      back to HW (if we do detect some change in the ELSP that requires
      re-submission). However, due to the way we check for wrap-around during
      the unravelling, we mark any context that has been submitted just once
      (i.e. with the rq->wa_tail set, but the ring->tail earlier) as
      potentially wrapping and requiring a forced restore on resubmission.
      This was expected to be not a problem, as it was anticipated that most
      unwinding for preemption would result in a context switch and the few
      that did not would be lost in the noise. It did not take long for
      someone to find one particular workload where the cost of those extra
      context restores was measurable.
      
      However, since we know the wa_tail is of fixed size, and we know that a
      request must be larger than the wa_tail itself, we can safely maintain
      the check for request wrapping and check against a slightly future point
      in the ring that includes an expected wa_tail. (That is if the
      ring->tail is already set to rq->wa_tail, including another 8 bytes in
      the check does not invalidate the incremental wrap detection.)
      
      Fixes: 8ab3a381
      
       ("drm/i915/gt: Incrementally check for rewinding")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Bruce Chang <yu.bruce.chang@intel.com>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.4+
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
      bb65548e
    • Chris Wilson's avatar
      drm/i915/gt: Delay execlist processing for tgl · 6ca7217d
      Chris Wilson authored
      When running gem_exec_nop, it floods the system with many requests (with
      the goal of userspace submitting faster than the HW can process a single
      empty batch). This causes the driver to continually resubmit new
      requests onto the end of an active context, a flood of lite-restore
      preemptions. If we time this just right, Tigerlake hangs.
      
      Inserting a small delay between the processing of CS events and
      submitting the next context, prevents the hang. Naturally it does not
      occur with debugging enabled. The suspicion then is that this is related
      to the issues with the CS event buffer, and inserting an mmio read of
      the CS pointer status appears to be very successful in preventing the
      hang. Other registers, or uncached reads, or plain mb, do not prevent
      the hang, suggesting that register is key -- but that the hang can be
      prevented by a simple udelay, suggests it is just a timing issue like
      that encountered by commit 233c1ae3
      
       ("drm/i915/gt: Wait for CSB
      entries on Tigerlake"). Also note that the hang is not prevented by
      applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
      between requests.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Bruce Chang <yu.bruce.chang@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
      6ca7217d
    • Chris Wilson's avatar
      drm/i915/gem: Support parsing of oversize batches · 57b2d834
      Chris Wilson authored
      
      
      Matthew Auld noted that on more recent systems (such as the parser for
      gen9) we may have objects that are larger than expected by the GEM uAPI
      (i.e. greater than u32). These objects would have incorrect implicit
      batch lengths, causing the parser to reject them for being incomplete,
      or worse.
      
      Based on a patch by Matthew Auld.
      
      Reported-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Fixes: 435e8fc0
      
       ("drm/i915: Allow parsing of unsized batches")
      Testcase: igt/gem_exec_params/larger-than-life-batch
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk
      57b2d834
  10. Oct 15, 2020
    • Chris Wilson's avatar
      drm/i915/gt: Fixup tgl mocs for PTE tracking · a04ac827
      Chris Wilson authored
      Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
      caused display glitches. Though it is documented as deprecated (and so
      likely behaves as uncached) use the follow-pte bit and force it out of
      L3 cache.
      
      Fixes: 4d8a5cfe
      
       ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
      Testcase: igt/kms_frontbuffer_tracking
      Testcase: igt/kms_big_fb
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
      Cc: Lucas De Marchi <lucas.demarchi@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
      a04ac827
    • Ville Syrjälä's avatar
      drm/i915: Enable eLLC caching of display buffers for SKL+ · c0888e9e
      Ville Syrjälä authored
      
      
      Since SKL the eLLC has been sitting on the far side of the system
      agent, meaning the display engine can utilize it. Let's enable that.
      
      I chose WB for the caching mode, because my numbers are indicating
      that WT might actually be WB and WC might actually be UC. I'm not
      100% sure that is indeed the case but at least my simple rendercopy
      based benchmark didn't see any difference in performance.
      
      Also if I configure things to do LLCeLLC+WT I still get cache dirt
      on my screen, suggesting that is in fact operating in WB mode
      anyway. This is also the reason I had to fix the MOCS target cache
      to really say PTE rather than LLC+eLLC.
      Since SKL the eLLC has been sitting on the far side of the system agent,
      meaning the display engine can utilize it. Let's enable that.
      
      Eero's earlier benchmarks numbers:
      "* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily
         variation on the tested SKL machines
      
       * SKL GT4e (128MB eLLC) / Wayland / Weston:
         +15-20% SynMark TexMem512 (512MB of textures)
         +4-6% SynMark TerrainFly*, CSCloth, ShMapVsm
         -5-10% SynMark TexMem128 (128MB of textures)
      
       * SKL GT3e (64MB eLLC) / Xorg / Unity:
         +4-8% GpuTest Triangle fullscreen (FullHD)
         -5-10% GpuTest Triangle windowed (1/2 screen)
      
       * SKL GT2 (no eLLC) / Xorg / Unity:
         * Some of the higher FPS SynMark pixel and vertex shader tests
           are few percent higher, more than daily variance
         => Do you see any reason why this machine would be impacted
            although it doesn't eLLC?"
      
      Caveats:
      - Still haven't tested with a prime setup
      - Still not entirely sure this a good idea, but I've been
        using it on my cfl anyway :)
      
      v2: Split the MOCS PTE change out
      
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
      c0888e9e
    • Ville Syrjälä's avatar
      drm/i915: Fix MOCS PTE setting for gen9+ · 36b6b681
      Ville Syrjälä authored
      
      
      Fix up the MOCS PTE setting to really get the LLC cacheability
      from the PTE rather than hardocoding it to LLC or LLC+eLLC.
      
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-2-ville.syrjala@linux.intel.com
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-2-chris@chris-wilson.co.uk
      36b6b681
    • Ville Syrjälä's avatar
      drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init · d46b60a2
      Ville Syrjälä authored
      
      
      Currently we leave the cache_level of the initial fb obj
      set to NONE. This means on eLLC machines the first pin_to_display()
      will try to switch it to WT which requires a vma unbind+bind.
      If that happens during the fbdev initialization rcu does not
      seem operational which causes the unbind to get stuck. To
      most appearances this looks like a dead machine on boot.
      
      Avoid the unbind by already marking the object cache_level
      as WT when creating it. We still do an excplicit ggtt pin
      which will rewrite the PTEs anyway, so they will match whatever
      cache level we set.
      
      Cc: <stable@vger.kernel.org> # v5.7+
      Suggested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
      Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
      d46b60a2
  11. Oct 13, 2020
    • Ayaz A Siddiqui's avatar
      drm/i915/gt: Initialize reserved and unspecified MOCS indices · 4d8a5cfe
      Ayaz A Siddiqui authored
      
      
      In order to avoid functional breakage of mis-programmed applications that
      have grown to depend on unused MOCS entries, we are programming
      those entries to be equal to fully cached ("L3 + LLC") entry.
      
      These reserved and unspecified entries should not be used as they may be
      changed to less performant variants with better coherency in the future
      if more entries are needed.
      
      v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for
      programming default value, setting I915_MOCS_PTE index of tgl_mocs_table
      with desired value.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Lucas De Marchi <lucas.demarchi@intel.com>
      Cc: Tomasz Lis <tomasz.lis@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Francisco Jerez <currojerez@riseup.net>
      Cc: Mathew Alwin <alwin.mathew@intel.com>
      Cc: Mcguire Russell W <russell.w.mcguire@intel.com>
      Cc: Spruit Neil R <neil.r.spruit@intel.com>
      Cc: Zhou Cheng <cheng.zhou@intel.com>
      Cc: Benemelis Mike G <mike.g.benemelis@intel.com>
      
      Signed-off-by: default avatarAyaz A Siddiqui <ayaz.siddiqui@intel.com>
      Reviewed-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Acked-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com
      Cc: stable@vger.kernel.org
      4d8a5cfe
  12. Oct 07, 2020
    • Chris Wilson's avatar
      drm/i915/gt: Track the most recent pulse for the heartbeat · bf9bd6a5
      Chris Wilson authored
      
      
      Since we track the idle_pulse for flushing the barriers and avoid
      re-emitting the pulse upon idling if no futher action is required, this
      also impacts the heartbeat. Before emitting a fresh heartbeat, we look
      at the engine idle status and assume that if the pulse was the last
      request emitted along the heartbeat, the engine is idling and a
      heartbeat pulse not required. This assumption fails, but we can reuse
      the idle pulse as the heartbeat if we are yet to emit one, and so track
      the status of that pulse for our engine health check.
      
      This impacts tgl/rcs0 as we rely on the heartbeat for our healthcheck for
      the normal preemption detection mechanism is disabled by default.
      
      Testcase: igt/gem_exec_schedule/preempt-hang/rcs0 #tgl
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201006094653.7558-1-chris@chris-wilson.co.uk
      bf9bd6a5
  13. Oct 06, 2020
    • Tvrtko Ursulin's avatar
      drm/i915: Fix DMA mapped scatterlist lookup · 934941ed
      Tvrtko Ursulin authored
      
      
      As the previous patch fixed the places where we walk the whole scatterlist
      for DMA addresses, this patch fixes the random lookup functionality.
      
      To achieve this we have to add a second lookup iterator and add a
      i915_gem_object_get_sg_dma helper, to be used analoguous to existing
      i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per
      object and they are flushed at the same point for simplicity. (Strictly
      speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages,
      but today this conincides with unsetting of the pages in general.)
      
      Partial VMA view is then fixed to use the new DMA lookup and properly
      query sg length.
      
      v2:
       * Checkpatch.
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: Tom Murphy <murphyt7@tcd.ie>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
      934941ed
    • Tvrtko Ursulin's avatar
      drm/i915: Fix DMA mapped scatterlist walks · 8a473dba
      Tvrtko Ursulin authored
      
      
      When walking DMA mapped scatterlists sg_dma_len has to be used since it
      can be different (coalesced) from the backing store entry.
      
      This also means we have to end the walk when encountering a zero length
      DMA entry and cannot rely on the normal sg list end marker.
      
      Both issues were there in theory for some time but were hidden by the fact
      Intel IOMMU driver was never coalescing entries. As there are ongoing
      efforts to change this we need to start handling it.
      
      v2:
       * Use unsigned int for local storing sg_dma_len. (Logan)
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      References: 85d1225e ("drm/i915: Introduce & use new lightweight SGL iterators")
      References: b31144c0
      
       ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()")
      Reported-by: default avatarTom Murphy <murphyt7@tcd.ie>
      Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter
      Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com
      8a473dba
    • Chris Wilson's avatar
      drm/i915/gt: Scrub HW state on remove · 25dc89d5
      Chris Wilson authored
      
      
      Currently we do a final scrub of the HW state upon release. However,
      when rebinding the device, this is too late as the device may either
      have been partially rebound or the device is no longer accessible. If
      the device has been removed before release, the reset goes astray
      leaving the device in an inconsistent state, unlikely to work without a
      full PCI reset. Furthermore, if the device is partially rebound before
      the HW scrubbing, there may be leftover HW state that should have been
      scrubbed. Either way, we need to push the scrubbing earlier before the
      removal, so into unregister. The danger is that on older machines,
      resetting the GPU also impact the display engine and so the reset should
      be after modesetting is disabled (and before reuse we need to recover
      modesetting).
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2508
      Testcase: igt/core_hotunplug
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200929112639.24223-1-chris@chris-wilson.co.uk
      25dc89d5
    • Chris Wilson's avatar
      drm/i915: Skip over MI_NOOP when parsing · a6c5e2ae
      Chris Wilson authored
      
      
      Though less likely in practice, igt uses MI_NOOP frequently to pad out
      its batch buffers. The lookup and valiation of so many MI_NOOP command
      descriptions is noticeable, though the side-effect of poisoning the
      last-validated-command cache is more likely to impact upon real CS.
      
      Testcase: igt/gen9_exec_parse/bb-large
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201001102632.18789-1-chris@chris-wilson.co.uk
      a6c5e2ae
  14. Oct 01, 2020
  15. Sep 29, 2020
    • Chris Wilson's avatar
      drm/i915: Avoid mixing integer types during batch copies · b7eeb2b4
      Chris Wilson authored
      Be consistent and use unsigned long throughout the chunk copies to
      avoid the inherent clumsiness of mixing integer types of different
      widths and signs. Failing to take acount of a wider unsigned type when
      using min_t can lead to treating it as a negative, only for it flip back
      to a large unsigned value after passing a boundary check.
      
      Fixes: ed13033f
      
       ("drm/i915/cmdparser: Only cache the dst vmap")
      Testcase: igt/gen9_exec_parse/bb-large
      Reported-by: default avatar"Candelaria, Jared" <jared.candelaria@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: "Candelaria, Jared" <jared.candelaria@intel.com>
      Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com>
      Cc: <stable@vger.kernel.org> # v4.9+
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200928215942.31917-1-chris@chris-wilson.co.uk
      b7eeb2b4