Skip to content
  1. Sep 29, 2020
    • Chris Wilson's avatar
      drm/i915: Cancel outstanding work after disabling heartbeats on an engine · 7a991cd3
      Chris Wilson authored
      We only allow persistent requests to remain on the GPU past the closure
      of their containing context (and process) so long as they are continuously
      checked for hangs or allow other requests to preempt them, as we need to
      ensure forward progress of the system. If we allow persistent contexts
      to remain on the system after the the hangcheck mechanism is disabled,
      the system may grind to a halt. On disabling the mechanism, we sent a
      pulse along the engine to remove all executing contexts from the engine
      which would check for hung contexts -- but we did not prevent those
      contexts from being resubmitted if they survived the final hangcheck.
      
      Fixes: 9a40bddd
      
       ("drm/i915/gt: Expose heartbeat interval via sysfs")
      Testcase: igt/gem_ctx_persistence/heartbeat-stop
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.7+
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Acked-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
      7a991cd3
  2. Sep 26, 2020
    • Chris Wilson's avatar
      drm/i915/gem: Hold request reference for canceling an active context · badef44d
      Chris Wilson authored
      We have to be very careful while walking the timeline->requests list
      under the RCU guard, as the requests (and so rq->link) use
      SLAB_TYPESAFE_BY_RCU and so the requests may be reallocated within an
      rcu grace period. As the requests are reallocated, they are removed from
      one list and placed on another, and if we are iterating over that
      request at that moment, the list iteration jumps from one list to the
      next and promptly gets confused. Verify we hold the request reference
      to ensure that the request is not added to a new list behind our backs.
      
      <4> [582.745252] general protection fault, probably for non-canonical address 0xcccccccccccccd5c: 0000 [#1] PREEMPT SMP PTI
      <4> [582.745297] CPU: 0 PID: 1475 Comm: gem_ctx_persist Not tainted 5.9.0-rc1-CI-CI_DRM_8908+ #1
      <4> [582.745304] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
      <4> [582.745317] RIP: 0010:__lock_acquire+0x2c3/0x1f40
      <4> [582.745323] Code: 00 65 8b 05 c7 8a ef 7e 85 c0 0f 85 b4 07 00 00 44 8b 9d c4 08 00 00 45 85 db 0f 84 0f 01 00 00 ba 05 00 00 00 e9 c8 06 00 00 <48> 81 3f c0 89 c7 82 b8 00 00 00 00 41 0f 45 c0 83 fe 01 41 89 c3
      <4> [582.745334] RSP: 0018:ffffc9000461bc40 EFLAGS: 00010002
      <4> [582.745340] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
      <4> [582.745345] RDX: 0000000000000000 RSI: 0000000000000000 RDI: cccccccccccccd5c
      <4> [582.745350] RBP: ffff8881ec4a2880 R08: 0000000000000001 R09: 0000000000000001
      <4> [582.745356] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      <4> [582.745361] R13: 0000000000000000 R14: 0000000000000000 R15: cccccccccccccd5c
      <4> [582.745367] FS:  00007fb44da78e40(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
      <4> [582.745373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [582.745378] CR2: 00007fb44daad040 CR3: 0000000268428000 CR4: 0000000000350ef0
      <4> [582.745383] Call Trace:
      <4> [582.745390]  ? __lock_acquire+0x913/0x1f40
      <4> [582.745397]  lock_acquire+0xb5/0x3c0
      <4> [582.745526]  ? kill_engines+0x19a/0x4b0 [i915]
      <4> [582.745533]  ? find_held_lock+0x2d/0x90
      <4> [582.745541]  _raw_spin_lock_irq+0x30/0x40
      <4> [582.745635]  ? kill_engines+0x19a/0x4b0 [i915]
      <4> [582.745727]  kill_engines+0x19a/0x4b0 [i915]
      <4> [582.745820]  context_close+0x195/0x410 [i915]
      <4> [582.745912]  i915_gem_context_close+0x5b/0x160 [i915]
      <4> [582.745994]  i915_driver_postclose+0x14/0x40 [i915]
      <4> [582.746003]  drm_file_free.part.13+0x240/0x290
      <4> [582.746009]  drm_release_noglobal+0x16/0x50
      <4> [582.746016]  __fput+0xa5/0x250
      <4> [582.746021]  task_work_run+0x6e/0xb0
      <4> [582.746028]  exit_to_user_mode_prepare+0x178/0x180
      <4> [582.746034]  syscall_exit_to_user_mode+0x36/0x220
      <4> [582.746040]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      <4> [582.746045] RIP: 0033:0x7fb44d1dc421
      <4> [582.746050] Code: f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 8b 05 ea cf 20 00 85 c0 75 16 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10
      <4> [582.746062] RSP: 002b:00007ffed2e83818 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      <4> [582.746069] RAX: 0000000000000000 RBX: 0000556410bfe840 RCX: 00007fb44d1dc421
      <4> [582.746075] RDX: 000000000000000a RSI: 00000000c0406469 RDI: 0000000000000008
      <4> [582.746080] RBP: 0000000000000008 R08: 00007fb44d1c51cc R09: 00007fb44d1c5240
      <4> [582.746086] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000fffffffb
      <4> [582.746091] R13: 0000000000000006 R14: 0000000000000000 R15: 000000000000000a
      <4> [582.746099] Modules linked in: vgem mei_hdcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio btusb btrtl btbcm btintel x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul bluetooth ghash_clmulni_intel ecdh_generic ecc i915 r8169 realtek mei_me mei snd_hda_intel i2c_hid snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm pinctrl_geminilake pinctrl_intel prime_numbers [last unloaded: test_drm_mm]
      
      Fixes: 736e785f
      
       ("drm/i915/gem: Reduce context termination list iteration guard to RCU")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-2-chris@chris-wilson.co.uk
      badef44d
    • Chris Wilson's avatar
      drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks" · 35faeb7d
      Chris Wilson authored
      The reordering and rebasing of commit 2e4c6c1a ("drm/i915: Remove
      i915_request.lock requirement for execution callbacks") caused it to
      revert an earlier correction. Let us restore commit 99f0a640d464
      ("drm/i915: Remove requirement for holding i915_request.lock for
      breadcrumbs")
      
      Fixes: 2e4c6c1a
      
       ("drm/i915: Remove i915_request.lock requirement for execution callbacks")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk
      35faeb7d
  3. Sep 24, 2020
    • Chris Wilson's avatar
      drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex · 102f5aa4
      Chris Wilson authored
      
      
      Since the debugfs may peek into the GEM contexts as the corresponding
      client/fd is being closed, we may try and follow a dangling pointer.
      However, the context closure itself is serialised with the ctx->mutex,
      so if we hold that mutex as we inspect the state coupled in the context,
      we know the pointers within the context are stable and will remain valid
      as we inspect their tables.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: CQ Tang <cq.tang@intel.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200723172119.17649-3-chris@chris-wilson.co.uk
      102f5aa4
  4. Sep 23, 2020
  5. Sep 22, 2020
    • Matthew Auld's avatar
      drm/i915: check i915_vm_alloc_pt_stash for errors · 1604cb2a
      Matthew Auld authored
      If we are really unlucky and encounter an error during
      i915_vm_alloc_pt_stash, we end up passing an empty pt/pd stash all the
      way down into the low-level ppgtt alloc code, leading to explosions,
      since it expects at least the required number of pt/pd for the va range.
      
      [  211.981418] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  211.981421] #PF: supervisor read access in kernel mode
      [  211.981422] #PF: error_code(0x0000) - not-present page
      [  211.981424] PGD 80000008439cb067 P4D 80000008439cb067 PUD 84a37f067 PMD 0
      [  211.981427] Oops: 0000 [#1] SMP PTI
      [  211.981428] CPU: 1 PID: 1301 Comm: i915_selftest Tainted: G     U    I       5.9.0-rc5+ #3
      [  211.981430] Hardware name:  /NUC6i7KYB, BIOS KYSKLi70.86A.0050.2017.0831.1924 08/31/2017
      [  211.981521] RIP: 0010:__gen8_ppgtt_alloc+0x1ed/0x3c0 [i915]
      [  211.981523] Code: c1 48 c7 c7 5d 5d fe c0 65 ff 0d ee 1d 03 3f e8 d9 91 1f e2 8b 55 c4 31 c0 48 8b 75 b8 85 d2 0f 95 c0 48 8b 1c c6 48 89 45 98 <48> 8b 03 48 8b 90 58 02 00 00 48 85 d2 0f 84 07 ea 15 00 48 81 fa
      [  211.981526] RSP: 0018:ffffba2cc0eb3970 EFLAGS: 00010202
      [  211.981527] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000004
      [  211.981529] RDX: 0000000000000002 RSI: ffff9be998bdb8c0 RDI: ffff9be99c844300
      [  211.981530] RBP: ffffba2cc0eb39d8 R08: 0000000000000640 R09: ffff9be97cdfd000
      [  211.981531] R10: ffff9be97cdfd614 R11: 0000000000000000 R12: 0000000000000000
      [  211.981532] R13: ffff9be98607ba20 R14: ffff9be995a0b400 R15: ffffba2cc0eb39e8
      [  211.981534] FS:  00007f0f10b31000(0000) GS:ffff9be99fc40000(0000) knlGS:0000000000000000
      [  211.981536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  211.981538] CR2: 0000000000000000 CR3: 000000084d74e006 CR4: 00000000003706e0
      [  211.981539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  211.981541] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  211.981542] Call Trace:
      [  211.981609]  gen8_ppgtt_alloc+0x79/0x90 [i915]
      [  211.981678]  ppgtt_bind_vma+0x36/0x80 [i915]
      [  211.981756]  __vma_bind+0x39/0x40 [i915]
      [  211.981818]  fence_work+0x21/0x98 [i915]
      [  211.981879]  fence_notify+0x8d/0x128 [i915]
      [  211.981939]  __i915_sw_fence_complete+0x62/0x240 [i915]
      [  211.982018]  i915_vma_pin_ww+0x1ee/0x9c0 [i915]
      
      Fixes: cd0452aa
      
       ("drm/i915: Preallocate stashes for vma page-directories")
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200921160844.73186-1-matthew.auld@intel.com
      1604cb2a
  6. Sep 21, 2020
  7. Sep 18, 2020
    • Chris Wilson's avatar
      drm/i915/gt: Remove defunct intel_virtual_engine_get_sibling() · 29545e5c
      Chris Wilson authored
      
      
      As the last user was eliminated in commit e21fecdcde40 ("drm/i915/gt:
      Distinguish the virtual breadcrumbs from the irq breadcrumbs"), we can
      remove the function. One less implementation detail creeping beyond its
      scope.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200826132811.17577-16-chris@chris-wilson.co.uk
      29545e5c
    • Chris Wilson's avatar
      drm/i915: Reduce GPU error capture mutex hold time · f2acf740
      Chris Wilson authored
      
      
      Shrink the hold time for the error capture mutex to just around the
      acquire/release of the PTE used for reading back the object via the
      Global GTT. For platforms that do not need the GGTT read back, we can
      skip the mutex entirely and allow concurrent error capture. Where we do
      use the GGTT, by restricting the hold time around the slow readback and
      compression, we are more resilient against softlockups (khungtaskd) as
      the heartbeat may well also trigger an error while the first is on
      going, and this allows the heartbeat reset to skip past the capture and
      not be stalled.
      
      Testcase: igt/gem_exec_capture/many-*
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-3-chris@chris-wilson.co.uk
      f2acf740
    • Chris Wilson's avatar
      drm/i915: Break up error capture compression loops with cond_resched() · 293f43c8
      Chris Wilson authored
      
      
      As the error capture will compress user buffers as directed to by the
      user, it can take an arbitrary amount of time and space. Break up the
      compression loops with a call to cond_resched(), that will allow other
      processes to schedule (avoiding the soft lockups) and also serve as a
      warning should we try to make this loop atomic in the future.
      
      Testcase: igt/gem_exec_capture/many-*
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-2-chris@chris-wilson.co.uk
      293f43c8
    • Chris Wilson's avatar
      drm/i915/gt: Show engine properties in the pretty printer · 0bda4b80
      Chris Wilson authored
      
      
      When debugging the engine state, include the user properties that may,
      or may not, have been adjusted by the user/test.
      
      For example,
      vecs0
      	...
      	Properties:
      		heartbeat_interval_ms: 2500 [default 2500]
      		max_busywait_duration_ns: 8000 [default 8000]
      		preempt_timeout_ms: 640 [default 640]
      		stop_timeout_ms: 100 [default 100]
      		timeslice_duration_ms: 1 [default 1]
      
      Suggested-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-1-chris@chris-wilson.co.uk
      0bda4b80
  8. Sep 17, 2020
  9. Sep 15, 2020
    • Chris Wilson's avatar
      drm/i915/gt: Use a mmio read of the CSB in case of failure · 4ff64bcf
      Chris Wilson authored
      
      
      If we find the GPU didn't update the CSB within 50us, we currently fail
      and eventually reset the GPU. Lets report the value from the mmio space
      as a last resort, it may just stave off an unnecessary GPU reset.
      
      References: HSDES#22011327657
      Suggested-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-4-chris@chris-wilson.co.uk
      4ff64bcf
    • Chris Wilson's avatar
      drm/i915/gt: Apply the CSB w/a for all · 884c4074
      Chris Wilson authored
      Since we expect to inline the csb_parse() routines, the w/a for the
      stale CSB data on Tigerlake will be pulled into process_csb(), and so we
      might as well simply reuse the logic for all, and so will hopefully
      avoid any strange behaviour on Icelake that was not covered by our
      previous w/a.
      
      References: d8f50531
      
       ("drm/i915/icl: Forcibly evict stale csb entries")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Bruce Chang <yu.bruce.chang@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-3-chris@chris-wilson.co.uk
      884c4074
    • Chris Wilson's avatar
      drm/i915/gt: Wait for CSB entries on Tigerlake · 233c1ae3
      Chris Wilson authored
      On Tigerlake, we are seeing a repeat of commit d8f50531 ("drm/i915/icl:
      Forcibly evict stale csb entries") where, presumably, due to a missing
      Global Observation Point synchronisation, the write pointer of the CSB
      ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
      we see the GPU report more context-switch entries for us to parse, but
      those entries have not been written, leading us to process stale events,
      and eventually report a hung GPU.
      
      However, this effect appears to be much more severe than we previously
      saw on Icelake (though it might be best if we try the same approach
      there as well and measure), and Bruce suggested the good idea of resetting
      the CSB entry after use so that we can detect when it has been updated by
      the GPU. By instrumenting how long that may be, we can set a reliable
      upper bound for how long we should wait for:
      
          513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
      References: d8f50531
      
       ("drm/i915/icl: Forcibly evict stale csb entries")
      References: HSDES#22011327657, HSDES#1508287568
      Suggested-by: default avatarBruce Chang <yu.bruce.chang@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Bruce Chang <yu.bruce.chang@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: stable@vger.kernel.org # v5.4
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
      233c1ae3
    • Chris Wilson's avatar
      drm/i915/gt: Widen CSB pointer to u64 for the parsers · f24a44e5
      Chris Wilson authored
      
      
      A CSB entry is 64b, and it is simpler for us to treat it as an array of
      64b entries than as an array of pairs of 32b entries.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
      f24a44e5
    • Chris Wilson's avatar
      drm/i915/gt: Check for a registered driver with IPS · 6cb304b3
      Chris Wilson authored
      
      
      If the ips module calls into the driver during an unbind/bind cycle, we
      may see the driver while it has unregistered itself from ips and try and
      dereference a NULL ips_mchdev pointer.
      
      <1> [211.928844] BUG: kernel NULL pointer dereference, address: 0000000000000014
      <1> [211.928861] #PF: supervisor read access in kernel mode
      <1> [211.928871] #PF: error_code(0x0000) - not-present page
      <6> [211.928881] PGD 0 P4D 0
      <4> [211.928890] Oops: 0000 [#1] PREEMPT SMP PTI
      <4> [211.928900] CPU: 3 PID: 327 Comm: ips-monitor Not tainted 5.9.0-rc5-CI-CI_DRM_9008+ #1
      <4> [211.928914] Hardware name: Hewlett-Packard HP EliteBook 8440p/172A, BIOS 68CCU Ver. F.24 09/13/2013
      <4> [211.929056] RIP: 0010:mchdev_get+0x5a/0x180 [i915]
      <4> [211.929067] Code: c0 5a 74 0d 80 3d f1 53 29 00 00 0f 84 ab 00 00 00 48 8b 1d c8 a8 29 00 e8 d3 18 89 e1 85 c0 74 09 80 3d d1 53 29 00 00 74 65 <8b> 4b 14 48 8d 7b 14 85 c9 0f 84 09 01 00 00 8d 51 01 89 c8 f0 0f
      <4> [211.929095] RSP: 0018:ffffc900002efe60 EFLAGS: 00010202
      <4> [211.929105] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff8881297acf40
      <4> [211.929118] RDX: 0000000000000000 RSI: ffffffff8264e2c0 RDI: ffff8881297ad820
      <4> [211.929130] RBP: ffffc900002efe68 R08: ffff8881297ad820 R09: 00000000fffffffe
      <4> [211.929143] R10: ffff8881297acf40 R11: 00000000fff74c96 R12: ffff8881294dfa18
      <4> [211.929155] R13: 0000000000000067 R14: ffff888126eff640 R15: ffff888126efe840
      <4> [211.929168] FS:  0000000000000000(0000) GS:ffff888133d80000(0000) knlGS:0000000000000000
      <4> [211.929182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [211.929194] CR2: 0000000000000014 CR3: 0000000002610000 CR4: 00000000000006e0
      <4> [211.929206] Call Trace:
      <4> [211.929294]  i915_read_mch_val+0x15/0x380 [i915]
      <4> [211.929309]  ? ips_monitor+0x3fb/0x630 [intel_ips]
      <4> [211.929321]  ips_monitor+0x53c/0x630 [intel_ips]
      <4> [211.929334]  ? ips_gpu_lower+0x30/0x30 [intel_ips]
      <4> [211.929348]  kthread+0x14d/0x170
      <4> [211.929358]  ? kthread_park+0x80/0x80
      <4> [211.929369]  ret_from_fork+0x22/0x30
      <4> [211.929382] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei intel_ips lpc_ich ptp prime_numbers pps_core
      <4> [211.929437] CR2: 0000000000000014
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915105113.26564-1-chris@chris-wilson.co.uk
      6cb304b3
    • Chris Wilson's avatar
      drm/i915/gt: Clear the buffer pool age before use · 9bb34ff2
      Chris Wilson authored
      If we create a new node, it is possible for the slab allocator to return
      us a recently freed node. If that node was just retired, it will retain
      the current jiffy as its node->age. There is then a miniscule window,
      where as that node is retired, it will appear on the free list with an
      incorrect age and be eligible for reuse by one thread, and then by a
      second thread as the correct node->age is written.
      
      Fixes: 06b73c2d
      
       ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-3-chris@chris-wilson.co.uk
      9bb34ff2
    • Chris Wilson's avatar
      drm/i915/gem: Prevent using pgprot_writecombine() if PAT is not supported · 121ba69f
      Chris Wilson authored
      Let's not try and use PAT attributes for I915_MAP_WC if the CPU doesn't
      support PAT.
      
      Fixes: 6056e500
      
       ("drm/i915/gem: Support discontiguous lmem object maps")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: <stable@vger.kernel.org> # v5.6+
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-2-chris@chris-wilson.co.uk
      121ba69f
    • Chris Wilson's avatar
      drm/i915/gem: Avoid implicit vmap for highmem on x86-32 · 060bb115
      Chris Wilson authored
      On 32b, highmem using a finite set of indirect PTE (i.e. vmap) to provide
      virtual mappings of the high pages.  As these are finite, map_new_virtual()
      must wait for some other kmap() to finish when it runs out. If we map a
      large number of objects, there is no method for it to tell us to release
      the mappings, and we deadlock.
      
      However, if we make an explicit vmap of the page, that uses a larger
      vmalloc arena, and also has the ability to tell us to release unwanted
      mappings. Most importantly, it will fail and propagate an error instead
      of waiting forever.
      
      Fixes: fb8621d3 ("drm/i915: Avoid allocating a vmap arena for a single page") #x86-32
      References: e87666b5
      
       ("drm/i915/shrinker: Hook up vmap allocation failure notifier")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: <stable@vger.kernel.org> # v4.7+
      Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-1-chris@chris-wilson.co.uk
      060bb115
  10. Sep 07, 2020