Skip to content
  1. Jul 09, 2018
    • Chris Wilson's avatar
      drm/i915/selftests: Prevent background reaping of active objects · 932cac10
      Chris Wilson authored
      
      
      igt_mmap_offset_exhaustion() wants to test what happens when the mmap
      space is filled with zombie objects, objects discarded by userspace but
      still active on the GPU. As they are only protected by the active
      reference, we have to be certain that active reference is kept while we
      peek into our dangling pointer. That active reference should not be
      freed until we retire, but we do that retirement from a background
      thread. This leaves us with a subtle timing problem, exacerbated and
      highlighted by KASAN:
      
      <3>[  132.380399] BUG: KASAN: use-after-free in drm_gem_create_mmap_offset+0x8c/0xd0
      <3>[  132.380430] Read of size 8 at addr ffff8801e13245f8 by task drv_selftest/5822
      
      <4>[  132.380470] CPU: 0 PID: 5822 Comm: drv_selftest Tainted: G     U            4.18.0-rc3-g7ae7763aa2be-kasan_48+ #1
      <4>[  132.380473] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
      <4>[  132.380475] Call Trace:
      <4>[  132.380481]  dump_stack+0x7c/0xbb
      <4>[  132.380487]  print_address_description+0x65/0x270
      <4>[  132.380493]  kasan_report+0x25b/0x380
      <4>[  132.380497]  ? drm_gem_create_mmap_offset+0x8c/0xd0
      <4>[  132.380503]  drm_gem_create_mmap_offset+0x8c/0xd0
      <4>[  132.380584]  i915_gem_object_create_mmap_offset+0x6d/0x100 [i915]
      <4>[  132.380650]  igt_mmap_offset_exhaustion+0x462/0x940 [i915]
      <4>[  132.380714]  ? i915_gem_close_object+0x740/0x740 [i915]
      <4>[  132.380784]  ? igt_gem_huge+0x269/0x3d0 [i915]
      <4>[  132.380865]  __i915_subtests+0x5a/0x160 [i915]
      <4>[  132.380936]  __run_selftests+0x1a2/0x2f0 [i915]
      <4>[  132.381008]  i915_live_selftests+0x4e/0x80 [i915]
      <4>[  132.381071]  i915_pci_probe+0xd8/0x1b0 [i915]
      <4>[  132.381077]  pci_device_probe+0x1c5/0x3a0
      <4>[  132.381087]  driver_probe_device+0x6b6/0xcb0
      <4>[  132.381094]  __driver_attach+0x22d/0x2c0
      <4>[  132.381100]  ? driver_probe_device+0xcb0/0xcb0
      <4>[  132.381103]  bus_for_each_dev+0x113/0x1a0
      <4>[  132.381108]  ? check_flags.part.24+0x450/0x450
      <4>[  132.381112]  ? subsys_dev_iter_exit+0x10/0x10
      <4>[  132.381123]  bus_add_driver+0x38b/0x6e0
      <4>[  132.381131]  driver_register+0x189/0x400
      <4>[  132.381136]  ? 0xffffffffc12d8000
      <4>[  132.381140]  do_one_initcall+0xa0/0x4c0
      <4>[  132.381145]  ? initcall_blacklisted+0x180/0x180
      <4>[  132.381152]  ? do_init_module+0x4a/0x54c
      <4>[  132.381156]  ? rcu_lockdep_current_cpu_online+0xdc/0x130
      <4>[  132.381161]  ? kasan_unpoison_shadow+0x30/0x40
      <4>[  132.381169]  do_init_module+0x1b5/0x54c
      <4>[  132.381177]  load_module+0x619e/0x9b70
      <4>[  132.381202]  ? module_frob_arch_sections+0x20/0x20
      <4>[  132.381211]  ? vfs_read+0x257/0x2f0
      <4>[  132.381214]  ? vfs_read+0x257/0x2f0
      <4>[  132.381221]  ? kernel_read+0x8b/0x130
      <4>[  132.381231]  ? copy_strings_kernel+0x120/0x120
      <4>[  132.381244]  ? __se_sys_finit_module+0x17c/0x1a0
      <4>[  132.381248]  __se_sys_finit_module+0x17c/0x1a0
      <4>[  132.381252]  ? __ia32_sys_init_module+0xa0/0xa0
      <4>[  132.381261]  ? __se_sys_newstat+0x77/0xd0
      <4>[  132.381265]  ? cp_new_stat+0x590/0x590
      <4>[  132.381269]  ? kmem_cache_free+0x2f0/0x340
      <4>[  132.381285]  do_syscall_64+0x97/0x400
      <4>[  132.381292]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      <4>[  132.381295] RIP: 0033:0x7eff4af46839
      <4>[  132.381297] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
      <4>[  132.381426] RSP: 002b:00007ffcd84f4cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      <4>[  132.381432] RAX: ffffffffffffffda RBX: 000055dfdeb429a0 RCX: 00007eff4af46839
      <4>[  132.381435] RDX: 0000000000000000 RSI: 000055dfdeb43670 RDI: 0000000000000004
      <4>[  132.381437] RBP: 000055dfdeb43670 R08: 0000000000000004 R09: 0000000000000000
      <4>[  132.381440] R10: 00007ffcd84f4e60 R11: 0000000000000246 R12: 0000000000000000
      <4>[  132.381442] R13: 000055dfdeb3bec0 R14: 0000000000000000 R15: 000000000000003b
      
      <3>[  132.381466] Allocated by task 5822:
      <4>[  132.381485]  kmem_cache_alloc+0xdf/0x2e0
      <4>[  132.381546]  i915_gem_object_create_internal+0x24/0x1e0 [i915]
      <4>[  132.381609]  igt_mmap_offset_exhaustion+0x257/0x940 [i915]
      <4>[  132.381677]  __i915_subtests+0x5a/0x160 [i915]
      <4>[  132.381742]  __run_selftests+0x1a2/0x2f0 [i915]
      <4>[  132.381806]  i915_live_selftests+0x4e/0x80 [i915]
      <4>[  132.381865]  i915_pci_probe+0xd8/0x1b0 [i915]
      <4>[  132.381868]  pci_device_probe+0x1c5/0x3a0
      <4>[  132.381871]  driver_probe_device+0x6b6/0xcb0
      <4>[  132.381874]  __driver_attach+0x22d/0x2c0
      <4>[  132.381877]  bus_for_each_dev+0x113/0x1a0
      <4>[  132.381880]  bus_add_driver+0x38b/0x6e0
      <4>[  132.381884]  driver_register+0x189/0x400
      <4>[  132.381886]  do_one_initcall+0xa0/0x4c0
      <4>[  132.381889]  do_init_module+0x1b5/0x54c
      <4>[  132.381892]  load_module+0x619e/0x9b70
      <4>[  132.381895]  __se_sys_finit_module+0x17c/0x1a0
      <4>[  132.381898]  do_syscall_64+0x97/0x400
      <4>[  132.381901]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      <3>[  132.381914] Freed by task 150:
      <4>[  132.381931]  kmem_cache_free+0xb7/0x340
      <4>[  132.381995]  __i915_gem_free_objects+0x875/0xf50 [i915]
      <4>[  132.382054]  __i915_gem_free_work+0x69/0xb0 [i915]
      <4>[  132.382058]  process_one_work+0x78b/0x1740
      <4>[  132.382061]  worker_thread+0x82/0xb80
      <4>[  132.382064]  kthread+0x30c/0x3d0
      <4>[  132.382067]  ret_from_fork+0x3a/0x50
      
      <3>[  132.382081] The buggy address belongs to the object at ffff8801e1324500
                         which belongs to the cache drm_i915_gem_object of size 1168
      <3>[  132.382133] The buggy address is located 248 bytes inside of
                         1168-byte region [ffff8801e1324500, ffff8801e1324990)
      <3>[  132.382179] The buggy address belongs to the page:
      <0>[  132.382202] page:ffffea000784c800 count:1 mapcount:0 mapping:ffff8801dedf6500 index:0xffff8801e1323ec0 compound_mapcount: 0
      <0>[  132.382251] flags: 0x8000000000008100(slab|head)
      <1>[  132.382274] raw: 8000000000008100 ffff8801d6317440 ffff8801d6317440 ffff8801dedf6500
      <1>[  132.382307] raw: ffff8801e1323ec0 0000000000140013 00000001ffffffff 0000000000000000
      <1>[  132.382339] page dumped because: kasan: bad access detected
      
      <3>[  132.382373] Memory state around the buggy address:
      <3>[  132.382395]  ffff8801e1324480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      <3>[  132.382426]  ffff8801e1324500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      <3>[  132.382457] >ffff8801e1324580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      <3>[  132.382488]                                                                 ^
      <3>[  132.382517]  ffff8801e1324600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      <3>[  132.382548]  ffff8801e1324680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      This patch tricks the system into running without the background retire
      thread, until after we finish the test. The only reaping should then be
      performed by the mmap offset routine to reclaim the space as required.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-1-chris@chris-wilson.co.uk
      932cac10
    • Chris Wilson's avatar
      drm/i915/selftests: Replace wait-on-timeout with explicit timeout · d9a13ce3
      Chris Wilson authored
      
      
      In igt_flush_test() we install a background timer in order to ensure
      that the wait completes within a certain time. We can now tell the wait
      that it has to complete within a timeout, and so no longer need the
      background timer.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-3-chris@chris-wilson.co.uk
      d9a13ce3
    • Chris Wilson's avatar
      drm/i915: Provide a timeout to i915_gem_wait_for_idle() on setup · 2621cefa
      Chris Wilson authored
      
      
      With a broken GPU we expect it to fail during the initial
      GPU setup where do a couple of context switches to record the defaults.
      This is a task that takes a few milliseconds even on the slowest of
      devices, but we may have to wait 60s for hangcheck to give in and
      declare the machine inoperable. In this a case where any gpu hang is
      unacceptable, both from a timeliness and practical standpoint.
      
      We can therefore set a timeout on our wait-for-idle that is shorter than
      the hangcheck (which may be up to 60s for a declaring a wedged driver)
      and so detect the broken GPU much more quickly during driver load (and
      so prevent stalling userspace for ages).
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-2-chris@chris-wilson.co.uk
      2621cefa
    • Chris Wilson's avatar
      drm/i915: Provide a timeout to i915_gem_wait_for_idle() · ec625fb9
      Chris Wilson authored
      
      
      Usually we have no idea about the upper bound we need to wait to catch
      up with userspace when idling the device, but in a few situations we
      know the system was idle beforehand and can provide a short timeout in
      order to very quickly catch a failure, long before hangcheck kicks in.
      
      In the following patches, we will use the timeout to curtain two overly
      long waits, where we know we can expect the GPU to complete within a
      reasonable time or declare it broken.
      
      In particular, with a broken GPU we expect it to fail during the initial
      GPU setup where do a couple of context switches to record the defaults.
      This is a task that takes a few milliseconds even on the slowest of
      devices, but we may have to wait 60s for hangcheck to give in and
      declare the machine inoperable. In this a case where any gpu hang is
      unacceptable, both from a timeliness and practical standpoint.
      
      The other improvement is that in selftests, we do not need to arm an
      independent timer to inject a wedge, as we can just limit the timeout on
      the wait directly.
      
      v2: Include the timeout parameter in the trace.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
      ec625fb9
    • Chris Wilson's avatar
      drm/i915/selftests: Magic numbers for old Y-tiling · e1479132
      Chris Wilson authored
      
      
      i915g has a slightly different tiling layout, and so requires a
      different reference swizzle pattern.
      
      Testcase: igt/drv_selftests/live_objects #gdg
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180707100405.817-1-chris@chris-wilson.co.uk
      e1479132
  2. Jul 07, 2018
  3. Jul 06, 2018