Skip to content
  1. May 22, 2019
    • Tvrtko Ursulin's avatar
      drm/i915: Engine discovery query · c5d3e39c
      Tvrtko Ursulin authored
      
      
      Engine discovery query allows userspace to enumerate engines, probe their
      configuration features, all without needing to maintain the internal PCI
      ID based database.
      
      A new query for the generic i915 query ioctl is added named
      DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure
      drm_i915_query_engine_info. The address of latter should be passed to the
      kernel in the query.data_ptr field, and should be large enough for the
      kernel to fill out all known engines as struct drm_i915_engine_info
      elements trailing the query.
      
      As with other queries, setting the item query length to zero allows
      userspace to query minimum required buffer size.
      
      Enumerated engines have common type mask which can be used to query all
      hardware engines, versus engines userspace can submit to using the execbuf
      uAPI.
      
      Engines also have capabilities which are per engine class namespace of
      bits describing features not present on all engine instances.
      
      v2:
       * Fixed HEVC assignment.
       * Reorder some fields, rename type to flags, increase width. (Lionel)
       * No need to allocate temporary storage if we do it engine by engine.
         (Lionel)
      
      v3:
       * Describe engine flags and mark mbz fields. (Lionel)
       * HEVC only applies to VCS.
      
      v4:
       * Squash SFC flag into main patch.
       * Tidy some comments.
      
      v5:
       * Add uabi_ prefix to engine capabilities. (Chris Wilson)
       * Report exact size of engine info array. (Chris Wilson)
       * Drop the engine flags. (Joonas Lahtinen)
       * Added some more reserved fields.
       * Move flags after class/instance.
      
      v6:
       * Do not check engine info array was zeroed by userspace but zero the
         unused fields for them instead.
      
      v7:
       * Simplify length calculation loop. (Lionel Landwerlin)
      
      v8:
       * Remove MBZ comments where not applicable.
       * Rename ABI flags to match engine class define naming.
       * Rename SFC ABI flag to reflect it applies to VCS and VECS.
       * SFC is wired to even _logical_ engine instances.
       * SFC applies to VCS and VECS.
       * HEVC is present on all instances on Gen11. (Tony)
       * Simplify length calculation even more. (Chris Wilson)
       * Move info_ptr assigment closer to loop for clarity. (Chris Wilson)
       * Use vdbox_sfc_access from runtime info.
       * Rebase for RUNTIME_INFO.
       * Refactor for lower indentation.
       * Rename uAPI class/instance to engine_class/instance to avoid C++
         keyword.
      
      v9:
       * Rebase for s/num_rings/num_engines/ in RUNTIME_INFO.
      
      v10:
       * Use new copy_query_item.
      
      v11:
       * Consolidate with struct i915_engine_class_instnace.
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tony Ye <tony.ye@intel.com>
      Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v7
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190522090054.6007-1-tvrtko.ursulin@linux.intel.com
      c5d3e39c
    • Tvrtko Ursulin's avatar
      drm/i915/icl: Add WaDisableBankHangMode · cbe3e1d1
      Tvrtko Ursulin authored
      Disable GPU hang by default on unrecoverable ECC cache errors.
      
      v2:
       * Rebase.
      
      v3:
       * Use intel_uncore_read. (Chris)
      
      Fixes: cc38cae7
      
       ("drm/i915/icl: Introduce initial Icelake Workarounds")
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Acked-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190520110442.403-2-tvrtko.ursulin@linux.intel.com
      cbe3e1d1
    • Tvrtko Ursulin's avatar
      drm/i915/selftests: Verify context workarounds · fde93886
      Tvrtko Ursulin authored
      
      
      Test context workarounds have been correctly applied in newly created
      contexts.
      
      To accomplish this the existing engine_wa_list_verify helper is extended
      to take in a context from which reading of the workaround list will be
      done.
      
      Context workaround verification is done from the existing subtests, which
      have been renamed to reflect they are no longer only about GT and engine
      workarounds.
      
      v2:
       * Test after resets and refactor to use intel_context more. (Chris)
      
      v3:
       * Use ce->engine->i915 instead of ce->gem_context->i915. (Chris)
       * gem_engine_iter.idx is engine->id + 1. (Chris)
      
      v4:
       * Make local function static.
      
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190520142546.12493-1-tvrtko.ursulin@linux.intel.com
      fde93886
    • Chris Wilson's avatar
      drm/i915: Allow specification of parallel execbuf · a88b6e4c
      Chris Wilson authored
      
      
      There is a desire to split a task onto two engines and have them run at
      the same time, e.g. scanline interleaving to spread the workload evenly.
      Through the use of the out-fence from the first execbuf, we can
      coordinate secondary execbuf to only become ready simultaneously with
      the first, so that with all things idle the second execbufs are executed
      in parallel with the first. The key difference here between the new
      EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
      waits for the completion of the first request (so that all of its
      rendering results are visible to the second execbuf, the more common
      userspace fence requirement).
      
      Since we only have a single input fence slot, userspace cannot mix an
      in-fence and a submit-fence. It has to use one or the other! This is not
      such a harsh requirement, since by virtue of the submit-fence, the
      secondary execbuf inherit all of the dependencies from the first
      request, and for the application the dependencies should be common
      between the primary and secondary execbuf.
      
      Suggested-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Testcase: igt/gem_exec_fence/parallel
      Link: https://github.com/intel/media-driver/pull/546
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-10-chris@chris-wilson.co.uk
      a88b6e4c
    • Chris Wilson's avatar
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson authored
      
      
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • Chris Wilson's avatar
      drm/i915: Extend execution fence to support a callback · f71e01a7
      Chris Wilson authored
      
      
      In the next patch, we will want to configure the slave request
      depending on which physical engine the master request is executed on.
      For this, we introduce a callback from the execute fence to convey this
      information.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk
      f71e01a7
    • Chris Wilson's avatar
      drm/i915: Apply an execution_mask to the virtual_engine · 78e41ddd
      Chris Wilson authored
      
      
      Allow the user to direct which physical engines of the virtual engine
      they wish to execute one, as sometimes it is necessary to override the
      load balancing algorithm.
      
      v2: Only kick the virtual engines on context-out if required
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
      78e41ddd
    • Chris Wilson's avatar
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson authored
      
      
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
    • Chris Wilson's avatar
      drm/i915: Allow userspace to clone contexts on creation · b81dde71
      Chris Wilson authored
      
      
      A usecase arose out of handling context recovery in mesa, whereby they
      wish to recreate a context with fresh logical state but preserving all
      other details of the original. Currently, they create a new context and
      iterate over which bits they want to copy across, but it would much more
      convenient if they were able to just pass in a target context to clone
      during creation. This essentially extends the setparam during creation
      to pull the details from a target context instead of the user supplied
      parameters.
      
      The ideal here is that we don't expose control over anything more than
      can be obtained via CONTEXT_PARAM. That is userspace retains explicit
      control over all features, and this api is just convenience.
      
      For example, you could replace
      
      	struct context_param p = { .param = CONTEXT_PARAM_VM };
      
      	param.ctx_id = old_id;
      	gem_context_get_param(&p.param);
      
      	new_id = gem_context_create();
      
      	param.ctx_id = new_id;
      	gem_context_set_param(&p.param);
      
      	gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */
      
      with
      
      	struct create_ext_param p = {
      	  { .name = CONTEXT_CREATE_CLONE },
      	  .clone_id = old_id,
      	  .flags = CLONE_FLAGS_VM
      	}
      	new_id = gem_context_create_ext(&p);
      
      and not have to worry about stray namespace pollution etc.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk
      b81dde71
    • Chris Wilson's avatar
      drm/i915: Re-expose SINGLE_TIMELINE flags for context creation · 8319f44c
      Chris Wilson authored
      
      
      The SINGLE_TIMELINE flag can be used to create a context such that all
      engine instances within that context share a common timeline. This can
      be useful for mixing operations between real and virtual engines, or
      when using a composite context for a single client API context.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-4-chris@chris-wilson.co.uk
      8319f44c
    • Chris Wilson's avatar
      drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] · e620f7b3
      Chris Wilson authored
      
      
      Allow the user to specify a local engine index (as opposed to
      class:index) that they can use to refer to a preset engine inside the
      ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
      This will be useful for setting SSEU parameters on virtual engines that
      are local to the context and do not have a valid global class:instance
      lookup.
      
      Note that due to the ambiguity in using class:instance with
      ctx->engines[], if a user supplied engine map is active the user must
      specify the engine to alter by its index into the ctx->engines[].
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-3-chris@chris-wilson.co.uk
      e620f7b3
    • Chris Wilson's avatar
      drm/i915: Allow a context to define its set of engines · 976b55f0
      Chris Wilson authored
      
      
      Over the last few years, we have debated how to extend the user API to
      support an increase in the number of engines, that may be sparse and
      even be heterogeneous within a class (not all video decoders created
      equal). We settled on using (class, instance) tuples to identify a
      specific engine, with an API for the user to construct a map of engines
      to capabilities. Into this picture, we then add a challenge of virtual
      engines; one user engine that maps behind the scenes to any number of
      physical engines. To keep it general, we want the user to have full
      control over that mapping. To that end, we allow the user to constrain a
      context to define the set of engines that it can access, order fully
      controlled by the user via (class, instance). With such precise control
      in context setup, we can continue to use the existing execbuf uABI of
      specifying a single index; only now it doesn't automagically map onto
      the engines, it uses the user defined engine map from the context.
      
      v2: Fixup freeing of local on success of get_engines()
      v3: Allow empty engines[]
      v4: s/nengine/num_engines/
      v5: Replace 64 limit on num_engines with a note that execbuf is
      currently limited to only using the first 64 engines.
      v6: Actually use the engines_mutex to guard the ctx->engines.
      
      Testcase: igt/gem_ctx_engines
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk
      976b55f0
    • Chris Wilson's avatar
      drm/i915: Restore control over ppgtt for context creation ABI · 7f3f317a
      Chris Wilson authored
      
      
      Having hid the partially exposed new ABI from the PR, put it back again
      for completion of context recovery. A significant part of context
      recovery is the ability to reuse as much of the old context as is
      feasible (to avoid expensive reconstruction). The biggest chunk kept
      hidden at the moment is fine-control over the ctx->ppgtt (the GPU page
      tables and associated translation tables and kernel maps), so make
      control over the ctx->ppgtt explicit.
      
      This allows userspace to create and share virtual memory address spaces
      (within the limits of a single fd) between contexts they own, along with
      the ability to query the contexts for the vm state.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-1-chris@chris-wilson.co.uk
      7f3f317a
  2. May 20, 2019
    • Ville Syrjälä's avatar
      drm/i915: Bump gen7+ fb size limits to 16kx16k · 5c000fb3
      Ville Syrjälä authored
      
      
      With gtt remapping in place we can use arbitrarily large
      framebuffers. Let's bump the limits to 16kx16k on gen7+.
      The limit was chosen to match the maximum 2D surface size
      of the 3D engine.
      
      With the remapping we could easily go higher than that for the
      display engine. However the modesetting ddx will blindly assume
      it can handle whatever is reported via kms. The oversized
      buffer dimensions are not caught by glamor nor Mesa until
      finally an assert will trip when genxml attempts to pack the
      SURFACE_STATE. So we pick a safe limit to avoid the X server
      from crashing (or potentially misbehaving if the genxml asserts
      are compiled out).
      
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110187
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-9-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      5c000fb3
    • Ville Syrjälä's avatar
      drm/i915: Bump fb stride limit to 128KiB for gen4+ and 256KiB for gen7+ · 20330129
      Ville Syrjälä authored
      
      
      With gtt remapping plugged in we can simply raise the stride
      limit on gen4+. Let's just pick the limit to match the render
      engine max stride (256KiB on gen7+, 128KiB on gen4+).
      
      No remapping CCS because the virtual address of each page actually
      matters due to the new hash mode
      (WaCompressedResourceDisplayNewHashMode:skl,kbl etc.), and no
      remapping on gen2/3 due extra complications from fence alignment
      and gen2 2KiB GTT tile size. Also no real benefit since the
      display engine limits already match the other limits.
      
      v2: Rebase due to is_ccs_modifier()
      v3: Tweak the comment and commit msg
      v4: Fix gen4+ stride limit to be 128KiB
      
      Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v3
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-8-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      20330129
    • Ville Syrjälä's avatar
      drm/i915: Align dumb buffer stride to 4k to allow for gtt remapping · aa5ca8b7
      Ville Syrjälä authored
      
      
      Align dumb buffer stride to 4k if the fb will be big enough to
      require gtt remapping.
      
      v2: Leave the stride alone for buffers that look to be for the cursor
      v3: Make it not a hack (Daniel)
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-7-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      aa5ca8b7
    • Ville Syrjälä's avatar
      drm/i915: Overcome display engine stride limits via GTT remapping · 54d4d719
      Ville Syrjälä authored
      
      
      The display engine stride limits are getting in our way. On SKL+
      we are limited to 8k pixels, which is easily exceeded with three
      4k displays. To overcome this limitation we can remap the pages
      in the GTT to provide the display engine with a view of memory
      with a smaller stride.
      
      The code is mostly already there as We already play tricks with
      the plane surface address and x/y offsets.
      
      A few caveats apply:
      * linear buffers need the fb stride to be page aligned, as
        otherwise the remapped lines wouldn't start at the same
        spot
      * compressed buffers can't be remapped due to the new
        ccs hash mode causing the virtual address of the pages
        to affect the interpretation of the compressed data. IIRC
        the old hash was limited to the low 12 bits so if we were
        using that mode we could remap. As it stands we just refuse
        to remapp with compressed fbs.
      * no remapping gen2/3 as we'd need a fence for the remapped
        vma, which we currently don't have. Need to deal with the
        fence POT requirements, and do something about the gen2
        gtt page size vs tile size difference
      
      v2: Rebase due to is_ccs_modifier()
          Fix up the skl+ stride_mult mess
          memset() the gtt_view because otherwise we could leave
          junk in plane[1] when going from 2 plane to 1 plane format
      v3: intel_check_plane_stride() was split out
      v4: Drop the aligned viewport stuff, it was meant for ccs which
          can't be remapped anyway
      v5: Introduce intel_plane_can_remap()
          Reorder the code so that plane_state->view gets filled
          even for invisible planes, otherwise we'd keep using
          stale values and could explode during remapping. The new
          logic never remaps invisible planes since we don't have
          a viewport, and instead pins the full fb instead
      v6: Fix plane src coord checks after remapping by moving
          plane_state->base.src to the final plane x/y offsets.
          Allow intel_plane_check_stride() to fail even with
          remapping (can happen at least with a linear 64bpp
          fb with a 4k plane and a suitably inconvenient src
          coordinates).
          Improve aux plane FIXME (Daniel)
          Move some code shuffling into a separate patch (Daniel)
      
      Testcase: igt/kms_big_fb
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-6-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      54d4d719
    • Ville Syrjälä's avatar
      drm/i915: Shuffle stride checking code around · a88c40eb
      Ville Syrjälä authored
      
      
      Reorganize some fb stride checking code a bit to prepare for
      gtt remapping. And do a bit of s/pitch/stride/ renaming in the
      process for a bit more uniformity (apart from the whole
      fb->pitches[] thing).
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-5-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      a88c40eb
    • Ville Syrjälä's avatar
      drm/i915/selftests: Add live vma selftest · bb211c3d
      Ville Syrjälä authored
      
      
      Add a live selftest to excercise rotated/remapped vmas. We simply
      write through the rotated/remapped vma, and confirm that the data
      appears in the right page when read through the normal vma.
      
      Not sure what the fallout of making all rotated/remapped vmas
      mappable/fenceable would be, hence I just hacked it in the test.
      
      v2: Grab rpm reference (Chris)
          GEM_BUG_ON(view.type not as expected) (Chris)
          Allow CAN_FENCE for rotated/remapped vmas (Chris)
          Update intel_plane_uses_fence() to ask for a fence
          only for normal vmas on gen4+
      v3: Deal with intel_wakeref_t
      v4: Rebase
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-4-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      bb211c3d
    • Ville Syrjälä's avatar
      drm/i915/selftests: Add mock selftest for remapped vmas · e2e394bf
      Ville Syrjälä authored
      
      
      Extend the rotated vma mock selftest to cover remapped vmas as
      well.
      
      TODO: reindent the loops I guess? Left like this for now to
      ease review
      
      v2: Include the vma type in the error message (Chris)
      v3: Deal with trimmed sg
      v4: Drop leftover debugs
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-3-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      e2e394bf
    • Ville Syrjälä's avatar
      drm/i915: Add a new "remapped" gtt_view · 1a74fc0b
      Ville Syrjälä authored
      
      
      To overcome display engine stride limits we'll want to remap the
      pages in the GTT. To that end we need a new gtt_view type which
      is just like the "rotated" type except not rotated.
      
      v2: Use intel_remapped_plane_info base type
          s/unused/unused_mbz/ (Chris)
          Separate BUILD_BUG_ON()s (Chris)
          Use I915_GTT_PAGE_SIZE (Chris)
      v3: Use i915_gem_object_get_dma_address() (Chris)
          Trim the sg (Tvrtko)
      v4: Actually trim this time. Limit the max length
          to one row of pages to keep things simple
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-2-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      1a74fc0b
  3. May 17, 2019
    • Chris Wilson's avatar
      drm/i915/execlists: Drop promotion on unsubmit · 4cc79cbb
      Chris Wilson authored
      
      
      With the disappearance of NEWCLIENT, we no longer need to provide the
      priority boost on preemption in order to prevent repeated gazumping,
      and we can remove the dead code.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk
      4cc79cbb
    • Chris Wilson's avatar
      drm/i915: Downgrade NEWCLIENT to non-preemptive · 68fc728b
      Chris Wilson authored
      Commit 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") had the
      intended consequence of not allowing a sequence of work that merely
      crossed into a new engine the privilege to be promoted to NEWCLIENT
      status. It also had the unintended consequence of actually making
      NEWCLIENT effective on heavily oversubscribed transcode machines and
      impacting upon their throughput.
      
      If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
      those clients, using the NEWCLIENT boost that will be scheduled as
      
      	rcsA x 30, (rcsB, vcs) x 30
      
      where as before it would have been
      
      	(rcsA, rcsB, vcs) x 30
      
      That is with NEWCLIENT only boosting the first request of each client,
      we would execute all rcsA requests prior to running on the vcs engines;
      acruing a lot of dead time as compared to the previous case where the
      vcs engine would be started in parallel to processing the second client.
      
      The previous patch has the effect of delaying submission until it is
      required by a third party (either the user with an explicit wait, or by
      another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
      which has the effect of removing its preemptive grant and reducing it to
      the same level as any other user interaction -- that it will not be
      promoted above the interengine dependencies, and so preventing NEWCLIENTS
      from starving other engines. This a large nerf to the rrul properties of
      the current NEWCLIENT, but it still does give prioritised submission to
      new requests from light workloads.
      
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Fixes: 1413b2bc
      
       ("drm/i915: Trim NEWCLIENT boosting") # customer impact
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
      68fc728b
    • Chris Wilson's avatar
      drm/i915: Bump signaler priority on adding a waiter · 6e7eb7a8
      Chris Wilson authored
      
      
      The handling of the no-preemption priority level imposes the restriction
      that we need to maintain the implied ordering even though preemption is
      disabled. Otherwise we may end up with an AB-BA deadlock across multiple
      engine due to a real preemption event reordering the no-preemption
      WAITs. To resolve this issue we currently promote all requests to WAIT
      on unsubmission, however this interferes with the timeslicing
      requirement that we do not apply any implicit promotion that will defeat
      the round-robin timeslice list. (If we automatically promote the active
      request it will go back to the head of the queue and not the tail!)
      
      So we need implicit promotion to prevent reordering around semaphores
      where we are not allowed to preempt, and we must avoid implicit
      promotion on unsubmission. So instead of at unsubmit, if we apply that
      implicit promotion on adding the dependency, we avoid the semaphore
      deadlock and we also reduce the gains made by the promotion for user
      space waiting. Furthermore, by keeping the earlier dependencies at a
      higher level, we reduce the search space for timeslicing without
      altering runtime scheduling too badly (no dependencies at all will be
      assigned a higher priority for rrul).
      
      v2: Limit the bump to external edges (as originally intended) i.e.
      between contexts and out to the user.
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
      6e7eb7a8
    • Chris Wilson's avatar
      drm/i915/hdcp: Use both bits for device_count · af461ff3
      Chris Wilson authored
      
      
      Smatch spotted:
      drivers/gpu/drm/i915//intel_hdcp.c:1406 hdcp2_authenticate_repeater_topology() warn: should this be a bitwise op?
      
      and indeed looks to be suspect that we do need to use a bitwise or to
      combine the two register fields into one counter.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Reviewed-by: default avatarRamalingam C <ramalingam.c@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190517102225.3069-3-chris@chris-wilson.co.uk
      af461ff3
    • Chris Wilson's avatar
      drm/i915/dp: Initialise locals for static analysis · 96ac0813
      Chris Wilson authored
      
      
      Just to squelch an smatch warning that doesn't see the with_() being
      taken unconditionally:
      drivers/gpu/drm/i915//intel_dp.c:230 intel_dp_get_fia_supported_lane_count() error: uninitialized symbol 'lane_info'.
      drivers/gpu/drm/i915//intel_dp.c:5338 intel_digital_port_connected() error: uninitialized symbol 'is_connected'.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190517102225.3069-2-chris@chris-wilson.co.uk
      96ac0813
    • Chris Wilson's avatar
      drm/i915: Truly bump ready tasks ahead of busywaits · 17db337f
      Chris Wilson authored
      In commit b7404c7e ("drm/i915: Bump ready tasks ahead of
      busywaits"), I tried cutting a corner in order to not install a signal
      for each of our dependencies, and only listened to requests on which we
      were intending to busywait. The compromise that was made was that
      instead of then being able to promote the request with a full
      NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
      had cleared the semaphore chain, we settled for only using the NEWCLIENT
      boost. With an over saturated system with multiple NEWCLIENTS in flight
      at any time, this was found to be an inadequate promotion and left us
      with a much poorer scheduling order than prior to using semaphores.
      
      The outcome of this patch, is that all requests have NOSEMAPHORE
      priority when they have no dependencies and are ready to run and not
      busywait, restoring the pre-semaphore ordering on saturated systems.
      
      We can demonstrate the effect of poor scheduling order by oversaturating
      the system using gem_wsim on a system with multiple vcs engines
      (i.e running the same workloads across more clients than required for
      peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):
      
      x v5.1 (normalized)
      + tip
      * fix
      +------------------------------------------------------------------------+
      |                                                                    x   |
      |                                                                    x   |
      |                                                                    x   |
      |                                                                    x   |
      |                                                                   %x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |         +                                                        %#xx  |
      |         +                                                        %#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%##x  |
      |         +++                                                     %%##x  |
      |         +++                                                     %%##x  |
      |         +++                                                     %%##x  |
      |        ++++                                                     %%##x  |
      |        ++++                                                     %%##x  |
      |        ++++                                                     %%##xx |
      |        ++++                                                     %###xx |
      |        ++++                                                     %###xx |
      |        ++++                                                     %###xx |
      |        ++++                                                     %###xx |
      |        ++++ +                                                   %#O#xx |
      |        ++++ +                                                   %#O#xx |
      |        ++++++ +                                                 %#O#xx |
      |       ++++++++++                                                %OOOxxx|
      |       ++++++++++       +                                       %#OOO#xx|
      |     + ++++++++++++ ++ +++++    +                        ++    @@OOOO#xx|
      |                                                                   |A_| |
      ||__________M_______A____________________|                               |
      |                                                                 |A_|   |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       0.99456       1.00628      0.999985     1.0001545  0.0024387139
      + 120      0.873021       1.00037      0.884134    0.90148752   0.039190862
      Difference at 99.5% confidence
      	-0.098667 +/- 0.0110762
      	-9.86517% +/- 1.10745%
      	(Student's t, pooled s = 0.0277657)
      % 120      0.990207       1.00165     0.9970265    0.99699748     0.0021024
      Difference at 99.5% confidence
      	-0.003157 +/- 0.000908245
      	-0.315651% +/- 0.0908105%
      	(Student's t, pooled s = 0.00227678)
      
      Fixes: b7404c7e
      
       ("drm/i915: Bump ready tasks ahead of busywaits")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
      17db337f
    • Chris Wilson's avatar
      drm/i915: Mark semaphores as complete on unsubmit out if payload was started · dba5a7f3
      Chris Wilson authored
      Avoid charging us for the presumed busywait if the request was preempted
      after successfully using semaphores to reduce inter-engine latency.
      
      v2: Bump the priority to reflect the lack of semaphores now required.
      
      References: ca6e56f6
      
       ("drm/i915: Disable semaphore busywaits on saturated systems")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-1-chris@chris-wilson.co.uk
      dba5a7f3
  4. May 14, 2019
    • Imre Deak's avatar
      drm/i915: Assert that TypeC ports are not used for eDP · 4e309baf
      Imre Deak authored
      
      
      Add an assert that we don't use TypeC ports for eDP. That may in theory
      be possible on TypeC legacy ports, but I'm not sure if that's a
      practical scenario, so let's deal with that only if there's a use case.
      Adding support for that wouldn't be too difficult, since TypeC mode
      switching is not possible on TypeC legacy ports.
      
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-12-imre.deak@intel.com
      4e309baf
    • Imre Deak's avatar
      drm/i915: Avoid taking the PPS lock for non-eDP/VLV/CHV · b4c7ea63
      Imre Deak authored
      
      
      On ICL we have to make sure that we enable the AUX power domain in a
      controlled way (corresponding to the port's actual TypeC mode). Since
      the PPS lock - which takes an AUX power ref - is only needed on
      eDP on all platforms and eDP/DP on VLV/CHV avoid taking it in all
      other cases.
      
      v2:
      - Clarify commit log about the condition for taking the PPS lock.
        (Ville)
      
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-11-imre.deak@intel.com
      b4c7ea63
    • Imre Deak's avatar
      drm/i915: Replace use of PLLS power domain with DISPLAY_CORE domain · 08d8e170
      Imre Deak authored
      
      
      There isn't a separate power domain specific to PLLs. When programming
      them we require the same power domain to be enabled which is needed when
      accessing other display core parts (not specific to any
      pipe/port/transcoder). This corresponds to the DISPLAY_CORE domain added
      previously in this patchset, so use that instead to save bits in the
      power domain mask.
      
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-10-imre.deak@intel.com
      08d8e170
    • Imre Deak's avatar
      drm/i915: Remove the unneeded AUX power ref from intel_dp_hpd_pulse() · 6f08ebe7
      Imre Deak authored
      The power get/put was added in
      
      commit 1c767b33 ("drm/i915: take display port power domain in DP HPD handler")
      Author: Imre Deak <imre.deak@intel.com>
      Date:   Mon Aug 18 14:42:42 2014 +0300
      
      to account for the HW access in ibx_digital_port_connected(). This
      latter call was in turn removed in
      
      commit 7d23e3c3
      
       ("drm/i915: Cleaning up intel_dp_hpd_pulse")
      Author: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com>
      Date:   Wed Mar 30 18:05:23 2016 +0530
      
      after which we didn't actually need the power reference.
      
      One way we are accessing the HW during HPD pulse handling is via DP AUX
      transfers, but the transfer function takes its own reference, so doesn't
      need the reference in intel_dp_hpd_pulse().
      
      The other spot is in
      
      	intel_psr_short_pulse()->intel_psr_disable_locked()
      
      but that can only happen when the panel is enabled with the
      corresponding modeset already holding the required power reference.
      
      v2:
      - Remove the unneeded power get/put from intel_psr_disable_locked().
        (Ville)
      - Checkpatch commit quoting format fix in the commit log.
      
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: José Roberto de Souza <jose.souza@intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-9-imre.deak@intel.com
      6f08ebe7
    • Imre Deak's avatar
      drm/i915: Remove the unneeded AUX power ref from intel_dp_detect() · 6cfe7ec0
      Imre Deak authored
      
      
      We don't need the AUX power for the whole duration of the detect, only
      when we're doing AUX transfers. The AUX transfer function takes its own
      reference on the AUX power domain already. The two places during detect
      which access display core registers (not specific to a
      pipe/port/transcoder) only need the power domain that is required for
      that access. That power domain is equivalent to the device global power
      domain on most platforms (enabled whenever we hold a runtime PM
      reference) except on CHV/VLV where it's equivalent to the display power
      well.
      
      Add a new power domain that reflects the above, and use this at the two
      spots accessing registers. With that we can avoid taking the AUX
      reference for the whole duration of the detect function.
      
      Put the domains asynchronously to avoid the unneeded on-off-on toggling.
      
      Also adapt the idea from with_intel_runtime_pm et al. for making it easy
      to write short sequences where a display power ref is needed.
      
      v2: (Ville)
      - Add with_intel_display_power() helper to simplify things.
      - s/bool res/bool is_connected/
      
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-8-imre.deak@intel.com
      6cfe7ec0
    • Imre Deak's avatar
      drm/i915: WARN for eDP encoders in intel_dp_detect_dpcd() · ad5125d6
      Imre Deak authored
      
      
      We are not calling this function for eDP, so add an early assert about
      this for clarity.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-7-imre.deak@intel.com
      ad5125d6
    • Imre Deak's avatar
      drm/i915: Disable power asynchronously during DP AUX transfers · f39194a7
      Imre Deak authored
      
      
      In a follow-up patch we will restrict holding the reference on the AUX
      power domain to the AUX transfer function. To avoid the unnecessary
      on-off-on power togglings drop the reference asynchronously.
      
      There is no reason we couldn't do this in general and also put the
      reference asynchronously in pps_unlock(); but that's a separate change
      that can be done as a follow-up.
      
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-6-imre.deak@intel.com
      f39194a7
    • Imre Deak's avatar
      drm/i915: Add support for asynchronous display power disabling · e0da2d63
      Imre Deak authored
      
      
      By disabling a power domain asynchronously we can restrict holding a
      reference on that power domain to the actual code sequence that
      requires the power to be on for the HW access it's doing, by also
      avoiding unneeded on-off-on togglings of the power domain (since the
      disabling happens with a delay).
      
      One benefit is potential power saving due to the following two reasons:
      1. The fact that we will now be holding the reference only for the
         necessary duration by the end of the patchset. While simply not
         delaying the disabling has the same benefit, it has the problem that
         frequent on-off-on power switching has its own power cost (see the 2.
         point below) and the debug trace for power well on/off events will
         cause a lot of dmesg spam (see details about this further below).
      2. Avoiding the power cost of freuqent on-off-on power switching. This
         requires us to find the optimal disabling delay based on the measured
         power cost of on->off and off->on switching of each power well vs.
         the power of keeping the given power well on.
      
         In this patchset I'm not providing this optimal delay for two
         reasons:
         a) I don't have the means yet to perform the measurement (with high
            enough signal-to-noise ratio, or with the help of an energy
            counter that takes switching into account). I'm currently looking
            for a way to measure this.
      
         b) Before reducing the disabling delay we need an alternative way for
            debug tracing powerwell on/off events. Simply avoiding/throttling
            the debug messages is not a solution, see further below.
      
         Note that even in the case where we can't measure any considerable
         power cost of frequent on-off switching of powerwells, it still would
         make sense to do the disabling asynchronously (with 0 delay) to avoid
         blocking on the disabling. On VLV I measured this disabling time
         overhead to be 1ms on average with a worst case of 4ms.
      
      In the case of the AUX power domains on ICL we would also need to keep
      the sequence where we hold the power reference short, the way it would
      be by the end of this patchset where we hold it only for the actual AUX
      transfer. Anything else would make the locking we need for ICL TypeC
      ports (whenever we hold a reference on any AUX power domain) rather
      problematic, adding for instance unnecessary lockdep dependencies to
      the required TypeC port lock.
      
      I chose the disabling delay to be 100msec for now to avoid the unneeded
      toggling (and so not to introduce dmesg spamming) in the DP MST sideband
      signaling code. We could optimize this delay later, once we have the
      means to measure the switching power cost (see above).
      
      Note that simply removing/throttling the debug tracing for power well
      on/off events is not a solution. We need to know the exact spots of
      these events and cannot rely only on incorrect register accesses caught
      (due to not holding a wakeref at the time of access). Incorrect
      powerwell enabling/disabling could lead to other problems, for instance
      we need to keep certain powerwells enabled for the duration of modesets
      and AUX transfers.
      
      v2:
      - Clarify the commit log parts about power cost measurement and the
        problem of simply removing/throttling debug tracing. (Chris)
      - Optimize out local wakeref vars at intel_runtime_pm_put_raw() and
        intel_display_power_put_async() call sites if
        CONFIG_DRM_I915_DEBUG_RUNTIME_PM=n. (Chris)
      - Rebased on v2 of the wakeref w/o power-on guarantee patch.
      - Add missing docbook headers.
      v3:
      - Checkpatch spelling/missing-empty-line fix.
      v4:
      - Fix unintended local wakeref var optimization when using
        call-arguments with side-effects, by using inline funcs instead of
        macros. In this patch in particular this will fix the
        intel_display_power_grab_async_put_ref()->intel_runtime_pm_put_raw()
        call).
      
        No size change in practice (would be the same disregarding the
        corresponding change in intel_display_power_grab_async_put_ref()):
        $ size i915-macro.ko
           text	   data	    bss	    dec	    hex	filename
        2455190	 105890	  10272	2571352	 273c58	i915-macro.ko
        $ size i915-inline.ko
           text	   data	    bss	    dec	    hex	filename
        2455195	 105890	  10272	2571357	 273c5d	i915-inline.ko
      
        Kudos to Stan for reporting the raw-wakeref WARNs this issue caused. His
        config has CONFIG_DRM_I915_DEBUG_RUNTIME_PM=n, which I didn't retest
        after v1, and we are also not testing this config in CI.
      
        Now tested both with CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y/n on ICL,
        connecting both Chamelium and regular DP, HDMI sinks.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
      Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190513192533.12586-1-imre.deak@intel.com
      e0da2d63
    • Imre Deak's avatar
      drm/i915: Verify power domains state during suspend in all cases · ee70080a
      Imre Deak authored
      
      
      There is no reason why we couldn't verify the power domains state during
      suspend in all cases, so do that. I overlooked this when originally
      adding the check.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-4-imre.deak@intel.com
      ee70080a
    • Imre Deak's avatar
      drm/i915: Force printing wakeref tacking during pm_cleanup · dbf99c1f
      Imre Deak authored
      
      
      Make sure we print and drop the wakeref tracking info during pm_cleanup
      even if there are wakeref holders (either raw-wakeref or wakelock
      holders). Dropping the wakeref tracking means that a late put on the ref
      will WARN since the wakeref will be unknown, but that is rightly so,
      since the put is late and we want to catch that case.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-3-imre.deak@intel.com
      dbf99c1f
    • Imre Deak's avatar
      drm/i915: Add support for tracking wakerefs w/o power-on guarantee · 4547c255
      Imre Deak authored
      
      
      It's useful to track runtime PM refs that don't guarantee a device
      power-on state to the rest of the driver. One such case is holding a
      reference that will be put asynchronously, during which normal users
      without their own reference shouldn't access the HW. A follow-up patch
      will add support for disabling display power domains asynchronously
      which needs this.
      
      For this we can split wakeref_count into a low half-word tracking
      all references (raw-wakerefs) and a high half-word tracking
      references guaranteeing a power-on state (wakelocks).
      
      Follow-up patches will make use of the API added here.
      
      While at it add the missing docbook header for the unchecked
      display-power and runtime_pm put functions.
      
      No functional changes, except for printing leaked raw-wakerefs
      and wakelocks separately in intel_runtime_pm_cleanup().
      
      v2:
      - Track raw wakerefs/wakelocks in the low/high half-word of
        wakeref_count, instead of adding a new counter. (Chris)
      v3:
      - Add a struct_member(T, m) helper instead of open-coding it. (Chris)
      - Checkpatch indentation formatting fix.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-2-imre.deak@intel.com
      4547c255
    • Imre Deak's avatar
      drm/i915/icl: More workaround for port F detection due to broken VBTs · 1aa37508
      Imre Deak authored
      
      
      Add another ICL-Y PCIID that proved to have only 5 ports to the
      corresponding PCIID list.
      
      Meanwhile I'm trying to get a complete list of all PCIIDs with less than
      6 ports and/or get a VBT fix to mark these ports non-existent, but until
      then the only way is to go one-by-one.
      
      This fixes the following error on machines with less than 6 port:
      
      	[drm:intel_power_well_enable [i915]] enabling AUX F
      	------------[ cut here ]------------
      	WARN_ON(intel_wait_for_register(&dev_priv->uncore, regs->driver, (0x1 << ((pw_idx) * 2)), (0x1 << ((pw_idx) * 2)), 1))
      
      (Internal reference: BSpec/Index/20584/Issues, HSD/1306084116)
      
      Cc: Mika Kahola <mika.kahola@intel.com>
      References: https://bugs.freedesktop.org/show_bug.cgi?id=108915
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarMika Kahola <mika.kahola@intel.com>
      Tested-by: default avatarMika Kahola <mika.kahola@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190510140255.25215-1-imre.deak@intel.com
      1aa37508