git.monstr.eu Git - linux-2.6-microblaze.git/log

drm/i915: Eliminate lots of iterations over the execobjects array

The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.

Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.

The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.

The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.

v2: Add a Theory of Operation spiel.
v3: Fall back to slow relocations in preparation for flushing userptrs.
v4: Document struct members, factor out eb_validate_vma(), add a few
more comments to explain some magic and hide other magic behind macros.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations

If we write a relocation into the buffer, we require our own implicit
synchronisation added after the start of the execbuf, outside of the
user's control. As we may end up clflushing, or doing the patch itself
on the GPU, asynchronously we need to look at the implicit serialisation
on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
object.

If the user does trigger a stall for relocations, we make sure the stall
is complete enough so that the batch is not submitted before we complete
those relocations.

Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit fencing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Pass vma to relocate entry

We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.

Note we have only been using a single bit to track flushing since

commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

unconditionally flushed all render caches before the breadcrumb and

commit 6ac42f4148bc27e5ffd18a9ab0eac57f58822af4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Jul 21 12:25:01 2012 +0200

    drm/i915: Replace the complex flushing logic with simple invalidate/flush all

did away with the explicit GPU domain tracking. This was then codified
into the ABI with NO_RELOC in

commit ed5982e6ce5f106abcbf071f80730db344a6da42
Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer!
Date:   Thu Jan 17 22:23:36 2013 +0100

    drm/i915: Allow userspace to hint that the relocations were known

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Store a direct lookup from object handle to vma

The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Fix retrieval of hangcheck stats

The default context is always supported (as it contains the global
hangcheck stats) and the contexts for hangcheck are not limited
to any ring.

This was dropped in 2013 because it was supposed to have been included
with Ben's full-ppgtt patch set. It never landed and the bug remains.

References: https://bugs.freedesktop.org/show_bug.cgi?id=65845
Link: http://patchwork.freedesktop.org/patch/msgid/1372175222-27622-1-git-send-email-mika.kuoppala@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616132849.29597-1-chris@chris-wilson.co.uk

drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty

For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.

Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.

v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Mark CPU cache as dirty on every transition for CPU writes

Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.

The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).

v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.

v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.

Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make i915_vma_destroy() static

i915_vma_destroy() is now not used outside of i915_vma.c so we can
remove the export and make the function static.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616123508.12673-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

drm/i915: Actually attach the tv_format property to the SDVO connector

Attach the tv_format property to the SDVO connector instead of passing
a '0' in place of the pointer to the property. This got broken when
the SDVO connector properties were converted to atomic.

We can thank sparse for catching this:
drivers/gpu/drm/i915/intel_sdvo.c:2742:75: warning: Using plain integer as NULL pointer

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: 630d30a4ee27 ("drm/i915: Convert intel_sdvo connector properties to atomic.")
Link: http://patchwork.freedesktop.org/patch/msgid/20170615172308.10121-1-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Merge tag 'gvt-next-2017-06-08' of https://github.com/01org/gvt-linux into drm-intel-next-queued

gvt-next-2017-06-08

First gvt-next pull for 4.13:
- optimization for per-VM mmio save/restore (Changbin)
- optimization for mmio hash table (Changbin)
- scheduler optimization with event (Ping)
- vGPU reset refinement (Fred)
- other misc refactor and cleanups, etc.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608093547.bjgs436e3iokrzdm@zhen-hp.sh.intel.com

Revert "drm/i915/skl: New ddb allocation algorithm"

This reverts commit bb9d85f6e9de8fef5236c076530eab67a2f2431b.

New ddb allocation algorithm is a show stopper on my SKL system.

Besides not be able to get external DP 4k@60 (through USB type C),
It fully hang my screen when unplugging the USB type C.

Bugzilla: https://patchwork.freedesktop.org/patch/161571/
Fixes: bb9d85f6e9de ("drm/i915/skl: New ddb allocation algorithm")
Cc: Mahesh Kumar <mahesh1.kumar@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497376350-3400-1-git-send-email-rodrigo.vivi@intel.com

drm/i915/glk: Add cold boot sequence for GLK DSI

As per BSEPC, if device ready bit is '0' in enable IO sequence
then its a cold boot/reset scenario eg: S3/S4 resume. If cold boot
scenario detected in enable IO, then prepare port immediately.
In normal boot scenario, prepare port after glk_dsi_device_ready().
Without cold boot sequence enabled, features like S3/S4 doesn't work.

Signed-off-by: Madhav Chauhan <madhav.chauhan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497340095-5877-2-git-send-email-madhav.chauhan@intel.com

drm/i915/glk: Split GLK DSI device ready functionality

This patch divides glk_dsi_device_ready() function into
two part. First part will program LP wake and MIPI DSI mode
to MIPI_CTRL reg using newly defined function glk_dsi_enable_io().
glk_dsi_enable_io() will be called from intel_dsi_pre_enable.
Second part will do remaining device ready activities using
the existing function glk_dsi_device_ready().

Signed-off-by: Madhav Chauhan <madhav.chauhan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497340095-5877-1-git-send-email-madhav.chauhan@intel.com

drm/i915: Don't enable backlight at setup time.

Maarten and Ville noticed that we are enabling backlight via DP aux very
early in the modeset_init path via the intel_dp_aux_setup_backlight()
function, since commit e7156c833903 ("drm/i915: Add Backlight Control using
DPCD for eDP connectors (v9)"). Looks like all we need to do during
_setup_backlight() is read the current brightness state instead of
modifying it.

v2: Rewrote commit message.

Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Yetunde Adebisi <yetundex.adebisi@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Puthikorn Voravootivat <puthik@chromium.org>
Fixes: e7156c833903 ("drm/i915: Add Backlight Control using DPCD for eDP connectors (v9)")
Link: http://patchwork.freedesktop.org/patch/msgid/1497384239-2965-1-git-send-email-dhinakaran.pandiyan@intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915/cnl: make function cnl_ddi_dp_set_dpll_hw_state static

The function cnl_ddi_dp_set_dpll_hw_state does not need to be in global
scope, so make it static.

Cleans up sparse warning:
"symbol 'cnl_ddi_dp_set_dpll_hw_state' was not declared. Should it
be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170613134751.29196-1-colin.king@canonical.com
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Remove pipe A quirk remnants

With 830 the only thing needing pipe quirks, we can just drop the quirk
defines and replace the checks with IS_I830() checks.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-8-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Drop pipe A quirk for Thinkapd T60

The pipe A force quirk shouldn't needed except on 830. So let's nuke it
for the IBM Thinkpad T60 945 machines. This quirk pre-dates
KMS so it's usefulness is doubtful at best now.

The original bug report [1] describes the symptoms as "system hang on
closing T60 panel lid", and we already dropped a similar quirk for
another 945 machine in
commit 736a69ca8c99 ("drm/i915: Drop PIPE-A quirk for 945GSE HP Mini")
so I'm hopeful we can drop this one as well.

The quirk was added into xf86-video-intel in
commit 08903abe4dc0 ("Add pipe a force enable quirk for Lenovo T60")

[1] https://bugs.freedesktop.org/show_bug.cgi?id=16494

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-7-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Drop pipe A quirk for Toshiba Protege R205-S209

The pipe A force quirk shouldn't needed except on 830. So let's nuke it
for the Toshiba Protege R-205/S-209 945 machines. This quirk pre-dates
KMS so it's usefulness is doubtful at best now.

Unfortunately the original bug report [1] isn't very helpful since it
doesn't describe the symptoms. And the commit message in xf86-video-intel
commit ecdb5963ef68 ("Add pipe A force enable quirk for Toshiba Portege R205-S209")
is not much help either.

However, if we assume the problem was the typical "closing the lid
hangs the box" type of thing, we already nuked the quirk for another
945 machine in
commit 736a69ca8c99 ("drm/i915: Drop PIPE-A quirk for 945GSE HP Mini")
and so I hope we can drop this one as well.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=14944

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-6-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Add i830 "pipes power well"

830 more or less requires both pipes and DPLLs to remain on as long
as either pipe is needed. However, when neither pipe is actually needed,
we can save a bit of power by turning everything off. To do that we add
a new "power well" that turns both pipes and DPLLs on and off in the
right order. Seems to save ~50mW on my Fujitsu-Siemens Lifebook S6010.

This also avoids having to abuse the load detection to force pipe A on
at init time. That was never very robust, and it only worked for one
pipe, whereas 830 really needs both pipes enabled. As a bonus the 830
pipe quirk is now a bit more isolated from the rest of the mode setting
infrastructure, which should mean that it's much less likely someone
will accidentally break it in the future. The extra cost is of course
slight code duplication, but that seems like a worthwile tradeoff here.

v2; s/BIT/BIT_ULL/

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-5-ville.syrjala@linux.intel.com
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Use a loop for the "three times for luck" DPLL procedure

The magic "enable the DPLL three times" sequence feels like it
deserves a loop.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-4-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Plumb the correct acquire ctx into intel_crtc_disable_noatomic()

If intel_crtc_disable_noatomic() were to ever get called during resume
we'd end up deadlocking since resume has its own acqcuire_ctx but
intel_crtc_disable_noatomic() still tries to use the
mode_config.acquire_ctx. Pass down the correct acquire ctx from the top.

Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e2d ("drm/i915: Use atomic helpers for suspend, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-3-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Fix deadlock witha the pipe A quirk during resume

Pass down the correct acquire context to the pipe A quirk load detect
hack during display resume. Avoids deadlocking the entire thing.

Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e2d ("drm/i915: Use atomic helpers for suspend, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-2-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Split vma exec_link/evict_link

Currently the vma has one link member that is used for both holding its
place in the execbuf reservation list, and in any eviction list. This
dual property is quite tricky and error prone.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-3-chris@chris-wilson.co.uk

drm/i915: Use vma->exec_entry as our double-entry placeholder

This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-2-chris@chris-wilson.co.uk

drm/i915: Amalgamate execbuffer parameter structures

Combine the two slightly overlapping parameter structures we pass around
the execbuffer routines into one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-1-chris@chris-wilson.co.uk

drm/i915/perf: add GLK support

Add OA support for Geminilake (pretty much identical to Broxton), and
also add the associated OA configurations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170613112309.4088-2-lionel.g.landwerlin@intel.com

drm/i915/perf: add KBL support

Add OA support for Kabylake (pretty much identical to Skylake), and
also add the associated OA configurations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: add KBL GT2/GT3 check macros

Add macros to detect GT2/GT3 skus so we can apply the proper OA
configuration later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915/perf: remove perf.hook_lock

In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.

dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.

The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915/perf: per-gen timebase for checking sample freq

An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.

Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.

v2:
Specify frequency of 19.2MHz for BXT (Ville)
Initialize oa_sample_rate_hard_limit per-gen too (Lionel)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915/perf: Add more OA configs for BDW, CHV, SKL + BXT

These are auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

https://github.com/rib/gputop
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py

$ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915/perf: Add OA unit support for Gen 8+

Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
    lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
    context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
    (Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
    batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
    batchbuffer programming the OA unit doesn't actually work)
    (Lionel)
    Pin context before updating context image (Chris)
    Drop MMIO programming now that we switch to a kernel context with
    right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
     configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
     than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
     user contexts and reconfiguring the kernel context through the
     batchbuffer and updating all the other contexts' context image.
     Also take care to lock slice/subslice configuration when OA is
     on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
     (Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
     configuration changes in a later patch (Lionel)
     Remove usleep after programming the NOA configs on Gen8+, this
     doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

v16: Add missing i915_add_request() in case we fail to emit OA
     configuration (Matthew)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915/perf: Add 'render basic' Gen8+ OA unit configs

Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:

https://github.com/rib/gputop
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py

$ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic

v2: add newlines to debug messages + fix comment (Matthew Auld)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915/perf: rework mux configurations queries

Gen8+ might have mux configurations per slices/subslices. Depending on
whether slices/subslices have been fused off, only part of the
configuration needs to be applied. This change reworks the mux
configurations query mechanism to allow more than one set of registers
to be programmed.

v2: s/n_mux_regs/n_mux_configs/ (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: expose _SUBSLICE_MASK GETPARM

Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices can be enabled. This information is
required, for example, to be able to analyse some OA counter reports
where the counter configuration depends on the HW sub slice
configuration.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: expose _SLICE_MASK GETPARM

Enables userspace to determine the maximum number of slices that can
be enabled on the device and also know what specific slices can be
enabled. This information is required, for example, to be able to
analyse some OA counter reports where the counter configuration
depends on the HW slice configuration.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: Reinstate reservation_object zapping for batch_pool objects

I removed the zapping of the reservation_object->fence array of shared
fences prematurely. We don't yet have the code to zap that array when
retiring the object, and so currently it remains possible to continually
grow the shared array trapping requests when reusing the batch_pool
object across many timelines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170518094638.5469-4-chris@chris-wilson.co.uk

drm/i915: Spin for struct_mutex inside shrinker

Having resolved whether or not we would deadlock upon a call to
mutex_lock(&dev->struct_mutex), we can then spin for the contended
struct_mutex if we are not the owner. We cannot afford to simply block
and wait for the mutex, as the owner may itself be waiting for the
allocator -- i.e. a cyclic deadlock. This should significantly improve
the chance of running the shrinker for other processes whilst the GPU is
busy.

A more balanced approach would be to optimistically spin whilst the
mutex owner was on the cpu and there was an opportunity to acquire the
mutex for ourselves quickly. However, that requires support from
kernel/locking/ and a new mutex_spin_trylock() primitive.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Only restrict noreclaim in the early shrink passes

In our first pass, we do not want to use reclaim at all as we want to
solely reap the i915 buffer caches (its purgeable pages). But we don't
mind it initiates IO or pulls via the FS (but it shouldn't anyway as we
say no to reclaim!). Just drop the GFP_IO constraint for simplicity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Remove __GFP_NORETRY from our buffer allocator

I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim of our dirty buffers and relies on
reclaim via kswapd. As a result, a single pass of direct reclaim is
unreliable when i915 occupies the majority of available memory, and the
only means of effectively waiting on kswapd to amke progress is by not
setting the __GFP_NORETRY flag and lopping. That leaves us with the
dilemma of invoking the oomkiller instead of propagating the allocation
failure back to userspace where it can be handled more gracefully (one
hopes). In the future we may have __GFP_MAYFAIL to allow repeats up until
we genuinely run out of memory and the oomkiller would have been invoked.
Until then, let the oomkiller wreck havoc.

v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
v3: Update comments that direct reclaim only appears to be ignoring our
dirty buffers!

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Encourage our shrinker more when our shmemfs allocations fails

Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.

v2: Be a little more cautious in paging out gfx buffers, and leave that
to a more balanced approach from shrink_slab(). Important when combined
with "drm/i915: Start writeback from the shrinker" as anything shrunk is
immediately swapped out and so should be more conservative.

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk

drm/i915/cfl: Basic DDI plumbing for Coffee Lake.

All here is pretty much like Kabylake.

Including CFL-U has to use same ddi translation table
as KBL-U for now.

v2: Include missed IS_COFFEELAKE on edp trans table. (DK)
    Handle CFL-U with same translation table as KBL-U. (DK and
    confirmed with HW engineers)

v3: Adding missed case for IS_CFL_ULT. (DK).

v4: Duh! Now with the real IS_CFL_ULT instead of KBL one. (DK)
    Also use IS_GEN9_BC when possible. (DK)

Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497045770-21302-1-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Enable wrpll computation for CNL

Enable wrpll computation for Cannonlake platform to support
pll's required for HDMI output. The patch contains the following features

- compute Cannonlake port clock programming
  dividers P, Q, and K.
- compute PLL parameters for Cannonlake. These parameters
  set the values on DPLL registers.
- find the register values to program wrpll for Cannonlake.
  The reference clock can be either 19.2MHz or 24MHz.

v2: rebase
v3: squash wrpll patches into one (Rodrigo)
v4: switch order of getting even dividers (Paulo)
    update divider register values for PDiv and KDiv (Paulo)
    update wrpll computation algorithm (Paulo)
v5: Remove ref clock division by 1000. (Rodrigo)
v6: Rodrigo rebasing on top of latest code.

Signed-off-by: Kahola, Mika <mika.kahola@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-18-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: LSPCON support is gen9+

There is no platform specific change needed for LSPCON
support on Cannonlake. So let's make it gen9+.

Cc: Shashank Sharma <shashank.sharma@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-17-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Enable fifo underrun for Cannonlake.

Also in a way that reuse bdw+ for all next platforms.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-16-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Fix Cannonlake scaler mode programing.

As Geminilake scalers Cannonlake also don't need and don't have
the "high quality" mode programming.

Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-15-git-send-email-rodrigo.vivi@intel.com

drm/i915: Use HAS_CSR instead of gen number on DMC load.

Since we have HAS_CSR tied to the platform definition
let's use this instead of checking per platform.

One less thing to worry when adding support to new platforms.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Animesh Manna<animesh.manna@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-14-git-send-email-rodrigo.vivi@intel.com

drm/i915/DMC/CNL: Load DMC on CNL

This patch loads the DMC on CNL.The firmware version
is 1.04.

v2: (Rodrigo) Remove MODULE_FIRMWARE.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-13-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Enable loadgen_select bit for vswing sequence

vswing programming sequence step 2 requires the Loadgen_select bit to
be set in PORT_TX_DW4 lane reigsters per table defined by Bit rate and
lane width. Implemented the change that was marked as FIXME in the
driver.

v2: (Rodrigo) checkpatch fixes.

Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-12-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Implement voltage swing sequence.

This is an important part of the DDI initalization as well as
for changing the voltage during DisplayPort link training.

This new sequence for Cannonlake is more like Broxton style
but still with different registers, different table and
different steps.

v2: Do not write to DW4_GRP to avoid overwrite individual loadgen.
    Fix PORT_CL_DW5 SUS Clock Config set.
v3: As previous platforms use only eDP table if low voltage was
    requested.
v4: fix Werror:maybe uninitialized (Paulo)
v5: Rebase on top of dw2_swing_sel changes
    on previous patches.
v6: Using flexible SCALING_MODE_SEL(x).

Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-11-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Add DDI Buffer translation tables for Cannonlake.

These tables are used on voltage wswing sequence initialization
on Cannonlake.

It is a complete new format now in use by the voltage swing team,
not following any other standard in use by any other platform.
Also the registers are different as well. So let's redefine
the translation table for Cannonlake.

The table is huge. So we minimized with the fields that are
different or might be different anytime soon. The common
values will be hardcoded on the voltage swing sequence.

v2: Merge the lower and the upper bits to match the spec table
    and make review easier. This was possible with the good
    idea for Manasi with a better way to handle it on the bit
    macro definition presented on previous patch.
    Credits-to: Manasi

Cc: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-10-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Add registers related to voltage swing sequences.

This are the registers and bits needed for the voltage swing
sequence on Cannonlake.

v2: Remove CL_DW5 that was wrongly defined.
v3: Use (1 << 1) instead of (1<<1) as Paulo suggested
    Change DW2 swing sel upper and lower macros to do the
    bit selection instead of definint a table that doesn't
    match the spec. It is based on a Manasi version of it.
    Credits-to: Manasi.
v4: Let SCALING_MODE_SEL flexible. (Manasi)

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-9-git-send-email-rodrigo.vivi@intel.com

drm/i915: Add MMIO helper for 6 ports with different offsets.

Also new registers can have different mmio offsets
per different lane per port.

v2: Use _PICK as PORT3 instead of creating a new
    macro with if per port.
v3: Use _PICK directly on MMIO_PORT6. While MMIO_PORT
    isn't flexible enough let's continue with MMIO_PORT6
    as we have MMIO_PORT3.

Cc: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-8-git-send-email-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915/cnl: Initialize PLLs

Although CNL follows PLL initialization more like Skylake
than Broxton we have a completely different initialization
sequence and registers used.

One big difference from SKL is that CDCLK PLL is now
exclusive (ADPLL) and for DDIs and MIPI we need to use
DFGPLLs 0, 1 or 2.

v2: Accept all Ander's suggestions and fixes:
    - Registers and bits names prefix
    - Group pll functions
    - bits masks fixes
    - remove read and modify on cfgcr1
    - fix cfgcr0 setup
v3: Set SSC_ENABLE for DP.
    Fix HDMI_MODE cfgcr0.
    Avoid touch cfgcr0 on DP.
    Add missed else on dpll_mgr definition so we use cnl one, not hsw.
v3: Centra freq should be always set to default and change bits
    definitions to (1 << 1) instead of (1<<1). (by Paulo)
v4: Rebased.

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kahola, Mika <mika.kahola@intel.com>
Reviewed-by: Ander Conselvan De Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-7-git-send-email-rodrigo.vivi@intel.com

drm/i915: Configure DPLL's for Cannonlake

DPLL's are defined in DPCLKA_CFGCR0 register (0x6C200). Let's use these
definitions when computing dpll's for ddi ports.

v2: (Rodrigo) Remove register that was defined in another patch with
fixed name and more bits.

Signed-off-by: Kahola, Mika <mika.kahola@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-6-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: DDI - PLL mapping

One of the steps for PLL (un)initialization is to (un)map
the correspondent DDI that is actually using that PLL.

So, let's do this step following the places already stablished
and used so far, although spec put this as part of PLL
initialization sequences.

v2: Use proper prefix on bits names as suggested by Ander.
v3: Add missed "~". Without that the logic was inverted
    so we were disabling interrupts.
    Credits-to: Clinton
    Credits-to: Art
v4: Spec is getting updated to do DDI -> PLL mapping
    and clock on in 2 separated reg writes. (Paulo)
    Also update bits definitions to use space
    (1 << 1) instead of (1<<1). (Paulo)

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Art Runyan <arthur.j.runyan@intel.com>
Cc: Clint Taylor <clinton.a.taylor@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kahola, Mika <mika.kahola@intel.com>
Cc: Ander Conselvan De Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Kahola, Mika <mika.kahola@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-5-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Allow dynamic cdclk changes on CNL

All the low level cdclk bits are present, so let's add the required
hooks to reconfigure cdclk on the fly.

Cannonlake also needs to adjust the minimal pixel rate
as gen9 platforms. Specially for the Azalia audio case.

v2: Rebase due to cnl_sanitize_cdclk()
v3: Rebased by Rodrigo on top of Ville's cdclk rework.
v4: Rebase moving cnl_calc_cdclk up to follow same order
    as previous platforms.
v2: Squash drm/i915/cnl: Adjust min pixel rate. to address
    the current limitation where CDCLK cannot be set to 168MHz
    if audio is used with 96MHz. (Imre)
v3: adjust some of the clock limits within
    bdw_adjust_min_pipe_pixel_rate. (Ville/DK/Imre).
    Fix commit message messed by squash.

Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Sanyog Kale <sanyog.r.kale@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-4-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Implement CNL display init/unit sequence

Implement the CNL display init/uninit sequence as outlined in Bspec.

Quite similar to SKL/BXT. The main complicaiton is probably the extra
procmon setup we must do based on the process/voltage information we
can read out from some register.

v2: s/skl_dbuf/gen9_dbuf/ to follow upstream
    bxt needed a cdclk sanitize step, so let's add it for cnl too
v3: s/CHICKEN_MISC_1/CHICKEN_MISC_2/ (Ander)
v4: Rebased by Rodrigo after Ville's cdclk rework
v5: Removed unecessary Aux IO forced enable/disable, Fix DW10 setup
    Fix procpon Mask. (Credits-to Paulo and Clint)
    Remove A0 workaround.
v6: Rebased on top of recent code (Rodrigo).
v7: Respect the order of sanitize_ after set_
    (Done by Rodrigo, Requested by Ville)
v8: Commit message updated to matvh v5 changes besides
    Remove unused DW8 and an extra blank line. (all noticed
    by Imre).
v9: Remove __attribute__((unused)) added on latest version
    of drm/i915/cnl: Implement .set_cdclk() for CNL.

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Clint Taylor <clinton.a.taylor@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-3-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Implement .set_cdclk() for CNL

Add support for changing the cdclk frequency on CNL. Again, quite
similar to BXT, but there are some annoying differences which means
trying to share more code might not be feasible:
* PLL ratio now lives in the PLL enable register
* pcode came from SKL, not from BXT

We support three cdclk frequencies: 168,336,528 Mhz. The first two
use the same PLL frequency, the last one uses a different one meaning
we once again may need to toggle the PLL off and on when changing
cdclk.

v2: Rebased by Rodrigo on top of Ville's cdclk rework.
v3: Respect order of set_ bellow get_ (Ville)
v4: Added __attribute__((unused)) to avoid broken compilation with Werror.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-2-git-send-email-rodrigo.vivi@intel.com

drm/i915/cnl: Implement .get_display_clock_speed() for CNL

Add support for reading out the cdclk frequency from the hardware on
CNL. Very similar to BXT, with a few new twists and turns:
* the PLL is now called CDCLK PLL, not DE PLL
* reference clock can be 24 MHz in addition to the 19.2 MHz BXT had
* the ratio now lives in the PLL enable register
* Only 1x and 2x CD2X dividers are supported

v2: Deal with PLL lock bit the same way as BXT/SKL do now
v3: DSSM refclk indicator is bit 31 not 24 (Ander)
v4: Rebased by Rodrigo after Ville's cdclk rework.
v5: Set cdclk to the ref clock as previous platforms. (Imre)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-1-git-send-email-rodrigo.vivi@intel.com

drm/i915: Pass atomic state to backlight enable/disable/set callbacks.

Pass crtc_state to the enable callback, and connector_state to all callbacks.
This will eliminate the need to guess for the correct pipe in these
callbacks.

The crtc state is required for pch_enable_backlight to obtain the correct
cpu_transcoder.

intel_dp_aux_backlight's setup function is called before hw readout, so
crtc_state and connector_state->best_encoder are NULL in the enable()
and set() callbacks.

This fixes the following series of warns from intel_get_pipe_from_connector:
[  219.968428] ------------[ cut here ]------------
[  219.968481] WARNING: CPU: 3 PID: 2457 at
drivers/gpu/drm/i915/intel_display.c:13881
intel_get_pipe_from_connector+0x62/0x90 [i915]
[  219.968483]
WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex))
[  219.968485] Modules linked in: nls_iso8859_1 snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec
snd_hda_core snd_hwdep snd_pcm intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel snd_seq_midi snd_seq_midi_event kvm snd_rawmidi irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_seq
snd_seq_device serio_raw snd_timer aesni_intel aes_x86_64 crypto_simd
glue_helper cryptd lpc_ich snd mei_me shpchp soundcore mei rfkill_gpio
mac_hid intel_pmc_ipc parport_pc ppdev lp parport ip_tables x_tables
autofs4 hid_generic usbhid igb ahci i915 xhci_pci dca xhci_hcd ptp
sdhci_pci sdhci libahci pps_core i2c_hid hid video
[  219.968573] CPU: 3 PID: 2457 Comm: kworker/u8:3 Tainted: G        W
4.10.0-tip-201703010159+ #2
[  219.968575] Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS
APLKRVPA.X64.0144.B10.1606270006 06/27/2016
[  219.968627] Workqueue: events_unbound intel_atomic_commit_work [i915]
[  219.968629] Call Trace:
[  219.968640]  dump_stack+0x63/0x87
[  219.968646]  __warn+0xd1/0xf0
[  219.968651]  warn_slowpath_fmt+0x4f/0x60
[  219.968657]  ? drm_printk+0x97/0xa0
[  219.968708]  intel_get_pipe_from_connector+0x62/0x90 [i915]
[  219.968756]  intel_panel_enable_backlight+0x19/0xf0 [i915]
[  219.968804]  intel_edp_backlight_on.part.22+0x33/0x40 [i915]
[  219.968852]  intel_edp_backlight_on+0x18/0x20 [i915]
[  219.968900]  intel_enable_ddi+0x94/0xc0 [i915]
[  219.968950]  intel_encoders_enable.isra.93+0x77/0x90 [i915]
[  219.969000]  haswell_crtc_enable+0x310/0x7f0 [i915]
[  219.969051]  intel_update_crtc+0x58/0x100 [i915]
[  219.969101]  skl_update_crtcs+0x218/0x240 [i915]
[  219.969153]  intel_atomic_commit_tail+0x350/0x1000 [i915]
[  219.969159]  ? vtime_account_idle+0xe/0x50
[  219.969164]  ? finish_task_switch+0x107/0x250
[  219.969214]  intel_atomic_commit_work+0x12/0x20 [i915]
[  219.969219]  process_one_work+0x153/0x3f0
[  219.969223]  worker_thread+0x12b/0x4b0
[  219.969227]  kthread+0x101/0x140
[  219.969230]  ? rescuer_thread+0x340/0x340
[  219.969233]  ? kthread_park+0x90/0x90
[  219.969237]  ? do_syscall_64+0x6e/0x180
[  219.969243]  ret_from_fork+0x2c/0x40
[  219.969246] ---[ end trace 0a8fa19387b9ad6d ]---

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100022
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170612102115.23665-4-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Pass connector state to intel_panel_set_backlight_acpi

Passing the state is also needed to convert the backlight functions
to use the correct state instead of looking it up.

This is done as a separate commit to allow easier bisecting.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100022
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170612102115.23665-3-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Pass crtc_state and connector state to backlight enable/disable functions

The backlight functions need to determine the pipe and the transcoder the
backlight will be enabled on, so pass crtc_state instead of trying to
dereference the state without holding locks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100022
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170612102115.23665-2-maarten.lankhorst@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Fix GVT-g PVINFO version compatibility check

Current it's strictly checked if PVINFO version matches 1.0
for GVT-g i915 guest which doesn't help for compatibility at
all and forces GVT-g host can't extend PVINFO easily with version
bump for real compatibility check.

This fixes that to check minimal required PVINFO version instead.

v2:
- drop unneeded version macro
- use only major version for sanity check

v3:
- fix up PVInfo value with kernel type
- one indent fix

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609074805.5101-1-zhenyuw@linux.intel.com

drm/i915/cfl: Coffee Lake reuses Kabylake DMC.

both platforms. We haven't recieved any separated release
specifically for Coffee Lake so let's just re-use what
is already there for Kabylake.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497038550-30910-1-git-send-email-rodrigo.vivi@intel.com

drm/i915/huc: Load HuC on Coffee Lake

Coffee Lake reuses Kabylake's HUC firmware.

v2: Change Coffeelake to Coffee Lake

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496965704-23610-2-git-send-email-anusha.srivatsa@intel.com

drm/i915/guc: Load GuC on Coffee Lake

Coffee Lake reuses Kabylake's GuC.

v2: Change Coffeelake to Coffee Lake

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496965704-23610-1-git-send-email-anusha.srivatsa@intel.com

drm/i915/cfl: Add Coffee Lake PCI IDs for U Sku.

Add PCI Ids for U Skus of Coffeelake.

v2: Use intel_coffeelake_gt3_info, in accordance to-
Rodrigo's patch:

v3: rebased

v3: Remove unused INTEL_CFL_IDS(Rodrigo).

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496965267-21725-3-git-send-email-anusha.srivatsa@intel.com

drm/i915/cfl: Add Coffee Lake PCI IDs for H Sku.

Add PCI Ids for H Sku by following the BSpec.

v2: Remove unused INTEL_CFL_IDS.(Rodrigo).
v3: Add missing IDs(Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496965267-21725-2-git-send-email-anusha.srivatsa@intel.com

drm/i915/cfl: Add Coffee Lake PCI IDs for S Skus.

Add PCI Ids for S Sku following the BSpec.

v2: Remove the unused INTEL_CFL_IDS.(Rodrigo)
v3: Add missing IDs(Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496965267-21725-1-git-send-email-anusha.srivatsa@intel.com

drm/i915/glk: Remove the alpha_support flag

Geminilake is now included in CI, making it part of the pre-merge
criteria. The support should be in good enough shape, so let's remove
the alpha_support flag.

Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608114800.17201-1-ander.conselvan.de.oliveira@intel.com

drm/i915/cfl: Introduce Display workarounds for Coffee Lake.

The whole Display engine for Coffee Lake is pretty much
identical to the Kabylake. For this reason let's reuse
all display related production workardounds here even though
CFL is not explicit listed at Display workarounds page at Spec.

v2: moved intel_pm.c chunck to this patch in order to address
all display related w/a in a single place.

Cc: Arthur Runyan <arthur.j.runyan@intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496937000-8450-3-git-send-email-rodrigo.vivi@intel.com

drm/i915/cfl: Coffee Lake uses CNP PCH.

So let's force it on the virtual detection.

Also it is still the only silicon for now on this PCH,
so WARN otherwise.

v2: Rebased on top of Cannonlake and added the missed
debug message as pointed by DK.

Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496937000-8450-2-git-send-email-rodrigo.vivi@intel.com

drm/i915/cfl: Introduce Coffee Lake platform definition.

Coffee Lake is a Intel® Processor containing Intel® HD Graphics
following Kabylake.

It is Gen9 graphics based platform on top of CNP PCH.

Let's start by adding the platform definition based on previous
platforms but yet as preliminary_hw_support.

On following patches we will start adding PCI IDs and the
platform specific changes.

v2: Also add BS2 ring that is present on GT3. As on KBL, according
    spec: "GT3 also has additional media blocks with second instance
    of VEBox and VDBox each", i.e. BSD2 ring in our case. Noticed
    when reviewing PCI ID patches.

v3: CFL_PLATFORM instead for CFL_FEATURES because it contains
    Platform information and no new features when compared to
    BDW_FEATURES definition.

v4: Rebased on top of Cannonlake patches.

Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496937000-8450-1-git-send-email-rodrigo.vivi@intel.com

drm/i915: Remove the spin-request during execbuf await_request

Originally we would enable and disable the breadcrumb interrupt
immediately on demand. This was slow enough to have a large impact
(>30%) on tasks that hopped between engines. However, by using a shadow
to keep the irq alive for an extra interrupt (see commit 67b807a89230
("drm/i915: Delay disabling the user interrupt for breadcrumbs")) and
by recently reducing the cost in adding ourselves to the signal tree, we
no longer need to spin-request during await_request to avoid delays in
throughput tests. Without the earlier patches to stop the wakeup when
signaling if the irq was already active, we saw no improvement in
execbuf overhead (and corresponding contention in other clients) despite
the removal of the spinner in a simple test like glxgears. This means
there will be scenarios where now we spend longer enabling the interrupt
than we would have spent spinning, but these are not likely to have as
noticeable an impact as the high frequency test cases (where there
should not be any regression).

Ulterior motive: generalising the engine->sync_to to handle different
types of semaphores and non-semaphores.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-4-chris@chris-wilson.co.uk

drm/i915: Skip adding the request to the signal tree is complete

Enabling the interrupt for the signaler takes a finite amount of time (a
few microseconds) during which it is possible for the request to
complete. Check afterwards and skip adding the request to the signal
rbtree if it complete.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-3-chris@chris-wilson.co.uk

drm/i915: Report back whether the irq was armed when adding the waiter

The important condition that we need to check after enabling the
interrupt for signaling is whether the request completed in the process
(and so we missed that interrupt). A large cost in enabling the
signaling (rather than waiters) is in waking up the auxiliary signaling
thread, but we only need to do so to catch that missed interrupt. If we
know we didn't miss any interrupts (because we didn't arm the interrupt)
then we can skip waking the auxiliary thread.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-2-chris@chris-wilson.co.uk

drm/i915: Check signaled state after enabling signaling

Setting up the irq to signal the request completion takes a finite
amount of time, during which it is possible that the request already
completed. Check afterwards, just in case, so that we can respond
immediately.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-1-chris@chris-wilson.co.uk

drm/i915/guc: Clear enable_guc_loading in case of init failure

And prevent calling i915_ggtt_disable_guc twice (the first when GuC init
failed, and the second time during driver unload / intel_uc_fini_hw),
and hitting the GEM_BUG_ON.

v2: Clear enable_guc_loading unconditionally (Michal)
Make sure guc_free_load_err_log is still called (Daniele)
Don't shoot the messenger (Chris)

Fixes: 3950bf3dbff10 ("drm/i915/guc: Add onion teardown to the GuC
setup")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170605171251.9905-1-michel.thierry@intel.com

drm/i915: Move the unclaimed mmio detection into the powerwell for KMS

Replace the large comment about requiring the powerwell for
intel_uncore_arm_unclaimed_mmio_detection() by moving the arming of the
mmio error detection into the powerwell held for modesetting. Thereby
also accomplishing the goal of only arming the mmio detection after a
full modeset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504115508.13571-1-chris@chris-wilson.co.uk
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915/gvt: Refine virtual reset function

during the emulation of virtual reset:
1. only reset the engine related mmio ending with MMIO
offset Master_IRQ, not include display stuff.

2. fences are not required to set default
value as well to prevent screen flicking.

this will fix the issue of Guest screen hang while running
Force tdr in Linux guest.

v2:
- only reset the engine related mmio. (Zhenyu & Zhiyuan)
v3:
- IMR/Ring mode registers are not save/restored. (Changbin)
v4:
- redefine the MMIO reset offset for easy understanding. (Zhenyu)
- pvinfo can be reset. (Zhenyu)
v5:
- add more comments for mmio reset. (Zhenyu)

Cc: Changbin Du <changbin.du@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Lv zhiyuan <zhiyuan.lv@intel.com>
Cc: Zhang Yulei <yulei.zhang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Fix GDRST vreg state after reset

Emulating the GDRST read behavior correctly to ack the
guest reset request.

v2:
- split the original patch into two:
GDRST read handler and virtual gpu reset. (Zhenyu)
v3:
- emulate the GDRST read right after write. (Zhenyu)

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhang Yulei <yulei.zhang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Tuning the size of MMIO hash lookup table to 2048

On Skylake platform, The traced virtual mmio registers are up to 2039.
So tuning the hash table size to improve lookup performance.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Add helper for tuning MMIO hash table

We count all the tracked virtual MMIO registers, which can help us to
tune the MMIO hash table.

v2: Move num_tracked_mmio into gvt structure.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Make the MMIO attribute wrappers be inline

Function calls are expensive. I have see obvious overhead call to
these wrappers in perf data, especially from the cmd parser side.
So make these simple wrappers be inline to kill them all.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Make mmio_attribute as type u8 to save 1.5MB memory

Type u8 is big enough to contain all MMIO attribute flags. As the
total MMIO size is 2MB so we saved 1.5MB memory.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Cleanup struct intel_gvt_mmio_info

The size, length, addr_mask fields actually are not necessary. Every
tracked mmio has DWORD size, and addr_mask is a legacy field.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks

Some of traced MMIO registers are a large continuous section. These
stuffed the MMIO lookup hash table and so waste lots of memory and
get much lower lookup performance.

Here we picked out these sections by special handling. These sections
include:
  o Display pipe registers, total 768.
  o The PVINFO page, total 1024.
  o MCHBAR_MIRROR, total 65536.
  o CSR_MMIO, total 3072.

So we removed 70,400 items from the hash table, and speed up guest
boot time by ~500ms.

v2:
  o add a local function find_mmio_block().
  o fix comments.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: add gtt_invalidate API to flush the GTT TLB

add gtt_invalidate API to handle the GTT TLB flush instead of
hiding in write_pte64 function. This can avoid overkill when using
write_pte64

Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Add runtime_pm get/put to proctect MMIO accessing

In some cases, GVT-g is accessing MMIO without holding runtime_pm
and this patch can add the inline API for doing the runtime_pm get/put
to make sure when accessing HW MMIO the i915 HW is really powered on.

Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: remove redundant -Wall

This flag is already set in the top level Makefile of the kernel.

Also, by having set CONFIG_DRM_I915_GVT, thereby appending -Wall to
ccflags, you undo all the -Wno-* cflags previously set in the Make
variable KBUILD_CFLAGS.

For example:

cc foo.c -Wall -Wno-format -Wall

resets -Wformat.

Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Legacy HSW related MMIO handler clean up

remove all the legacy pre-BDW mmio handlers and the corresponding
usage/definition since pre-BDW platforms are not supported in GVT
environment.

v2:
- clean up all the left dirty code before BDW, e.g
all D_HSW usage and itself, D_IVB, D_PRE_BDW. (Zhenyu)
v3:
- change is based on gvt-staging. (Zhenyu)

Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Trigger scheduling after context complete

The time based scheduler poll context busy status at every
micro-second during vGPU switch, it will make GPU idle for a while
when the context is very small and completed before the next
micro-second arrival. Trigger scheduling immediately after context
complete will eliminate GPU idle and improve performance.

Create two vGPU with same type, run Heaven simultaneously:
Before this patch:
+---------+----------+----------+
|         |  vGPU1   |   vGPU2  |
+---------+----------+----------+
|  Heaven |  357     |    354   |
+-------------------------------+

After this patch:
+---------+----------+----------+
|         |  vGPU1   |   vGPU2  |
+---------+----------+----------+
|  Heaven |  397     |    398   |
+-------------------------------+

v2: Let need_reschedule protect by gvt-lock.

Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Support event based scheduling

This patch decouple the time slice calculation and scheduler, let
other event be able to trigger scheduling without impact the
calculation for QoS.

v2: add only one new enum definition.
v3: fix typo.

Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Delete gvt_dbg_cmd() in cmd_parser_exec()

Since cmd message have been recorded in trace, gvt_dbg_cmd isn't
necessary. This will reduce much of dmesg as gvt_dbg_cmd is repeated
on each workload.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: Change flood gvt dmesg into trace

Currently gvt dmesg is so heavy at drm.debug=0x2 that guest and
host almost couldn't run on xengt.

This patch transfer these repeated messages into trace, so dmesg
is light at drm.debug=0x2, and user could get the target message through
trace event and trace filter.

Suggested-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: clean up the unused last_ctx_submit_time of struct intel_vgpu

Clean up it as it is not used now.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: add RING_INSTDONE and SC_INSTDONE mmio handler in GVT-g

kernel hangcheck needs to check RING_INSTDONE and SC_INSTDONE registers'
state to know if hardware is still running. In GVT-g environment, we need
to emulate these registers changing for all the guests although they are
not render owner. Here we return the physical state for all the guests,
then if INSTDONE is changing guest can know hardware is still running
although its workload is pending.

Read INSTDONE isn't one correct way to know if guest trigger gfx reset,
especially with Linux guest, it will read ACTH first, then check INSTDONE
and SUBSLICE registers to check if hardware is still running, at last
trigger gfx reset when it finds all the registers is frozen. In Windows
guest, read INSTDONE usually happens when OS detect TDR.

With the difference between Windows and Linux guest, "disable_warn_untrack"
may let debug log run into wrong state(Linux guest trigger hangcheck
with no ACTHD changed, then check INSTDONE), but actually there is no TDR
happened.

The new policy is always WARN with untrack MMIO r/w. Bad effect is many
noisy untrack mmio warning logs exist when real TDR happen. Even so you can
control the log output or not by setting the debug mask bit.

v2: remove log in instdone_mmio_read

Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: implement per-vm mmio switching optimization

Commit ab9da627906a ("drm/i915: make context status notifier head be
per engine") gives us a chance to inspect every single request. Then
we can eliminate unnecessary mmio switching for same vGPU. We only
need mmio switching for different VMs (including host).

This patch introduced a new general API intel_gvt_switch_mmio() to
replace the old intel_gvt_load/restore_render_mmio(). This function
can be further optimized for vGPU to vGPU switching.

To support individual ring switch, we track the owner who occupy
each ring. When another VM or host request a ring we do the mmio
context switching. Otherwise no need to switch the ring.

This optimization is very useful if only one guest has plenty of
workloads and the host is mostly idle. The best case is no mmio
switching will happen.

v2:
o fix missing ring switch issue. (chuanxiao)
o support individual ring switch.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915/gvt: refactor function intel_vgpu_submit_execlist

The function intel_vgpu_submit_execlist could be more simpler. It
actually does:
  1) validate the submission. The first context must be valid,
     and all two must be privilege_access.
  2) submit valid contexts. The first one need emulate schedule_in.

We do not need a bitmap, valid desc copy valid_desc. Local variable
emulate_schedule_in also can be optimized out.

v2: dump desc content in err msg (Zhi Wang)

Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>