From 3d7a64b992eab27be75c6cf734be4e87fc88e4b5 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon, 4 Feb 2019 08:41:05 +0000
Subject: drm/i915: Allow normal clients to always preempt idle priority
 clients
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When first enabling preemption, we hesitated from making it a free-for-all
where every higher priority client would force a preempt-to-idle cycle
and take over from all lower priority clients. We hesitated because we
were uncertain just how well preemption would work in practice, whether
the preemption latency itself would detract from the latency gains for
higher priority tasks and whether it would work at all. Since
introducing preemption, we have been enabling it for more common tasks,
even giving normal clients a small preemptive boost when they first
start (to aide fairness and improve interactivity). Now lets take one
step further and give permission for all normal (priority:0) clients to
preempt any idle (priority:<0) task so that users running long compute
jobs do not overly impact other jobs (i.e. their desktop) and the system
remains responsive under such idle loads.

References: f6322eddaff7 ("drm/i915/preemption: Allow preemption between submission ports")
References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com>
Cc: "Stead, Alan" <alan.stead@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204084116.3013-1-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 34d0a148e664..1398eb81dee6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -592,7 +592,20 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 
 static inline bool __execlists_need_preempt(int prio, int last)
 {
-	return prio > max(0, last);
+	/*
+	 * Allow preemption of low -> normal -> high, but we do
+	 * not allow low priority tasks to preempt other low priority
+	 * tasks under the impression that latency for low priority
+	 * tasks does not matter (as much as background throughput),
+	 * so kiss.
+	 *
+	 * More naturally we would write
+	 *	prio >= max(0, last);
+	 * except that we wish to prevent triggering preemption at the same
+	 * priority level: the task that is running should remain running
+	 * to preserve FIFO ordering of dependencies.
+	 */
+	return prio > max(I915_PRIORITY_NORMAL - 1, last);
 }
 
 static inline void
-- 
cgit v1.2.3


From 1413b2bc0717036a5a653eef20cc3ae4cc66501a Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon, 4 Feb 2019 15:01:01 +0000
Subject: drm/i915: Trim NEWCLIENT boosting

Limit the NEWCLIENT boost to only give its small priority boost to fresh
clients only that have no dependencies.

The idea for using NEWCLIENT boosting, commit b16c765122f9 ("drm/i915:
Priority boost for new clients"), is that short-lived streams are often
interactive and require lower latency -- and that by executing those
ahead of the long running hogs, the short-lived clients do little to
interfere with the system throughput by virtue of their short-lived
nature. However, we were only considering the client's own timeline for
determining whether or not it was a fresh stream. This allowed for
compositors to wake up before their vblank and bump all of its client
streams. However, in testing with media-bench this results in chaining
all cooperating contexts together preventing us from being able to
reorder contexts to reduce bubbles (pipeline stalls), overall increasing
latency, and reducing system throughput. The exact opposite of our
intent. The compromise of applying the NEWCLIENT boost to strictly fresh
clients (that do not wait upon anything else) should maintain the
"real-time response under load" characteristics of FQ_CODEL, without
locking together the long chains of dependencies across the system.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204150101.30759-1-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9ed5baf157a3..9383a9fb4893 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -968,7 +968,7 @@ void i915_request_add(struct i915_request *request)
 		 * Allow interactive/synchronous clients to jump ahead of
 		 * the bulk clients. (FQ_CODEL)
 		 */
-		if (!prev || i915_request_completed(prev))
+		if (list_empty(&request->sched.signalers_list))
 			attr.priority |= I915_PRIORITY_NEWCLIENT;
 
 		engine->schedule(request, &attr);
-- 
cgit v1.2.3


From 87f1ef225242d40f99e0ddf0f22020dc2b73601b Mon Sep 17 00:00:00 2001
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Date: Tue, 5 Feb 2019 09:50:28 +0000
Subject: drm/i915: Record the sseu configuration per-context & engine

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

v5: More to_intel_context() (Tvrtko)
    Switch intel_sseu from union to struct (Tvrtko)
    Move context default sseu in existing loop (Chris)

v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)

Tvrtko Ursulin:

v7:
 * Pass intel_sseu by pointer instead of value to make_rpcs.
 * Rebase for make_rpcs changes.

v8:
 * Rebase for RPCS edit on pin.

v9:
 * Rebase for context image setup changes.

v10:
 * Rename dev_priv to i915. (Chris Wilson)

v11:
 * Rebase.

v12:
 * Rebase for IS_GEN changes.

v13:
 * Rebase for RUNTIME_INFO.

v14:
 * Rebase for intel_context_init.

v15:
 * Rebase for drm-tip changes.

v16:
 * Moved struct intel_sseu definition to i915_gem_context.h.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-1-tvrtko.ursulin@linux.intel.com
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_context.h | 14 ++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 31 ++++++++++++++++---------------
 4 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 534e52e3a8da..45b837bc8f9e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3305,6 +3305,20 @@ mkwrite_device_info(struct drm_i915_private *dev_priv)
 	return (struct intel_device_info *)INTEL_INFO(dev_priv);
 }
 
+static inline struct intel_sseu
+intel_device_default_sseu(struct drm_i915_private *i915)
+{
+	const struct sseu_dev_info *sseu = &RUNTIME_INFO(i915)->sseu;
+	struct intel_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 /* modesetting */
 extern void intel_modeset_init_hw(struct drm_device *dev);
 extern int intel_modeset_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6faf1f6faab5..d3887c27c3ba 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -330,6 +330,9 @@ intel_context_init(struct intel_context *ce,
 
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
+
+	/* Use the whole device by default */
+	ce->sseu = intel_device_default_sseu(ctx->i915);
 }
 
 static struct i915_gem_context *
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 6ba40ff6b91f..919f6f0a0f7a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -31,6 +31,7 @@
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -53,6 +54,16 @@ struct intel_context_ops {
 	void (*destroy)(struct intel_context *ce);
 };
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+struct intel_sseu {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 min_eus_per_subslice;
+	u8 max_eus_per_subslice;
+};
+
 /**
  * struct i915_gem_context - client state
  *
@@ -173,6 +184,9 @@ struct i915_gem_context {
 		int pin_count;
 
 		const struct intel_context_ops *ops;
+
+		/** sseu: Control eu/slice partitioning */
+		struct intel_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a9eb0211ce77..7682f1e71d55 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1266,7 +1266,8 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 	return i915_vma_pin(vma, 0, 0, flags);
 }
 
-static u32 make_rpcs(struct drm_i915_private *dev_priv);
+static u32
+make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
 
 static void
 __execlists_update_reg_state(struct intel_engine_cs *engine,
@@ -1281,7 +1282,8 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
 
 	/* RPCS */
 	if (engine->class == RENDER_CLASS)
-		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(engine->i915);
+		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(engine->i915,
+							  &ce->sseu);
 }
 
 static struct intel_context *
@@ -2432,18 +2434,19 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 }
 
 static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu)
 {
-	bool subslice_pg = RUNTIME_INFO(dev_priv)->sseu.has_subslice_pg;
-	u8 slices = hweight8(RUNTIME_INFO(dev_priv)->sseu.slice_mask);
-	u8 subslices = hweight8(RUNTIME_INFO(dev_priv)->sseu.subslice_mask[0]);
+	const struct sseu_dev_info *sseu = &RUNTIME_INFO(i915)->sseu;
+	bool subslice_pg = sseu->has_subslice_pg;
+	u8 slices = hweight8(ctx_sseu->slice_mask);
+	u8 subslices = hweight8(ctx_sseu->subslice_mask);
 	u32 rpcs = 0;
 
 	/*
 	 * No explicit RPCS request is needed to ensure full
 	 * slice/subslice/EU enablement prior to Gen9.
 	*/
-	if (INTEL_GEN(dev_priv) < 9)
+	if (INTEL_GEN(i915) < 9)
 		return 0;
 
 	/*
@@ -2471,7 +2474,7 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * subslices are enabled, or a count between one and four on the first
 	 * slice.
 	 */
-	if (IS_GEN(dev_priv, 11) && slices == 1 && subslices >= 4) {
+	if (IS_GEN(i915, 11) && slices == 1 && subslices >= 4) {
 		GEM_BUG_ON(subslices & 1);
 
 		subslice_pg = false;
@@ -2484,10 +2487,10 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (RUNTIME_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		u32 mask, val = slices;
 
-		if (INTEL_GEN(dev_priv) >= 11) {
+		if (INTEL_GEN(i915) >= 11) {
 			mask = GEN11_RPCS_S_CNT_MASK;
 			val <<= GEN11_RPCS_S_CNT_SHIFT;
 		} else {
@@ -2512,18 +2515,16 @@ make_rpcs(struct drm_i915_private *dev_priv)
 		rpcs |= GEN8_RPCS_ENABLE | GEN8_RPCS_SS_CNT_ENABLE | val;
 	}
 
-	if (RUNTIME_INFO(dev_priv)->sseu.has_eu_pg) {
+	if (sseu->has_eu_pg) {
 		u32 val;
 
-		val = RUNTIME_INFO(dev_priv)->sseu.eu_per_subslice <<
-		      GEN8_RPCS_EU_MIN_SHIFT;
+		val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
 		val &= GEN8_RPCS_EU_MIN_MASK;
 
 		rpcs |= val;
 
-		val = RUNTIME_INFO(dev_priv)->sseu.eu_per_subslice <<
-		      GEN8_RPCS_EU_MAX_SHIFT;
+		val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
 		val &= GEN8_RPCS_EU_MAX_MASK;
 
-- 
cgit v1.2.3


From ec431eae8fc51c1b4555ec5f566e081804207657 Mon Sep 17 00:00:00 2001
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Date: Tue, 5 Feb 2019 09:50:29 +0000
Subject: drm/i915/perf: lock powergating configuration to default when active

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active.

On most platforms there are no issues with this apart from a performance
penality for some media workloads that benefit from running on a partially
powergated GPU. We already prevent RC6 from affecting the programming so
it doesn't sound completely unreasonable to hold on powergating for the
same reason.

On Icelake however there would a functional problem if the slices not-
containing the VME block were left enabled with a running media workload
which explicitly disabled them. To avoid a GPU hang in this case, on
Icelake we lock the enablement to only slices which contain VME blocks.
Downside is that it means degraded GPU performance when OA is active but
there is no known alternative solution for this.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
    More to_intel_context() (Tvrtko)
    s/dev_priv/i915/ (Tvrtko)

Tvrtko Ursulin:

v4:
 * Rebase for make_rpcs changes.

v5:
 * Apply OA restriction from make_rpcs directly.

v6:
 * Rebase for context image setup changes.

v7:
 * Move stream assignment before metric enable.

v8-9:
 * Rebase.

v10:
 * Squashed with ICL support patch.

Bspec: 21140
Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-2-tvrtko.ursulin@linux.intel.com
---
 drivers/gpu/drm/i915/i915_perf.c | 13 ++++++++----
 drivers/gpu/drm/i915/intel_lrc.c | 46 ++++++++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_lrc.h |  2 ++
 3 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 727118301f91..9ebf99f3d8d3 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
+
+	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(dev_priv,
+			       &to_intel_context(ctx,
+						 dev_priv->engine[RCS])->sseu));
 }
 
 /*
@@ -2098,21 +2103,21 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	if (ret)
 		goto err_lock;
 
+	stream->ops = &i915_oa_stream_ops;
+	dev_priv->perf.oa.exclusive_stream = stream;
+
 	ret = dev_priv->perf.oa.ops.enable_metric_set(stream);
 	if (ret) {
 		DRM_DEBUG("Unable to enable metric set\n");
 		goto err_enable;
 	}
 
-	stream->ops = &i915_oa_stream_ops;
-
-	dev_priv->perf.oa.exclusive_stream = stream;
-
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	return 0;
 
 err_enable:
+	dev_priv->perf.oa.exclusive_stream = NULL;
 	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7682f1e71d55..d99c462e2c09 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1266,9 +1266,6 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 	return i915_vma_pin(vma, 0, 0, flags);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
-
 static void
 __execlists_update_reg_state(struct intel_engine_cs *engine,
 			     struct intel_context *ce)
@@ -1282,8 +1279,8 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
 
 	/* RPCS */
 	if (engine->class == RENDER_CLASS)
-		regs[CTX_R_PWR_CLK_STATE + 1] = make_rpcs(engine->i915,
-							  &ce->sseu);
+		regs[CTX_R_PWR_CLK_STATE + 1] = gen8_make_rpcs(engine->i915,
+							       &ce->sseu);
 }
 
 static struct intel_context *
@@ -2433,13 +2430,12 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu)
+u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *req_sseu)
 {
 	const struct sseu_dev_info *sseu = &RUNTIME_INFO(i915)->sseu;
 	bool subslice_pg = sseu->has_subslice_pg;
-	u8 slices = hweight8(ctx_sseu->slice_mask);
-	u8 subslices = hweight8(ctx_sseu->subslice_mask);
+	struct intel_sseu ctx_sseu;
+	u8 slices, subslices;
 	u32 rpcs = 0;
 
 	/*
@@ -2449,6 +2445,34 @@ make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu)
 	if (INTEL_GEN(i915) < 9)
 		return 0;
 
+	/*
+	 * If i915/perf is active, we want a stable powergating configuration
+	 * on the system.
+	 *
+	 * We could choose full enablement, but on ICL we know there are use
+	 * cases which disable slices for functional, apart for performance
+	 * reasons. So in this case we select a known stable subset.
+	 */
+	if (!i915->perf.oa.exclusive_stream) {
+		ctx_sseu = *req_sseu;
+	} else {
+		ctx_sseu = intel_device_default_sseu(i915);
+
+		if (IS_GEN(i915, 11)) {
+			/*
+			 * We only need subslice count so it doesn't matter
+			 * which ones we select - just turn off low bits in the
+			 * amount of half of all available subslices per slice.
+			 */
+			ctx_sseu.subslice_mask =
+				~(~0 << (hweight8(ctx_sseu.subslice_mask) / 2));
+			ctx_sseu.slice_mask = 0x1;
+		}
+	}
+
+	slices = hweight8(ctx_sseu.slice_mask);
+	subslices = hweight8(ctx_sseu.subslice_mask);
+
 	/*
 	 * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
 	 * wide and Icelake has up to eight subslices, specfial programming is
@@ -2518,13 +2542,13 @@ make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu)
 	if (sseu->has_eu_pg) {
 		u32 val;
 
-		val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
+		val = ctx_sseu.min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
 		val &= GEN8_RPCS_EU_MIN_MASK;
 
 		rpcs |= val;
 
-		val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
+		val = ctx_sseu.max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
 		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
 		val &= GEN8_RPCS_EU_MAX_MASK;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 3d86c27c6b32..f1aec8a6986f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,4 +112,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							const char *prefix),
 				   unsigned int max);
 
+u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
+
 #endif /* _INTEL_LRC_H_ */
-- 
cgit v1.2.3


From 7810858412a0ab8b8ebb97d301dd601808968c88 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue, 5 Feb 2019 09:50:30 +0000
Subject: drm/i915: Add timeline barrier support

Timeline barrier allows serialization between different timelines.

After calling i915_timeline_set_barrier with a request, all following
submissions on this timeline will be set up as depending on this request,
or barrier. Once the barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

v2:
 * Assert barrier has been retired on timeline_fini. (Chris Wilson)
 * Fix mock_timeline.

v3:
 * Improved comment language. (Chris Wilson)

v4:
 * Maintain ordering with previous barriers set on the timeline.

v5:
 * Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-3-tvrtko.ursulin@linux.intel.com
---
 drivers/gpu/drm/i915/i915_request.c            | 17 +++++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.c           | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.h           | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/selftests/mock_timeline.c |  1 +
 4 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9383a9fb4893..6512630b59b8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -526,6 +526,19 @@ out:
 	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
 }
 
+static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
+static int add_timeline_barrier(struct i915_request *rq)
+{
+	return add_barrier(rq, &rq->timeline->barrier);
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -668,6 +681,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 */
 	rq->head = rq->ring->emit;
 
+	ret = add_timeline_barrier(rq);
+	if (ret)
+		goto err_unwind;
+
 	ret = engine->request_alloc(rq);
 	if (ret)
 		goto err_unwind;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 5ea3af393ffe..dcff3ae96683 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -163,6 +163,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	spin_lock_init(&timeline->lock);
 
+	init_request_active(&timeline->barrier, NULL);
 	init_request_active(&timeline->last_request, NULL);
 	INIT_LIST_HEAD(&timeline->requests);
 
@@ -235,6 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
+	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
@@ -309,6 +311,25 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 	__i915_vma_unpin(tl->hwsp_ggtt);
 }
 
+int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
+{
+	struct i915_request *old;
+	int err;
+
+	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+
+	/* Must maintain ordering wrt existing barriers */
+	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
+	if (old) {
+		err = i915_request_await_dma_fence(rq, &old->fence);
+		if (err)
+			return err;
+	}
+
+	i915_gem_active_set(&tl->barrier, rq);
+	return 0;
+}
+
 void __i915_timeline_free(struct kref *kref)
 {
 	struct i915_timeline *timeline =
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 8caeb66d1cd5..d167e04073c5 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -74,6 +74,16 @@ struct i915_timeline {
 	 */
 	struct i915_syncmap *sync;
 
+	/**
+	 * Barrier provides the ability to serialize ordering between different
+	 * timelines.
+	 *
+	 * Users can call i915_timeline_set_barrier which will make all
+	 * subsequent submissions to this timeline be executed only after the
+	 * barrier has been completed.
+	 */
+	struct i915_gem_active barrier;
+
 	struct list_head link;
 	const char *name;
 	struct drm_i915_private *i915;
@@ -155,4 +165,16 @@ void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
 
+/**
+ * i915_timeline_set_barrier - orders submission between different timelines
+ * @timeline: timeline to set the barrier on
+ * @rq: request after which new submissions can proceed
+ *
+ * Sets the passed in request as the serialization point for all subsequent
+ * submissions on @timeline. Subsequent requests will not be submitted to GPU
+ * until the barrier has been completed.
+ */
+int i915_timeline_set_barrier(struct i915_timeline *timeline,
+			      struct i915_request *rq);
+
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index cf39ccd9fc05..e5659aaa856d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -15,6 +15,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	spin_lock_init(&timeline->lock);
 
+	init_request_active(&timeline->barrier, NULL);
 	init_request_active(&timeline->last_request, NULL);
 	INIT_LIST_HEAD(&timeline->requests);
 
-- 
cgit v1.2.3


From e46c2e99f60043bfdb63f18fe775db1f688906d9 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue, 5 Feb 2019 09:50:31 +0000
Subject: drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We want to allow userspace to reconfigure the subslice configuration on a
per context basis.

This is required for the functional requirement of shutting down non-VME
enabled sub-slices on Gen11 parts.

To do so, we expose a context parameter to allow adjustment of the RPCS
register stored within the context image (and currently not accessible via
LRI).

If the context is adjusted before first use or whilst idle, the adjustment
is for "free"; otherwise if the context is active we queue a request to do
so (using the kernel context), following all other activity by that
context, which is also marked as barrier for all following submission
against the same context.

Since the overhead of device re-configuration during context switching can
be significant, especially in multi-context workloads, we limit this new
uAPI to only support the Gen11 VME use case. In this use case either the
device is fully enabled, and exactly one slice and half of the subslices
are enabled.

Example usage:

	struct drm_i915_gem_context_param_sseu sseu = { };
	struct drm_i915_gem_context_param arg = {
		.param = I915_CONTEXT_PARAM_SSEU,
		.ctx_id = gem_context_create(fd),
		.size = sizeof(sseu),
		.value = to_user_pointer(&sseu)
	};

	/* Query device defaults. */
	gem_context_get_param(fd, &arg);

	/* Set VME configuration on a 1x6x8 part. */
	sseu.slice_mask = 0x1;
	sseu.subslice_mask = 0xe0;
	gem_context_set_param(fd, &arg);

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu()
    (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context
    (Chris)

v7: Synchronize the requests following a powergating setting change using
    a global dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users
    (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
    s/dev_priv/i915/ (Tvrtko)
    Change uapi class/instance fields to u16 (Tvrtko)
    Bump mask fields to 64bits (Lionel)
    Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
    reconfiguring powergated slice/subslices (Chris)
    Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
 * Update for upstream changes.
 * Request submit needs a RPM reference.
 * Reject on !FULL_PPGTT for simplicity.
 * Pull out get/set param to helpers for readability and less indent.
 * Use i915_request_await_dma_fence in add_global_barrier to skip waits
   on the same timeline and avoid GEM_BUG_ON.
 * No need to explicitly assign a NULL pointer to engine in legacy mode.
 * No need to move gen8_make_rpcs up.
 * Factored out global barrier as prep patch.
 * Allow to only CAP_SYS_ADMIN if !Gen11.

v11:
 * Remove engine vfunc in favour of local helper. (Chris Wilson)
 * Stop retiring requests before updates since it is not needed
   (Chris Wilson)
 * Implement direct CPU update path for idle contexts. (Chris Wilson)
 * Left side dependency needs only be on the same context timeline.
   (Chris Wilson)
 * It is sufficient to order the timeline. (Chris Wilson)
 * Reject !RCS configuration attempts with -ENODEV for now.

v12:
 * Rebase for make_rpcs.

v13:
 * Centralize SSEU normalization to make_rpcs.
 * Type width checking (uAPI <-> implementation).
 * Gen11 restrictions uAPI checks.
 * Gen11 subslice count differences handling.
 Chris Wilson:
 * args->size handling fixes.
 * Update context image from GGTT.
 * Postpone context image update to pinning.
 * Use i915_gem_active_raw instead of last_request_on_engine.

v14:
 * Add activity tracker on intel_context to fix the lifetime issues
   and simplify the code. (Chris Wilson)

v15:
 * Fix context pin leak if no space in ring by simplifying the
   context pinning sequence.

v16:
 * Rebase for context get/set param locking changes.
 * Just -ENODEV on !Gen11. (Joonas)

v17:
 * Fix one Gen11 subslice enablement rule.
 * Handle error from i915_sw_fence_await_sw_fence_gfp. (Chris Wilson)

v18:
 * Update commit message. (Joonas)
 * Restrict uAPI to VME use case. (Joonas)

v19:
 * Rebase.

v20:
 * Rebase for ce->active_tracker.

v21:
 * Rebase for IS_GEN changes.

v22:
 * Reserve uAPI for flags straight away. (Chris Wilson)

v23:
 * Rebase for RUNTIME_INFO.

v24:
 * Added some headline docs for the uapi usage. (Joonas/Chris)

v25:
 * Renamed class/instance to engine_class/engine_instance to avoid clash
   with C++ keyword. (Tony Ye)

v26:
 * Rebased for runtime pm api changes.

v27:
 * Rebased for intel_context_init.
 * Wrap commit msg to 75.

v28:
 (Chris Wilson)
 * Use i915_gem_ggtt.
 * Use i915_request_await_dma_fence to show a better example.

v29:
 * i915_timeline_set_barrier can now fail. (Chris Wilson)

v30:
 * Capture some acks.

v31:
 * Drop the WARN_ON from use controllable paths. (Chris Wilson)
 * Use overflows_type for all checks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107634
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Stéphane Marchesin <marcheu@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-4-tvrtko.ursulin@linux.intel.com
---
 drivers/gpu/drm/i915/i915_gem_context.c | 340 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h |   6 +
 drivers/gpu/drm/i915/intel_lrc.c        |   4 +-
 include/uapi/drm/i915_drm.h             |  64 ++++++
 4 files changed, 411 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d3887c27c3ba..2d3e1ce9cc76 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -89,6 +89,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_trace.h"
+#include "intel_lrc_reg.h"
 #include "intel_workarounds.h"
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
@@ -321,6 +322,15 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
+static void intel_context_retire(struct i915_gem_active *active,
+				 struct i915_request *rq)
+{
+	struct intel_context *ce =
+		container_of(active, typeof(*ce), active_tracker);
+
+	intel_context_unpin(ce);
+}
+
 void
 intel_context_init(struct intel_context *ce,
 		   struct i915_gem_context *ctx,
@@ -333,6 +343,8 @@ intel_context_init(struct intel_context *ce,
 
 	/* Use the whole device by default */
 	ce->sseu = intel_device_default_sseu(ctx->i915);
+
+	init_request_active(&ce->active_tracker, intel_context_retire);
 }
 
 static struct i915_gem_context *
@@ -850,6 +862,56 @@ out:
 	return 0;
 }
 
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+	int ret;
+
+	if (args->size == 0)
+		goto out;
+	else if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.flags || user_sseu.rsvd)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.engine_class,
+					  user_sseu.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	/* Only use for mutex here is to serialize get_param and set_param. */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ce = to_intel_context(ctx, engine);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+out:
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -862,15 +924,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	if (!ctx)
 		return -ENOENT;
 
-	args->size = 0;
 	switch (args->param) {
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 		ret = -EINVAL;
 		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
+		args->size = 0;
 		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
 		break;
 	case I915_CONTEXT_PARAM_GTT_SIZE:
+		args->size = 0;
+
 		if (ctx->ppgtt)
 			args->value = ctx->ppgtt->vm.total;
 		else if (to_i915(dev)->mm.aliasing_ppgtt)
@@ -879,14 +943,20 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 			args->value = to_i915(dev)->ggtt.vm.total;
 		break;
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->size = 0;
 		args->value = i915_gem_context_no_error_capture(ctx);
 		break;
 	case I915_CONTEXT_PARAM_BANNABLE:
+		args->size = 0;
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
 	case I915_CONTEXT_PARAM_PRIORITY:
+		args->size = 0;
 		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -896,6 +966,270 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct intel_context *ce,
+				 struct intel_sseu sseu)
+{
+	u64 offset;
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = i915_ggtt_offset(ce->state) +
+		 LRC_STATE_PN * PAGE_SIZE +
+		 (CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = gen8_make_rpcs(rq->i915, &sseu);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int
+gen8_modify_rpcs_gpu(struct intel_context *ce,
+		     struct intel_engine_cs *engine,
+		     struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = engine->i915;
+	struct i915_request *rq, *prev;
+	intel_wakeref_t wakeref;
+	int ret;
+
+	GEM_BUG_ON(!ce->pin_count);
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	/* Submitting requests etc needs the hw awake. */
+	wakeref = intel_runtime_pm_get(i915);
+
+	rq = i915_request_alloc(engine, i915->kernel_context);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		goto out_put;
+	}
+
+	/* Queue this switch after all other activity by this context. */
+	prev = i915_gem_active_raw(&ce->ring->timeline->last_request,
+				   &i915->drm.struct_mutex);
+	if (prev && !i915_request_completed(prev)) {
+		ret = i915_request_await_dma_fence(rq, &prev->fence);
+		if (ret < 0)
+			goto out_add;
+	}
+
+	/* Order all following requests to be after. */
+	ret = i915_timeline_set_barrier(ce->ring->timeline, rq);
+	if (ret)
+		goto out_add;
+
+	ret = gen8_emit_rpcs_config(rq, ce, sseu);
+	if (ret)
+		goto out_add;
+
+	/*
+	 * Guarantee context image and the timeline remains pinned until the
+	 * modifying request is retired by setting the ce activity tracker.
+	 *
+	 * But we only need to take one pin on the account of it. Or in other
+	 * words transfer the pinned ce object to tracked active request.
+	 */
+	if (!i915_gem_active_isset(&ce->active_tracker))
+		__intel_context_pin(ce);
+	i915_gem_active_set(&ce->active_tracker, rq);
+
+out_add:
+	i915_request_add(rq);
+out_put:
+	intel_runtime_pm_put(i915, wakeref);
+
+	return ret;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  struct intel_sseu sseu)
+{
+	struct intel_context *ce = to_intel_context(ctx, engine);
+	int ret;
+
+	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
+	GEM_BUG_ON(engine->id != RCS);
+
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	/* Nothing to do if unmodified. */
+	if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
+		goto out;
+
+	/*
+	 * If context is not idle we have to submit an ordered request to modify
+	 * its context image via the kernel context. Pristine and idle contexts
+	 * will be configured on pinning.
+	 */
+	if (ce->pin_count)
+		ret = gen8_modify_rpcs_gpu(ce, engine, sseu);
+
+	if (!ret)
+		ce->sseu = sseu;
+
+out:
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	return ret;
+}
+
+static int
+user_to_context_sseu(struct drm_i915_private *i915,
+		     const struct drm_i915_gem_context_param_sseu *user,
+		     struct intel_sseu *context)
+{
+	const struct sseu_dev_info *device = &RUNTIME_INFO(i915)->sseu;
+
+	/* No zeros in any field. */
+	if (!user->slice_mask || !user->subslice_mask ||
+	    !user->min_eus_per_subslice || !user->max_eus_per_subslice)
+		return -EINVAL;
+
+	/* Max > min. */
+	if (user->max_eus_per_subslice < user->min_eus_per_subslice)
+		return -EINVAL;
+
+	/*
+	 * Some future proofing on the types since the uAPI is wider than the
+	 * current internal implementation.
+	 */
+	if (overflows_type(user->slice_mask, context->slice_mask) ||
+	    overflows_type(user->subslice_mask, context->subslice_mask) ||
+	    overflows_type(user->min_eus_per_subslice,
+			   context->min_eus_per_subslice) ||
+	    overflows_type(user->max_eus_per_subslice,
+			   context->max_eus_per_subslice))
+		return -EINVAL;
+
+	/* Check validity against hardware. */
+	if (user->slice_mask & ~device->slice_mask)
+		return -EINVAL;
+
+	if (user->subslice_mask & ~device->subslice_mask[0])
+		return -EINVAL;
+
+	if (user->max_eus_per_subslice > device->max_eus_per_subslice)
+		return -EINVAL;
+
+	context->slice_mask = user->slice_mask;
+	context->subslice_mask = user->subslice_mask;
+	context->min_eus_per_subslice = user->min_eus_per_subslice;
+	context->max_eus_per_subslice = user->max_eus_per_subslice;
+
+	/* Part specific restrictions. */
+	if (IS_GEN(i915, 11)) {
+		unsigned int hw_s = hweight8(device->slice_mask);
+		unsigned int hw_ss_per_s = hweight8(device->subslice_mask[0]);
+		unsigned int req_s = hweight8(context->slice_mask);
+		unsigned int req_ss = hweight8(context->subslice_mask);
+
+		/*
+		 * Only full subslice enablement is possible if more than one
+		 * slice is turned on.
+		 */
+		if (req_s > 1 && req_ss != hw_ss_per_s)
+			return -EINVAL;
+
+		/*
+		 * If more than four (SScount bitfield limit) subslices are
+		 * requested then the number has to be even.
+		 */
+		if (req_ss > 4 && (req_ss & 1))
+			return -EINVAL;
+
+		/*
+		 * If only one slice is enabled and subslice count is below the
+		 * device full enablement, it must be at most half of the all
+		 * available subslices.
+		 */
+		if (req_s == 1 && req_ss < hw_ss_per_s &&
+		    req_ss > (hw_ss_per_s / 2))
+			return -EINVAL;
+
+		/* ABI restriction - VME use case only. */
+
+		/* All slices or one slice only. */
+		if (req_s != 1 && req_s != hw_s)
+			return -EINVAL;
+
+		/*
+		 * Half subslices or full enablement only when one slice is
+		 * enabled.
+		 */
+		if (req_s == 1 &&
+		    (req_ss != hw_ss_per_s && req_ss != (hw_ss_per_s / 2)))
+			return -EINVAL;
+
+		/* No EU configuration changes. */
+		if ((user->min_eus_per_subslice !=
+		     device->max_eus_per_subslice) ||
+		    (user->max_eus_per_subslice !=
+		     device->max_eus_per_subslice))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int set_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_sseu sseu;
+	int ret;
+
+	if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (!IS_GEN(i915, 11))
+		return -ENODEV;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.flags || user_sseu.rsvd)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(i915,
+					  user_sseu.engine_class,
+					  user_sseu.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	/* Only render engine supports RPCS configuration. */
+	if (engine->class != RENDER_CLASS)
+		return -ENODEV;
+
+	ret = user_to_context_sseu(i915, &user_sseu, &sseu);
+	if (ret)
+		return ret;
+
+	ret = i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
+	if (ret)
+		return ret;
+
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -958,7 +1292,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 					I915_USER_PRIORITY(priority);
 		}
 		break;
-
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = set_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 919f6f0a0f7a..92ad5272e57f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -183,6 +183,12 @@ struct i915_gem_context {
 		u64 lrc_desc;
 		int pin_count;
 
+		/**
+		 * active_tracker: Active tracker for the external rq activity
+		 * on this intel_context object.
+		 */
+		struct i915_gem_active active_tracker;
+
 		const struct intel_context_ops *ops;
 
 		/** sseu: Control eu/slice partitioning */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d99c462e2c09..5e98fd79bd9d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2498,7 +2498,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *req_sseu)
 	 * subslices are enabled, or a count between one and four on the first
 	 * slice.
 	 */
-	if (IS_GEN(i915, 11) && slices == 1 && subslices >= 4) {
+	if (IS_GEN(i915, 11) &&
+	    slices == 1 &&
+	    subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) {
 		GEM_BUG_ON(subslices & 1);
 
 		subslice_pg = false;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 298b2e197744..397810fa2d33 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1486,9 +1486,73 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+/**
+ * Context SSEU programming
+ *
+ * It may be necessary for either functional or performance reason to configure
+ * a context to run with a reduced number of SSEU (where SSEU stands for Slice/
+ * Sub-slice/EU).
+ *
+ * This is done by configuring SSEU configuration using the below
+ * @struct drm_i915_gem_context_param_sseu for every supported engine which
+ * userspace intends to use.
+ *
+ * Not all GPUs or engines support this functionality in which case an error
+ * code -ENODEV will be returned.
+ *
+ * Also, flexibility of possible SSEU configuration permutations varies between
+ * GPU generations and software imposed limitations. Requesting such a
+ * combination will return an error code of -EINVAL.
+ *
+ * NOTE: When perf/OA is active the context's SSEU configuration is ignored in
+ * favour of a single global setting.
+ */
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u16 engine_class;
+	__u16 engine_instance;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 flags;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u64 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u64 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u16 min_eus_per_subslice;
+	__u16 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
cgit v1.2.3


From c06ee6ff2cbcf7f23c6dcfe97e2be415eb3a43e4 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue, 5 Feb 2019 09:50:32 +0000
Subject: drm/i915/selftests: Context SSEU reconfiguration tests

Exercise the context image reconfiguration logic for idle and busy
contexts, with the resets thrown into the mix as well.

Free from the uAPI restrictions this test runs on all Gen9+ platforms
with slice power gating.

v2:
 * Rename some helpers for clarity.
 * Include subtest names in error logs.
 * Remove unnecessary function export.

v3:
 * Rebase for RUNTIME_INFO.

v4:
 * Fix incomplete unexport from v2. (Chris Wilson)

v5:
 * Rebased for runtime pm api changes.

v6:
 * Rebased for i915_reset.c.

v7:
 * Tidy checkpatch warnings.
 * Consolidate error checking and logging a bit.
 * Skip idle test phase if something failed before it.

v8:
 (Chris Wilson)
 * Fix i915_request_wait error handling.
 * No need to PIN_HIGH the VMA.
 * Remove pointless GEM_BUG_ON before pointer dereference.

v9:
 * Avoid rq leak if rpcs query fails. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> # v6
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-5-tvrtko.ursulin@linux.intel.com
---
 drivers/gpu/drm/i915/i915_gem_context.c           |  31 +-
 drivers/gpu/drm/i915/selftests/i915_gem_context.c | 467 ++++++++++++++++++++++
 2 files changed, 488 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2d3e1ce9cc76..93ab287f44b6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1052,23 +1052,19 @@ out_put:
 }
 
 static int
-i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
-				  struct intel_engine_cs *engine,
-				  struct intel_sseu sseu)
+__i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				    struct intel_engine_cs *engine,
+				    struct intel_sseu sseu)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
-	int ret;
+	int ret = 0;
 
 	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
 	GEM_BUG_ON(engine->id != RCS);
 
-	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
 	/* Nothing to do if unmodified. */
 	if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
-		goto out;
+		return 0;
 
 	/*
 	 * If context is not idle we have to submit an ordered request to modify
@@ -1081,7 +1077,22 @@ i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
 	if (!ret)
 		ce->sseu = sseu;
 
-out:
+	return ret;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  struct intel_sseu sseu)
+{
+	int ret;
+
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
+
 	mutex_unlock(&ctx->i915->drm.struct_mutex);
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index e2c1f0bc2abe..d00d0bb07784 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -24,10 +24,13 @@
 
 #include <linux/prime_numbers.h>
 
+#include "../i915_reset.h"
 #include "../i915_selftest.h"
 #include "i915_random.h"
 #include "igt_flush_test.h"
 #include "igt_live_test.h"
+#include "igt_reset.h"
+#include "igt_spinner.h"
 
 #include "mock_drm.h"
 #include "mock_gem_device.h"
@@ -576,6 +579,469 @@ out_unlock:
 	return err;
 }
 
+static struct i915_vma *rpcs_query_batch(struct i915_vma *vma)
+{
+	struct drm_i915_gem_object *obj;
+	u32 *cmd;
+	int err;
+
+	if (INTEL_GEN(vma->vm->i915) < 8)
+		return ERR_PTR(-EINVAL);
+
+	obj = i915_gem_object_create_internal(vma->vm->i915, PAGE_SIZE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+
+	cmd = i915_gem_object_pin_map(obj, I915_MAP_WB);
+	if (IS_ERR(cmd)) {
+		err = PTR_ERR(cmd);
+		goto err;
+	}
+
+	*cmd++ = MI_STORE_REGISTER_MEM_GEN8;
+	*cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+	*cmd++ = lower_32_bits(vma->node.start);
+	*cmd++ = upper_32_bits(vma->node.start);
+	*cmd = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_unpin_map(obj);
+
+	err = i915_gem_object_set_to_gtt_domain(obj, false);
+	if (err)
+		goto err;
+
+	vma = i915_vma_instance(obj, vma->vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (err)
+		goto err;
+
+	return vma;
+
+err:
+	i915_gem_object_put(obj);
+	return ERR_PTR(err);
+}
+
+static int
+emit_rpcs_query(struct drm_i915_gem_object *obj,
+		struct i915_gem_context *ctx,
+		struct intel_engine_cs *engine,
+		struct i915_request **rq_out)
+{
+	struct i915_request *rq;
+	struct i915_vma *batch;
+	struct i915_vma *vma;
+	int err;
+
+	GEM_BUG_ON(!intel_engine_can_store_dword(engine));
+
+	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	err = i915_gem_object_set_to_gtt_domain(obj, false);
+	if (err)
+		return err;
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (err)
+		return err;
+
+	batch = rpcs_query_batch(vma);
+	if (IS_ERR(batch)) {
+		err = PTR_ERR(batch);
+		goto err_vma;
+	}
+
+	rq = i915_request_alloc(engine, ctx);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto err_batch;
+	}
+
+	err = engine->emit_bb_start(rq, batch->node.start, batch->node.size, 0);
+	if (err)
+		goto err_request;
+
+	err = i915_vma_move_to_active(batch, rq, 0);
+	if (err)
+		goto skip_request;
+
+	err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+	if (err)
+		goto skip_request;
+
+	i915_gem_object_set_active_reference(batch->obj);
+	i915_vma_unpin(batch);
+	i915_vma_close(batch);
+
+	i915_vma_unpin(vma);
+
+	*rq_out = i915_request_get(rq);
+
+	i915_request_add(rq);
+
+	return 0;
+
+skip_request:
+	i915_request_skip(rq, err);
+err_request:
+	i915_request_add(rq);
+err_batch:
+	i915_vma_unpin(batch);
+err_vma:
+	i915_vma_unpin(vma);
+
+	return err;
+}
+
+#define TEST_IDLE	BIT(0)
+#define TEST_BUSY	BIT(1)
+#define TEST_RESET	BIT(2)
+
+static int
+__sseu_prepare(struct drm_i915_private *i915,
+	       const char *name,
+	       unsigned int flags,
+	       struct i915_gem_context *ctx,
+	       struct intel_engine_cs *engine,
+	       struct igt_spinner **spin_out)
+{
+	int ret = 0;
+
+	if (flags & (TEST_BUSY | TEST_RESET)) {
+		struct igt_spinner *spin;
+		struct i915_request *rq;
+
+		spin = kzalloc(sizeof(*spin), GFP_KERNEL);
+		if (!spin) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		ret = igt_spinner_init(spin, i915);
+		if (ret)
+			return ret;
+
+		rq = igt_spinner_create_request(spin, ctx, engine, MI_NOOP);
+		if (IS_ERR(rq)) {
+			ret = PTR_ERR(rq);
+			igt_spinner_fini(spin);
+			kfree(spin);
+			goto out;
+		}
+
+		i915_request_add(rq);
+
+		if (!igt_wait_for_spinner(spin, rq)) {
+			pr_err("%s: Spinner failed to start!\n", name);
+			igt_spinner_end(spin);
+			igt_spinner_fini(spin);
+			kfree(spin);
+			ret = -ETIMEDOUT;
+			goto out;
+		}
+
+		*spin_out = spin;
+	}
+
+out:
+	return ret;
+}
+
+static int
+__read_slice_count(struct drm_i915_private *i915,
+		   struct i915_gem_context *ctx,
+		   struct intel_engine_cs *engine,
+		   struct drm_i915_gem_object *obj,
+		   struct igt_spinner *spin,
+		   u32 *rpcs)
+{
+	struct i915_request *rq = NULL;
+	u32 s_mask, s_shift;
+	unsigned int cnt;
+	u32 *buf, val;
+	long ret;
+
+	ret = emit_rpcs_query(obj, ctx, engine, &rq);
+	if (ret)
+		return ret;
+
+	if (spin)
+		igt_spinner_end(spin);
+
+	ret = i915_request_wait(rq, I915_WAIT_LOCKED, MAX_SCHEDULE_TIMEOUT);
+	i915_request_put(rq);
+	if (ret < 0)
+		return ret;
+
+	buf = i915_gem_object_pin_map(obj, I915_MAP_WB);
+	if (IS_ERR(buf)) {
+		ret = PTR_ERR(buf);
+		return ret;
+	}
+
+	if (INTEL_GEN(i915) >= 11) {
+		s_mask = GEN11_RPCS_S_CNT_MASK;
+		s_shift = GEN11_RPCS_S_CNT_SHIFT;
+	} else {
+		s_mask = GEN8_RPCS_S_CNT_MASK;
+		s_shift = GEN8_RPCS_S_CNT_SHIFT;
+	}
+
+	val = *buf;
+	cnt = (val & s_mask) >> s_shift;
+	*rpcs = val;
+
+	i915_gem_object_unpin_map(obj);
+
+	return cnt;
+}
+
+static int
+__check_rpcs(const char *name, u32 rpcs, int slices, unsigned int expected,
+	     const char *prefix, const char *suffix)
+{
+	if (slices == expected)
+		return 0;
+
+	if (slices < 0) {
+		pr_err("%s: %s read slice count failed with %d%s\n",
+		       name, prefix, slices, suffix);
+		return slices;
+	}
+
+	pr_err("%s: %s slice count %d is not %u%s\n",
+	       name, prefix, slices, expected, suffix);
+
+	pr_info("RPCS=0x%x; %u%sx%u%s\n",
+		rpcs, slices,
+		(rpcs & GEN8_RPCS_S_CNT_ENABLE) ? "*" : "",
+		(rpcs & GEN8_RPCS_SS_CNT_MASK) >> GEN8_RPCS_SS_CNT_SHIFT,
+		(rpcs & GEN8_RPCS_SS_CNT_ENABLE) ? "*" : "");
+
+	return -EINVAL;
+}
+
+static int
+__sseu_finish(struct drm_i915_private *i915,
+	      const char *name,
+	      unsigned int flags,
+	      struct i915_gem_context *ctx,
+	      struct i915_gem_context *kctx,
+	      struct intel_engine_cs *engine,
+	      struct drm_i915_gem_object *obj,
+	      unsigned int expected,
+	      struct igt_spinner *spin)
+{
+	unsigned int slices =
+		hweight32(intel_device_default_sseu(i915).slice_mask);
+	u32 rpcs = 0;
+	int ret = 0;
+
+	if (flags & TEST_RESET) {
+		ret = i915_reset_engine(engine, "sseu");
+		if (ret)
+			goto out;
+	}
+
+	ret = __read_slice_count(i915, ctx, engine, obj,
+				 flags & TEST_RESET ? NULL : spin, &rpcs);
+	ret = __check_rpcs(name, rpcs, ret, expected, "Context", "!");
+	if (ret)
+		goto out;
+
+	ret = __read_slice_count(i915, kctx, engine, obj, NULL, &rpcs);
+	ret = __check_rpcs(name, rpcs, ret, slices, "Kernel context", "!");
+
+out:
+	if (spin)
+		igt_spinner_end(spin);
+
+	if ((flags & TEST_IDLE) && ret == 0) {
+		ret = i915_gem_wait_for_idle(i915,
+					     I915_WAIT_LOCKED,
+					     MAX_SCHEDULE_TIMEOUT);
+		if (ret)
+			return ret;
+
+		ret = __read_slice_count(i915, ctx, engine, obj, NULL, &rpcs);
+		ret = __check_rpcs(name, rpcs, ret, expected,
+				   "Context", " after idle!");
+	}
+
+	return ret;
+}
+
+static int
+__sseu_test(struct drm_i915_private *i915,
+	    const char *name,
+	    unsigned int flags,
+	    struct i915_gem_context *ctx,
+	    struct intel_engine_cs *engine,
+	    struct drm_i915_gem_object *obj,
+	    struct intel_sseu sseu)
+{
+	struct igt_spinner *spin = NULL;
+	struct i915_gem_context *kctx;
+	int ret;
+
+	kctx = kernel_context(i915);
+	if (IS_ERR(kctx))
+		return PTR_ERR(kctx);
+
+	ret = __sseu_prepare(i915, name, flags, ctx, engine, &spin);
+	if (ret)
+		goto out;
+
+	ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
+	if (ret)
+		goto out;
+
+	ret = __sseu_finish(i915, name, flags, ctx, kctx, engine, obj,
+			    hweight32(sseu.slice_mask), spin);
+
+out:
+	if (spin) {
+		igt_spinner_end(spin);
+		igt_spinner_fini(spin);
+		kfree(spin);
+	}
+
+	kernel_context_close(kctx);
+
+	return ret;
+}
+
+static int
+__igt_ctx_sseu(struct drm_i915_private *i915,
+	       const char *name,
+	       unsigned int flags)
+{
+	struct intel_sseu default_sseu = intel_device_default_sseu(i915);
+	struct intel_engine_cs *engine = i915->engine[RCS];
+	struct drm_i915_gem_object *obj;
+	struct i915_gem_context *ctx;
+	struct intel_sseu pg_sseu;
+	intel_wakeref_t wakeref;
+	struct drm_file *file;
+	int ret;
+
+	if (INTEL_GEN(i915) < 9)
+		return 0;
+
+	if (!RUNTIME_INFO(i915)->sseu.has_slice_pg)
+		return 0;
+
+	if (hweight32(default_sseu.slice_mask) < 2)
+		return 0;
+
+	/*
+	 * Gen11 VME friendly power-gated configuration with half enabled
+	 * sub-slices.
+	 */
+	pg_sseu = default_sseu;
+	pg_sseu.slice_mask = 1;
+	pg_sseu.subslice_mask =
+		~(~0 << (hweight32(default_sseu.subslice_mask) / 2));
+
+	pr_info("SSEU subtest '%s', flags=%x, def_slices=%u, pg_slices=%u\n",
+		name, flags, hweight32(default_sseu.slice_mask),
+		hweight32(pg_sseu.slice_mask));
+
+	file = mock_file(i915);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	if (flags & TEST_RESET)
+		igt_global_reset_lock(i915);
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	ctx = i915_gem_create_context(i915, file->driver_priv);
+	if (IS_ERR(ctx)) {
+		ret = PTR_ERR(ctx);
+		goto out_unlock;
+	}
+
+	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto out_unlock;
+	}
+
+	wakeref = intel_runtime_pm_get(i915);
+
+	/* First set the default mask. */
+	ret = __sseu_test(i915, name, flags, ctx, engine, obj, default_sseu);
+	if (ret)
+		goto out_fail;
+
+	/* Then set a power-gated configuration. */
+	ret = __sseu_test(i915, name, flags, ctx, engine, obj, pg_sseu);
+	if (ret)
+		goto out_fail;
+
+	/* Back to defaults. */
+	ret = __sseu_test(i915, name, flags, ctx, engine, obj, default_sseu);
+	if (ret)
+		goto out_fail;
+
+	/* One last power-gated configuration for the road. */
+	ret = __sseu_test(i915, name, flags, ctx, engine, obj, pg_sseu);
+	if (ret)
+		goto out_fail;
+
+out_fail:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		ret = -EIO;
+
+	i915_gem_object_put(obj);
+
+	intel_runtime_pm_put(i915, wakeref);
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	if (flags & TEST_RESET)
+		igt_global_reset_unlock(i915);
+
+	mock_file_free(i915, file);
+
+	if (ret)
+		pr_err("%s: Failed with %d!\n", name, ret);
+
+	return ret;
+}
+
+static int igt_ctx_sseu(void *arg)
+{
+	struct {
+		const char *name;
+		unsigned int flags;
+	} *phase, phases[] = {
+		{ .name = "basic", .flags = 0 },
+		{ .name = "idle", .flags = TEST_IDLE },
+		{ .name = "busy", .flags = TEST_BUSY },
+		{ .name = "busy-reset", .flags = TEST_BUSY | TEST_RESET },
+		{ .name = "busy-idle", .flags = TEST_BUSY | TEST_IDLE },
+		{ .name = "reset-idle", .flags = TEST_RESET | TEST_IDLE },
+	};
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0, phase = phases; ret == 0 && i < ARRAY_SIZE(phases);
+	     i++, phase++)
+		ret = __igt_ctx_sseu(arg, phase->name, phase->flags);
+
+	return ret;
+}
+
 static int igt_ctx_readonly(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1162,6 +1628,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(live_nop_switch),
 		SUBTEST(igt_ctx_exec),
 		SUBTEST(igt_ctx_readonly),
+		SUBTEST(igt_ctx_sseu),
 		SUBTEST(igt_vm_isolation),
 	};
 
-- 
cgit v1.2.3


From a21f453c73aa3705029a3460b456ef77e0bc2215 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 5 Feb 2019 12:38:35 +0000
Subject: drm/i915/selftests: Exercise some AB...BA preemption chains

Build a chain using 2 contexts (A, B) then request a preemption such
that a later A request runs before the spinner in B.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205123835.25331-1-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 103 +++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index fb35f53c9ce3..58144e024751 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -4,6 +4,8 @@
  * Copyright © 2018 Intel Corporation
  */
 
+#include <linux/prime_numbers.h>
+
 #include "../i915_reset.h"
 
 #include "../i915_selftest.h"
@@ -405,6 +407,106 @@ err_wedged:
 	goto err_client_b;
 }
 
+static int live_chain_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct preempt_client hi, lo;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+
+	/*
+	 * Build a chain AB...BA between two contexts (A, B) and request
+	 * preemption of the last request. It should then complete before
+	 * the previously submitted spinner in B.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &hi))
+		goto err_unlock;
+
+	if (preempt_client_init(i915, &lo))
+		goto err_client_hi;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_sched_attr attr = {
+			.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX),
+		};
+		int count, i;
+
+		for_each_prime_number_from(count, 1, 32) { /* must fit ring! */
+			struct i915_request *rq;
+
+			rq = igt_spinner_create_request(&hi.spin,
+							hi.ctx, engine,
+							MI_ARB_CHECK);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+			if (!igt_wait_for_spinner(&hi.spin, rq))
+				goto err_wedged;
+
+			rq = igt_spinner_create_request(&lo.spin,
+							lo.ctx, engine,
+							MI_ARB_CHECK);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+
+			for (i = 0; i < count; i++) {
+				rq = i915_request_alloc(engine, lo.ctx);
+				if (IS_ERR(rq))
+					goto err_wedged;
+				i915_request_add(rq);
+			}
+
+			rq = i915_request_alloc(engine, hi.ctx);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+			engine->schedule(rq, &attr);
+
+			igt_spinner_end(&hi.spin);
+			if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+				struct drm_printer p =
+					drm_info_printer(i915->drm.dev);
+
+				pr_err("Failed to preempt over chain of %d\n",
+				       count);
+				intel_engine_dump(engine, &p,
+						  "%s\n", engine->name);
+				goto err_wedged;
+			}
+			igt_spinner_end(&lo.spin);
+		}
+	}
+
+	err = 0;
+err_client_lo:
+	preempt_client_fini(&lo);
+err_client_hi:
+	preempt_client_fini(&hi);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	igt_spinner_end(&hi.spin);
+	igt_spinner_end(&lo.spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_lo;
+}
+
 static int live_preempt_hang(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -785,6 +887,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_self_preempt),
+		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 	};
-- 
cgit v1.2.3


From 64d6c500a3843408559164223d69fb31e1a00e52 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 5 Feb 2019 13:00:02 +0000
Subject: drm/i915: Generalise GPU activity tracking

We currently track GPU memory usage inside VMA, such that we never
release memory used by the GPU until after it has finished accessing it.
However, we may want to track other resources aside from VMA, or we may
want to split a VMA into multiple independent regions and track each
separately. For this purpose, generalise our request tracking (akin to
struct reservation_object) so that we can embed it into other objects.

v2: Tweak error handling during selftest setup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-2-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/Makefile                      |   4 +-
 drivers/gpu/drm/i915/i915_active.c                 | 228 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_active.h                 |  69 +++++++
 drivers/gpu/drm/i915/i915_active_types.h           |  26 +++
 drivers/gpu/drm/i915/i915_gem_gtt.c                |   3 +-
 drivers/gpu/drm/i915/i915_vma.c                    | 173 +++-------------
 drivers/gpu/drm/i915/i915_vma.h                    |   9 +-
 drivers/gpu/drm/i915/selftests/i915_active.c       | 157 ++++++++++++++
 .../gpu/drm/i915/selftests/i915_live_selftests.h   |   3 +-
 9 files changed, 518 insertions(+), 154 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_active.c
 create mode 100644 drivers/gpu/drm/i915/i915_active.h
 create mode 100644 drivers/gpu/drm/i915/i915_active_types.h
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_active.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 210d0e8777b6..1787e1299b1b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -57,7 +57,9 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
 i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
 
 # GEM code
-i915-y += i915_cmd_parser.o \
+i915-y += \
+	  i915_active.o \
+	  i915_cmd_parser.o \
 	  i915_gem_batch_pool.o \
 	  i915_gem_clflush.o \
 	  i915_gem_context.o \
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
new file mode 100644
index 000000000000..91950d778cab
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -0,0 +1,228 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_active.h"
+
+#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
+
+struct active_node {
+	struct i915_gem_active base;
+	struct i915_active *ref;
+	struct rb_node node;
+	u64 timeline;
+};
+
+static void
+__active_retire(struct i915_active *ref)
+{
+	GEM_BUG_ON(!ref->count);
+	if (!--ref->count)
+		ref->retire(ref);
+}
+
+static void
+node_retire(struct i915_gem_active *base, struct i915_request *rq)
+{
+	__active_retire(container_of(base, struct active_node, base)->ref);
+}
+
+static void
+last_retire(struct i915_gem_active *base, struct i915_request *rq)
+{
+	__active_retire(container_of(base, struct i915_active, last));
+}
+
+static struct i915_gem_active *
+active_instance(struct i915_active *ref, u64 idx)
+{
+	struct active_node *node;
+	struct rb_node **p, *parent;
+	struct i915_request *old;
+
+	/*
+	 * We track the most recently used timeline to skip a rbtree search
+	 * for the common case, under typical loads we never need the rbtree
+	 * at all. We can reuse the last slot if it is empty, that is
+	 * after the previous activity has been retired, or if it matches the
+	 * current timeline.
+	 *
+	 * Note that we allow the timeline to be active simultaneously in
+	 * the rbtree and the last cache. We do this to avoid having
+	 * to search and replace the rbtree element for a new timeline, with
+	 * the cost being that we must be aware that the ref may be retired
+	 * twice for the same timeline (as the older rbtree element will be
+	 * retired before the new request added to last).
+	 */
+	old = i915_gem_active_raw(&ref->last, BKL(ref));
+	if (!old || old->fence.context == idx)
+		goto out;
+
+	/* Move the currently active fence into the rbtree */
+	idx = old->fence.context;
+
+	parent = NULL;
+	p = &ref->tree.rb_node;
+	while (*p) {
+		parent = *p;
+
+		node = rb_entry(parent, struct active_node, node);
+		if (node->timeline == idx)
+			goto replace;
+
+		if (node->timeline < idx)
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+
+	/* kmalloc may retire the ref->last (thanks shrinker)! */
+	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
+		kfree(node);
+		goto out;
+	}
+
+	if (unlikely(!node))
+		return ERR_PTR(-ENOMEM);
+
+	init_request_active(&node->base, node_retire);
+	node->ref = ref;
+	node->timeline = idx;
+
+	rb_link_node(&node->node, parent, p);
+	rb_insert_color(&node->node, &ref->tree);
+
+replace:
+	/*
+	 * Overwrite the previous active slot in the rbtree with last,
+	 * leaving last zeroed. If the previous slot is still active,
+	 * we must be careful as we now only expect to receive one retire
+	 * callback not two, and so much undo the active counting for the
+	 * overwritten slot.
+	 */
+	if (i915_gem_active_isset(&node->base)) {
+		/* Retire ourselves from the old rq->active_list */
+		__list_del_entry(&node->base.link);
+		ref->count--;
+		GEM_BUG_ON(!ref->count);
+	}
+	GEM_BUG_ON(list_empty(&ref->last.link));
+	list_replace_init(&ref->last.link, &node->base.link);
+	node->base.request = fetch_and_zero(&ref->last.request);
+
+out:
+	return &ref->last;
+}
+
+void i915_active_init(struct drm_i915_private *i915,
+		      struct i915_active *ref,
+		      void (*retire)(struct i915_active *ref))
+{
+	ref->i915 = i915;
+	ref->retire = retire;
+	ref->tree = RB_ROOT;
+	init_request_active(&ref->last, last_retire);
+	ref->count = 0;
+}
+
+int i915_active_ref(struct i915_active *ref,
+		    u64 timeline,
+		    struct i915_request *rq)
+{
+	struct i915_gem_active *active;
+
+	active = active_instance(ref, timeline);
+	if (IS_ERR(active))
+		return PTR_ERR(active);
+
+	if (!i915_gem_active_isset(active))
+		ref->count++;
+	i915_gem_active_set(active, rq);
+
+	GEM_BUG_ON(!ref->count);
+	return 0;
+}
+
+bool i915_active_acquire(struct i915_active *ref)
+{
+	lockdep_assert_held(BKL(ref));
+	return !ref->count++;
+}
+
+void i915_active_release(struct i915_active *ref)
+{
+	lockdep_assert_held(BKL(ref));
+	__active_retire(ref);
+}
+
+int i915_active_wait(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+	int ret = 0;
+
+	if (i915_active_acquire(ref))
+		goto out_release;
+
+	ret = i915_gem_active_retire(&ref->last, BKL(ref));
+	if (ret)
+		goto out_release;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		ret = i915_gem_active_retire(&it->base, BKL(ref));
+		if (ret)
+			break;
+	}
+
+out_release:
+	i915_active_release(ref);
+	return ret;
+}
+
+static int __i915_request_await_active(struct i915_request *rq,
+				       struct i915_gem_active *active)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
+int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
+{
+	struct active_node *it, *n;
+	int ret;
+
+	ret = __i915_request_await_active(rq, &ref->last);
+	if (ret)
+		return ret;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		ret = __i915_request_await_active(rq, &it->base);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void i915_active_fini(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+
+	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		kfree(it);
+	}
+	ref->tree = RB_ROOT;
+}
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_active.c"
+#endif
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
new file mode 100644
index 000000000000..0aa2628ea734
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -0,0 +1,69 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_ACTIVE_H_
+#define _I915_ACTIVE_H_
+
+#include "i915_active_types.h"
+
+/*
+ * GPU activity tracking
+ *
+ * Each set of commands submitted to the GPU compromises a single request that
+ * signals a fence upon completion. struct i915_request combines the
+ * command submission, scheduling and fence signaling roles. If we want to see
+ * if a particular task is complete, we need to grab the fence (struct
+ * i915_request) for that task and check or wait for it to be signaled. More
+ * often though we want to track the status of a bunch of tasks, for example
+ * to wait for the GPU to finish accessing some memory across a variety of
+ * different command pipelines from different clients. We could choose to
+ * track every single request associated with the task, but knowing that
+ * each request belongs to an ordered timeline (later requests within a
+ * timeline must wait for earlier requests), we need only track the
+ * latest request in each timeline to determine the overall status of the
+ * task.
+ *
+ * struct i915_active provides this tracking across timelines. It builds a
+ * composite shared-fence, and is updated as new work is submitted to the task,
+ * forming a snapshot of the current status. It should be embedded into the
+ * different resources that need to track their associated GPU activity to
+ * provide a callback when that GPU activity has ceased, or otherwise to
+ * provide a serialisation point either for request submission or for CPU
+ * synchronisation.
+ */
+
+void i915_active_init(struct drm_i915_private *i915,
+		      struct i915_active *ref,
+		      void (*retire)(struct i915_active *ref));
+
+int i915_active_ref(struct i915_active *ref,
+		    u64 timeline,
+		    struct i915_request *rq);
+
+int i915_active_wait(struct i915_active *ref);
+
+int i915_request_await_active(struct i915_request *rq,
+			      struct i915_active *ref);
+
+bool i915_active_acquire(struct i915_active *ref);
+
+static inline void i915_active_cancel(struct i915_active *ref)
+{
+	GEM_BUG_ON(ref->count != 1);
+	ref->count = 0;
+}
+
+void i915_active_release(struct i915_active *ref);
+
+static inline bool
+i915_active_is_idle(const struct i915_active *ref)
+{
+	return !ref->count;
+}
+
+void i915_active_fini(struct i915_active *ref);
+
+#endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
new file mode 100644
index 000000000000..411e502ed8dd
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -0,0 +1,26 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_ACTIVE_TYPES_H_
+#define _I915_ACTIVE_TYPES_H_
+
+#include <linux/rbtree.h>
+
+#include "i915_request.h"
+
+struct drm_i915_private;
+
+struct i915_active {
+	struct drm_i915_private *i915;
+
+	struct rb_root tree;
+	struct i915_gem_active last;
+	unsigned int count;
+
+	void (*retire)(struct i915_active *ref);
+};
+
+#endif /* _I915_ACTIVE_TYPES_H_ */
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 49b00996a15e..e625659c03a2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1917,14 +1917,13 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
+	i915_active_init(i915, &vma->active, NULL);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
 	vma->private = ppgtt;
 
-	vma->active = RB_ROOT;
-
 	vma->size = size;
 	vma->fence_size = size;
 	vma->flags = I915_VMA_GGTT;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d83b8ad5f859..d4772061e642 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -63,22 +63,23 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason)
 
 #endif
 
-struct i915_vma_active {
-	struct i915_gem_active base;
-	struct i915_vma *vma;
-	struct rb_node node;
-	u64 timeline;
-};
+static void obj_bump_mru(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
-static void
-__i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
+	spin_lock(&i915->mm.obj_lock);
+	if (obj->bind_count)
+		list_move_tail(&obj->mm.link, &i915->mm.bound_list);
+	spin_unlock(&i915->mm.obj_lock);
+
+	obj->mm.dirty = true; /* be paranoid  */
+}
+
+static void __i915_vma_retire(struct i915_active *ref)
 {
+	struct i915_vma *vma = container_of(ref, typeof(*vma), active);
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	GEM_BUG_ON(!i915_vma_is_active(vma));
-	if (--vma->active_count)
-		return;
-
 	GEM_BUG_ON(!i915_gem_object_is_active(obj));
 	if (--obj->active_count)
 		return;
@@ -90,16 +91,12 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 		reservation_object_unlock(obj->resv);
 	}
 
-	/* Bump our place on the bound list to keep it roughly in LRU order
+	/*
+	 * Bump our place on the bound list to keep it roughly in LRU order
 	 * so that we don't steal from recently used but inactive objects
 	 * (unless we are forced to ofc!)
 	 */
-	spin_lock(&rq->i915->mm.obj_lock);
-	if (obj->bind_count)
-		list_move_tail(&obj->mm.link, &rq->i915->mm.bound_list);
-	spin_unlock(&rq->i915->mm.obj_lock);
-
-	obj->mm.dirty = true; /* be paranoid  */
+	obj_bump_mru(obj);
 
 	if (i915_gem_object_has_active_reference(obj)) {
 		i915_gem_object_clear_active_reference(obj);
@@ -107,21 +104,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 	}
 }
 
-static void
-i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
-{
-	struct i915_vma_active *active =
-		container_of(base, typeof(*active), base);
-
-	__i915_vma_retire(active->vma, rq);
-}
-
-static void
-i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
-{
-	__i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
-}
-
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
@@ -137,10 +119,9 @@ vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	vma->active = RB_ROOT;
-
-	init_request_active(&vma->last_active, i915_vma_last_retire);
+	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
 	init_request_active(&vma->last_fence, NULL);
+
 	vma->vm = vm;
 	vma->ops = &vm->vma_ops;
 	vma->obj = obj;
@@ -823,7 +804,6 @@ void i915_vma_reopen(struct i915_vma *vma)
 static void __i915_vma_destroy(struct i915_vma *vma)
 {
 	struct drm_i915_private *i915 = vma->vm->i915;
-	struct i915_vma_active *iter, *n;
 
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
@@ -843,10 +823,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 		spin_unlock(&obj->vma.lock);
 	}
 
-	rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&iter->base));
-		kfree(iter);
-	}
+	i915_active_fini(&vma->active);
 
 	kmem_cache_free(i915->vmas, vma);
 }
@@ -931,104 +908,15 @@ static void export_fence(struct i915_vma *vma,
 	reservation_object_unlock(resv);
 }
 
-static struct i915_gem_active *active_instance(struct i915_vma *vma, u64 idx)
-{
-	struct i915_vma_active *active;
-	struct rb_node **p, *parent;
-	struct i915_request *old;
-
-	/*
-	 * We track the most recently used timeline to skip a rbtree search
-	 * for the common case, under typical loads we never need the rbtree
-	 * at all. We can reuse the last_active slot if it is empty, that is
-	 * after the previous activity has been retired, or if the active
-	 * matches the current timeline.
-	 *
-	 * Note that we allow the timeline to be active simultaneously in
-	 * the rbtree and the last_active cache. We do this to avoid having
-	 * to search and replace the rbtree element for a new timeline, with
-	 * the cost being that we must be aware that the vma may be retired
-	 * twice for the same timeline (as the older rbtree element will be
-	 * retired before the new request added to last_active).
-	 */
-	old = i915_gem_active_raw(&vma->last_active,
-				  &vma->vm->i915->drm.struct_mutex);
-	if (!old || old->fence.context == idx)
-		goto out;
-
-	/* Move the currently active fence into the rbtree */
-	idx = old->fence.context;
-
-	parent = NULL;
-	p = &vma->active.rb_node;
-	while (*p) {
-		parent = *p;
-
-		active = rb_entry(parent, struct i915_vma_active, node);
-		if (active->timeline == idx)
-			goto replace;
-
-		if (active->timeline < idx)
-			p = &parent->rb_right;
-		else
-			p = &parent->rb_left;
-	}
-
-	active = kmalloc(sizeof(*active), GFP_KERNEL);
-
-	/* kmalloc may retire the vma->last_active request (thanks shrinker)! */
-	if (unlikely(!i915_gem_active_raw(&vma->last_active,
-					  &vma->vm->i915->drm.struct_mutex))) {
-		kfree(active);
-		goto out;
-	}
-
-	if (unlikely(!active))
-		return ERR_PTR(-ENOMEM);
-
-	init_request_active(&active->base, i915_vma_retire);
-	active->vma = vma;
-	active->timeline = idx;
-
-	rb_link_node(&active->node, parent, p);
-	rb_insert_color(&active->node, &vma->active);
-
-replace:
-	/*
-	 * Overwrite the previous active slot in the rbtree with last_active,
-	 * leaving last_active zeroed. If the previous slot is still active,
-	 * we must be careful as we now only expect to receive one retire
-	 * callback not two, and so much undo the active counting for the
-	 * overwritten slot.
-	 */
-	if (i915_gem_active_isset(&active->base)) {
-		/* Retire ourselves from the old rq->active_list */
-		__list_del_entry(&active->base.link);
-		vma->active_count--;
-		GEM_BUG_ON(!vma->active_count);
-	}
-	GEM_BUG_ON(list_empty(&vma->last_active.link));
-	list_replace_init(&vma->last_active.link, &active->base.link);
-	active->base.request = fetch_and_zero(&vma->last_active.request);
-
-out:
-	return &vma->last_active;
-}
-
 int i915_vma_move_to_active(struct i915_vma *vma,
 			    struct i915_request *rq,
 			    unsigned int flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
-	struct i915_gem_active *active;
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
-	active = active_instance(vma, rq->fence.context);
-	if (IS_ERR(active))
-		return PTR_ERR(active);
-
 	/*
 	 * Add a reference if we're newly entering the active list.
 	 * The order in which we add operations to the retirement queue is
@@ -1037,9 +925,15 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_active_isset(active) && !vma->active_count++)
+	if (!vma->active.count)
 		obj->active_count++;
-	i915_gem_active_set(active, rq);
+
+	if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) {
+		if (!vma->active.count)
+			obj->active_count--;
+		return -ENOMEM;
+	}
+
 	GEM_BUG_ON(!i915_vma_is_active(vma));
 	GEM_BUG_ON(!obj->active_count);
 
@@ -1073,8 +967,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 	 */
 	might_sleep();
 	if (i915_vma_is_active(vma)) {
-		struct i915_vma_active *active, *n;
-
 		/*
 		 * When a closed VMA is retired, it is unbound - eek.
 		 * In order to prevent it from being recursively closed,
@@ -1090,19 +982,10 @@ int i915_vma_unbind(struct i915_vma *vma)
 		 */
 		__i915_vma_pin(vma);
 
-		ret = i915_gem_active_retire(&vma->last_active,
-					     &vma->vm->i915->drm.struct_mutex);
+		ret = i915_active_wait(&vma->active);
 		if (ret)
 			goto unpin;
 
-		rbtree_postorder_for_each_entry_safe(active, n,
-						     &vma->active, node) {
-			ret = i915_gem_active_retire(&active->base,
-						     &vma->vm->i915->drm.struct_mutex);
-			if (ret)
-				goto unpin;
-		}
-
 		ret = i915_gem_active_retire(&vma->last_fence,
 					     &vma->vm->i915->drm.struct_mutex);
 unpin:
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 5793abe509a2..3c03d4569481 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -34,6 +34,7 @@
 #include "i915_gem_fence_reg.h"
 #include "i915_gem_object.h"
 
+#include "i915_active.h"
 #include "i915_request.h"
 
 enum i915_cache_level;
@@ -108,9 +109,7 @@ struct i915_vma {
 #define I915_VMA_USERFAULT	BIT(I915_VMA_USERFAULT_BIT)
 #define I915_VMA_GGTT_WRITE	BIT(15)
 
-	unsigned int active_count;
-	struct rb_root active;
-	struct i915_gem_active last_active;
+	struct i915_active active;
 	struct i915_gem_active last_fence;
 
 	/**
@@ -154,9 +153,9 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
 
-static inline bool i915_vma_is_active(struct i915_vma *vma)
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
 {
-	return vma->active_count;
+	return !i915_active_is_idle(&vma->active);
 }
 
 int __must_check i915_vma_move_to_active(struct i915_vma *vma,
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
new file mode 100644
index 000000000000..337b1f98b923
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -0,0 +1,157 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include "../i915_selftest.h"
+
+#include "igt_flush_test.h"
+#include "lib_sw_fence.h"
+
+struct live_active {
+	struct i915_active base;
+	bool retired;
+};
+
+static void __live_active_retire(struct i915_active *base)
+{
+	struct live_active *active = container_of(base, typeof(*active), base);
+
+	active->retired = true;
+}
+
+static int __live_active_setup(struct drm_i915_private *i915,
+			       struct live_active *active)
+{
+	struct intel_engine_cs *engine;
+	struct i915_sw_fence *submit;
+	enum intel_engine_id id;
+	unsigned int count = 0;
+	int err = 0;
+
+	submit = heap_fence_create(GFP_KERNEL);
+	if (!submit)
+		return -ENOMEM;
+
+	i915_active_init(i915, &active->base, __live_active_retire);
+	active->retired = false;
+
+	if (!i915_active_acquire(&active->base)) {
+		pr_err("First i915_active_acquire should report being idle\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+
+		err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+						       submit,
+						       GFP_KERNEL);
+		if (err >= 0)
+			err = i915_active_ref(&active->base,
+					      rq->fence.context, rq);
+		i915_request_add(rq);
+		if (err) {
+			pr_err("Failed to track active ref!\n");
+			break;
+		}
+
+		count++;
+	}
+
+	i915_active_release(&active->base);
+	if (active->retired && count) {
+		pr_err("i915_active retired before submission!\n");
+		err = -EINVAL;
+	}
+	if (active->base.count != count) {
+		pr_err("i915_active not tracking all requests, found %d, expected %d\n",
+		       active->base.count, count);
+		err = -EINVAL;
+	}
+
+out:
+	i915_sw_fence_commit(submit);
+	heap_fence_put(submit);
+
+	return err;
+}
+
+static int live_active_wait(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct live_active active;
+	intel_wakeref_t wakeref;
+	int err;
+
+	/* Check that we get a callback when requests retire upon waiting */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	err = __live_active_setup(i915, &active);
+
+	i915_active_wait(&active.base);
+	if (!active.retired) {
+		pr_err("i915_active not retired after waiting!\n");
+		err = -EINVAL;
+	}
+
+	i915_active_fini(&active.base);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
+static int live_active_retire(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct live_active active;
+	intel_wakeref_t wakeref;
+	int err;
+
+	/* Check that we get a callback when requests are indirectly retired */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	err = __live_active_setup(i915, &active);
+
+	/* waits for & retires all requests */
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	if (!active.retired) {
+		pr_err("i915_active not retired after flushing!\n");
+		err = -EINVAL;
+	}
+
+	i915_active_fini(&active.base);
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
+int i915_active_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(live_active_wait),
+		SUBTEST(live_active_retire),
+	};
+
+	if (i915_terminally_wedged(&i915->gpu_error))
+		return 0;
+
+	return i915_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index 76b4f87fc853..6d766925ad04 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -12,8 +12,9 @@
 selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */
 selftest(uncore, intel_uncore_live_selftests)
 selftest(workarounds, intel_workarounds_live_selftests)
-selftest(requests, i915_request_live_selftests)
 selftest(timelines, i915_timeline_live_selftests)
+selftest(requests, i915_request_live_selftests)
+selftest(active, i915_active_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(dmabuf, i915_gem_dmabuf_live_selftests)
 selftest(coherency, i915_gem_coherency_live_selftests)
-- 
cgit v1.2.3


From a42375af0a305e9c299cffd739a39e415930614c Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 5 Feb 2019 13:00:03 +0000
Subject: drm/i915: Release the active tracker tree upon idling

As soon as we detect that the active tracker is idle and we prepare to
call the retire callback, release the storage for our tree of
per-timeline nodes. We expect these to be infrequently used and quick
to allocate, so there is little benefit in keeping the tree cached and
we would prefer to return the pages back to the system in a timely
fashion.

This also means that when we finalize the struct as a whole, we know as
the activity tracker must be idle, the tree has already been released.
Indeed we can reduce i915_active_fini() just to the assertions that there
is nothing to do.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-3-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_active.c | 33 +++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_active.h |  4 ++++
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 91950d778cab..b1fefe98f9a6 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -16,12 +16,29 @@ struct active_node {
 	u64 timeline;
 };
 
+static void
+__active_park(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		kfree(it);
+	}
+	ref->tree = RB_ROOT;
+}
+
 static void
 __active_retire(struct i915_active *ref)
 {
 	GEM_BUG_ON(!ref->count);
-	if (!--ref->count)
-		ref->retire(ref);
+	if (--ref->count)
+		return;
+
+	/* return the unused nodes to our slabcache */
+	__active_park(ref);
+
+	ref->retire(ref);
 }
 
 static void
@@ -210,18 +227,14 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
-	struct active_node *it, *n;
-
 	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
-
-	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
-	}
-	ref->tree = RB_ROOT;
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
+	GEM_BUG_ON(ref->count);
 }
+#endif
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 0aa2628ea734..ec4b66efd9a7 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -64,6 +64,10 @@ i915_active_is_idle(const struct i915_active *ref)
 	return !ref->count;
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref);
+#else
+static inline void i915_active_fini(struct i915_active *ref) { }
+#endif
 
 #endif /* _I915_ACTIVE_H_ */
-- 
cgit v1.2.3


From 5f5c139d6900b3338963bddcd8a567dcad33cf92 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 5 Feb 2019 13:00:04 +0000
Subject: drm/i915: Allocate active tracking nodes from a slabcache

Wrap the active tracking for a GPU references in a slabcache for faster
allocations, and hopefully better fragmentation reduction.

v3: Nothing device specific left, it's just a slabcache that we can
make global.
v4: Include i915_active.h and don't put the initfunc under DEBUG_GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-4-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_active.c | 31 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_active.h |  3 +++
 drivers/gpu/drm/i915/i915_pci.c    |  4 ++++
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index b1fefe98f9a6..64661c41532b 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -9,6 +9,17 @@
 
 #define BKL(ref) (&(ref)->i915->drm.struct_mutex)
 
+/*
+ * Active refs memory management
+ *
+ * To be more economical with memory, we reap all the i915_active trees as
+ * they idle (when we know the active requests are inactive) and allocate the
+ * nodes from a local slab cache to hopefully reduce the fragmentation.
+ */
+static struct i915_global_active {
+	struct kmem_cache *slab_cache;
+} global;
+
 struct active_node {
 	struct i915_gem_active base;
 	struct i915_active *ref;
@@ -23,7 +34,7 @@ __active_park(struct i915_active *ref)
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
+		kmem_cache_free(global.slab_cache, it);
 	}
 	ref->tree = RB_ROOT;
 }
@@ -96,11 +107,11 @@ active_instance(struct i915_active *ref, u64 idx)
 			p = &parent->rb_left;
 	}
 
-	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
 	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
-		kfree(node);
+		kmem_cache_free(global.slab_cache, node);
 		goto out;
 	}
 
@@ -239,3 +250,17 @@ void i915_active_fini(struct i915_active *ref)
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
 #endif
+
+int __init i915_global_active_init(void)
+{
+	global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
+	if (!global.slab_cache)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void __exit i915_global_active_exit(void)
+{
+	kmem_cache_destroy(global.slab_cache);
+}
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index ec4b66efd9a7..179b47aeec33 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -70,4 +70,7 @@ void i915_active_fini(struct i915_active *ref);
 static inline void i915_active_fini(struct i915_active *ref) { }
 #endif
 
+int i915_global_active_init(void);
+void i915_global_active_exit(void);
+
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 55c9058e91a8..c6533c57ea2f 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -26,6 +26,7 @@
 #include <linux/vgaarb.h>
 #include <linux/vga_switcheroo.h>
 
+#include "i915_active.h"
 #include "i915_drv.h"
 #include "i915_selftest.h"
 
@@ -798,6 +799,8 @@ static int __init i915_init(void)
 	bool use_kms = true;
 	int err;
 
+	i915_global_active_init();
+
 	err = i915_mock_selftests();
 	if (err)
 		return err > 0 ? 0 : err;
@@ -829,6 +832,7 @@ static void __exit i915_exit(void)
 		return;
 
 	pci_unregister_driver(&i915_pci_driver);
+	i915_global_active_exit();
 }
 
 module_init(i915_init);
-- 
cgit v1.2.3


From 21950ee7cc8f13c5350bda0cae22cdb7ac7e3058 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 5 Feb 2019 13:00:05 +0000
Subject: drm/i915: Pull i915_gem_active into the i915_active family

Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_active.c             |  62 ++--
 drivers/gpu/drm/i915/i915_active.h             | 349 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_active_types.h       |  16 +-
 drivers/gpu/drm/i915/i915_debugfs.c            |   2 +-
 drivers/gpu/drm/i915/i915_gem.c                |  10 +-
 drivers/gpu/drm/i915/i915_gem_context.c        |  17 +-
 drivers/gpu/drm/i915/i915_gem_context.h        |   2 +-
 drivers/gpu/drm/i915/i915_gem_fence_reg.c      |   4 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c            |   2 +-
 drivers/gpu/drm/i915/i915_gem_object.h         |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c          |  10 +-
 drivers/gpu/drm/i915/i915_request.c            |  35 +--
 drivers/gpu/drm/i915/i915_request.h            | 383 -------------------------
 drivers/gpu/drm/i915/i915_reset.c              |   2 +-
 drivers/gpu/drm/i915/i915_timeline.c           |  25 +-
 drivers/gpu/drm/i915/i915_timeline.h           |  14 +-
 drivers/gpu/drm/i915/i915_vma.c                |  12 +-
 drivers/gpu/drm/i915/i915_vma.h                |   2 +-
 drivers/gpu/drm/i915/intel_engine_cs.c         |   2 +-
 drivers/gpu/drm/i915/intel_overlay.c           |  33 +--
 drivers/gpu/drm/i915/selftests/mock_timeline.c |   4 +-
 21 files changed, 480 insertions(+), 508 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 64661c41532b..215b6ff8aa73 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -21,7 +21,7 @@ static struct i915_global_active {
 } global;
 
 struct active_node {
-	struct i915_gem_active base;
+	struct i915_active_request base;
 	struct i915_active *ref;
 	struct rb_node node;
 	u64 timeline;
@@ -33,7 +33,7 @@ __active_park(struct i915_active *ref)
 	struct active_node *it, *n;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		GEM_BUG_ON(i915_active_request_isset(&it->base));
 		kmem_cache_free(global.slab_cache, it);
 	}
 	ref->tree = RB_ROOT;
@@ -53,18 +53,18 @@ __active_retire(struct i915_active *ref)
 }
 
 static void
-node_retire(struct i915_gem_active *base, struct i915_request *rq)
+node_retire(struct i915_active_request *base, struct i915_request *rq)
 {
 	__active_retire(container_of(base, struct active_node, base)->ref);
 }
 
 static void
-last_retire(struct i915_gem_active *base, struct i915_request *rq)
+last_retire(struct i915_active_request *base, struct i915_request *rq)
 {
 	__active_retire(container_of(base, struct i915_active, last));
 }
 
-static struct i915_gem_active *
+static struct i915_active_request *
 active_instance(struct i915_active *ref, u64 idx)
 {
 	struct active_node *node;
@@ -85,7 +85,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	 * twice for the same timeline (as the older rbtree element will be
 	 * retired before the new request added to last).
 	 */
-	old = i915_gem_active_raw(&ref->last, BKL(ref));
+	old = i915_active_request_raw(&ref->last, BKL(ref));
 	if (!old || old->fence.context == idx)
 		goto out;
 
@@ -110,7 +110,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
-	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
+	if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) {
 		kmem_cache_free(global.slab_cache, node);
 		goto out;
 	}
@@ -118,7 +118,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	if (unlikely(!node))
 		return ERR_PTR(-ENOMEM);
 
-	init_request_active(&node->base, node_retire);
+	i915_active_request_init(&node->base, NULL, node_retire);
 	node->ref = ref;
 	node->timeline = idx;
 
@@ -133,7 +133,7 @@ replace:
 	 * callback not two, and so much undo the active counting for the
 	 * overwritten slot.
 	 */
-	if (i915_gem_active_isset(&node->base)) {
+	if (i915_active_request_isset(&node->base)) {
 		/* Retire ourselves from the old rq->active_list */
 		__list_del_entry(&node->base.link);
 		ref->count--;
@@ -154,7 +154,7 @@ void i915_active_init(struct drm_i915_private *i915,
 	ref->i915 = i915;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
-	init_request_active(&ref->last, last_retire);
+	i915_active_request_init(&ref->last, NULL, last_retire);
 	ref->count = 0;
 }
 
@@ -162,15 +162,15 @@ int i915_active_ref(struct i915_active *ref,
 		    u64 timeline,
 		    struct i915_request *rq)
 {
-	struct i915_gem_active *active;
+	struct i915_active_request *active;
 
 	active = active_instance(ref, timeline);
 	if (IS_ERR(active))
 		return PTR_ERR(active);
 
-	if (!i915_gem_active_isset(active))
+	if (!i915_active_request_isset(active))
 		ref->count++;
-	i915_gem_active_set(active, rq);
+	__i915_active_request_set(active, rq);
 
 	GEM_BUG_ON(!ref->count);
 	return 0;
@@ -196,12 +196,12 @@ int i915_active_wait(struct i915_active *ref)
 	if (i915_active_acquire(ref))
 		goto out_release;
 
-	ret = i915_gem_active_retire(&ref->last, BKL(ref));
+	ret = i915_active_request_retire(&ref->last, BKL(ref));
 	if (ret)
 		goto out_release;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = i915_gem_active_retire(&it->base, BKL(ref));
+		ret = i915_active_request_retire(&it->base, BKL(ref));
 		if (ret)
 			break;
 	}
@@ -211,11 +211,11 @@ out_release:
 	return ret;
 }
 
-static int __i915_request_await_active(struct i915_request *rq,
-				       struct i915_gem_active *active)
+int i915_request_await_active_request(struct i915_request *rq,
+				      struct i915_active_request *active)
 {
 	struct i915_request *barrier =
-		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+		i915_active_request_raw(active, &rq->i915->drm.struct_mutex);
 
 	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
 }
@@ -225,12 +225,12 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 	struct active_node *it, *n;
 	int ret;
 
-	ret = __i915_request_await_active(rq, &ref->last);
+	ret = i915_request_await_active_request(rq, &ref->last);
 	if (ret)
 		return ret;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = __i915_request_await_active(rq, &it->base);
+		ret = i915_request_await_active_request(rq, &it->base);
 		if (ret)
 			return ret;
 	}
@@ -241,12 +241,32 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
-	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
+	GEM_BUG_ON(i915_active_request_isset(&ref->last));
 	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
 	GEM_BUG_ON(ref->count);
 }
 #endif
 
+int i915_active_request_set(struct i915_active_request *active,
+			    struct i915_request *rq)
+{
+	int err;
+
+	/* Must maintain ordering wrt previous active requests */
+	err = i915_request_await_active_request(rq, active);
+	if (err)
+		return err;
+
+	__i915_active_request_set(active, rq);
+	return 0;
+}
+
+void i915_active_retire_noop(struct i915_active_request *active,
+			     struct i915_request *request)
+{
+	/* Space left intentionally blank */
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
 #endif
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 179b47aeec33..12b5c1d287d1 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -7,7 +7,354 @@
 #ifndef _I915_ACTIVE_H_
 #define _I915_ACTIVE_H_
 
+#include <linux/lockdep.h>
+
 #include "i915_active_types.h"
+#include "i915_request.h"
+
+/*
+ * We treat requests as fences. This is not be to confused with our
+ * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
+ * We use the fences to synchronize access from the CPU with activity on the
+ * GPU, for example, we should not rewrite an object's PTE whilst the GPU
+ * is reading them. We also track fences at a higher level to provide
+ * implicit synchronisation around GEM objects, e.g. set-domain will wait
+ * for outstanding GPU rendering before marking the object ready for CPU
+ * access, or a pageflip will wait until the GPU is complete before showing
+ * the frame on the scanout.
+ *
+ * In order to use a fence, the object must track the fence it needs to
+ * serialise with. For example, GEM objects want to track both read and
+ * write access so that we can perform concurrent read operations between
+ * the CPU and GPU engines, as well as waiting for all rendering to
+ * complete, or waiting for the last GPU user of a "fence register". The
+ * object then embeds a #i915_active_request to track the most recent (in
+ * retirement order) request relevant for the desired mode of access.
+ * The #i915_active_request is updated with i915_active_request_set() to
+ * track the most recent fence request, typically this is done as part of
+ * i915_vma_move_to_active().
+ *
+ * When the #i915_active_request completes (is retired), it will
+ * signal its completion to the owner through a callback as well as mark
+ * itself as idle (i915_active_request.request == NULL). The owner
+ * can then perform any action, such as delayed freeing of an active
+ * resource including itself.
+ */
+
+void i915_active_retire_noop(struct i915_active_request *active,
+			     struct i915_request *request);
+
+/**
+ * i915_active_request_init - prepares the activity tracker for use
+ * @active - the active tracker
+ * @rq - initial request to track, can be NULL
+ * @func - a callback when then the tracker is retired (becomes idle),
+ *         can be NULL
+ *
+ * i915_active_request_init() prepares the embedded @active struct for use as
+ * an activity tracker, that is for tracking the last known active request
+ * associated with it. When the last request becomes idle, when it is retired
+ * after completion, the optional callback @func is invoked.
+ */
+static inline void
+i915_active_request_init(struct i915_active_request *active,
+			 struct i915_request *rq,
+			 i915_active_retire_fn retire)
+{
+	RCU_INIT_POINTER(active->request, rq);
+	INIT_LIST_HEAD(&active->link);
+	active->retire = retire ?: i915_active_retire_noop;
+}
+
+#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL)
+
+/**
+ * i915_active_request_set - updates the tracker to watch the current request
+ * @active - the active tracker
+ * @request - the request to watch
+ *
+ * __i915_active_request_set() watches the given @request for completion. Whilst
+ * that @request is busy, the @active reports busy. When that @request is
+ * retired, the @active tracker is updated to report idle.
+ */
+static inline void
+__i915_active_request_set(struct i915_active_request *active,
+			  struct i915_request *request)
+{
+	list_move(&active->link, &request->active_list);
+	rcu_assign_pointer(active->request, request);
+}
+
+int __must_check
+i915_active_request_set(struct i915_active_request *active,
+			struct i915_request *rq);
+
+/**
+ * i915_active_request_set_retire_fn - updates the retirement callback
+ * @active - the active tracker
+ * @fn - the routine called when the request is retired
+ * @mutex - struct_mutex used to guard retirements
+ *
+ * i915_active_request_set_retire_fn() updates the function pointer that
+ * is called when the final request associated with the @active tracker
+ * is retired.
+ */
+static inline void
+i915_active_request_set_retire_fn(struct i915_active_request *active,
+				  i915_active_retire_fn fn,
+				  struct mutex *mutex)
+{
+	lockdep_assert_held(mutex);
+	active->retire = fn ?: i915_active_retire_noop;
+}
+
+static inline struct i915_request *
+__i915_active_request_peek(const struct i915_active_request *active)
+{
+	/*
+	 * Inside the error capture (running with the driver in an unknown
+	 * state), we want to bend the rules slightly (a lot).
+	 *
+	 * Work is in progress to make it safer, in the meantime this keeps
+	 * the known issue from spamming the logs.
+	 */
+	return rcu_dereference_protected(active->request, 1);
+}
+
+/**
+ * i915_active_request_raw - return the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_raw() returns the current request being tracked, or NULL.
+ * It does not obtain a reference on the request for the caller, so the caller
+ * must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_raw(const struct i915_active_request *active,
+			struct mutex *mutex)
+{
+	return rcu_dereference_protected(active->request,
+					 lockdep_is_held(mutex));
+}
+
+/**
+ * i915_active_request_peek - report the active request being monitored
+ * @active - the active tracker
+ *
+ * i915_active_request_peek() returns the current request being tracked if
+ * still active, or NULL. It does not obtain a reference on the request
+ * for the caller, so the caller must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_peek(const struct i915_active_request *active,
+			 struct mutex *mutex)
+{
+	struct i915_request *request;
+
+	request = i915_active_request_raw(active, mutex);
+	if (!request || i915_request_completed(request))
+		return NULL;
+
+	return request;
+}
+
+/**
+ * i915_active_request_get - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_get() returns a reference to the active request, or NULL
+ * if the active tracker is idle. The caller must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_get(const struct i915_active_request *active,
+			struct mutex *mutex)
+{
+	return i915_request_get(i915_active_request_peek(active, mutex));
+}
+
+/**
+ * __i915_active_request_get_rcu - return a reference to the active request
+ * @active - the active tracker
+ *
+ * __i915_active_request_get() returns a reference to the active request,
+ * or NULL if the active tracker is idle. The caller must hold the RCU read
+ * lock, but the returned pointer is safe to use outside of RCU.
+ */
+static inline struct i915_request *
+__i915_active_request_get_rcu(const struct i915_active_request *active)
+{
+	/*
+	 * Performing a lockless retrieval of the active request is super
+	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
+	 * slab of request objects will not be freed whilst we hold the
+	 * RCU read lock. It does not guarantee that the request itself
+	 * will not be freed and then *reused*. Viz,
+	 *
+	 * Thread A			Thread B
+	 *
+	 * rq = active.request
+	 *				retire(rq) -> free(rq);
+	 *				(rq is now first on the slab freelist)
+	 *				active.request = NULL
+	 *
+	 *				rq = new submission on a new object
+	 * ref(rq)
+	 *
+	 * To prevent the request from being reused whilst the caller
+	 * uses it, we take a reference like normal. Whilst acquiring
+	 * the reference we check that it is not in a destroyed state
+	 * (refcnt == 0). That prevents the request being reallocated
+	 * whilst the caller holds on to it. To check that the request
+	 * was not reallocated as we acquired the reference we have to
+	 * check that our request remains the active request across
+	 * the lookup, in the same manner as a seqlock. The visibility
+	 * of the pointer versus the reference counting is controlled
+	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
+	 *
+	 * In the middle of all that, we inspect whether the request is
+	 * complete. Retiring is lazy so the request may be completed long
+	 * before the active tracker is updated. Querying whether the
+	 * request is complete is far cheaper (as it involves no locked
+	 * instructions setting cachelines to exclusive) than acquiring
+	 * the reference, so we do it first. The RCU read lock ensures the
+	 * pointer dereference is valid, but does not ensure that the
+	 * seqno nor HWS is the right one! However, if the request was
+	 * reallocated, that means the active tracker's request was complete.
+	 * If the new request is also complete, then both are and we can
+	 * just report the active tracker is idle. If the new request is
+	 * incomplete, then we acquire a reference on it and check that
+	 * it remained the active request.
+	 *
+	 * It is then imperative that we do not zero the request on
+	 * reallocation, so that we can chase the dangling pointers!
+	 * See i915_request_alloc().
+	 */
+	do {
+		struct i915_request *request;
+
+		request = rcu_dereference(active->request);
+		if (!request || i915_request_completed(request))
+			return NULL;
+
+		/*
+		 * An especially silly compiler could decide to recompute the
+		 * result of i915_request_completed, more specifically
+		 * re-emit the load for request->fence.seqno. A race would catch
+		 * a later seqno value, which could flip the result from true to
+		 * false. Which means part of the instructions below might not
+		 * be executed, while later on instructions are executed. Due to
+		 * barriers within the refcounting the inconsistency can't reach
+		 * past the call to i915_request_get_rcu, but not executing
+		 * that while still executing i915_request_put() creates
+		 * havoc enough.  Prevent this with a compiler barrier.
+		 */
+		barrier();
+
+		request = i915_request_get_rcu(request);
+
+		/*
+		 * What stops the following rcu_access_pointer() from occurring
+		 * before the above i915_request_get_rcu()? If we were
+		 * to read the value before pausing to get the reference to
+		 * the request, we may not notice a change in the active
+		 * tracker.
+		 *
+		 * The rcu_access_pointer() is a mere compiler barrier, which
+		 * means both the CPU and compiler are free to perform the
+		 * memory read without constraint. The compiler only has to
+		 * ensure that any operations after the rcu_access_pointer()
+		 * occur afterwards in program order. This means the read may
+		 * be performed earlier by an out-of-order CPU, or adventurous
+		 * compiler.
+		 *
+		 * The atomic operation at the heart of
+		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
+		 * atomic_inc_not_zero() which is only a full memory barrier
+		 * when successful. That is, if i915_request_get_rcu()
+		 * returns the request (and so with the reference counted
+		 * incremented) then the following read for rcu_access_pointer()
+		 * must occur after the atomic operation and so confirm
+		 * that this request is the one currently being tracked.
+		 *
+		 * The corresponding write barrier is part of
+		 * rcu_assign_pointer().
+		 */
+		if (!request || request == rcu_access_pointer(active->request))
+			return rcu_pointer_handoff(request);
+
+		i915_request_put(request);
+	} while (1);
+}
+
+/**
+ * i915_active_request_get_unlocked - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_get_unlocked() returns a reference to the active request,
+ * or NULL if the active tracker is idle. The reference is obtained under RCU,
+ * so no locking is required by the caller.
+ *
+ * The reference should be freed with i915_request_put().
+ */
+static inline struct i915_request *
+i915_active_request_get_unlocked(const struct i915_active_request *active)
+{
+	struct i915_request *request;
+
+	rcu_read_lock();
+	request = __i915_active_request_get_rcu(active);
+	rcu_read_unlock();
+
+	return request;
+}
+
+/**
+ * i915_active_request_isset - report whether the active tracker is assigned
+ * @active - the active tracker
+ *
+ * i915_active_request_isset() returns true if the active tracker is currently
+ * assigned to a request. Due to the lazy retiring, that request may be idle
+ * and this may report stale information.
+ */
+static inline bool
+i915_active_request_isset(const struct i915_active_request *active)
+{
+	return rcu_access_pointer(active->request);
+}
+
+/**
+ * i915_active_request_retire - waits until the request is retired
+ * @active - the active request on which to wait
+ *
+ * i915_active_request_retire() waits until the request is completed,
+ * and then ensures that at least the retirement handler for this
+ * @active tracker is called before returning. If the @active
+ * tracker is idle, the function returns immediately.
+ */
+static inline int __must_check
+i915_active_request_retire(struct i915_active_request *active,
+			   struct mutex *mutex)
+{
+	struct i915_request *request;
+	long ret;
+
+	request = i915_active_request_raw(active, mutex);
+	if (!request)
+		return 0;
+
+	ret = i915_request_wait(request,
+				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+
+	list_del_init(&active->link);
+	RCU_INIT_POINTER(active->request, NULL);
+
+	active->retire(active, request);
+
+	return 0;
+}
 
 /*
  * GPU activity tracking
@@ -47,6 +394,8 @@ int i915_active_wait(struct i915_active *ref);
 
 int i915_request_await_active(struct i915_request *rq,
 			      struct i915_active *ref);
+int i915_request_await_active_request(struct i915_request *rq,
+				      struct i915_active_request *active);
 
 bool i915_active_acquire(struct i915_active *ref);
 
diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h
index 411e502ed8dd..b679253b53a5 100644
--- a/drivers/gpu/drm/i915/i915_active_types.h
+++ b/drivers/gpu/drm/i915/i915_active_types.h
@@ -8,16 +8,26 @@
 #define _I915_ACTIVE_TYPES_H_
 
 #include <linux/rbtree.h>
-
-#include "i915_request.h"
+#include <linux/rcupdate.h>
 
 struct drm_i915_private;
+struct i915_active_request;
+struct i915_request;
+
+typedef void (*i915_active_retire_fn)(struct i915_active_request *,
+				      struct i915_request *);
+
+struct i915_active_request {
+	struct i915_request __rcu *request;
+	struct list_head link;
+	i915_active_retire_fn retire;
+};
 
 struct i915_active {
 	struct drm_i915_private *i915;
 
 	struct rb_root tree;
-	struct i915_gem_active last;
+	struct i915_active_request last;
 	unsigned int count;
 
 	void (*retire)(struct i915_active *ref);
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index bf3073e63af8..c48733a15e63 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -206,7 +206,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		if (vma->fence)
 			seq_printf(m, " , fence: %d%s",
 				   vma->fence->id,
-				   i915_gem_active_isset(&vma->last_fence) ? "*" : "");
+				   i915_active_request_isset(&vma->last_fence) ? "*" : "");
 		seq_puts(m, ")");
 	}
 	if (obj->stolen)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d92e7ab0005e..52b5a24be42b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3018,7 +3018,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 
 	GEM_BUG_ON(i915->gt.active_requests);
 	for_each_engine(engine, i915, id) {
-		GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request));
+		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
 		GEM_BUG_ON(engine->last_retired_context !=
 			   to_intel_context(i915->kernel_context, engine));
 	}
@@ -3264,7 +3264,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 	list_for_each_entry(tl, &gt->active_list, link) {
 		struct i915_request *rq;
 
-		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
@@ -4165,7 +4165,8 @@ out:
 }
 
 static void
-frontbuffer_retire(struct i915_gem_active *active, struct i915_request *request)
+frontbuffer_retire(struct i915_active_request *active,
+		   struct i915_request *request)
 {
 	struct drm_i915_gem_object *obj =
 		container_of(active, typeof(*obj), frontbuffer_write);
@@ -4192,7 +4193,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	obj->resv = &obj->__builtin_resv;
 
 	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
-	init_request_active(&obj->frontbuffer_write, frontbuffer_retire);
+	i915_active_request_init(&obj->frontbuffer_write,
+				 NULL, frontbuffer_retire);
 
 	obj->mm.madv = I915_MADV_WILLNEED;
 	INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 93ab287f44b6..280813a4bf82 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -322,7 +322,7 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
-static void intel_context_retire(struct i915_gem_active *active,
+static void intel_context_retire(struct i915_active_request *active,
 				 struct i915_request *rq)
 {
 	struct intel_context *ce =
@@ -344,7 +344,8 @@ intel_context_init(struct intel_context *ce,
 	/* Use the whole device by default */
 	ce->sseu = intel_device_default_sseu(ctx->i915);
 
-	init_request_active(&ce->active_tracker, intel_context_retire);
+	i915_active_request_init(&ce->active_tracker,
+				 NULL, intel_context_retire);
 }
 
 static struct i915_gem_context *
@@ -668,8 +669,8 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	GEM_BUG_ON(timeline == &engine->timeline);
 
-	rq = i915_gem_active_raw(&timeline->last_request,
-				 &engine->i915->drm.struct_mutex);
+	rq = i915_active_request_raw(&timeline->last_request,
+				     &engine->i915->drm.struct_mutex);
 	if (rq && rq->engine == engine) {
 		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
 			  timeline->name, engine->name,
@@ -1015,8 +1016,8 @@ gen8_modify_rpcs_gpu(struct intel_context *ce,
 	}
 
 	/* Queue this switch after all other activity by this context. */
-	prev = i915_gem_active_raw(&ce->ring->timeline->last_request,
-				   &i915->drm.struct_mutex);
+	prev = i915_active_request_raw(&ce->ring->timeline->last_request,
+				       &i915->drm.struct_mutex);
 	if (prev && !i915_request_completed(prev)) {
 		ret = i915_request_await_dma_fence(rq, &prev->fence);
 		if (ret < 0)
@@ -1039,9 +1040,9 @@ gen8_modify_rpcs_gpu(struct intel_context *ce,
 	 * But we only need to take one pin on the account of it. Or in other
 	 * words transfer the pinned ce object to tracked active request.
 	 */
-	if (!i915_gem_active_isset(&ce->active_tracker))
+	if (!i915_active_request_isset(&ce->active_tracker))
 		__intel_context_pin(ce);
-	i915_gem_active_set(&ce->active_tracker, rq);
+	__i915_active_request_set(&ce->active_tracker, rq);
 
 out_add:
 	i915_request_add(rq);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 92ad5272e57f..ca150a764c24 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -187,7 +187,7 @@ struct i915_gem_context {
 		 * active_tracker: Active tracker for the external rq activity
 		 * on this intel_context object.
 		 */
-		struct i915_gem_active active_tracker;
+		struct i915_active_request active_tracker;
 
 		const struct intel_context_ops *ops;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index 46e259661294..e037e94792f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -223,7 +223,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 			 i915_gem_object_get_tiling(vma->obj)))
 			return -EINVAL;
 
-		ret = i915_gem_active_retire(&vma->last_fence,
+		ret = i915_active_request_retire(&vma->last_fence,
 					     &vma->obj->base.dev->struct_mutex);
 		if (ret)
 			return ret;
@@ -232,7 +232,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 	if (fence->vma) {
 		struct i915_vma *old = fence->vma;
 
-		ret = i915_gem_active_retire(&old->last_fence,
+		ret = i915_active_request_retire(&old->last_fence,
 					     &old->obj->base.dev->struct_mutex);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e625659c03a2..d646d37eec2f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1918,7 +1918,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 		return ERR_PTR(-ENOMEM);
 
 	i915_active_init(i915, &vma->active, NULL);
-	init_request_active(&vma->last_fence, NULL);
+	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h
index 73fec917d097..fab040331cdb 100644
--- a/drivers/gpu/drm/i915/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/i915_gem_object.h
@@ -175,7 +175,7 @@ struct drm_i915_gem_object {
 
 	atomic_t frontbuffer_bits;
 	unsigned int frontbuffer_ggtt_origin; /* write once */
-	struct i915_gem_active frontbuffer_write;
+	struct i915_active_request frontbuffer_write;
 
 	/** Current tiling stride for the object, if it's tiled. */
 	unsigned int tiling_and_stride;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6e2e5ed2bd0a..9a65341fec09 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1062,23 +1062,23 @@ i915_error_object_create(struct drm_i915_private *i915,
 }
 
 /* The error capture is special as tries to run underneath the normal
- * locking rules - so we use the raw version of the i915_gem_active lookup.
+ * locking rules - so we use the raw version of the i915_active_request lookup.
  */
 static inline u32
-__active_get_seqno(struct i915_gem_active *active)
+__active_get_seqno(struct i915_active_request *active)
 {
 	struct i915_request *request;
 
-	request = __i915_gem_active_peek(active);
+	request = __i915_active_request_peek(active);
 	return request ? request->global_seqno : 0;
 }
 
 static inline int
-__active_get_engine_id(struct i915_gem_active *active)
+__active_get_engine_id(struct i915_active_request *active)
 {
 	struct i915_request *request;
 
-	request = __i915_gem_active_peek(active);
+	request = __i915_active_request_peek(active);
 	return request ? request->engine->id : -1;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 6512630b59b8..c2a5c48c7541 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -29,6 +29,7 @@
 #include <linux/sched/signal.h>
 
 #include "i915_drv.h"
+#include "i915_active.h"
 #include "i915_reset.h"
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
@@ -125,12 +126,6 @@ static void unreserve_gt(struct drm_i915_private *i915)
 		i915_gem_park(i915);
 }
 
-void i915_gem_retire_noop(struct i915_gem_active *active,
-			  struct i915_request *request)
-{
-	/* Space left intentionally blank */
-}
-
 static void advance_ring(struct i915_request *request)
 {
 	struct intel_ring *ring = request->ring;
@@ -244,7 +239,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine,
 
 static void i915_request_retire(struct i915_request *request)
 {
-	struct i915_gem_active *active, *next;
+	struct i915_active_request *active, *next;
 
 	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  request->engine->name,
@@ -278,10 +273,10 @@ static void i915_request_retire(struct i915_request *request)
 		 * we may spend an inordinate amount of time simply handling
 		 * the retirement of requests and processing their callbacks.
 		 * Of which, this loop itself is particularly hot due to the
-		 * cache misses when jumping around the list of i915_gem_active.
-		 * So we try to keep this loop as streamlined as possible and
-		 * also prefetch the next i915_gem_active to try and hide
-		 * the likely cache miss.
+		 * cache misses when jumping around the list of
+		 * i915_active_request.  So we try to keep this loop as
+		 * streamlined as possible and also prefetch the next
+		 * i915_active_request to try and hide the likely cache miss.
 		 */
 		prefetchw(next);
 
@@ -526,17 +521,9 @@ out:
 	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
 }
 
-static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
-{
-	struct i915_request *barrier =
-		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
-
-	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
-}
-
 static int add_timeline_barrier(struct i915_request *rq)
 {
-	return add_barrier(rq, &rq->timeline->barrier);
+	return i915_request_await_active_request(rq, &rq->timeline->barrier);
 }
 
 /**
@@ -595,7 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * We use RCU to look up requests in flight. The lookups may
 	 * race with the request being allocated from the slab freelist.
 	 * That is the request we are writing to here, may be in the process
-	 * of being read by __i915_gem_active_get_rcu(). As such,
+	 * of being read by __i915_active_request_get_rcu(). As such,
 	 * we have to be very careful when overwriting the contents. During
 	 * the RCU lookup, we change chase the request->engine pointer,
 	 * read the request->global_seqno and increment the reference count.
@@ -937,8 +924,8 @@ void i915_request_add(struct i915_request *request)
 	 * see a more recent value in the hws than we are tracking.
 	 */
 
-	prev = i915_gem_active_raw(&timeline->last_request,
-				   &request->i915->drm.struct_mutex);
+	prev = i915_active_request_raw(&timeline->last_request,
+				       &request->i915->drm.struct_mutex);
 	if (prev && !i915_request_completed(prev)) {
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq);
@@ -954,7 +941,7 @@ void i915_request_add(struct i915_request *request)
 	spin_unlock_irq(&timeline->lock);
 
 	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
-	i915_gem_active_set(&timeline->last_request, request);
+	__i915_active_request_set(&timeline->last_request, request);
 
 	list_add_tail(&request->ring_link, &ring->request_list);
 	if (list_is_first(&request->ring_link, &ring->request_list)) {
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 3cffb96203b9..40f3e8dcbdd5 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -403,387 +403,4 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
 
 void i915_retire_requests(struct drm_i915_private *i915);
 
-/*
- * We treat requests as fences. This is not be to confused with our
- * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
- * We use the fences to synchronize access from the CPU with activity on the
- * GPU, for example, we should not rewrite an object's PTE whilst the GPU
- * is reading them. We also track fences at a higher level to provide
- * implicit synchronisation around GEM objects, e.g. set-domain will wait
- * for outstanding GPU rendering before marking the object ready for CPU
- * access, or a pageflip will wait until the GPU is complete before showing
- * the frame on the scanout.
- *
- * In order to use a fence, the object must track the fence it needs to
- * serialise with. For example, GEM objects want to track both read and
- * write access so that we can perform concurrent read operations between
- * the CPU and GPU engines, as well as waiting for all rendering to
- * complete, or waiting for the last GPU user of a "fence register". The
- * object then embeds a #i915_gem_active to track the most recent (in
- * retirement order) request relevant for the desired mode of access.
- * The #i915_gem_active is updated with i915_gem_active_set() to track the
- * most recent fence request, typically this is done as part of
- * i915_vma_move_to_active().
- *
- * When the #i915_gem_active completes (is retired), it will
- * signal its completion to the owner through a callback as well as mark
- * itself as idle (i915_gem_active.request == NULL). The owner
- * can then perform any action, such as delayed freeing of an active
- * resource including itself.
- */
-struct i915_gem_active;
-
-typedef void (*i915_gem_retire_fn)(struct i915_gem_active *,
-				   struct i915_request *);
-
-struct i915_gem_active {
-	struct i915_request __rcu *request;
-	struct list_head link;
-	i915_gem_retire_fn retire;
-};
-
-void i915_gem_retire_noop(struct i915_gem_active *,
-			  struct i915_request *request);
-
-/**
- * init_request_active - prepares the activity tracker for use
- * @active - the active tracker
- * @func - a callback when then the tracker is retired (becomes idle),
- *         can be NULL
- *
- * init_request_active() prepares the embedded @active struct for use as
- * an activity tracker, that is for tracking the last known active request
- * associated with it. When the last request becomes idle, when it is retired
- * after completion, the optional callback @func is invoked.
- */
-static inline void
-init_request_active(struct i915_gem_active *active,
-		    i915_gem_retire_fn retire)
-{
-	RCU_INIT_POINTER(active->request, NULL);
-	INIT_LIST_HEAD(&active->link);
-	active->retire = retire ?: i915_gem_retire_noop;
-}
-
-/**
- * i915_gem_active_set - updates the tracker to watch the current request
- * @active - the active tracker
- * @request - the request to watch
- *
- * i915_gem_active_set() watches the given @request for completion. Whilst
- * that @request is busy, the @active reports busy. When that @request is
- * retired, the @active tracker is updated to report idle.
- */
-static inline void
-i915_gem_active_set(struct i915_gem_active *active,
-		    struct i915_request *request)
-{
-	list_move(&active->link, &request->active_list);
-	rcu_assign_pointer(active->request, request);
-}
-
-/**
- * i915_gem_active_set_retire_fn - updates the retirement callback
- * @active - the active tracker
- * @fn - the routine called when the request is retired
- * @mutex - struct_mutex used to guard retirements
- *
- * i915_gem_active_set_retire_fn() updates the function pointer that
- * is called when the final request associated with the @active tracker
- * is retired.
- */
-static inline void
-i915_gem_active_set_retire_fn(struct i915_gem_active *active,
-			      i915_gem_retire_fn fn,
-			      struct mutex *mutex)
-{
-	lockdep_assert_held(mutex);
-	active->retire = fn ?: i915_gem_retire_noop;
-}
-
-static inline struct i915_request *
-__i915_gem_active_peek(const struct i915_gem_active *active)
-{
-	/*
-	 * Inside the error capture (running with the driver in an unknown
-	 * state), we want to bend the rules slightly (a lot).
-	 *
-	 * Work is in progress to make it safer, in the meantime this keeps
-	 * the known issue from spamming the logs.
-	 */
-	return rcu_dereference_protected(active->request, 1);
-}
-
-/**
- * i915_gem_active_raw - return the active request
- * @active - the active tracker
- *
- * i915_gem_active_raw() returns the current request being tracked, or NULL.
- * It does not obtain a reference on the request for the caller, so the caller
- * must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_raw(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	return rcu_dereference_protected(active->request,
-					 lockdep_is_held(mutex));
-}
-
-/**
- * i915_gem_active_peek - report the active request being monitored
- * @active - the active tracker
- *
- * i915_gem_active_peek() returns the current request being tracked if
- * still active, or NULL. It does not obtain a reference on the request
- * for the caller, so the caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	struct i915_request *request;
-
-	request = i915_gem_active_raw(active, mutex);
-	if (!request || i915_request_completed(request))
-		return NULL;
-
-	return request;
-}
-
-/**
- * i915_gem_active_get - return a reference to the active request
- * @active - the active tracker
- *
- * i915_gem_active_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	return i915_request_get(i915_gem_active_peek(active, mutex));
-}
-
-/**
- * __i915_gem_active_get_rcu - return a reference to the active request
- * @active - the active tracker
- *
- * __i915_gem_active_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold the RCU read lock, but
- * the returned pointer is safe to use outside of RCU.
- */
-static inline struct i915_request *
-__i915_gem_active_get_rcu(const struct i915_gem_active *active)
-{
-	/*
-	 * Performing a lockless retrieval of the active request is super
-	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
-	 * slab of request objects will not be freed whilst we hold the
-	 * RCU read lock. It does not guarantee that the request itself
-	 * will not be freed and then *reused*. Viz,
-	 *
-	 * Thread A			Thread B
-	 *
-	 * rq = active.request
-	 *				retire(rq) -> free(rq);
-	 *				(rq is now first on the slab freelist)
-	 *				active.request = NULL
-	 *
-	 *				rq = new submission on a new object
-	 * ref(rq)
-	 *
-	 * To prevent the request from being reused whilst the caller
-	 * uses it, we take a reference like normal. Whilst acquiring
-	 * the reference we check that it is not in a destroyed state
-	 * (refcnt == 0). That prevents the request being reallocated
-	 * whilst the caller holds on to it. To check that the request
-	 * was not reallocated as we acquired the reference we have to
-	 * check that our request remains the active request across
-	 * the lookup, in the same manner as a seqlock. The visibility
-	 * of the pointer versus the reference counting is controlled
-	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
-	 *
-	 * In the middle of all that, we inspect whether the request is
-	 * complete. Retiring is lazy so the request may be completed long
-	 * before the active tracker is updated. Querying whether the
-	 * request is complete is far cheaper (as it involves no locked
-	 * instructions setting cachelines to exclusive) than acquiring
-	 * the reference, so we do it first. The RCU read lock ensures the
-	 * pointer dereference is valid, but does not ensure that the
-	 * seqno nor HWS is the right one! However, if the request was
-	 * reallocated, that means the active tracker's request was complete.
-	 * If the new request is also complete, then both are and we can
-	 * just report the active tracker is idle. If the new request is
-	 * incomplete, then we acquire a reference on it and check that
-	 * it remained the active request.
-	 *
-	 * It is then imperative that we do not zero the request on
-	 * reallocation, so that we can chase the dangling pointers!
-	 * See i915_request_alloc().
-	 */
-	do {
-		struct i915_request *request;
-
-		request = rcu_dereference(active->request);
-		if (!request || i915_request_completed(request))
-			return NULL;
-
-		/*
-		 * An especially silly compiler could decide to recompute the
-		 * result of i915_request_completed, more specifically
-		 * re-emit the load for request->fence.seqno. A race would catch
-		 * a later seqno value, which could flip the result from true to
-		 * false. Which means part of the instructions below might not
-		 * be executed, while later on instructions are executed. Due to
-		 * barriers within the refcounting the inconsistency can't reach
-		 * past the call to i915_request_get_rcu, but not executing
-		 * that while still executing i915_request_put() creates
-		 * havoc enough.  Prevent this with a compiler barrier.
-		 */
-		barrier();
-
-		request = i915_request_get_rcu(request);
-
-		/*
-		 * What stops the following rcu_access_pointer() from occurring
-		 * before the above i915_request_get_rcu()? If we were
-		 * to read the value before pausing to get the reference to
-		 * the request, we may not notice a change in the active
-		 * tracker.
-		 *
-		 * The rcu_access_pointer() is a mere compiler barrier, which
-		 * means both the CPU and compiler are free to perform the
-		 * memory read without constraint. The compiler only has to
-		 * ensure that any operations after the rcu_access_pointer()
-		 * occur afterwards in program order. This means the read may
-		 * be performed earlier by an out-of-order CPU, or adventurous
-		 * compiler.
-		 *
-		 * The atomic operation at the heart of
-		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
-		 * atomic_inc_not_zero() which is only a full memory barrier
-		 * when successful. That is, if i915_request_get_rcu()
-		 * returns the request (and so with the reference counted
-		 * incremented) then the following read for rcu_access_pointer()
-		 * must occur after the atomic operation and so confirm
-		 * that this request is the one currently being tracked.
-		 *
-		 * The corresponding write barrier is part of
-		 * rcu_assign_pointer().
-		 */
-		if (!request || request == rcu_access_pointer(active->request))
-			return rcu_pointer_handoff(request);
-
-		i915_request_put(request);
-	} while (1);
-}
-
-/**
- * i915_gem_active_get_unlocked - return a reference to the active request
- * @active - the active tracker
- *
- * i915_gem_active_get_unlocked() returns a reference to the active request,
- * or NULL if the active tracker is idle. The reference is obtained under RCU,
- * so no locking is required by the caller.
- *
- * The reference should be freed with i915_request_put().
- */
-static inline struct i915_request *
-i915_gem_active_get_unlocked(const struct i915_gem_active *active)
-{
-	struct i915_request *request;
-
-	rcu_read_lock();
-	request = __i915_gem_active_get_rcu(active);
-	rcu_read_unlock();
-
-	return request;
-}
-
-/**
- * i915_gem_active_isset - report whether the active tracker is assigned
- * @active - the active tracker
- *
- * i915_gem_active_isset() returns true if the active tracker is currently
- * assigned to a request. Due to the lazy retiring, that request may be idle
- * and this may report stale information.
- */
-static inline bool
-i915_gem_active_isset(const struct i915_gem_active *active)
-{
-	return rcu_access_pointer(active->request);
-}
-
-/**
- * i915_gem_active_wait - waits until the request is completed
- * @active - the active request on which to wait
- * @flags - how to wait
- * @timeout - how long to wait at most
- * @rps - userspace client to charge for a waitboost
- *
- * i915_gem_active_wait() waits until the request is completed before
- * returning, without requiring any locks to be held. Note that it does not
- * retire any requests before returning.
- *
- * This function relies on RCU in order to acquire the reference to the active
- * request without holding any locks. See __i915_gem_active_get_rcu() for the
- * glory details on how that is managed. Once the reference is acquired, we
- * can then wait upon the request, and afterwards release our reference,
- * free of any locking.
- *
- * This function wraps i915_request_wait(), see it for the full details on
- * the arguments.
- *
- * Returns 0 if successful, or a negative error code.
- */
-static inline int
-i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags)
-{
-	struct i915_request *request;
-	long ret = 0;
-
-	request = i915_gem_active_get_unlocked(active);
-	if (request) {
-		ret = i915_request_wait(request, flags, MAX_SCHEDULE_TIMEOUT);
-		i915_request_put(request);
-	}
-
-	return ret < 0 ? ret : 0;
-}
-
-/**
- * i915_gem_active_retire - waits until the request is retired
- * @active - the active request on which to wait
- *
- * i915_gem_active_retire() waits until the request is completed,
- * and then ensures that at least the retirement handler for this
- * @active tracker is called before returning. If the @active
- * tracker is idle, the function returns immediately.
- */
-static inline int __must_check
-i915_gem_active_retire(struct i915_gem_active *active,
-		       struct mutex *mutex)
-{
-	struct i915_request *request;
-	long ret;
-
-	request = i915_gem_active_raw(active, mutex);
-	if (!request)
-		return 0;
-
-	ret = i915_request_wait(request,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				MAX_SCHEDULE_TIMEOUT);
-	if (ret < 0)
-		return ret;
-
-	list_del_init(&active->link);
-	RCU_INIT_POINTER(active->request, NULL);
-
-	active->retire(active, request);
-
-	return 0;
-}
-
-#define for_each_active(mask, idx) \
-	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
-
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4462007a681c..0e0ddf2e6815 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -862,7 +862,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 		struct i915_request *rq;
 		long timeout;
 
-		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index dcff3ae96683..b2202d2e58a2 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -163,8 +163,8 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	spin_lock_init(&timeline->lock);
 
-	init_request_active(&timeline->barrier, NULL);
-	init_request_active(&timeline->last_request, NULL);
+	INIT_ACTIVE_REQUEST(&timeline->barrier);
+	INIT_ACTIVE_REQUEST(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
@@ -236,7 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
-	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
+	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
@@ -311,25 +311,6 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 	__i915_vma_unpin(tl->hwsp_ggtt);
 }
 
-int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
-{
-	struct i915_request *old;
-	int err;
-
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
-
-	/* Must maintain ordering wrt existing barriers */
-	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
-	if (old) {
-		err = i915_request_await_dma_fence(rq, &old->fence);
-		if (err)
-			return err;
-	}
-
-	i915_gem_active_set(&tl->barrier, rq);
-	return 0;
-}
-
 void __i915_timeline_free(struct kref *kref)
 {
 	struct i915_timeline *timeline =
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index d167e04073c5..7bec7d2e45bf 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -28,6 +28,7 @@
 #include <linux/list.h>
 #include <linux/kref.h>
 
+#include "i915_active.h"
 #include "i915_request.h"
 #include "i915_syncmap.h"
 #include "i915_utils.h"
@@ -58,10 +59,10 @@ struct i915_timeline {
 
 	/* Contains an RCU guarded pointer to the last request. No reference is
 	 * held to the request, users must carefully acquire a reference to
-	 * the request using i915_gem_active_get_request_rcu(), or hold the
+	 * the request using i915_active_request_get_request_rcu(), or hold the
 	 * struct_mutex.
 	 */
-	struct i915_gem_active last_request;
+	struct i915_active_request last_request;
 
 	/**
 	 * We track the most recent seqno that we wait on in every context so
@@ -82,7 +83,7 @@ struct i915_timeline {
 	 * subsequent submissions to this timeline be executed only after the
 	 * barrier has been completed.
 	 */
-	struct i915_gem_active barrier;
+	struct i915_active_request barrier;
 
 	struct list_head link;
 	const char *name;
@@ -174,7 +175,10 @@ void i915_timelines_fini(struct drm_i915_private *i915);
  * submissions on @timeline. Subsequent requests will not be submitted to GPU
  * until the barrier has been completed.
  */
-int i915_timeline_set_barrier(struct i915_timeline *timeline,
-			      struct i915_request *rq);
+static inline int
+i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
+{
+	return i915_active_request_set(&tl->barrier, rq);
+}
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d4772061e642..b713bed20c38 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -120,7 +120,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		return ERR_PTR(-ENOMEM);
 
 	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
-	init_request_active(&vma->last_fence, NULL);
+	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = vm;
 	vma->ops = &vm->vma_ops;
@@ -808,7 +808,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
 
-	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
+	GEM_BUG_ON(i915_active_request_isset(&vma->last_fence));
 
 	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
@@ -942,14 +942,14 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 		obj->write_domain = I915_GEM_DOMAIN_RENDER;
 
 		if (intel_fb_obj_invalidate(obj, ORIGIN_CS))
-			i915_gem_active_set(&obj->frontbuffer_write, rq);
+			__i915_active_request_set(&obj->frontbuffer_write, rq);
 
 		obj->read_domains = 0;
 	}
 	obj->read_domains |= I915_GEM_GPU_DOMAINS;
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
-		i915_gem_active_set(&vma->last_fence, rq);
+		__i915_active_request_set(&vma->last_fence, rq);
 
 	export_fence(vma, rq, flags);
 	return 0;
@@ -986,8 +986,8 @@ int i915_vma_unbind(struct i915_vma *vma)
 		if (ret)
 			goto unpin;
 
-		ret = i915_gem_active_retire(&vma->last_fence,
-					     &vma->vm->i915->drm.struct_mutex);
+		ret = i915_active_request_retire(&vma->last_fence,
+					      &vma->vm->i915->drm.struct_mutex);
 unpin:
 		__i915_vma_unpin(vma);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 3c03d4569481..7c742027f866 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -110,7 +110,7 @@ struct i915_vma {
 #define I915_VMA_GGTT_WRITE	BIT(15)
 
 	struct i915_active active;
-	struct i915_gem_active last_fence;
+	struct i915_active_request last_fence;
 
 	/**
 	 * Support different GGTT views into the same object.
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 71c01eb13af1..49fa43ff02ba 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1086,7 +1086,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 	 * the last request that remains in the timeline. When idle, it is
 	 * the last executed context as tracked by retirement.
 	 */
-	rq = __i915_gem_active_peek(&engine->timeline.last_request);
+	rq = __i915_active_request_peek(&engine->timeline.last_request);
 	if (rq)
 		return rq->hw_context == kernel_context;
 	else
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index f68c7975006c..fc2e283d326b 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -184,7 +184,7 @@ struct intel_overlay {
 	struct overlay_registers __iomem *regs;
 	u32 flip_addr;
 	/* flip handling */
-	struct i915_gem_active last_flip;
+	struct i915_active_request last_flip;
 };
 
 static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
@@ -212,23 +212,23 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
 
 static void intel_overlay_submit_request(struct intel_overlay *overlay,
 					 struct i915_request *rq,
-					 i915_gem_retire_fn retire)
+					 i915_active_retire_fn retire)
 {
-	GEM_BUG_ON(i915_gem_active_peek(&overlay->last_flip,
-					&overlay->i915->drm.struct_mutex));
-	i915_gem_active_set_retire_fn(&overlay->last_flip, retire,
-				      &overlay->i915->drm.struct_mutex);
-	i915_gem_active_set(&overlay->last_flip, rq);
+	GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip,
+					    &overlay->i915->drm.struct_mutex));
+	i915_active_request_set_retire_fn(&overlay->last_flip, retire,
+					  &overlay->i915->drm.struct_mutex);
+	__i915_active_request_set(&overlay->last_flip, rq);
 	i915_request_add(rq);
 }
 
 static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
 					 struct i915_request *rq,
-					 i915_gem_retire_fn retire)
+					 i915_active_retire_fn retire)
 {
 	intel_overlay_submit_request(overlay, rq, retire);
-	return i915_gem_active_retire(&overlay->last_flip,
-				      &overlay->i915->drm.struct_mutex);
+	return i915_active_request_retire(&overlay->last_flip,
+					  &overlay->i915->drm.struct_mutex);
 }
 
 static struct i915_request *alloc_request(struct intel_overlay *overlay)
@@ -349,8 +349,9 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay)
 	i915_vma_put(vma);
 }
 
-static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
-					       struct i915_request *rq)
+static void
+intel_overlay_release_old_vid_tail(struct i915_active_request *active,
+				   struct i915_request *rq)
 {
 	struct intel_overlay *overlay =
 		container_of(active, typeof(*overlay), last_flip);
@@ -358,7 +359,7 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
 	intel_overlay_release_old_vma(overlay);
 }
 
-static void intel_overlay_off_tail(struct i915_gem_active *active,
+static void intel_overlay_off_tail(struct i915_active_request *active,
 				   struct i915_request *rq)
 {
 	struct intel_overlay *overlay =
@@ -421,8 +422,8 @@ static int intel_overlay_off(struct intel_overlay *overlay)
  * We have to be careful not to repeat work forever an make forward progess. */
 static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 {
-	return i915_gem_active_retire(&overlay->last_flip,
-				      &overlay->i915->drm.struct_mutex);
+	return i915_active_request_retire(&overlay->last_flip,
+					  &overlay->i915->drm.struct_mutex);
 }
 
 /* Wait for pending overlay flip and release old frame.
@@ -1355,7 +1356,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	overlay->contrast = 75;
 	overlay->saturation = 146;
 
-	init_request_active(&overlay->last_flip, NULL);
+	INIT_ACTIVE_REQUEST(&overlay->last_flip);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index e5659aaa856d..d2de9ece2118 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -15,8 +15,8 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	spin_lock_init(&timeline->lock);
 
-	init_request_active(&timeline->barrier, NULL);
-	init_request_active(&timeline->last_request, NULL);
+	INIT_ACTIVE_REQUEST(&timeline->barrier);
+	INIT_ACTIVE_REQUEST(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
-- 
cgit v1.2.3


From b2ae318acdcaf1c9562329c9cc1bd2f479157ad2 Mon Sep 17 00:00:00 2001
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Mon, 4 Feb 2019 14:25:38 -0800
Subject: drm/i915: Rename HAS_GMCH
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First of all GMCH can be considered a feature by itself
since it is a chip present in some platforms that connects
the IA processor to memory and other components in PC.

Also with the introduction of display block at device info,
we got a redundant definition:

.display.has_gmch_display = 1,

So, let's clean up things a bit and use the standardized
way of has_feature on displays side.

No functional change and no manual interaction to generate
this patch.

It is only:

sed -si -e 's/has_gmch_display/has_gmch/g' \
    	-e 's/HAS_GMCH_DISPLAY/HAS_GMCH/g' drivers/gpu/drm/i915/*{c,h}

Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204222538.15842-1-rodrigo.vivi@intel.com
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h            |  4 ++--
 drivers/gpu/drm/i915/i915_pci.c            | 10 +++++-----
 drivers/gpu/drm/i915/i915_suspend.c        |  4 ++--
 drivers/gpu/drm/i915/intel_color.c         |  6 +++---
 drivers/gpu/drm/i915/intel_device_info.h   |  2 +-
 drivers/gpu/drm/i915/intel_display.c       | 28 ++++++++++++++--------------
 drivers/gpu/drm/i915/intel_dp.c            | 12 ++++++------
 drivers/gpu/drm/i915/intel_fifo_underrun.c |  6 +++---
 drivers/gpu/drm/i915/intel_hdmi.c          |  6 +++---
 drivers/gpu/drm/i915/intel_hotplug.c       |  2 +-
 drivers/gpu/drm/i915/intel_i2c.c           |  2 +-
 drivers/gpu/drm/i915/vlv_dsi.c             |  4 ++--
 13 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c48733a15e63..e6b5bd845245 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3727,7 +3727,7 @@ static int spr_wm_latency_open(struct inode *inode, struct file *file)
 {
 	struct drm_i915_private *dev_priv = inode->i_private;
 
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		return -ENODEV;
 
 	return single_open(file, spr_wm_latency_show, dev_priv);
@@ -3737,7 +3737,7 @@ static int cur_wm_latency_open(struct inode *inode, struct file *file)
 {
 	struct drm_i915_private *dev_priv = inode->i_private;
 
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		return -ENODEV;
 
 	return single_open(file, cur_wm_latency_show, dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 45b837bc8f9e..a2293152cb6a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2490,7 +2490,7 @@ static inline unsigned int i915_sg_segment_size(void)
 
 #define HAS_FW_BLC(dev_priv) 	(INTEL_GEN(dev_priv) > 2)
 #define HAS_FBC(dev_priv)	(INTEL_INFO(dev_priv)->display.has_fbc)
-#define HAS_CUR_FBC(dev_priv)	(!HAS_GMCH_DISPLAY(dev_priv) && INTEL_GEN(dev_priv) >= 7)
+#define HAS_CUR_FBC(dev_priv)	(!HAS_GMCH(dev_priv) && INTEL_GEN(dev_priv) >= 7)
 
 #define HAS_IPS(dev_priv)	(IS_HSW_ULT(dev_priv) || IS_BROADWELL(dev_priv))
 
@@ -2570,7 +2570,7 @@ static inline unsigned int i915_sg_segment_size(void)
 #define HAS_PCH_NOP(dev_priv) (INTEL_PCH_TYPE(dev_priv) == PCH_NOP)
 #define HAS_PCH_SPLIT(dev_priv) (INTEL_PCH_TYPE(dev_priv) != PCH_NONE)
 
-#define HAS_GMCH_DISPLAY(dev_priv) (INTEL_INFO(dev_priv)->display.has_gmch_display)
+#define HAS_GMCH(dev_priv) (INTEL_INFO(dev_priv)->display.has_gmch)
 
 #define HAS_LSPCON(dev_priv) (INTEL_GEN(dev_priv) >= 9)
 
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index c6533c57ea2f..36fa6f1905fc 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -88,7 +88,7 @@
 	.num_pipes = 1, \
 	.display.has_overlay = 1, \
 	.display.overlay_needs_physical = 1, \
-	.display.has_gmch_display = 1, \
+	.display.has_gmch = 1, \
 	.gpu_reset_clobbers_display = true, \
 	.hws_needs_physical = 1, \
 	.unfenced_needs_alignment = 1, \
@@ -129,7 +129,7 @@ static const struct intel_device_info intel_i865g_info = {
 #define GEN3_FEATURES \
 	GEN(3), \
 	.num_pipes = 2, \
-	.display.has_gmch_display = 1, \
+	.display.has_gmch = 1, \
 	.gpu_reset_clobbers_display = true, \
 	.ring_mask = RENDER_RING, \
 	.has_snoop = true, \
@@ -206,7 +206,7 @@ static const struct intel_device_info intel_pineview_info = {
 	GEN(4), \
 	.num_pipes = 2, \
 	.display.has_hotplug = 1, \
-	.display.has_gmch_display = 1, \
+	.display.has_gmch = 1, \
 	.gpu_reset_clobbers_display = true, \
 	.ring_mask = RENDER_RING, \
 	.has_snoop = true, \
@@ -382,7 +382,7 @@ static const struct intel_device_info intel_valleyview_info = {
 	.num_pipes = 2,
 	.has_runtime_pm = 1,
 	.has_rc6 = 1,
-	.display.has_gmch_display = 1,
+	.display.has_gmch = 1,
 	.display.has_hotplug = 1,
 	.ppgtt = INTEL_PPGTT_FULL,
 	.has_snoop = true,
@@ -474,7 +474,7 @@ static const struct intel_device_info intel_cherryview_info = {
 	.has_runtime_pm = 1,
 	.has_rc6 = 1,
 	.has_logical_ring_contexts = 1,
-	.display.has_gmch_display = 1,
+	.display.has_gmch = 1,
 	.ppgtt = INTEL_PPGTT_FULL,
 	.has_reset_engine = 1,
 	.has_snoop = true,
diff --git a/drivers/gpu/drm/i915/i915_suspend.c b/drivers/gpu/drm/i915/i915_suspend.c
index f18afa2bac8d..d2f2a9c2fabd 100644
--- a/drivers/gpu/drm/i915/i915_suspend.c
+++ b/drivers/gpu/drm/i915/i915_suspend.c
@@ -86,7 +86,7 @@ int i915_save_state(struct drm_i915_private *dev_priv)
 	} else if (IS_GEN(dev_priv, 2)) {
 		for (i = 0; i < 7; i++)
 			dev_priv->regfile.saveSWF1[i] = I915_READ(SWF1(i));
-	} else if (HAS_GMCH_DISPLAY(dev_priv)) {
+	} else if (HAS_GMCH(dev_priv)) {
 		for (i = 0; i < 16; i++) {
 			dev_priv->regfile.saveSWF0[i] = I915_READ(SWF0(i));
 			dev_priv->regfile.saveSWF1[i] = I915_READ(SWF1(i));
@@ -131,7 +131,7 @@ int i915_restore_state(struct drm_i915_private *dev_priv)
 	} else if (IS_GEN(dev_priv, 2)) {
 		for (i = 0; i < 7; i++)
 			I915_WRITE(SWF1(i), dev_priv->regfile.saveSWF1[i]);
-	} else if (HAS_GMCH_DISPLAY(dev_priv)) {
+	} else if (HAS_GMCH(dev_priv)) {
 		for (i = 0; i < 16; i++) {
 			I915_WRITE(SWF0(i), dev_priv->regfile.saveSWF0[i]);
 			I915_WRITE(SWF1(i), dev_priv->regfile.saveSWF1[i]);
diff --git a/drivers/gpu/drm/i915/intel_color.c b/drivers/gpu/drm/i915/intel_color.c
index 4b0044cdcf1a..31fa599b0d61 100644
--- a/drivers/gpu/drm/i915/intel_color.c
+++ b/drivers/gpu/drm/i915/intel_color.c
@@ -321,7 +321,7 @@ static void i9xx_load_luts_internal(struct intel_crtc_state *crtc_state,
 	enum pipe pipe = crtc->pipe;
 	int i;
 
-	if (HAS_GMCH_DISPLAY(dev_priv)) {
+	if (HAS_GMCH(dev_priv)) {
 		if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_DSI))
 			assert_dsi_pll_enabled(dev_priv);
 		else
@@ -336,7 +336,7 @@ static void i9xx_load_luts_internal(struct intel_crtc_state *crtc_state,
 				(drm_color_lut_extract(lut[i].green, 8) << 8) |
 				drm_color_lut_extract(lut[i].blue, 8);
 
-			if (HAS_GMCH_DISPLAY(dev_priv))
+			if (HAS_GMCH(dev_priv))
 				I915_WRITE(PALETTE(pipe, i), word);
 			else
 				I915_WRITE(LGC_PALETTE(pipe, i), word);
@@ -345,7 +345,7 @@ static void i9xx_load_luts_internal(struct intel_crtc_state *crtc_state,
 		for (i = 0; i < 256; i++) {
 			u32 word = (i << 16) | (i << 8) | i;
 
-			if (HAS_GMCH_DISPLAY(dev_priv))
+			if (HAS_GMCH(dev_priv))
 				I915_WRITE(PALETTE(pipe, i), word);
 			else
 				I915_WRITE(LGC_PALETTE(pipe, i), word);
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 7bf09cef591a..e8b8661df746 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -115,7 +115,7 @@ enum intel_ppgtt {
 	func(has_ddi); \
 	func(has_dp_mst); \
 	func(has_fbc); \
-	func(has_gmch_display); \
+	func(has_gmch); \
 	func(has_hotplug); \
 	func(has_ipc); \
 	func(has_overlay); \
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 8b36ee183c7e..d324b816350b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1805,7 +1805,7 @@ static void intel_enable_pipe(const struct intel_crtc_state *new_crtc_state)
 	 * a plane.  On ILK+ the pipe PLLs are integrated, so we don't
 	 * need the check.
 	 */
-	if (HAS_GMCH_DISPLAY(dev_priv)) {
+	if (HAS_GMCH(dev_priv)) {
 		if (intel_crtc_has_type(new_crtc_state, INTEL_OUTPUT_DSI))
 			assert_dsi_pll_enabled(dev_priv);
 		else
@@ -2094,7 +2094,7 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
 	 * complicated than this. For example, Cherryview appears quite
 	 * happy to scanout from anywhere within its global aperture.
 	 */
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		pinctl |= PIN_MAPPABLE;
 
 	vma = i915_gem_object_pin_to_display_plane(obj,
@@ -3195,7 +3195,7 @@ i9xx_plane_max_stride(struct intel_plane *plane,
 {
 	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
 
-	if (!HAS_GMCH_DISPLAY(dev_priv)) {
+	if (!HAS_GMCH(dev_priv)) {
 		return 32*1024;
 	} else if (INTEL_GEN(dev_priv) >= 4) {
 		if (modifier == I915_FORMAT_MOD_X_TILED)
@@ -3771,7 +3771,7 @@ __intel_display_resume(struct drm_device *dev,
 	}
 
 	/* ignore any reset values/BIOS leftovers in the WM registers */
-	if (!HAS_GMCH_DISPLAY(to_i915(dev)))
+	if (!HAS_GMCH(to_i915(dev)))
 		to_intel_atomic_state(state)->skip_intermediate_wm = true;
 
 	ret = drm_atomic_helper_commit_duplicated_state(state, ctx);
@@ -5294,7 +5294,7 @@ intel_pre_disable_primary_noatomic(struct drm_crtc *crtc)
 	 * event which is after the vblank start event, so we need to have a
 	 * wait-for-vblank between disabling the plane and the pipe.
 	 */
-	if (HAS_GMCH_DISPLAY(dev_priv) &&
+	if (HAS_GMCH(dev_priv) &&
 	    intel_set_memory_cxsr(dev_priv, false))
 		intel_wait_for_vblank(dev_priv, pipe);
 }
@@ -5431,7 +5431,7 @@ static void intel_pre_plane_update(struct intel_crtc_state *old_crtc_state,
 	 * event which is after the vblank start event, so we need to have a
 	 * wait-for-vblank between disabling the plane and the pipe.
 	 */
-	if (HAS_GMCH_DISPLAY(dev_priv) && old_crtc_state->base.active &&
+	if (HAS_GMCH(dev_priv) && old_crtc_state->base.active &&
 	    pipe_config->disable_cxsr && intel_set_memory_cxsr(dev_priv, false))
 		intel_wait_for_vblank(dev_priv, crtc->pipe);
 
@@ -6705,7 +6705,7 @@ static void intel_crtc_compute_pixel_rate(struct intel_crtc_state *crtc_state)
 {
 	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
 
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		/* FIXME calculate proper pipe pixel rate for GMCH pfit */
 		crtc_state->pixel_rate =
 			crtc_state->base.adjusted_mode.crtc_clock;
@@ -9814,7 +9814,7 @@ static u32 intel_cursor_base(const struct intel_plane_state *plane_state)
 	base += plane_state->color_plane[0].offset;
 
 	/* ILK+ do this automagically */
-	if (HAS_GMCH_DISPLAY(dev_priv) &&
+	if (HAS_GMCH(dev_priv) &&
 	    plane_state->base.rotation & DRM_MODE_ROTATE_180)
 		base += (plane_state->base.crtc_h *
 			 plane_state->base.crtc_w - 1) * fb->format->cpp[0];
@@ -11356,7 +11356,7 @@ static void intel_dump_pipe_config(struct intel_crtc *crtc,
 			      pipe_config->scaler_state.scaler_users,
 		              pipe_config->scaler_state.scaler_id);
 
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		DRM_DEBUG_KMS("gmch pfit: control: 0x%08x, ratios: 0x%08x, lvds border: 0x%08x\n",
 			      pipe_config->gmch_pfit.control,
 			      pipe_config->gmch_pfit.pgm_ratios,
@@ -13096,7 +13096,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 
 			/* FIXME unify this for all platforms */
 			if (!new_crtc_state->active &&
-			    !HAS_GMCH_DISPLAY(dev_priv) &&
+			    !HAS_GMCH(dev_priv) &&
 			    dev_priv->display.initial_watermarks)
 				dev_priv->display.initial_watermarks(intel_state,
 								     new_intel_crtc_state);
@@ -15074,7 +15074,7 @@ retry:
 	 * intermediate watermarks (since we don't trust the current
 	 * watermarks).
 	 */
-	if (!HAS_GMCH_DISPLAY(dev_priv))
+	if (!HAS_GMCH(dev_priv))
 		intel_state->skip_intermediate_wm = true;
 
 	ret = intel_atomic_check(dev, state);
@@ -15315,7 +15315,7 @@ int intel_modeset_init(struct drm_device *dev)
 	 * Note that we need to do this after reconstructing the BIOS fb's
 	 * since the watermark calculation done here will use pstate->fb.
 	 */
-	if (!HAS_GMCH_DISPLAY(dev_priv))
+	if (!HAS_GMCH(dev_priv))
 		sanitize_watermarks(dev);
 
 	/*
@@ -15524,7 +15524,7 @@ static void intel_sanitize_crtc(struct intel_crtc *crtc,
 	if (crtc_state->base.active && !intel_crtc_has_encoders(crtc))
 		intel_crtc_disable_noatomic(&crtc->base, ctx);
 
-	if (crtc_state->base.active || HAS_GMCH_DISPLAY(dev_priv)) {
+	if (crtc_state->base.active || HAS_GMCH(dev_priv)) {
 		/*
 		 * We start out with underrun reporting disabled to avoid races.
 		 * For correct bookkeeping mark this on active crtcs.
@@ -16271,7 +16271,7 @@ intel_display_capture_error_state(struct drm_i915_private *dev_priv)
 
 		error->pipe[i].source = I915_READ(PIPESRC(i));
 
-		if (HAS_GMCH_DISPLAY(dev_priv))
+		if (HAS_GMCH(dev_priv))
 			error->pipe[i].stat = I915_READ(PIPESTAT(i));
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 2e994b5ba40b..d9bc8ee52300 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -2141,7 +2141,7 @@ intel_dp_compute_config(struct intel_encoder *encoder,
 				return ret;
 		}
 
-		if (HAS_GMCH_DISPLAY(dev_priv))
+		if (HAS_GMCH(dev_priv))
 			intel_gmch_panel_fitting(intel_crtc, pipe_config,
 						 conn_state->scaling_mode);
 		else
@@ -2152,7 +2152,7 @@ intel_dp_compute_config(struct intel_encoder *encoder,
 	if (adjusted_mode->flags & DRM_MODE_FLAG_DBLSCAN)
 		return -EINVAL;
 
-	if (HAS_GMCH_DISPLAY(dev_priv) &&
+	if (HAS_GMCH(dev_priv) &&
 	    adjusted_mode->flags & DRM_MODE_FLAG_INTERLACE)
 		return -EINVAL;
 
@@ -5300,7 +5300,7 @@ bool intel_digital_port_connected(struct intel_encoder *encoder)
 {
 	struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
 
-	if (HAS_GMCH_DISPLAY(dev_priv)) {
+	if (HAS_GMCH(dev_priv)) {
 		if (IS_GM45(dev_priv))
 			return gm45_digital_port_connected(encoder);
 		else
@@ -6038,7 +6038,7 @@ intel_dp_add_properties(struct intel_dp *intel_dp, struct drm_connector *connect
 		intel_attach_force_audio_property(connector);
 
 	intel_attach_broadcast_rgb_property(connector);
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		drm_connector_attach_max_bpc_property(connector, 6, 10);
 	else if (INTEL_GEN(dev_priv) >= 5)
 		drm_connector_attach_max_bpc_property(connector, 6, 12);
@@ -6047,7 +6047,7 @@ intel_dp_add_properties(struct intel_dp *intel_dp, struct drm_connector *connect
 		u32 allowed_scalers;
 
 		allowed_scalers = BIT(DRM_MODE_SCALE_ASPECT) | BIT(DRM_MODE_SCALE_FULLSCREEN);
-		if (!HAS_GMCH_DISPLAY(dev_priv))
+		if (!HAS_GMCH(dev_priv))
 			allowed_scalers |= BIT(DRM_MODE_SCALE_CENTER);
 
 		drm_connector_attach_scaling_mode_property(connector, allowed_scalers);
@@ -6919,7 +6919,7 @@ intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
 	drm_connector_init(dev, connector, &intel_dp_connector_funcs, type);
 	drm_connector_helper_add(connector, &intel_dp_connector_helper_funcs);
 
-	if (!HAS_GMCH_DISPLAY(dev_priv))
+	if (!HAS_GMCH(dev_priv))
 		connector->interlace_allowed = true;
 	connector->doublescan_allowed = 0;
 
diff --git a/drivers/gpu/drm/i915/intel_fifo_underrun.c b/drivers/gpu/drm/i915/intel_fifo_underrun.c
index 3b9285130ef5..f33de4be4b89 100644
--- a/drivers/gpu/drm/i915/intel_fifo_underrun.c
+++ b/drivers/gpu/drm/i915/intel_fifo_underrun.c
@@ -258,7 +258,7 @@ static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
 	old = !crtc->cpu_fifo_underrun_disabled;
 	crtc->cpu_fifo_underrun_disabled = !enable;
 
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
 	else if (IS_GEN_RANGE(dev_priv, 5, 6))
 		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
@@ -369,7 +369,7 @@ void intel_cpu_fifo_underrun_irq_handler(struct drm_i915_private *dev_priv,
 		return;
 
 	/* GMCH can't disable fifo underruns, filter them. */
-	if (HAS_GMCH_DISPLAY(dev_priv) &&
+	if (HAS_GMCH(dev_priv) &&
 	    crtc->cpu_fifo_underrun_disabled)
 		return;
 
@@ -421,7 +421,7 @@ void intel_check_cpu_fifo_underruns(struct drm_i915_private *dev_priv)
 		if (crtc->cpu_fifo_underrun_disabled)
 			continue;
 
-		if (HAS_GMCH_DISPLAY(dev_priv))
+		if (HAS_GMCH(dev_priv))
 			i9xx_check_fifo_underruns(crtc);
 		else if (IS_GEN(dev_priv, 7))
 			ivybridge_check_fifo_underruns(crtc);
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index 97a98e1bea56..f125a62eba8c 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -1588,7 +1588,7 @@ intel_hdmi_mode_valid(struct drm_connector *connector,
 
 	if (hdmi->has_hdmi_sink && !force_dvi) {
 		/* if we can't do 8bpc we may still be able to do 12bpc */
-		if (status != MODE_OK && !HAS_GMCH_DISPLAY(dev_priv))
+		if (status != MODE_OK && !HAS_GMCH(dev_priv))
 			status = hdmi_port_clock_valid(hdmi, clock * 3 / 2,
 						       true, force_dvi);
 
@@ -1613,7 +1613,7 @@ static bool hdmi_deep_color_possible(const struct intel_crtc_state *crtc_state,
 		&crtc_state->base.adjusted_mode;
 	int i;
 
-	if (HAS_GMCH_DISPLAY(dev_priv))
+	if (HAS_GMCH(dev_priv))
 		return false;
 
 	if (bpc == 10 && INTEL_GEN(dev_priv) < 11)
@@ -2150,7 +2150,7 @@ intel_hdmi_add_properties(struct intel_hdmi *intel_hdmi, struct drm_connector *c
 	drm_connector_attach_content_type_property(connector);
 	connector->state->picture_aspect_ratio = HDMI_PICTURE_ASPECT_NONE;
 
-	if (!HAS_GMCH_DISPLAY(dev_priv))
+	if (!HAS_GMCH(dev_priv))
 		drm_connector_attach_max_bpc_property(connector, 8, 12);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c
index e027d2b4abe5..b8937c788f03 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -470,7 +470,7 @@ void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
 			 * hotplug bits itself. So only WARN about unexpected
 			 * interrupts on saner platforms.
 			 */
-			WARN_ONCE(!HAS_GMCH_DISPLAY(dev_priv),
+			WARN_ONCE(!HAS_GMCH(dev_priv),
 				  "Received HPD interrupt on pin %d although disabled\n", pin);
 			continue;
 		}
diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index 4f6dc8c94634..5a733e711355 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -823,7 +823,7 @@ int intel_setup_gmbus(struct drm_i915_private *dev_priv)
 
 	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
 		dev_priv->gpio_mmio_base = VLV_DISPLAY_BASE;
-	else if (!HAS_GMCH_DISPLAY(dev_priv))
+	else if (!HAS_GMCH(dev_priv))
 		/*
 		 * Broxton uses the same PCH offsets for South Display Engine,
 		 * even though it doesn't have a PCH.
diff --git a/drivers/gpu/drm/i915/vlv_dsi.c b/drivers/gpu/drm/i915/vlv_dsi.c
index 696b750acd1d..6403728fe778 100644
--- a/drivers/gpu/drm/i915/vlv_dsi.c
+++ b/drivers/gpu/drm/i915/vlv_dsi.c
@@ -275,7 +275,7 @@ static int intel_dsi_compute_config(struct intel_encoder *encoder,
 	if (fixed_mode) {
 		intel_fixed_panel_mode(fixed_mode, adjusted_mode);
 
-		if (HAS_GMCH_DISPLAY(dev_priv))
+		if (HAS_GMCH(dev_priv))
 			intel_gmch_panel_fitting(crtc, pipe_config,
 						 conn_state->scaling_mode);
 		else
@@ -1633,7 +1633,7 @@ static void intel_dsi_add_properties(struct intel_connector *connector)
 		u32 allowed_scalers;
 
 		allowed_scalers = BIT(DRM_MODE_SCALE_ASPECT) | BIT(DRM_MODE_SCALE_FULLSCREEN);
-		if (!HAS_GMCH_DISPLAY(dev_priv))
+		if (!HAS_GMCH(dev_priv))
 			allowed_scalers |= BIT(DRM_MODE_SCALE_CENTER);
 
 		drm_connector_attach_scaling_mode_property(&connector->base,
-- 
cgit v1.2.3


From ab1ab0eb0cb64d1c225bdf42bee067f81c87e2a4 Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Thu, 31 Jan 2019 14:15:07 +0100
Subject: drm/i915: do not return invalid pointers as a *dentry

When calling debugfs functions, they can now return error values if
something went wrong.  If that happens, return a NULL as a *dentry to
the relay core instead of passing it an illegal pointer.

The relay core should be able to handle an illegal pointer, but add this
check to be safe.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190131131507.GA19807@kroah.com
---
 drivers/gpu/drm/i915/intel_guc_log.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_guc_log.c b/drivers/gpu/drm/i915/intel_guc_log.c
index b53582c0c6c1..806fdfd7c78a 100644
--- a/drivers/gpu/drm/i915/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/intel_guc_log.c
@@ -140,6 +140,9 @@ static struct dentry *create_buf_file_callback(const char *filename,
 
 	buf_file = debugfs_create_file(filename, mode,
 				       parent, buf, &relay_file_operations);
+	if (IS_ERR(buf_file))
+		return NULL;
+
 	return buf_file;
 }
 
-- 
cgit v1.2.3


From 39806c3f11e206d03e76521c808a96aedfcc58d9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Mon, 4 Feb 2019 23:16:44 +0200
Subject: drm/i915: Include register polling in reg_rw traces
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We generally omit register polling from the i915_reg_rw tracepoint.
Understandable since polling could generate a lot of noise in the
trace. The downside is that the trace is incomplete. As a compromise
let's trace the final register value observed while polling. That
should be generally sufficient to observe what the code should be
doing next.

I suppose in some cases it might make sense to also trace the initial
register value, and maybe the number of times we polled. But that
would require a separate tracepoint so let's leave it for the future.

The other users of _NOTRACE() are i915_pmu and i2c bitbanging,
which I decided to leave alone.

Next we should do something to claw back the tracepoints for
planes and whatnot which were switched to _FW() a while back.
I guess just new macros for raw_rw+trace. The question is
what to call it?

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204211644.21967-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c     | 12 ++++++++++--
 drivers/gpu/drm/i915/intel_dp.c     |  6 ++++++
 drivers/gpu/drm/i915/intel_uncore.c |  3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f462a4d28af4..349ba9e23a5b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2543,6 +2543,10 @@ static void vlv_restore_gunit_s0ix_state(struct drm_i915_private *dev_priv)
 static int vlv_wait_for_pw_status(struct drm_i915_private *dev_priv,
 				  u32 mask, u32 val)
 {
+	i915_reg_t reg = VLV_GTLC_PW_STATUS;
+	u32 reg_value;
+	int ret;
+
 	/* The HW does not like us polling for PW_STATUS frequently, so
 	 * use the sleeping loop rather than risk the busy spin within
 	 * intel_wait_for_register().
@@ -2550,8 +2554,12 @@ static int vlv_wait_for_pw_status(struct drm_i915_private *dev_priv,
 	 * Transitioning between RC6 states should be at most 2ms (see
 	 * valleyview_enable_rps) so use a 3ms timeout.
 	 */
-	return wait_for((I915_READ_NOTRACE(VLV_GTLC_PW_STATUS) & mask) == val,
-			3);
+	ret = wait_for(((reg_value = I915_READ_NOTRACE(reg)) & mask) == val, 3);
+
+	/* just trace the final value */
+	trace_i915_reg_rw(false, reg, reg_value, sizeof(reg_value), true);
+
+	return ret;
 }
 
 int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool force_on)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index d9bc8ee52300..a15d9fd80641 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1061,6 +1061,10 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp)
 #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0)
 	done = wait_event_timeout(dev_priv->gmbus_wait_queue, C,
 				  msecs_to_jiffies_timeout(10));
+
+	/* just trace the final value */
+	trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true);
+
 	if (!done)
 		DRM_ERROR("dp aux hw did not signal timeout!\n");
 #undef C
@@ -1227,6 +1231,8 @@ intel_dp_aux_xfer(struct intel_dp *intel_dp,
 			break;
 		msleep(1);
 	}
+	/* just trace the final value */
+	trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true);
 
 	if (try == 3) {
 		static u32 last_status = -1;
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index e88f0252d77e..75646a1e0051 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1819,6 +1819,9 @@ int __intel_wait_for_register(struct drm_i915_private *dev_priv,
 				 (reg_value & mask) == value,
 				 slow_timeout_ms * 1000, 10, 1000);
 
+	/* just trace the final value */
+	trace_i915_reg_rw(false, reg, reg_value, sizeof(reg_value), true);
+
 	if (out_value)
 		*out_value = reg_value;
 
-- 
cgit v1.2.3


From f81b845f72c31e7cbba4e8a91dc30252bc1589d0 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue, 5 Feb 2019 09:27:59 +0000
Subject: drm/i915: Push clear_intel_crtc_state() onto the heap
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

clear_intel_crtc_state() uses the stack for saving a temporary copy of
certain bits of the inherited crtc_state before clearing the unwanted
bits. This pushes it over the stack limit for my little 32b Pineview,
so move the temporary allocation to the heap instead. As we now use a
zeroed struct, we can copy the whole extended state back to both
preserve what bits need to be preserved and zero the rest.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205092759.16018-1-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/intel_display.c | 40 ++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index d324b816350b..113ebd196775 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11468,44 +11468,38 @@ static bool check_digital_port_conflicts(struct drm_atomic_state *state)
 	return ret;
 }
 
-static void
+static int
 clear_intel_crtc_state(struct intel_crtc_state *crtc_state)
 {
 	struct drm_i915_private *dev_priv =
 		to_i915(crtc_state->base.crtc->dev);
-	struct intel_crtc_scaler_state scaler_state;
-	struct intel_dpll_hw_state dpll_hw_state;
-	struct intel_shared_dpll *shared_dpll;
-	struct intel_crtc_wm_state wm_state;
-	bool force_thru, ips_force_disable;
+	struct intel_crtc_state *saved_state;
+
+	saved_state = kzalloc(sizeof(*saved_state), GFP_KERNEL);
+	if (!saved_state)
+		return -ENOMEM;
 
 	/* FIXME: before the switch to atomic started, a new pipe_config was
 	 * kzalloc'd. Code that depends on any field being zero should be
 	 * fixed, so that the crtc_state can be safely duplicated. For now,
 	 * only fields that are know to not cause problems are preserved. */
 
-	scaler_state = crtc_state->scaler_state;
-	shared_dpll = crtc_state->shared_dpll;
-	dpll_hw_state = crtc_state->dpll_hw_state;
-	force_thru = crtc_state->pch_pfit.force_thru;
-	ips_force_disable = crtc_state->ips_force_disable;
+	saved_state->scaler_state = crtc_state->scaler_state;
+	saved_state->shared_dpll = crtc_state->shared_dpll;
+	saved_state->dpll_hw_state = crtc_state->dpll_hw_state;
+	saved_state->pch_pfit.force_thru = crtc_state->pch_pfit.force_thru;
+	saved_state->ips_force_disable = crtc_state->ips_force_disable;
 	if (IS_G4X(dev_priv) ||
 	    IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
-		wm_state = crtc_state->wm;
+		saved_state->wm = crtc_state->wm;
 
 	/* Keep base drm_crtc_state intact, only clear our extended struct */
 	BUILD_BUG_ON(offsetof(struct intel_crtc_state, base));
-	memset(&crtc_state->base + 1, 0,
+	memcpy(&crtc_state->base + 1, &saved_state->base + 1,
 	       sizeof(*crtc_state) - sizeof(crtc_state->base));
 
-	crtc_state->scaler_state = scaler_state;
-	crtc_state->shared_dpll = shared_dpll;
-	crtc_state->dpll_hw_state = dpll_hw_state;
-	crtc_state->pch_pfit.force_thru = force_thru;
-	crtc_state->ips_force_disable = ips_force_disable;
-	if (IS_G4X(dev_priv) ||
-	    IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
-		crtc_state->wm = wm_state;
+	kfree(saved_state);
+	return 0;
 }
 
 static int
@@ -11520,7 +11514,9 @@ intel_modeset_pipe_config(struct drm_crtc *crtc,
 	int i;
 	bool retry = true;
 
-	clear_intel_crtc_state(pipe_config);
+	ret = clear_intel_crtc_state(pipe_config);
+	if (ret)
+		return ret;
 
 	pipe_config->cpu_transcoder =
 		(enum transcoder) to_intel_crtc(crtc)->pipe;
-- 
cgit v1.2.3


From 0aded171e20e438015a95fbfe57d679dbf228090 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 17:50:53 +0200
Subject: drm/i915: Fix wm latency==0 disable on skl+
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When adding the early latency==0 check back I neglected to
realize that we no longer have a way to return a failure
from the wm computation like we had in the past (since we
now calculate wms before ddb allocations). Also plane_en
being false doesn't actually indicate that the level is
invalid as it wil also happen when the plane is not
enabled.

skl_allocate_pipe_ddb() starts scanning from the maximum
watermark level and it stops as soon as it finds a level
that is deemed viable. The assumption being that if level
n+1 is valid then level n is valid as well. Thus if we
now disable any watermark level by zeroing its latency
the code will think that level to be actually valid
and won't confirm whether the actually enabled lower
watermark level(s) actually fit into the allotted ddb
space. This results in hilarious watermark values that
exceed the ddb allocation of the plane.

The way we must now indicate a failure is to assign an
unreasoanbly big value to min_ddb_alloc which will then
make skl_allocate_pipe_ddb() reject the entire level.

v2: Also do the same for the lines>31 case (Matt)
v3: Make 'blocks' u32 (Matt)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205155053.10081-1-ville.syrjala@linux.intel.com
---
 drivers/gpu/drm/i915/intel_pm.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index ed9786241307..737005bf6816 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4319,7 +4319,7 @@ skl_allocate_pipe_ddb(struct intel_crtc_state *cstate,
 	int num_active;
 	u64 plane_data_rate[I915_MAX_PLANES] = {};
 	u64 uv_plane_data_rate[I915_MAX_PLANES] = {};
-	u16 blocks = 0;
+	u32 blocks;
 	int level;
 
 	/* Clear the partitioning for disabled planes. */
@@ -4694,8 +4694,11 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *cstate,
 	uint_fixed_16_16_t selected_result;
 	u32 res_blocks, res_lines, min_ddb_alloc = 0;
 
-	if (latency == 0)
+	if (latency == 0) {
+		/* reject it */
+		result->min_ddb_alloc = U16_MAX;
 		return;
+	}
 
 	/* Display WA #1141: kbl,cfl */
 	if ((IS_KABYLAKE(dev_priv) || IS_COFFEELAKE(dev_priv) ||
@@ -4783,8 +4786,11 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *cstate,
 	if (!skl_wm_has_lines(dev_priv, level))
 		res_lines = 0;
 
-	if (res_lines > 31)
+	if (res_lines > 31) {
+		/* reject it */
+		result->min_ddb_alloc = U16_MAX;
 		return;
+	}
 
 	/*
 	 * If res_lines is valid, assume we can use this watermark level
-- 
cgit v1.2.3


From d16221195ae255beae1fa62f961a9b8e020a6c22 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Mon, 4 Feb 2019 22:21:39 +0200
Subject: drm/i915: Extract icl_set_pipe_chicken()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We need configure PIPE_CHICKEN during fastboot as well. Let's extract
it to a helper.

v2: Apparently PIPE_CHICKEN is icl+ only

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204202139.26884-1-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 113ebd196775..af848fb72ed6 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3896,6 +3896,24 @@ unlock:
 	clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
 }
 
+static void icl_set_pipe_chicken(struct intel_crtc *crtc)
+{
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	enum pipe pipe = crtc->pipe;
+	u32 tmp;
+
+	tmp = I915_READ(PIPE_CHICKEN(pipe));
+
+	/*
+	 * Display WA #1153: icl
+	 * enable hardware to bypass the alpha math
+	 * and rounding for per-pixel values 00 and 0xff
+	 */
+	tmp |= PER_PIXEL_ALPHA_BYPASS_EN;
+
+	I915_WRITE(PIPE_CHICKEN(pipe), tmp);
+}
+
 static void intel_update_pipe_config(const struct intel_crtc_state *old_crtc_state,
 				     const struct intel_crtc_state *new_crtc_state)
 {
@@ -5782,7 +5800,6 @@ static void haswell_crtc_enable(struct intel_crtc_state *pipe_config,
 	struct intel_atomic_state *old_intel_state =
 		to_intel_atomic_state(old_state);
 	bool psl_clkgate_wa;
-	u32 pipe_chicken;
 
 	if (WARN_ON(intel_crtc->active))
 		return;
@@ -5839,16 +5856,8 @@ static void haswell_crtc_enable(struct intel_crtc_state *pipe_config,
 	 */
 	intel_color_load_luts(pipe_config);
 
-	/*
-	 * Display WA #1153: enable hardware to bypass the alpha math
-	 * and rounding for per-pixel values 00 and 0xff
-	 */
-	if (INTEL_GEN(dev_priv) >= 11) {
-		pipe_chicken = I915_READ(PIPE_CHICKEN(pipe));
-		if (!(pipe_chicken & PER_PIXEL_ALPHA_BYPASS_EN))
-			I915_WRITE_FW(PIPE_CHICKEN(pipe),
-				      pipe_chicken | PER_PIXEL_ALPHA_BYPASS_EN);
-	}
+	if (INTEL_GEN(dev_priv) >= 11)
+		icl_set_pipe_chicken(intel_crtc);
 
 	intel_ddi_set_pipe_settings(pipe_config);
 	if (!transcoder_is_dsi(cpu_transcoder))
-- 
cgit v1.2.3


From 108d14bdaef67617412af3f40256810281464425 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Mon, 4 Feb 2019 22:22:14 +0200
Subject: drm/i915: Setup PIPE_CHICKEN for fastsets too
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Configure PIPE_CHICKEN during intel_update_pipe_config() to make
sure we have our chickens in a row with fastboot too.

v2: Apparently PIPE_CHICKEN is icl+ only

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204202214.27051-1-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index af848fb72ed6..dc7a2aa62874 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3958,6 +3958,9 @@ static void intel_update_pipe_config(const struct intel_crtc_state *old_crtc_sta
 		I915_WRITE(SKL_BOTTOM_COLOR(crtc->pipe),
 			   SKL_BOTTOM_COLOR_GAMMA_ENABLE |
 			   SKL_BOTTOM_COLOR_CSC_ENABLE);
+
+	if (INTEL_GEN(dev_priv) >= 11)
+		icl_set_pipe_chicken(crtc);
 }
 
 static void intel_fdi_normal_train(struct intel_crtc *crtc)
-- 
cgit v1.2.3


From bf002c100740f4ae01d0d86b44f65a712ee14031 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Mon, 4 Feb 2019 22:22:32 +0200
Subject: drm/i915: W/A for underruns with WM1+ disabled on icl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Disabling WM1+ on ICL causes tons of underruns with
linear/X-tiled framebuffers. We can avoid this by flipping
on a chicken bit affecting the way the hw fill the FIFO.
This may not be the final solution but should hopefully
avoid some underruns in the meantime.

v2: Apparently PIPE_CHICKEN is icl+ only

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204202232.27153-1-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h      | 1 +
 drivers/gpu/drm/i915/intel_display.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ede54fdc1676..12964b0fbc54 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7618,6 +7618,7 @@ enum {
 #define _PIPEB_CHICKEN			0x71038
 #define _PIPEC_CHICKEN			0x72038
 #define  PER_PIXEL_ALPHA_BYPASS_EN	(1 << 7)
+#define  PM_FILL_MAINTAIN_DBUF_FULLNESS	(1 << 0)
 #define PIPE_CHICKEN(pipe)		_MMIO_PIPE(pipe, _PIPEA_CHICKEN,\
 						   _PIPEB_CHICKEN)
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index dc7a2aa62874..9db931a450f6 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3911,6 +3911,12 @@ static void icl_set_pipe_chicken(struct intel_crtc *crtc)
 	 */
 	tmp |= PER_PIXEL_ALPHA_BYPASS_EN;
 
+	/*
+	 * W/A for underruns with linear/X-tiled with
+	 * WM1+ disabled.
+	 */
+	tmp |= PM_FILL_MAINTAIN_DBUF_FULLNESS;
+
 	I915_WRITE(PIPE_CHICKEN(pipe), tmp);
 }
 
-- 
cgit v1.2.3


From 26a11deea685b41a43edb513194718aa1f461c9a Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue, 5 Feb 2019 13:03:53 +0000
Subject: drm/i915/pmu: Fix enable count array size and bounds checking

Enable count array is supposed to have one counter for each possible
engine sampler. As such, array sizing and bounds checking is not correct
and would blow up the asserts if more samplers were added.

No ill-effect in the current code base but lets fix it for correctness.

At the same time tidy the assert for readability and robustness.

v2:
 * One check per assert. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130353.21105-1-tvrtko.ursulin@linux.intel.com
---
 drivers/gpu/drm/i915/i915_pmu.c         | 22 +++++++++++++++-------
 drivers/gpu/drm/i915/i915_pmu.h         |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 +++++----
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b1cb2d3cae16..13d70b90dd0f 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -599,7 +599,8 @@ static void i915_pmu_enable(struct perf_event *event)
 	 * Update the bitmask of enabled events and increment
 	 * the event reference counter.
 	 */
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	BUILD_BUG_ON(ARRAY_SIZE(i915->pmu.enable_count) != I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == ~0);
 	i915->pmu.enable |= BIT_ULL(bit);
 	i915->pmu.enable_count[bit]++;
@@ -620,11 +621,16 @@ static void i915_pmu_enable(struct perf_event *event)
 		engine = intel_engine_lookup_user(i915,
 						  engine_event_class(event),
 						  engine_event_instance(event));
-		GEM_BUG_ON(!engine);
-		engine->pmu.enable |= BIT(sample);
 
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
+			     I915_ENGINE_SAMPLE_COUNT);
+		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
+			     I915_ENGINE_SAMPLE_COUNT);
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
+
+		engine->pmu.enable |= BIT(sample);
 		engine->pmu.enable_count[sample]++;
 	}
 
@@ -654,9 +660,11 @@ static void i915_pmu_disable(struct perf_event *event)
 		engine = intel_engine_lookup_user(i915,
 						  engine_event_class(event),
 						  engine_event_instance(event));
-		GEM_BUG_ON(!engine);
-		GEM_BUG_ON(sample >= I915_PMU_SAMPLE_BITS);
+
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
+		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
 		GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
+
 		/*
 		 * Decrement the reference count and clear the enabled
 		 * bitmask when the last listener on an event goes away.
@@ -665,7 +673,7 @@ static void i915_pmu_disable(struct perf_event *event)
 			engine->pmu.enable &= ~BIT(sample);
 	}
 
-	GEM_BUG_ON(bit >= I915_PMU_MASK_BITS);
+	GEM_BUG_ON(bit >= ARRAY_SIZE(i915->pmu.enable_count));
 	GEM_BUG_ON(i915->pmu.enable_count[bit] == 0);
 	/*
 	 * Decrement the reference count and clear the enabled
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index 7f164ca3db12..b3728c5f13e7 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -31,6 +31,8 @@ enum {
 	((1 << I915_PMU_SAMPLE_BITS) + \
 	 (I915_PMU_LAST + 1 - __I915_PMU_OTHER(0)))
 
+#define I915_ENGINE_SAMPLE_COUNT (I915_SAMPLE_SEMA + 1)
+
 struct i915_pmu_sample {
 	u64 cur;
 };
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1398eb81dee6..4d4ea6963a72 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -403,16 +403,17 @@ struct intel_engine_cs {
 		/**
 		 * @enable_count: Reference count for the enabled samplers.
 		 *
-		 * Index number corresponds to the bit number from @enable.
+		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
 		 */
-		unsigned int enable_count[I915_PMU_SAMPLE_BITS];
+		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
 		/**
 		 * @sample: Counter values for sampling events.
 		 *
 		 * Our internal timer stores the current counters in this field.
+		 *
+		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
 		 */
-#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1)
-		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX];
+		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
 	} pmu;
 
 	/*
-- 
cgit v1.2.3


From c7e716b8617e5f468d6f8623532b9cc2cc7d3d0d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 22:50:55 +0200
Subject: drm/i915: Bump skl+ wm blocks to 11 bits
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On icl the plane watermark blocks field is 11 bits. Bump our define to
match so that readout won't ignore the extra bit. We can safely do this
for older platforms too since the unused bits are hardwired to zero.

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205205056.30081-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 12964b0fbc54..2be34e13af78 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6000,7 +6000,7 @@ enum {
 #define   PLANE_WM_EN		(1 << 31)
 #define   PLANE_WM_LINES_SHIFT	14
 #define   PLANE_WM_LINES_MASK	0x1f
-#define   PLANE_WM_BLOCKS_MASK	0x3ff
+#define   PLANE_WM_BLOCKS_MASK	0x7ff /* skl+: 10 bits, icl+ 11 bits */
 
 #define _CUR_WM_0(pipe) _PIPE(pipe, _CUR_WM_A_0, _CUR_WM_B_0)
 #define CUR_WM(pipe, level) _MMIO(_CUR_WM_0(pipe) + ((4) * (level)))
-- 
cgit v1.2.3


From d7e449a858ec280f3b90355246d8df680e40b6dd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 22:50:56 +0200
Subject: drm/i915: Just use icl+ definition for PLANE_WM blocks field
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The unused bits on PLANE_WM & co. are hardwired to zero. So no
need to worry about reading the extra bit on pre-icl.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205205056.30081-2-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h | 3 +--
 drivers/gpu/drm/i915/intel_pm.c | 9 ++-------
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 2be34e13af78..638a586469f9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6784,8 +6784,7 @@ enum {
 
 #define _PLANE_BUF_CFG_1_B			0x7127c
 #define _PLANE_BUF_CFG_2_B			0x7137c
-#define  SKL_DDB_ENTRY_MASK			0x3FF
-#define  ICL_DDB_ENTRY_MASK			0x7FF
+#define  DDB_ENTRY_MASK				0x7FF /* skl+: 10 bits, icl+ 11 bits */
 #define  DDB_ENTRY_END_SHIFT			16
 #define _PLANE_BUF_CFG_1(pipe)	\
 	_PIPE(pipe, _PLANE_BUF_CFG_1_A, _PLANE_BUF_CFG_1_B)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 737005bf6816..0f15685529a0 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3927,14 +3927,9 @@ static unsigned int skl_cursor_allocation(int num_active)
 static void skl_ddb_entry_init_from_hw(struct drm_i915_private *dev_priv,
 				       struct skl_ddb_entry *entry, u32 reg)
 {
-	u16 mask;
 
-	if (INTEL_GEN(dev_priv) >= 11)
-		mask = ICL_DDB_ENTRY_MASK;
-	else
-		mask = SKL_DDB_ENTRY_MASK;
-	entry->start = reg & mask;
-	entry->end = (reg >> DDB_ENTRY_END_SHIFT) & mask;
+	entry->start = reg & DDB_ENTRY_MASK;
+	entry->end = (reg >> DDB_ENTRY_END_SHIFT) & DDB_ENTRY_MASK;
 
 	if (entry->end)
 		entry->end += 1;
-- 
cgit v1.2.3


From fe5ec65668cdaa4348631d8ce1766eed43b33c10 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude@redhat.com>
Date: Tue, 29 Jan 2019 14:09:59 -0500
Subject: drm/i915: Block fbdev HPD processing during suspend

When resuming, we check whether or not any previously connected
MST topologies are still present and if so, attempt to resume them. If
this fails, we disable said MST topologies and fire off a hotplug event
so that userspace knows to reprobe.

However, sending a hotplug event involves calling
drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
connector reprobe in the caller's thread - something we can't do at the
point in which i915 calls drm_dp_mst_topology_mgr_resume() since
hotplugging hasn't been fully initialized yet.

This currently causes some rather subtle but fatal issues. For example,
on my T480s the laptop dock connected to it usually disappears during a
suspend cycle, and comes back up a short while after the system has been
resumed. This guarantees pretty much every suspend and resume cycle,
drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
a connector hotplug will occur. Now it's Rute Goldberg time: when the
connector hotplug occurs, i915 reprobes /all/ of the connectors,
including eDP. However, eDP probing requires that we power on the panel
VDD which in turn, grabs a wakeref to the appropriate power domain on
the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
things start breaking, since this all happens before
intel_power_domains_enable() is called we end up leaking the wakeref
that was acquired and never releasing it later. Come next suspend/resume
cycle, this causes us to fail to shut down the GPU properly, which
causes it not to resume properly and die a horrible complicated death.

(as a note: this only happens when there's both an eDP panel and MST
topology connected which is removed mid-suspend. One or the other seems
to always be OK).

We could try to fix the VDD wakeref leak, but this doesn't seem like
it's worth it at all since we aren't able to handle hotplug detection
while resuming anyway. So, let's go with a more robust solution inspired
by nouveau: block fbdev from handling hotplug events until we resume
fbdev. This allows us to still send sysfs hotplug events to be handled
later by user space while we're resuming, while also preventing us from
actually processing any hotplug events we receive until it's safe.

This fixes the wakeref leak observed on the T480s and as such, also
fixes suspend/resume with MST topologies connected on this machine.

Changes since v2:
* Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
  (Chris Wilson)
* Don't call drm_fb_helper_hotplug_event() in
  intel_fbdev_output_poll_changed() under lock (Chris Wilson)
* Always set ifbdev->hpd_waiting (Chris Wilson)

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v3.17+
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-2-lyude@redhat.com
---
 drivers/gpu/drm/i915/intel_drv.h   | 10 ++++++++++
 drivers/gpu/drm/i915/intel_fbdev.c | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 35fbc527c19e..79f102dfe37c 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -212,6 +212,16 @@ struct intel_fbdev {
 	unsigned long vma_flags;
 	async_cookie_t cookie;
 	int preferred_bpp;
+
+	/* Whether or not fbdev hpd processing is temporarily suspended */
+	bool hpd_suspended : 1;
+	/* Set when a hotplug was received while HPD processing was
+	 * suspended
+	 */
+	bool hpd_waiting : 1;
+
+	/* Protects hpd_suspended */
+	struct mutex hpd_lock;
 };
 
 struct intel_encoder {
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 3036d835bc2b..39b314272fe9 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -679,6 +679,7 @@ int intel_fbdev_init(struct drm_device *dev)
 	if (ifbdev == NULL)
 		return -ENOMEM;
 
+	mutex_init(&ifbdev->hpd_lock);
 	drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs);
 
 	if (!intel_fbdev_init_bios(dev, ifbdev))
@@ -752,6 +753,26 @@ void intel_fbdev_fini(struct drm_i915_private *dev_priv)
 	intel_fbdev_destroy(ifbdev);
 }
 
+/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD
+ * processing, fbdev will perform a full connector reprobe if a hotplug event
+ * was received while HPD was suspended.
+ */
+static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state)
+{
+	bool send_hpd = false;
+
+	mutex_lock(&ifbdev->hpd_lock);
+	ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED;
+	send_hpd = !ifbdev->hpd_suspended && ifbdev->hpd_waiting;
+	ifbdev->hpd_waiting = false;
+	mutex_unlock(&ifbdev->hpd_lock);
+
+	if (send_hpd) {
+		DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n");
+		drm_fb_helper_hotplug_event(&ifbdev->helper);
+	}
+}
+
 void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -773,6 +794,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 		 */
 		if (state != FBINFO_STATE_RUNNING)
 			flush_work(&dev_priv->fbdev_suspend_work);
+
 		console_lock();
 	} else {
 		/*
@@ -800,17 +822,26 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 
 	drm_fb_helper_set_suspend(&ifbdev->helper, state);
 	console_unlock();
+
+	intel_fbdev_hpd_set_suspend(ifbdev, state);
 }
 
 void intel_fbdev_output_poll_changed(struct drm_device *dev)
 {
 	struct intel_fbdev *ifbdev = to_i915(dev)->fbdev;
+	bool send_hpd;
 
 	if (!ifbdev)
 		return;
 
 	intel_fbdev_sync(ifbdev);
-	if (ifbdev->vma || ifbdev->helper.deferred_setup)
+
+	mutex_lock(&ifbdev->hpd_lock);
+	send_hpd = !ifbdev->hpd_suspended;
+	ifbdev->hpd_waiting = true;
+	mutex_unlock(&ifbdev->hpd_lock);
+
+	if (send_hpd && (ifbdev->vma || ifbdev->helper.deferred_setup))
 		drm_fb_helper_hotplug_event(&ifbdev->helper);
 }
 
-- 
cgit v1.2.3


From 6be1cf96bb79e3150b27a490d9709eec6b523224 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude@redhat.com>
Date: Tue, 29 Jan 2019 14:10:00 -0500
Subject: drm/i915: Don't send MST hotplugs during resume
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We have a bad habit of calling drm_fb_helper_hotplug_event() far more
then we actually need to. MST appears to be one of these cases, where we
call drm_fb_helper_hotplug_event() if we fail to resume a connected MST
topology in intel_dp_mst_resume(). We don't actually need to do this at
all though since hotplug events are already sent from
drm_dp_connector_destroy_work() every time connectors are unregistered
from userspace's PoV. Additionally, extra calls to
drm_fb_helper_hotplug_event() also just mean more of a chance of doing a
connector probe somewhere we shouldn't.

So, don't send any hotplug events during resume if the MST topology
fails to come up. Just rely on the DP MST helpers to send them for us.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-3-lyude@redhat.com
---
 drivers/gpu/drm/i915/intel_dp.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index a15d9fd80641..26c8c0c747e7 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -7102,7 +7102,10 @@ void intel_dp_mst_resume(struct drm_i915_private *dev_priv)
 			continue;
 
 		ret = drm_dp_mst_topology_mgr_resume(&intel_dp->mst_mgr);
-		if (ret)
-			intel_dp_check_mst_status(intel_dp);
+		if (ret) {
+			intel_dp->is_mst = false;
+			drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr,
+							false);
+		}
 	}
 }
-- 
cgit v1.2.3


From 6cbb55c086b4f23ad100f34cdeab496c4e99e0a6 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude@redhat.com>
Date: Tue, 29 Jan 2019 14:10:01 -0500
Subject: drm/i915: Don't send hotplug in intel_dp_check_mst_status()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This hotplug also isn't needed: drm_dp_mst_topology_mgr_set_mst()
already sends a hotplug on its own from drm_dp_destroy_connector_work()
after destroying connectors in the MST topology.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-4-lyude@redhat.com
---
 drivers/gpu/drm/i915/intel_dp.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 26c8c0c747e7..8f38d932f2d1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -4614,12 +4614,10 @@ go_again:
 
 			return ret;
 		} else {
-			struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
 			DRM_DEBUG_KMS("failed to get ESI - device may have failed\n");
 			intel_dp->is_mst = false;
-			drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, intel_dp->is_mst);
-			/* send a hotplug event */
-			drm_kms_helper_hotplug_event(intel_dig_port->base.base.dev);
+			drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr,
+							intel_dp->is_mst);
 		}
 	}
 	return -EINVAL;
-- 
cgit v1.2.3


From 5c4604e757ba9b193b09768d75a7d2105a5b883f Mon Sep 17 00:00:00 2001
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date: Thu, 7 Feb 2019 10:54:53 +0200
Subject: drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set

Make sure the underlying VMA in the process address space is the
same as it was during vm_mmap to avoid applying WC to wrong VMA.

A more long-term solution would be to have vm_mmap_locked variant
in linux/mmap.h for when caller wants to hold mmap_sem for an
extended duration.

v2:
- Refactor the compare function

Fixes: 1816f9236303 ("drm/i915: Support creation of unbound wc user mappings for objects")
Reported-by: Adam Zabrocki <adamza@microsoft.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-1-joonas.lahtinen@linux.intel.com
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 52b5a24be42b..2ea9b423de49 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1679,6 +1679,16 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static inline bool
+__vma_matches(struct vm_area_struct *vma, struct file *filp,
+	      unsigned long addr, unsigned long size)
+{
+	if (vma->vm_file != filp)
+		return false;
+
+	return vma->vm_start == addr && (vma->vm_end - vma->vm_start) == size;
+}
+
 /**
  * i915_gem_mmap_ioctl - Maps the contents of an object, returning the address
  *			 it is mapped to.
@@ -1737,7 +1747,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 			return -EINTR;
 		}
 		vma = find_vma(mm, addr);
-		if (vma)
+		if (vma && __vma_matches(vma, obj->base.filp, addr, args->size))
 			vma->vm_page_prot =
 				pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
 		else
-- 
cgit v1.2.3


From ebfb6977801da521d8d5d752d373a187e2a2b9b3 Mon Sep 17 00:00:00 2001
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date: Thu, 7 Feb 2019 10:54:54 +0200
Subject: drm/i915: Handle vm_mmap error during I915_GEM_MMAP ioctl with WC set

Add err goto label and use it when VMA can't be established or changes
underneath.

v2:
- Dropping Fixes: as it's indeed impossible to race an object to the
  error address. (Chris)
v3:
- Use IS_ERR_VALUE (Chris)

Reported-by: Adam Zabrocki <adamza@microsoft.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v2
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-2-joonas.lahtinen@linux.intel.com
---
 drivers/gpu/drm/i915/i915_gem.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2ea9b423de49..1d94bb61a779 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1738,6 +1738,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 	addr = vm_mmap(obj->base.filp, 0, args->size,
 		       PROT_READ | PROT_WRITE, MAP_SHARED,
 		       args->offset);
+	if (IS_ERR_VALUE(addr))
+		goto err;
+
 	if (args->flags & I915_MMAP_WC) {
 		struct mm_struct *mm = current->mm;
 		struct vm_area_struct *vma;
@@ -1753,17 +1756,22 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 		else
 			addr = -ENOMEM;
 		up_write(&mm->mmap_sem);
+		if (IS_ERR_VALUE(addr))
+			goto err;
 
 		/* This may race, but that's ok, it only gets set */
 		WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CPU);
 	}
 	i915_gem_object_put(obj);
-	if (IS_ERR((void *)addr))
-		return addr;
 
 	args->addr_ptr = (u64)addr;
 
 	return 0;
+
+err:
+	i915_gem_object_put(obj);
+
+	return addr;
 }
 
 static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
-- 
cgit v1.2.3


From d6f328bfeb0b0a9304166c52ec15c08076ca5246 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu, 7 Feb 2019 07:18:22 +0000
Subject: drm/i915: Hack and slash, throttle execbuffer hogs

Apply backpressure to hogs that emit requests faster than the GPU can
process them by waiting for their ring to be less than half-full before
proceeding with taking the struct_mutex.

This is a gross hack to apply throttling backpressure, the long term
goal is to remove the struct_mutex contention so that each client
naturally waits, preferably in an asynchronous, nonblocking fashion
(pipelined operations for the win), for their own resources and never
blocks another client within the driver at least. (Realtime priority
goals would extend to ensuring that resource contention favours high
priority clients as well.)

This patch only limits excessive request production and does not attempt
to throttle clients that block waiting for eviction (either global GTT or
system memory) or any other global resources, see above for the long term
goal.

No microbenchmarks are harmed (to the best of my knowledge).

Testcase: igt/gem_exec_schedule/pi-ringfull-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 67 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 13 ------
 drivers/gpu/drm/i915/intel_ringbuffer.h    | 12 ++++++
 3 files changed, 79 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8eedf7cac493..02adcaf6ebea 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -753,6 +753,68 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	return 0;
 }
 
+static struct i915_request *__eb_wait_for_ring(struct intel_ring *ring)
+{
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &ring->request_list, ring_link) {
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->ring_link == &ring->request_list)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_wait_for_ring(const struct i915_execbuffer *eb)
+{
+	const struct intel_context *ce;
+	struct i915_request *rq;
+	int ret = 0;
+
+	/*
+	 * Apply a light amount of backpressure to prevent excessive hogs
+	 * from blocking waiting for space whilst holding struct_mutex and
+	 * keeping all of their resources pinned.
+	 */
+
+	ce = to_intel_context(eb->ctx, eb->engine);
+	if (!ce->ring) /* first use, assume empty! */
+		return 0;
+
+	rq = __eb_wait_for_ring(ce->ring);
+	if (rq) {
+		mutex_unlock(&eb->i915->drm.struct_mutex);
+
+		if (i915_request_wait(rq,
+				      I915_WAIT_INTERRUPTIBLE,
+				      MAX_SCHEDULE_TIMEOUT) < 0)
+			ret = -EINTR;
+
+		i915_request_put(rq);
+
+		mutex_lock(&eb->i915->drm.struct_mutex);
+	}
+
+	return ret;
+}
+
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct radix_tree_root *handles_vma = &eb->ctx->handles_vma;
@@ -2291,6 +2353,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_rpm;
 
+	err = eb_wait_for_ring(&eb); /* may temporarily drop struct_mutex */
+	if (unlikely(err))
+		goto err_unlock;
+
 	err = eb_relocate(&eb);
 	if (err) {
 		/*
@@ -2435,6 +2501,7 @@ err_batch_unpin:
 err_vma:
 	if (eb.exec)
 		eb_release_vmas(&eb);
+err_unlock:
 	mutex_unlock(&dev->struct_mutex);
 err_rpm:
 	intel_runtime_pm_put(eb.i915, wakeref);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b889b27f8aeb..7f841dba87b3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -49,19 +49,6 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 		I915_GEM_HWS_INDEX_ADDR);
 }
 
-static unsigned int __intel_ring_space(unsigned int head,
-				       unsigned int tail,
-				       unsigned int size)
-{
-	/*
-	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
-	 * same cacheline, the Head Pointer must not be greater than the Tail
-	 * Pointer."
-	 */
-	GEM_BUG_ON(!is_power_of_2(size));
-	return (head - tail - CACHELINE_BYTES) & (size - 1);
-}
-
 unsigned int intel_ring_update_space(struct intel_ring *ring)
 {
 	unsigned int space;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4d4ea6963a72..710ffb221775 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -832,6 +832,18 @@ intel_ring_set_tail(struct intel_ring *ring, unsigned int tail)
 	return tail;
 }
 
+static inline unsigned int
+__intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
+{
+	/*
+	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+	 * same cacheline, the Head Pointer must not be greater than the Tail
+	 * Pointer."
+	 */
+	GEM_BUG_ON(!is_power_of_2(size));
+	return (head - tail - CACHELINE_BYTES) & (size - 1);
+}
+
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno);
 
 int intel_engine_setup_common(struct intel_engine_cs *engine);
-- 
cgit v1.2.3


From 440e84a52ae965c3ceb3153ad59e8ed6e8df9ef4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Wed, 6 Feb 2019 20:54:33 +0200
Subject: drm/i915: Don't set update_wm_post on g4x+
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

update_wm_post is meant for pre-g4x only. Don't ever set
it on g4x+.

The only effect of a bogus update_wm_post on g4x+ could
be that we clear the legacy_cursor_update flag in
intel_atomic_commit(). Since legacy_cursor_update is
only set for legacy cursor updates (as the name suggests)
and we only set update_wm_post for a modeset the two
cases should never occur at the same time. But let's
be consistent in setting update_wm_post so we don't
end up confusing so many people.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190206185433.8116-1-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 9db931a450f6..bf564a638dca 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11063,7 +11063,8 @@ static int intel_crtc_atomic_check(struct drm_crtc *crtc,
 	int ret;
 	bool mode_changed = needs_modeset(crtc_state);
 
-	if (mode_changed && !crtc_state->active)
+	if (INTEL_GEN(dev_priv) < 5 && !IS_G4X(dev_priv) &&
+	    mode_changed && !crtc_state->active)
 		pipe_config->update_wm_post = true;
 
 	if (mode_changed && crtc_state->enable &&
-- 
cgit v1.2.3


From 7eb31a0bb2c1df7328692c875a6423b406b09ad1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 18:08:36 +0200
Subject: drm/i915: Split the gamma/csc enable bits from the plane_ctl()
 function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On g4x+ the pipe gamma enable bit for the primary plane affects
the pipe bottom color as well. The same for the pipe csc enable
bit on ilk+. Thus we must configure those bits correctly even
when the primary plane is disabled.

To make the feasible let's split those settings from the
plane_ctl() function into a seprate funciton that we can
call from the ->disable_plane() hook as well.

For consistency we'll do that on all the plane types. While
that has no real benefits at this time, it'll become useful
when we start to control the pipe gamma/csc enable bits
dynamically when we overhaul the color management code.

On pre-g4x there doesn't appear to be any way to gamma
correct the pipe bottom color, but sticking to the same
pattern doesn't hurt. And it'll still help us to do
crtc state readout correctly for the pipe gamma enable
bit for the color management overhaul.

An alternative apporach would be to still precompute these
bits into plane_state->ctl, but that would require that we
run through the plane check even when the plane isn't logically
enabled on any crtc. Currently that condition causes us to
short circuit the entire thing and not call ->check_plane().
There would also be some chicken and egg problems with
->check_plane() vs. crtc color state check that would
requite splitting certain things into multiple steps.
So all in all this seems like the easier route.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-2-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 128 +++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_drv.h     |   3 +-
 drivers/gpu/drm/i915/intel_sprite.c  |  54 ++++++++++++---
 3 files changed, 139 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index bf564a638dca..be5a73b6805a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3215,28 +3215,38 @@ i9xx_plane_max_stride(struct intel_plane *plane,
 	}
 }
 
+static u32 i9xx_plane_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	u32 dspcntr = 0;
+
+	dspcntr |= DISPPLANE_GAMMA_ENABLE;
+
+	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
+		dspcntr |= DISPPLANE_PIPE_CSC_ENABLE;
+
+	if (INTEL_GEN(dev_priv) < 5)
+		dspcntr |= DISPPLANE_SEL_PIPE(crtc->pipe);
+
+	return dspcntr;
+}
+
 static u32 i9xx_plane_ctl(const struct intel_crtc_state *crtc_state,
 			  const struct intel_plane_state *plane_state)
 {
 	struct drm_i915_private *dev_priv =
 		to_i915(plane_state->base.plane->dev);
-	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
 	const struct drm_framebuffer *fb = plane_state->base.fb;
 	unsigned int rotation = plane_state->base.rotation;
 	u32 dspcntr;
 
-	dspcntr = DISPLAY_PLANE_ENABLE | DISPPLANE_GAMMA_ENABLE;
+	dspcntr = DISPLAY_PLANE_ENABLE;
 
 	if (IS_G4X(dev_priv) || IS_GEN(dev_priv, 5) ||
 	    IS_GEN(dev_priv, 6) || IS_IVYBRIDGE(dev_priv))
 		dspcntr |= DISPPLANE_TRICKLE_FEED_DISABLE;
 
-	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
-		dspcntr |= DISPPLANE_PIPE_CSC_ENABLE;
-
-	if (INTEL_GEN(dev_priv) < 5)
-		dspcntr |= DISPPLANE_SEL_PIPE(crtc->pipe);
-
 	switch (fb->format->format) {
 	case DRM_FORMAT_C8:
 		dspcntr |= DISPPLANE_8BPP;
@@ -3364,11 +3374,13 @@ static void i9xx_update_plane(struct intel_plane *plane,
 	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
 	enum i9xx_plane_id i9xx_plane = plane->i9xx_plane;
 	u32 linear_offset;
-	u32 dspcntr = plane_state->ctl;
 	int x = plane_state->color_plane[0].x;
 	int y = plane_state->color_plane[0].y;
 	unsigned long irqflags;
 	u32 dspaddr_offset;
+	u32 dspcntr;
+
+	dspcntr = plane_state->ctl | i9xx_plane_ctl_crtc(crtc_state);
 
 	linear_offset = intel_fb_xy_to_linear(x, y, plane_state, 0);
 
@@ -3428,10 +3440,23 @@ static void i9xx_disable_plane(struct intel_plane *plane,
 	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
 	enum i9xx_plane_id i9xx_plane = plane->i9xx_plane;
 	unsigned long irqflags;
+	u32 dspcntr;
+
+	/*
+	 * DSPCNTR pipe gamma enable on g4x+ and pipe csc
+	 * enable on ilk+ affect the pipe bottom color as
+	 * well, so we must configure them even if the plane
+	 * is disabled.
+	 *
+	 * On pre-g4x there is no way to gamma correct the
+	 * pipe bottom color but we'll keep on doing this
+	 * anyway.
+	 */
+	dspcntr = i9xx_plane_ctl_crtc(crtc_state);
 
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
-	I915_WRITE_FW(DSPCNTR(i9xx_plane), 0);
+	I915_WRITE_FW(DSPCNTR(i9xx_plane), dspcntr);
 	if (INTEL_GEN(dev_priv) >= 4)
 		I915_WRITE_FW(DSPSURF(i9xx_plane), 0);
 	else
@@ -3668,6 +3693,20 @@ static u32 cnl_plane_ctl_flip(unsigned int reflect)
 	return 0;
 }
 
+u32 skl_plane_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
+	u32 plane_ctl = 0;
+
+	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
+		return plane_ctl;
+
+	plane_ctl |= PLANE_CTL_PIPE_GAMMA_ENABLE;
+	plane_ctl |= PLANE_CTL_PIPE_CSC_ENABLE;
+
+	return plane_ctl;
+}
+
 u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state,
 		  const struct intel_plane_state *plane_state)
 {
@@ -3682,10 +3721,7 @@ u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state,
 
 	if (INTEL_GEN(dev_priv) < 10 && !IS_GEMINILAKE(dev_priv)) {
 		plane_ctl |= skl_plane_ctl_alpha(plane_state);
-		plane_ctl |=
-			PLANE_CTL_PIPE_GAMMA_ENABLE |
-			PLANE_CTL_PIPE_CSC_ENABLE |
-			PLANE_CTL_PLANE_GAMMA_DISABLE;
+		plane_ctl |= PLANE_CTL_PLANE_GAMMA_DISABLE;
 
 		if (plane_state->base.color_encoding == DRM_COLOR_YCBCR_BT709)
 			plane_ctl |= PLANE_CTL_YUV_TO_RGB_CSC_FORMAT_BT709;
@@ -3710,19 +3746,27 @@ u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state,
 	return plane_ctl;
 }
 
+u32 glk_plane_color_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
+	u32 plane_color_ctl = 0;
+
+	if (INTEL_GEN(dev_priv) >= 11)
+		return plane_color_ctl;
+
+	plane_color_ctl |= PLANE_COLOR_PIPE_GAMMA_ENABLE;
+	plane_color_ctl |= PLANE_COLOR_PIPE_CSC_ENABLE;
+
+	return plane_color_ctl;
+}
+
 u32 glk_plane_color_ctl(const struct intel_crtc_state *crtc_state,
 			const struct intel_plane_state *plane_state)
 {
-	struct drm_i915_private *dev_priv =
-		to_i915(plane_state->base.plane->dev);
 	const struct drm_framebuffer *fb = plane_state->base.fb;
 	struct intel_plane *plane = to_intel_plane(plane_state->base.plane);
 	u32 plane_color_ctl = 0;
 
-	if (INTEL_GEN(dev_priv) < 11) {
-		plane_color_ctl |= PLANE_COLOR_PIPE_GAMMA_ENABLE;
-		plane_color_ctl |= PLANE_COLOR_PIPE_CSC_ENABLE;
-	}
 	plane_color_ctl |= PLANE_COLOR_PLANE_GAMMA_DISABLE;
 	plane_color_ctl |= glk_plane_color_ctl_alpha(plane_state);
 
@@ -9945,11 +9989,15 @@ i845_cursor_max_stride(struct intel_plane *plane,
 	return 2048;
 }
 
+static u32 i845_cursor_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	return CURSOR_GAMMA_ENABLE;
+}
+
 static u32 i845_cursor_ctl(const struct intel_crtc_state *crtc_state,
 			   const struct intel_plane_state *plane_state)
 {
 	return CURSOR_ENABLE |
-		CURSOR_GAMMA_ENABLE |
 		CURSOR_FORMAT_ARGB |
 		CURSOR_STRIDE(plane_state->color_plane[0].stride);
 }
@@ -10019,7 +10067,9 @@ static void i845_update_cursor(struct intel_plane *plane,
 		unsigned int width = plane_state->base.crtc_w;
 		unsigned int height = plane_state->base.crtc_h;
 
-		cntl = plane_state->ctl;
+		cntl = plane_state->ctl |
+			i845_cursor_ctl_crtc(crtc_state);
+
 		size = (height << 12) | width;
 
 		base = intel_cursor_base(plane_state);
@@ -10086,27 +10136,36 @@ i9xx_cursor_max_stride(struct intel_plane *plane,
 	return plane->base.dev->mode_config.cursor_width * 4;
 }
 
-static u32 i9xx_cursor_ctl(const struct intel_crtc_state *crtc_state,
-			   const struct intel_plane_state *plane_state)
+static u32 i9xx_cursor_ctl_crtc(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_i915_private *dev_priv =
-		to_i915(plane_state->base.plane->dev);
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
 	u32 cntl = 0;
 
-	if (IS_GEN(dev_priv, 6) || IS_IVYBRIDGE(dev_priv))
-		cntl |= MCURSOR_TRICKLE_FEED_DISABLE;
+	if (INTEL_GEN(dev_priv) >= 11)
+		return cntl;
 
-	if (INTEL_GEN(dev_priv) <= 10) {
-		cntl |= MCURSOR_GAMMA_ENABLE;
+	cntl |= MCURSOR_GAMMA_ENABLE;
 
-		if (HAS_DDI(dev_priv))
-			cntl |= MCURSOR_PIPE_CSC_ENABLE;
-	}
+	if (HAS_DDI(dev_priv))
+		cntl |= MCURSOR_PIPE_CSC_ENABLE;
 
 	if (INTEL_GEN(dev_priv) < 5 && !IS_G4X(dev_priv))
 		cntl |= MCURSOR_PIPE_SELECT(crtc->pipe);
 
+	return cntl;
+}
+
+static u32 i9xx_cursor_ctl(const struct intel_crtc_state *crtc_state,
+			   const struct intel_plane_state *plane_state)
+{
+	struct drm_i915_private *dev_priv =
+		to_i915(plane_state->base.plane->dev);
+	u32 cntl = 0;
+
+	if (IS_GEN(dev_priv, 6) || IS_IVYBRIDGE(dev_priv))
+		cntl |= MCURSOR_TRICKLE_FEED_DISABLE;
+
 	switch (plane_state->base.crtc_w) {
 	case 64:
 		cntl |= MCURSOR_MODE_64_ARGB_AX;
@@ -10231,7 +10290,8 @@ static void i9xx_update_cursor(struct intel_plane *plane,
 	unsigned long irqflags;
 
 	if (plane_state && plane_state->base.visible) {
-		cntl = plane_state->ctl;
+		cntl = plane_state->ctl |
+			i9xx_cursor_ctl_crtc(crtc_state);
 
 		if (plane_state->base.crtc_h != plane_state->base.crtc_w)
 			fbc_ctl = CUR_FBC_CTL_EN | (plane_state->base.crtc_h - 1);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 79f102dfe37c..9717b1250e6f 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1765,9 +1765,10 @@ static inline u32 intel_plane_ggtt_offset(const struct intel_plane_state *state)
 
 u32 glk_plane_color_ctl(const struct intel_crtc_state *crtc_state,
 			const struct intel_plane_state *plane_state);
+u32 glk_plane_color_ctl_crtc(const struct intel_crtc_state *crtc_state);
 u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state,
 		  const struct intel_plane_state *plane_state);
-u32 glk_color_ctl(const struct intel_plane_state *plane_state);
+u32 skl_plane_ctl_crtc(const struct intel_crtc_state *crtc_state);
 u32 skl_plane_stride(const struct intel_plane_state *plane_state,
 		     int plane);
 int skl_check_plane_surface(struct intel_plane_state *plane_state);
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index cd42e81f8a90..b56a1a9ad01d 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -484,9 +484,16 @@ skl_program_plane(struct intel_plane *plane,
 	struct intel_plane *linked = plane_state->linked_plane;
 	const struct drm_framebuffer *fb = plane_state->base.fb;
 	u8 alpha = plane_state->base.alpha >> 8;
+	u32 plane_color_ctl = 0;
 	unsigned long irqflags;
 	u32 keymsk, keymax;
 
+	plane_ctl |= skl_plane_ctl_crtc(crtc_state);
+
+	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
+		plane_color_ctl = plane_state->color_ctl |
+			glk_plane_color_ctl_crtc(crtc_state);
+
 	/* Sizes are 0 based */
 	src_w--;
 	src_h--;
@@ -533,8 +540,7 @@ skl_program_plane(struct intel_plane *plane,
 	}
 
 	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
-		I915_WRITE_FW(PLANE_COLOR_CTL(pipe, plane_id),
-			      plane_state->color_ctl);
+		I915_WRITE_FW(PLANE_COLOR_CTL(pipe, plane_id), plane_color_ctl);
 
 	if (fb->format->is_yuv && icl_is_hdr_plane(plane))
 		icl_program_input_csc(plane, crtc_state, plane_state);
@@ -733,6 +739,11 @@ vlv_update_clrc(const struct intel_plane_state *plane_state)
 		      SP_SH_SIN(sh_sin) | SP_SH_COS(sh_cos));
 }
 
+static u32 vlv_sprite_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	return SP_GAMMA_ENABLE;
+}
+
 static u32 vlv_sprite_ctl(const struct intel_crtc_state *crtc_state,
 			  const struct intel_plane_state *plane_state)
 {
@@ -741,7 +752,7 @@ static u32 vlv_sprite_ctl(const struct intel_crtc_state *crtc_state,
 	const struct drm_intel_sprite_colorkey *key = &plane_state->ckey;
 	u32 sprctl;
 
-	sprctl = SP_ENABLE | SP_GAMMA_ENABLE;
+	sprctl = SP_ENABLE;
 
 	switch (fb->format->format) {
 	case DRM_FORMAT_YUYV:
@@ -808,7 +819,6 @@ vlv_update_plane(struct intel_plane *plane,
 	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
 	enum pipe pipe = plane->pipe;
 	enum plane_id plane_id = plane->id;
-	u32 sprctl = plane_state->ctl;
 	u32 sprsurf_offset = plane_state->color_plane[0].offset;
 	u32 linear_offset;
 	const struct drm_intel_sprite_colorkey *key = &plane_state->ckey;
@@ -819,6 +829,9 @@ vlv_update_plane(struct intel_plane *plane,
 	u32 x = plane_state->color_plane[0].x;
 	u32 y = plane_state->color_plane[0].y;
 	unsigned long irqflags;
+	u32 sprctl;
+
+	sprctl = plane_state->ctl | vlv_sprite_ctl_crtc(crtc_state);
 
 	/* Sizes are 0 based */
 	crtc_w--;
@@ -901,6 +914,19 @@ vlv_plane_get_hw_state(struct intel_plane *plane,
 	return ret;
 }
 
+static u32 ivb_sprite_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
+	u32 sprctl = 0;
+
+	sprctl |= SPRITE_GAMMA_ENABLE;
+
+	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
+		sprctl |= SPRITE_PIPE_CSC_ENABLE;
+
+	return sprctl;
+}
+
 static u32 ivb_sprite_ctl(const struct intel_crtc_state *crtc_state,
 			  const struct intel_plane_state *plane_state)
 {
@@ -911,14 +937,11 @@ static u32 ivb_sprite_ctl(const struct intel_crtc_state *crtc_state,
 	const struct drm_intel_sprite_colorkey *key = &plane_state->ckey;
 	u32 sprctl;
 
-	sprctl = SPRITE_ENABLE | SPRITE_GAMMA_ENABLE;
+	sprctl = SPRITE_ENABLE;
 
 	if (IS_IVYBRIDGE(dev_priv))
 		sprctl |= SPRITE_TRICKLE_FEED_DISABLE;
 
-	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
-		sprctl |= SPRITE_PIPE_CSC_ENABLE;
-
 	switch (fb->format->format) {
 	case DRM_FORMAT_XBGR8888:
 		sprctl |= SPRITE_FORMAT_RGBX888 | SPRITE_RGB_ORDER_RGBX;
@@ -970,7 +993,6 @@ ivb_update_plane(struct intel_plane *plane,
 {
 	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
 	enum pipe pipe = plane->pipe;
-	u32 sprctl = plane_state->ctl, sprscale = 0;
 	u32 sprsurf_offset = plane_state->color_plane[0].offset;
 	u32 linear_offset;
 	const struct drm_intel_sprite_colorkey *key = &plane_state->ckey;
@@ -982,8 +1004,11 @@ ivb_update_plane(struct intel_plane *plane,
 	u32 y = plane_state->color_plane[0].y;
 	u32 src_w = drm_rect_width(&plane_state->base.src) >> 16;
 	u32 src_h = drm_rect_height(&plane_state->base.src) >> 16;
+	u32 sprctl, sprscale = 0;
 	unsigned long irqflags;
 
+	sprctl = plane_state->ctl | ivb_sprite_ctl_crtc(crtc_state);
+
 	/* Sizes are 0 based */
 	src_w--;
 	src_h--;
@@ -1080,6 +1105,11 @@ g4x_sprite_max_stride(struct intel_plane *plane,
 	return 16384;
 }
 
+static u32 g4x_sprite_ctl_crtc(const struct intel_crtc_state *crtc_state)
+{
+	return DVS_GAMMA_ENABLE;
+}
+
 static u32 g4x_sprite_ctl(const struct intel_crtc_state *crtc_state,
 			  const struct intel_plane_state *plane_state)
 {
@@ -1090,7 +1120,7 @@ static u32 g4x_sprite_ctl(const struct intel_crtc_state *crtc_state,
 	const struct drm_intel_sprite_colorkey *key = &plane_state->ckey;
 	u32 dvscntr;
 
-	dvscntr = DVS_ENABLE | DVS_GAMMA_ENABLE;
+	dvscntr = DVS_ENABLE;
 
 	if (IS_GEN(dev_priv, 6))
 		dvscntr |= DVS_TRICKLE_FEED_DISABLE;
@@ -1146,7 +1176,6 @@ g4x_update_plane(struct intel_plane *plane,
 {
 	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
 	enum pipe pipe = plane->pipe;
-	u32 dvscntr = plane_state->ctl, dvsscale = 0;
 	u32 dvssurf_offset = plane_state->color_plane[0].offset;
 	u32 linear_offset;
 	const struct drm_intel_sprite_colorkey *key = &plane_state->ckey;
@@ -1158,8 +1187,11 @@ g4x_update_plane(struct intel_plane *plane,
 	u32 y = plane_state->color_plane[0].y;
 	u32 src_w = drm_rect_width(&plane_state->base.src) >> 16;
 	u32 src_h = drm_rect_height(&plane_state->base.src) >> 16;
+	u32 dvscntr, dvsscale = 0;
 	unsigned long irqflags;
 
+	dvscntr = plane_state->ctl | g4x_sprite_ctl_crtc(crtc_state);
+
 	/* Sizes are 0 based */
 	src_w--;
 	src_h--;
-- 
cgit v1.2.3


From 5f4f3e386b36952b2fa3d3935342a25d6e1c6d28 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 18:08:37 +0200
Subject: drm/i915: Precompute gamma_mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We shouldn't be computing gamma mode during the commit phase.
Move it to the check phase.

v2: Reword comments a bit (Matt)
    Rebase

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-3-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_color.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_color.c b/drivers/gpu/drm/i915/intel_color.c
index 31fa599b0d61..057f0600840a 100644
--- a/drivers/gpu/drm/i915/intel_color.c
+++ b/drivers/gpu/drm/i915/intel_color.c
@@ -375,8 +375,7 @@ static void haswell_load_luts(struct intel_crtc_state *crtc_state)
 		reenable_ips = true;
 	}
 
-	crtc_state->gamma_mode = GAMMA_MODE_MODE_8BIT;
-	I915_WRITE(GAMMA_MODE(crtc->pipe), GAMMA_MODE_MODE_8BIT);
+	I915_WRITE(GAMMA_MODE(crtc->pipe), crtc_state->gamma_mode);
 
 	i9xx_load_luts(crtc_state);
 
@@ -476,9 +475,7 @@ static void broadwell_load_luts(struct intel_crtc_state *crtc_state)
 	bdw_load_gamma_lut(crtc_state,
 			   INTEL_INFO(dev_priv)->color.degamma_lut_size);
 
-	crtc_state->gamma_mode = GAMMA_MODE_MODE_SPLIT;
-	I915_WRITE(GAMMA_MODE(pipe), GAMMA_MODE_MODE_SPLIT);
-	POSTING_READ(GAMMA_MODE(pipe));
+	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
 
 	/*
 	 * Reset the index, otherwise it prevents the legacy palette to be
@@ -532,9 +529,7 @@ static void glk_load_luts(struct intel_crtc_state *crtc_state)
 
 	bdw_load_gamma_lut(crtc_state, 0);
 
-	crtc_state->gamma_mode = GAMMA_MODE_MODE_10BIT;
-	I915_WRITE(GAMMA_MODE(pipe), GAMMA_MODE_MODE_10BIT);
-	POSTING_READ(GAMMA_MODE(pipe));
+	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
 }
 
 /* Loads the palette/gamma unit for the CRTC on CherryView. */
@@ -634,8 +629,10 @@ int intel_color_check(struct intel_crtc_state *crtc_state)
 	gamma_tests = INTEL_INFO(dev_priv)->color.gamma_lut_tests;
 
 	/* Always allow legacy gamma LUT with no further checking. */
-	if (crtc_state_is_legacy_gamma(crtc_state))
+	if (crtc_state_is_legacy_gamma(crtc_state)) {
+		crtc_state->gamma_mode = GAMMA_MODE_MODE_8BIT;
 		return 0;
+	}
 
 	if (check_lut_size(crtc_state->base.degamma_lut, degamma_length) ||
 	    check_lut_size(crtc_state->base.gamma_lut, gamma_length))
@@ -645,6 +642,12 @@ int intel_color_check(struct intel_crtc_state *crtc_state)
 	    drm_color_lut_check(crtc_state->base.gamma_lut, gamma_tests))
 		return -EINVAL;
 
+	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
+		crtc_state->gamma_mode = GAMMA_MODE_MODE_10BIT;
+	else if (INTEL_GEN(dev_priv) >= 9 || IS_BROADWELL(dev_priv))
+		crtc_state->gamma_mode = GAMMA_MODE_MODE_SPLIT;
+	else
+		crtc_state->gamma_mode = GAMMA_MODE_MODE_8BIT;
 
 	return 0;
 }
-- 
cgit v1.2.3


From 23b03a272c2beed43fa7468971b94a4b267cebbd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 18:08:38 +0200
Subject: drm/i915: Constify the state arguments to the color management stuff
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pass the crtc state etc. as const to the color management commit
functions. And while at it polish some of the local variables.

v2: Rebase

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-4-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h    |   4 +-
 drivers/gpu/drm/i915/intel_color.c | 140 ++++++++++++++++++++-----------------
 drivers/gpu/drm/i915/intel_drv.h   |   4 +-
 3 files changed, 80 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a2293152cb6a..1ca96376651b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -323,8 +323,8 @@ struct drm_i915_display_funcs {
 	/* display clock increase/decrease */
 	/* pll clock increase/decrease */
 
-	void (*load_csc_matrix)(struct intel_crtc_state *crtc_state);
-	void (*load_luts)(struct intel_crtc_state *crtc_state);
+	void (*load_csc_matrix)(const struct intel_crtc_state *crtc_state);
+	void (*load_luts)(const struct intel_crtc_state *crtc_state);
 };
 
 #define CSR_VERSION(major, minor)	((major) << 16 | (minor))
diff --git a/drivers/gpu/drm/i915/intel_color.c b/drivers/gpu/drm/i915/intel_color.c
index 057f0600840a..96d4ece4870c 100644
--- a/drivers/gpu/drm/i915/intel_color.c
+++ b/drivers/gpu/drm/i915/intel_color.c
@@ -74,12 +74,12 @@
 #define ILK_CSC_COEFF_1_0		\
 	((7 << 12) | ILK_CSC_COEFF_FP(CTM_COEFF_1_0, 8))
 
-static bool lut_is_legacy(struct drm_property_blob *lut)
+static bool lut_is_legacy(const struct drm_property_blob *lut)
 {
 	return drm_color_lut_size(lut) == LEGACY_LUT_LENGTH;
 }
 
-static bool crtc_state_is_legacy_gamma(struct intel_crtc_state *crtc_state)
+static bool crtc_state_is_legacy_gamma(const struct intel_crtc_state *crtc_state)
 {
 	return !crtc_state->base.degamma_lut &&
 		!crtc_state->base.ctm &&
@@ -115,8 +115,8 @@ static u64 *ctm_mult_by_limited(u64 *result, const u64 *input)
 
 static void ilk_load_ycbcr_conversion_matrix(struct intel_crtc *crtc)
 {
-	int pipe = crtc->pipe;
 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	enum pipe pipe = crtc->pipe;
 
 	I915_WRITE(PIPE_CSC_PREOFF_HI(pipe), 0);
 	I915_WRITE(PIPE_CSC_PREOFF_ME(pipe), 0);
@@ -137,13 +137,14 @@ static void ilk_load_ycbcr_conversion_matrix(struct intel_crtc *crtc)
 	I915_WRITE(PIPE_CSC_MODE(pipe), 0);
 }
 
-static void ilk_load_csc_matrix(struct intel_crtc_state *crtc_state)
+static void ilk_load_csc_matrix(const struct intel_crtc_state *crtc_state)
 {
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
-	int i, pipe = crtc->pipe;
-	u16 coeffs[9] = { 0, };
 	bool limited_color_range = false;
+	enum pipe pipe = crtc->pipe;
+	u16 coeffs[9] = {};
+	int i;
 
 	/*
 	 * FIXME if there's a gamma LUT after the CSC, we should
@@ -256,16 +257,16 @@ static void ilk_load_csc_matrix(struct intel_crtc_state *crtc_state)
 /*
  * Set up the pipe CSC unit on CherryView.
  */
-static void cherryview_load_csc_matrix(struct intel_crtc_state *crtc_state)
+static void cherryview_load_csc_matrix(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_device *dev = crtc_state->base.crtc->dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	int pipe = to_intel_crtc(crtc_state->base.crtc)->pipe;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	enum pipe pipe = crtc->pipe;
 	u32 mode;
 
 	if (crtc_state->base.ctm) {
-		struct drm_color_ctm *ctm = crtc_state->base.ctm->data;
-		u16 coeffs[9] = { 0, };
+		const struct drm_color_ctm *ctm = crtc_state->base.ctm->data;
+		u16 coeffs[9] = {};
 		int i;
 
 		for (i = 0; i < ARRAY_SIZE(coeffs); i++) {
@@ -303,18 +304,17 @@ static void cherryview_load_csc_matrix(struct intel_crtc_state *crtc_state)
 	I915_WRITE(CGM_PIPE_MODE(pipe), mode);
 }
 
-void intel_color_set_csc(struct intel_crtc_state *crtc_state)
+void intel_color_set_csc(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_device *dev = crtc_state->base.crtc->dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
 
 	if (dev_priv->display.load_csc_matrix)
 		dev_priv->display.load_csc_matrix(crtc_state);
 }
 
 /* Loads the legacy palette/gamma unit for the CRTC. */
-static void i9xx_load_luts_internal(struct intel_crtc_state *crtc_state,
-				    struct drm_property_blob *blob)
+static void i9xx_load_luts_internal(const struct intel_crtc_state *crtc_state,
+				    const struct drm_property_blob *blob)
 {
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
@@ -329,7 +329,8 @@ static void i9xx_load_luts_internal(struct intel_crtc_state *crtc_state,
 	}
 
 	if (blob) {
-		struct drm_color_lut *lut = blob->data;
+		const struct drm_color_lut *lut = blob->data;
+
 		for (i = 0; i < 256; i++) {
 			u32 word =
 				(drm_color_lut_extract(lut[i].red, 8) << 16) |
@@ -353,13 +354,13 @@ static void i9xx_load_luts_internal(struct intel_crtc_state *crtc_state,
 	}
 }
 
-static void i9xx_load_luts(struct intel_crtc_state *crtc_state)
+static void i9xx_load_luts(const struct intel_crtc_state *crtc_state)
 {
 	i9xx_load_luts_internal(crtc_state, crtc_state->base.gamma_lut);
 }
 
 /* Loads the legacy palette/gamma unit for the CRTC on Haswell. */
-static void haswell_load_luts(struct intel_crtc_state *crtc_state)
+static void haswell_load_luts(const struct intel_crtc_state *crtc_state)
 {
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
@@ -383,17 +384,19 @@ static void haswell_load_luts(struct intel_crtc_state *crtc_state)
 		hsw_enable_ips(crtc_state);
 }
 
-static void bdw_load_degamma_lut(struct intel_crtc_state *crtc_state)
+static void bdw_load_degamma_lut(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
-	enum pipe pipe = to_intel_crtc(crtc_state->base.crtc)->pipe;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	const struct drm_property_blob *degamma_lut = crtc_state->base.degamma_lut;
 	u32 i, lut_size = INTEL_INFO(dev_priv)->color.degamma_lut_size;
+	enum pipe pipe = crtc->pipe;
 
 	I915_WRITE(PREC_PAL_INDEX(pipe),
 		   PAL_PREC_SPLIT_MODE | PAL_PREC_AUTO_INCREMENT);
 
-	if (crtc_state->base.degamma_lut) {
-		struct drm_color_lut *lut = crtc_state->base.degamma_lut->data;
+	if (degamma_lut) {
+		const struct drm_color_lut *lut = degamma_lut->data;
 
 		for (i = 0; i < lut_size; i++) {
 			u32 word =
@@ -413,11 +416,13 @@ static void bdw_load_degamma_lut(struct intel_crtc_state *crtc_state)
 	}
 }
 
-static void bdw_load_gamma_lut(struct intel_crtc_state *crtc_state, u32 offset)
+static void bdw_load_gamma_lut(const struct intel_crtc_state *crtc_state, u32 offset)
 {
-	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
-	enum pipe pipe = to_intel_crtc(crtc_state->base.crtc)->pipe;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	const struct drm_property_blob *gamma_lut = crtc_state->base.gamma_lut;
 	u32 i, lut_size = INTEL_INFO(dev_priv)->color.gamma_lut_size;
+	enum pipe pipe = crtc->pipe;
 
 	WARN_ON(offset & ~PAL_PREC_INDEX_VALUE_MASK);
 
@@ -426,8 +431,8 @@ static void bdw_load_gamma_lut(struct intel_crtc_state *crtc_state, u32 offset)
 		   PAL_PREC_AUTO_INCREMENT |
 		   offset);
 
-	if (crtc_state->base.gamma_lut) {
-		struct drm_color_lut *lut = crtc_state->base.gamma_lut->data;
+	if (gamma_lut) {
+		const struct drm_color_lut *lut = gamma_lut->data;
 
 		for (i = 0; i < lut_size; i++) {
 			u32 word =
@@ -461,10 +466,11 @@ static void bdw_load_gamma_lut(struct intel_crtc_state *crtc_state, u32 offset)
 }
 
 /* Loads the palette/gamma unit for the CRTC on Broadwell+. */
-static void broadwell_load_luts(struct intel_crtc_state *crtc_state)
+static void broadwell_load_luts(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
-	enum pipe pipe = to_intel_crtc(crtc_state->base.crtc)->pipe;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	enum pipe pipe = crtc->pipe;
 
 	if (crtc_state_is_legacy_gamma(crtc_state)) {
 		haswell_load_luts(crtc_state);
@@ -484,10 +490,11 @@ static void broadwell_load_luts(struct intel_crtc_state *crtc_state)
 	I915_WRITE(PREC_PAL_INDEX(pipe), 0);
 }
 
-static void glk_load_degamma_lut(struct intel_crtc_state *crtc_state)
+static void glk_load_degamma_lut(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
-	enum pipe pipe = to_intel_crtc(crtc_state->base.crtc)->pipe;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	enum pipe pipe = crtc->pipe;
 	const u32 lut_size = 33;
 	u32 i;
 
@@ -514,11 +521,11 @@ static void glk_load_degamma_lut(struct intel_crtc_state *crtc_state)
 		I915_WRITE(PRE_CSC_GAMC_DATA(pipe), (1 << 16));
 }
 
-static void glk_load_luts(struct intel_crtc_state *crtc_state)
+static void glk_load_luts(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_device *dev = crtc_state->base.crtc->dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	enum pipe pipe = to_intel_crtc(crtc_state->base.crtc)->pipe;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	enum pipe pipe = crtc->pipe;
 
 	glk_load_degamma_lut(crtc_state);
 
@@ -532,28 +539,29 @@ static void glk_load_luts(struct intel_crtc_state *crtc_state)
 	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
 }
 
-/* Loads the palette/gamma unit for the CRTC on CherryView. */
-static void cherryview_load_luts(struct intel_crtc_state *crtc_state)
+static void cherryview_load_luts(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_crtc *crtc = crtc_state->base.crtc;
-	struct drm_i915_private *dev_priv = to_i915(crtc->dev);
-	enum pipe pipe = to_intel_crtc(crtc)->pipe;
-	struct drm_color_lut *lut;
-	u32 i, lut_size;
-	u32 word0, word1;
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	const struct drm_property_blob *gamma_lut = crtc_state->base.gamma_lut;
+	const struct drm_property_blob *degamma_lut = crtc_state->base.degamma_lut;
+	enum pipe pipe = crtc->pipe;
 
 	if (crtc_state_is_legacy_gamma(crtc_state)) {
 		/* Turn off degamma/gamma on CGM block. */
 		I915_WRITE(CGM_PIPE_MODE(pipe),
 			   (crtc_state->base.ctm ? CGM_PIPE_MODE_CSC : 0));
-		i9xx_load_luts_internal(crtc_state, crtc_state->base.gamma_lut);
+		i9xx_load_luts_internal(crtc_state, gamma_lut);
 		return;
 	}
 
-	if (crtc_state->base.degamma_lut) {
-		lut = crtc_state->base.degamma_lut->data;
-		lut_size = INTEL_INFO(dev_priv)->color.degamma_lut_size;
+	if (degamma_lut) {
+		const struct drm_color_lut *lut = degamma_lut->data;
+		int i, lut_size = INTEL_INFO(dev_priv)->color.degamma_lut_size;
+
 		for (i = 0; i < lut_size; i++) {
+			u32 word0, word1;
+
 			/* Write LUT in U0.14 format. */
 			word0 =
 			(drm_color_lut_extract(lut[i].green, 14) << 16) |
@@ -565,10 +573,13 @@ static void cherryview_load_luts(struct intel_crtc_state *crtc_state)
 		}
 	}
 
-	if (crtc_state->base.gamma_lut) {
-		lut = crtc_state->base.gamma_lut->data;
-		lut_size = INTEL_INFO(dev_priv)->color.gamma_lut_size;
+	if (gamma_lut) {
+		const struct drm_color_lut *lut = gamma_lut->data;
+		int i, lut_size = INTEL_INFO(dev_priv)->color.gamma_lut_size;
+
 		for (i = 0; i < lut_size; i++) {
+			u32 word0, word1;
+
 			/* Write LUT in U0.10 format. */
 			word0 =
 			(drm_color_lut_extract(lut[i].green, 10) << 16) |
@@ -582,8 +593,8 @@ static void cherryview_load_luts(struct intel_crtc_state *crtc_state)
 
 	I915_WRITE(CGM_PIPE_MODE(pipe),
 		   (crtc_state->base.ctm ? CGM_PIPE_MODE_CSC : 0) |
-		   (crtc_state->base.degamma_lut ? CGM_PIPE_MODE_DEGAMMA : 0) |
-		   (crtc_state->base.gamma_lut ? CGM_PIPE_MODE_GAMMA : 0));
+		   (degamma_lut ? CGM_PIPE_MODE_DEGAMMA : 0) |
+		   (gamma_lut ? CGM_PIPE_MODE_GAMMA : 0));
 
 	/*
 	 * Also program a linear LUT in the legacy block (behind the
@@ -592,10 +603,9 @@ static void cherryview_load_luts(struct intel_crtc_state *crtc_state)
 	i9xx_load_luts_internal(crtc_state, NULL);
 }
 
-void intel_color_load_luts(struct intel_crtc_state *crtc_state)
+void intel_color_load_luts(const struct intel_crtc_state *crtc_state)
 {
-	struct drm_device *dev = crtc_state->base.crtc->dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
 
 	dev_priv->display.load_luts(crtc_state);
 }
@@ -620,6 +630,8 @@ static int check_lut_size(const struct drm_property_blob *lut, int expected)
 int intel_color_check(struct intel_crtc_state *crtc_state)
 {
 	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
+	const struct drm_property_blob *gamma_lut = crtc_state->base.gamma_lut;
+	const struct drm_property_blob *degamma_lut = crtc_state->base.degamma_lut;
 	int gamma_length, degamma_length;
 	u32 gamma_tests, degamma_tests;
 
@@ -634,12 +646,12 @@ int intel_color_check(struct intel_crtc_state *crtc_state)
 		return 0;
 	}
 
-	if (check_lut_size(crtc_state->base.degamma_lut, degamma_length) ||
-	    check_lut_size(crtc_state->base.gamma_lut, gamma_length))
+	if (check_lut_size(degamma_lut, degamma_length) ||
+	    check_lut_size(gamma_lut, gamma_length))
 		return -EINVAL;
 
-	if (drm_color_lut_check(crtc_state->base.degamma_lut, degamma_tests) ||
-	    drm_color_lut_check(crtc_state->base.gamma_lut, gamma_tests))
+	if (drm_color_lut_check(degamma_lut, degamma_tests) ||
+	    drm_color_lut_check(gamma_lut, gamma_tests))
 		return -EINVAL;
 
 	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 9717b1250e6f..2c299456fdbd 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -2389,8 +2389,8 @@ int intel_plane_atomic_check_with_state(const struct intel_crtc_state *old_crtc_
 /* intel_color.c */
 void intel_color_init(struct intel_crtc *crtc);
 int intel_color_check(struct intel_crtc_state *crtc_state);
-void intel_color_set_csc(struct intel_crtc_state *crtc_state);
-void intel_color_load_luts(struct intel_crtc_state *crtc_state);
+void intel_color_set_csc(const struct intel_crtc_state *crtc_state);
+void intel_color_load_luts(const struct intel_crtc_state *crtc_state);
 
 /* intel_lspcon.c */
 bool lspcon_init(struct intel_digital_port *intel_dig_port);
-- 
cgit v1.2.3


From 87cefd57c88aecc877b35c89375fe688a8eb0868 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 18:08:39 +0200
Subject: drm/i915: Pull GAMMA_MODE write out from haswell_load_luts()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For bdw+ let's move the GAMMA_MODE write for the legacy LUT
mode into the .load_luts() funciton directly, rather than
relying on haswell_load_luts(). We'll be getting rid of
haswell_load_luts() entirely soon, and it's anyway cleaner
to have the GAMMA_MODE write in a single place.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-5-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_color.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_color.c b/drivers/gpu/drm/i915/intel_color.c
index 96d4ece4870c..171dff60d62f 100644
--- a/drivers/gpu/drm/i915/intel_color.c
+++ b/drivers/gpu/drm/i915/intel_color.c
@@ -473,21 +473,20 @@ static void broadwell_load_luts(const struct intel_crtc_state *crtc_state)
 	enum pipe pipe = crtc->pipe;
 
 	if (crtc_state_is_legacy_gamma(crtc_state)) {
-		haswell_load_luts(crtc_state);
-		return;
-	}
+		i9xx_load_luts(crtc_state);
+	} else {
+		bdw_load_degamma_lut(crtc_state);
+		bdw_load_gamma_lut(crtc_state,
+				   INTEL_INFO(dev_priv)->color.degamma_lut_size);
 
-	bdw_load_degamma_lut(crtc_state);
-	bdw_load_gamma_lut(crtc_state,
-			   INTEL_INFO(dev_priv)->color.degamma_lut_size);
+		/*
+		 * Reset the index, otherwise it prevents the legacy palette to be
+		 * written properly.
+		 */
+		I915_WRITE(PREC_PAL_INDEX(pipe), 0);
+	}
 
 	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
-
-	/*
-	 * Reset the index, otherwise it prevents the legacy palette to be
-	 * written properly.
-	 */
-	I915_WRITE(PREC_PAL_INDEX(pipe), 0);
 }
 
 static void glk_load_degamma_lut(const struct intel_crtc_state *crtc_state)
@@ -530,11 +529,16 @@ static void glk_load_luts(const struct intel_crtc_state *crtc_state)
 	glk_load_degamma_lut(crtc_state);
 
 	if (crtc_state_is_legacy_gamma(crtc_state)) {
-		haswell_load_luts(crtc_state);
-		return;
-	}
+		i9xx_load_luts(crtc_state);
+	} else {
+		bdw_load_gamma_lut(crtc_state, 0);
 
-	bdw_load_gamma_lut(crtc_state, 0);
+		/*
+		 * Reset the index, otherwise it prevents the legacy palette to be
+		 * written properly.
+		 */
+		I915_WRITE(PREC_PAL_INDEX(pipe), 0);
+	}
 
 	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
 }
-- 
cgit v1.2.3


From 4d8ed54c044703f96b1df9ef7ac689b18899a470 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 18:08:40 +0200
Subject: drm/i915: Split color mgmt based on single vs. double buffered
 registers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Split the color management hooks along the single vs. double
buffered registers line. Of the currently programmed registers
GAMMA_MODE and the ilk+ pipe CSC are double buffered, the
LUTS and CHV CGM block are single buffered.

The double buffered register will be programmed during the
normal pipe update with evasion, and also during pipe enable
so that the settings will already be correct when the pipe
starts up before the planes are enabled.

The single buffered registers are currently programmed before
the vblank evade. Which is totally wrong, but we'll correct
that later.

v2: Add some docs to explain the two vfuncs (Matt,Uma)
    Rebase

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-6-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      | 14 ++++++++++-
 drivers/gpu/drm/i915/intel_color.c   | 49 +++++++++++++++++-------------------
 drivers/gpu/drm/i915/intel_display.c | 16 ++++++------
 drivers/gpu/drm/i915/intel_drv.h     |  2 +-
 4 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1ca96376651b..08d75d57b34e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -323,7 +323,19 @@ struct drm_i915_display_funcs {
 	/* display clock increase/decrease */
 	/* pll clock increase/decrease */
 
-	void (*load_csc_matrix)(const struct intel_crtc_state *crtc_state);
+	/*
+	 * Program double buffered color management registers during
+	 * vblank evasion. The registers should then latch during the
+	 * next vblank start, alongside any other double buffered registers
+	 * involved with the same commit.
+	 */
+	void (*color_commit)(const struct intel_crtc_state *crtc_state);
+	/*
+	 * Load LUTs (and other single buffered color management
+	 * registers). Will (hopefully) be called during the vblank
+	 * following the latching of any double buffered registers
+	 * involved with the same commit.
+	 */
 	void (*load_luts)(const struct intel_crtc_state *crtc_state);
 };
 
diff --git a/drivers/gpu/drm/i915/intel_color.c b/drivers/gpu/drm/i915/intel_color.c
index 171dff60d62f..c364cb7cb14c 100644
--- a/drivers/gpu/drm/i915/intel_color.c
+++ b/drivers/gpu/drm/i915/intel_color.c
@@ -304,14 +304,6 @@ static void cherryview_load_csc_matrix(const struct intel_crtc_state *crtc_state
 	I915_WRITE(CGM_PIPE_MODE(pipe), mode);
 }
 
-void intel_color_set_csc(const struct intel_crtc_state *crtc_state)
-{
-	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
-
-	if (dev_priv->display.load_csc_matrix)
-		dev_priv->display.load_csc_matrix(crtc_state);
-}
-
 /* Loads the legacy palette/gamma unit for the CRTC. */
 static void i9xx_load_luts_internal(const struct intel_crtc_state *crtc_state,
 				    const struct drm_property_blob *blob)
@@ -359,6 +351,16 @@ static void i9xx_load_luts(const struct intel_crtc_state *crtc_state)
 	i9xx_load_luts_internal(crtc_state, crtc_state->base.gamma_lut);
 }
 
+static void hsw_color_commit(const struct intel_crtc_state *crtc_state)
+{
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+
+	I915_WRITE(GAMMA_MODE(crtc->pipe), crtc_state->gamma_mode);
+
+	ilk_load_csc_matrix(crtc_state);
+}
+
 /* Loads the legacy palette/gamma unit for the CRTC on Haswell. */
 static void haswell_load_luts(const struct intel_crtc_state *crtc_state)
 {
@@ -376,8 +378,6 @@ static void haswell_load_luts(const struct intel_crtc_state *crtc_state)
 		reenable_ips = true;
 	}
 
-	I915_WRITE(GAMMA_MODE(crtc->pipe), crtc_state->gamma_mode);
-
 	i9xx_load_luts(crtc_state);
 
 	if (reenable_ips)
@@ -485,8 +485,6 @@ static void broadwell_load_luts(const struct intel_crtc_state *crtc_state)
 		 */
 		I915_WRITE(PREC_PAL_INDEX(pipe), 0);
 	}
-
-	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
 }
 
 static void glk_load_degamma_lut(const struct intel_crtc_state *crtc_state)
@@ -539,8 +537,6 @@ static void glk_load_luts(const struct intel_crtc_state *crtc_state)
 		 */
 		I915_WRITE(PREC_PAL_INDEX(pipe), 0);
 	}
-
-	I915_WRITE(GAMMA_MODE(pipe), crtc_state->gamma_mode);
 }
 
 static void cherryview_load_luts(const struct intel_crtc_state *crtc_state)
@@ -551,10 +547,9 @@ static void cherryview_load_luts(const struct intel_crtc_state *crtc_state)
 	const struct drm_property_blob *degamma_lut = crtc_state->base.degamma_lut;
 	enum pipe pipe = crtc->pipe;
 
+	cherryview_load_csc_matrix(crtc_state);
+
 	if (crtc_state_is_legacy_gamma(crtc_state)) {
-		/* Turn off degamma/gamma on CGM block. */
-		I915_WRITE(CGM_PIPE_MODE(pipe),
-			   (crtc_state->base.ctm ? CGM_PIPE_MODE_CSC : 0));
 		i9xx_load_luts_internal(crtc_state, gamma_lut);
 		return;
 	}
@@ -595,11 +590,6 @@ static void cherryview_load_luts(const struct intel_crtc_state *crtc_state)
 		}
 	}
 
-	I915_WRITE(CGM_PIPE_MODE(pipe),
-		   (crtc_state->base.ctm ? CGM_PIPE_MODE_CSC : 0) |
-		   (degamma_lut ? CGM_PIPE_MODE_DEGAMMA : 0) |
-		   (gamma_lut ? CGM_PIPE_MODE_GAMMA : 0));
-
 	/*
 	 * Also program a linear LUT in the legacy block (behind the
 	 * CGM block).
@@ -614,6 +604,14 @@ void intel_color_load_luts(const struct intel_crtc_state *crtc_state)
 	dev_priv->display.load_luts(crtc_state);
 }
 
+void intel_color_commit(const struct intel_crtc_state *crtc_state)
+{
+	struct drm_i915_private *dev_priv = to_i915(crtc_state->base.crtc->dev);
+
+	if (dev_priv->display.color_commit)
+		dev_priv->display.color_commit(crtc_state);
+}
+
 static int check_lut_size(const struct drm_property_blob *lut, int expected)
 {
 	int len;
@@ -675,18 +673,17 @@ void intel_color_init(struct intel_crtc *crtc)
 	drm_mode_crtc_set_gamma_size(&crtc->base, 256);
 
 	if (IS_CHERRYVIEW(dev_priv)) {
-		dev_priv->display.load_csc_matrix = cherryview_load_csc_matrix;
 		dev_priv->display.load_luts = cherryview_load_luts;
 	} else if (IS_HASWELL(dev_priv)) {
-		dev_priv->display.load_csc_matrix = ilk_load_csc_matrix;
 		dev_priv->display.load_luts = haswell_load_luts;
+		dev_priv->display.color_commit = hsw_color_commit;
 	} else if (IS_BROADWELL(dev_priv) || IS_GEN9_BC(dev_priv) ||
 		   IS_BROXTON(dev_priv)) {
-		dev_priv->display.load_csc_matrix = ilk_load_csc_matrix;
 		dev_priv->display.load_luts = broadwell_load_luts;
+		dev_priv->display.color_commit = hsw_color_commit;
 	} else if (IS_GEMINILAKE(dev_priv) || IS_CANNONLAKE(dev_priv)) {
-		dev_priv->display.load_csc_matrix = ilk_load_csc_matrix;
 		dev_priv->display.load_luts = glk_load_luts;
+		dev_priv->display.color_commit = hsw_color_commit;
 	} else {
 		dev_priv->display.load_luts = i9xx_load_luts;
 	}
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index be5a73b6805a..63b61f61eccc 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -5779,6 +5779,7 @@ static void ironlake_crtc_enable(struct intel_crtc_state *pipe_config,
 	 * clocks enabled
 	 */
 	intel_color_load_luts(pipe_config);
+	intel_color_commit(pipe_config);
 
 	if (dev_priv->display.initial_watermarks != NULL)
 		dev_priv->display.initial_watermarks(old_intel_state, pipe_config);
@@ -5888,8 +5889,6 @@ static void haswell_crtc_enable(struct intel_crtc_state *pipe_config,
 
 	haswell_set_pipemisc(pipe_config);
 
-	intel_color_set_csc(pipe_config);
-
 	intel_crtc->active = true;
 
 	/* Display WA #1180: WaDisableScalarClockGating: glk, cnl */
@@ -5908,6 +5907,7 @@ static void haswell_crtc_enable(struct intel_crtc_state *pipe_config,
 	 * clocks enabled
 	 */
 	intel_color_load_luts(pipe_config);
+	intel_color_commit(pipe_config);
 
 	if (INTEL_GEN(dev_priv) >= 11)
 		icl_set_pipe_chicken(intel_crtc);
@@ -6245,8 +6245,6 @@ static void valleyview_crtc_enable(struct intel_crtc_state *pipe_config,
 
 	i9xx_set_pipeconf(pipe_config);
 
-	intel_color_set_csc(pipe_config);
-
 	intel_crtc->active = true;
 
 	intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, true);
@@ -6266,6 +6264,7 @@ static void valleyview_crtc_enable(struct intel_crtc_state *pipe_config,
 	i9xx_pfit_enable(pipe_config);
 
 	intel_color_load_luts(pipe_config);
+	intel_color_commit(pipe_config);
 
 	dev_priv->display.initial_watermarks(old_intel_state,
 					     pipe_config);
@@ -6322,6 +6321,7 @@ static void i9xx_crtc_enable(struct intel_crtc_state *pipe_config,
 	i9xx_pfit_enable(pipe_config);
 
 	intel_color_load_luts(pipe_config);
+	intel_color_commit(pipe_config);
 
 	if (dev_priv->display.initial_watermarks != NULL)
 		dev_priv->display.initial_watermarks(old_intel_state,
@@ -13742,10 +13742,8 @@ static void intel_begin_crtc_commit(struct drm_crtc *crtc,
 
 	if (!modeset &&
 	    (intel_cstate->base.color_mgmt_changed ||
-	     intel_cstate->update_pipe)) {
-		intel_color_set_csc(intel_cstate);
+	     intel_cstate->update_pipe))
 		intel_color_load_luts(intel_cstate);
-	}
 
 	/* Perform vblank evasion around commit operation */
 	intel_pipe_update_start(intel_cstate);
@@ -13753,6 +13751,10 @@ static void intel_begin_crtc_commit(struct drm_crtc *crtc,
 	if (modeset)
 		goto out;
 
+	if (intel_cstate->base.color_mgmt_changed ||
+	    intel_cstate->update_pipe)
+		intel_color_commit(intel_cstate);
+
 	if (intel_cstate->update_pipe)
 		intel_update_pipe_config(old_intel_cstate, intel_cstate);
 	else if (INTEL_GEN(dev_priv) >= 9)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 2c299456fdbd..aa003ab8c4aa 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -2389,7 +2389,7 @@ int intel_plane_atomic_check_with_state(const struct intel_crtc_state *old_crtc_
 /* intel_color.c */
 void intel_color_init(struct intel_crtc *crtc);
 int intel_color_check(struct intel_crtc_state *crtc_state);
-void intel_color_set_csc(const struct intel_crtc_state *crtc_state);
+void intel_color_commit(const struct intel_crtc_state *crtc_state);
 void intel_color_load_luts(const struct intel_crtc_state *crtc_state);
 
 /* intel_lspcon.c */
-- 
cgit v1.2.3


From 051a6d8d3ca09da2cfcbf15ca3dbe6afe330d51f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 5 Feb 2019 18:08:41 +0200
Subject: drm/i915: Move LUT programming to happen after vblank waits
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The LUTs are single buffered so we should program them after
the double buffered pipe updates have been latched by the
hardware.

We'll also fix up the IPS vs. split gamma w/a to do the IPS
disable like everyone else. Note that this is currently dead
code as we don't use the split gamma mode on HSW, but that
will be fixed up shortly.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-7-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_color.c   | 25 +------------------
 drivers/gpu/drm/i915/intel_display.c | 47 +++++++++++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_color.c b/drivers/gpu/drm/i915/intel_color.c
index c364cb7cb14c..71a1f12c6b2a 100644
--- a/drivers/gpu/drm/i915/intel_color.c
+++ b/drivers/gpu/drm/i915/intel_color.c
@@ -361,29 +361,6 @@ static void hsw_color_commit(const struct intel_crtc_state *crtc_state)
 	ilk_load_csc_matrix(crtc_state);
 }
 
-/* Loads the legacy palette/gamma unit for the CRTC on Haswell. */
-static void haswell_load_luts(const struct intel_crtc_state *crtc_state)
-{
-	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
-	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
-	bool reenable_ips = false;
-
-	/*
-	 * Workaround : Do not read or write the pipe palette/gamma data while
-	 * GAMMA_MODE is configured for split gamma and IPS_CTL has IPS enabled.
-	 */
-	if (IS_HASWELL(dev_priv) && crtc_state->ips_enabled &&
-	    (crtc_state->gamma_mode == GAMMA_MODE_MODE_SPLIT)) {
-		hsw_disable_ips(crtc_state);
-		reenable_ips = true;
-	}
-
-	i9xx_load_luts(crtc_state);
-
-	if (reenable_ips)
-		hsw_enable_ips(crtc_state);
-}
-
 static void bdw_load_degamma_lut(const struct intel_crtc_state *crtc_state)
 {
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
@@ -675,7 +652,7 @@ void intel_color_init(struct intel_crtc *crtc)
 	if (IS_CHERRYVIEW(dev_priv)) {
 		dev_priv->display.load_luts = cherryview_load_luts;
 	} else if (IS_HASWELL(dev_priv)) {
-		dev_priv->display.load_luts = haswell_load_luts;
+		dev_priv->display.load_luts = i9xx_load_luts;
 		dev_priv->display.color_commit = hsw_color_commit;
 	} else if (IS_BROADWELL(dev_priv) || IS_GEN9_BC(dev_priv) ||
 		   IS_BROXTON(dev_priv)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 63b61f61eccc..2802674c94ef 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -5373,24 +5373,54 @@ intel_pre_disable_primary_noatomic(struct drm_crtc *crtc)
 static bool hsw_pre_update_disable_ips(const struct intel_crtc_state *old_crtc_state,
 				       const struct intel_crtc_state *new_crtc_state)
 {
+	struct intel_crtc *crtc = to_intel_crtc(new_crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+
 	if (!old_crtc_state->ips_enabled)
 		return false;
 
 	if (needs_modeset(&new_crtc_state->base))
 		return true;
 
+	/*
+	 * Workaround : Do not read or write the pipe palette/gamma data while
+	 * GAMMA_MODE is configured for split gamma and IPS_CTL has IPS enabled.
+	 *
+	 * Disable IPS before we program the LUT.
+	 */
+	if (IS_HASWELL(dev_priv) &&
+	    (new_crtc_state->base.color_mgmt_changed ||
+	     new_crtc_state->update_pipe) &&
+	    new_crtc_state->gamma_mode == GAMMA_MODE_MODE_SPLIT)
+		return true;
+
 	return !new_crtc_state->ips_enabled;
 }
 
 static bool hsw_post_update_enable_ips(const struct intel_crtc_state *old_crtc_state,
 				       const struct intel_crtc_state *new_crtc_state)
 {
+	struct intel_crtc *crtc = to_intel_crtc(new_crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+
 	if (!new_crtc_state->ips_enabled)
 		return false;
 
 	if (needs_modeset(&new_crtc_state->base))
 		return true;
 
+	/*
+	 * Workaround : Do not read or write the pipe palette/gamma data while
+	 * GAMMA_MODE is configured for split gamma and IPS_CTL has IPS enabled.
+	 *
+	 * Re-enable IPS after the LUT has been programmed.
+	 */
+	if (IS_HASWELL(dev_priv) &&
+	    (new_crtc_state->base.color_mgmt_changed ||
+	     new_crtc_state->update_pipe) &&
+	    new_crtc_state->gamma_mode == GAMMA_MODE_MODE_SPLIT)
+		return true;
+
 	/*
 	 * We can't read out IPS on broadwell, assume the worst and
 	 * forcibly enable IPS on the first fastset.
@@ -11136,7 +11166,7 @@ static int intel_crtc_atomic_check(struct drm_crtc *crtc,
 			return ret;
 	}
 
-	if (crtc_state->color_mgmt_changed) {
+	if (mode_changed || crtc_state->color_mgmt_changed) {
 		ret = intel_color_check(pipe_config);
 		if (ret)
 			return ret;
@@ -13225,6 +13255,16 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 	 */
 	drm_atomic_helper_wait_for_flip_done(dev, state);
 
+	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
+		new_intel_crtc_state = to_intel_crtc_state(new_crtc_state);
+
+		if (new_crtc_state->active &&
+		    !needs_modeset(new_crtc_state) &&
+		    (new_intel_crtc_state->base.color_mgmt_changed ||
+		     new_intel_crtc_state->update_pipe))
+			intel_color_load_luts(new_intel_crtc_state);
+	}
+
 	/*
 	 * Now that the vblank has passed, we can go ahead and program the
 	 * optimal watermarks on platforms that need two-step watermark
@@ -13740,11 +13780,6 @@ static void intel_begin_crtc_commit(struct drm_crtc *crtc,
 		intel_atomic_get_new_crtc_state(old_intel_state, intel_crtc);
 	bool modeset = needs_modeset(&intel_cstate->base);
 
-	if (!modeset &&
-	    (intel_cstate->base.color_mgmt_changed ||
-	     intel_cstate->update_pipe))
-		intel_color_load_luts(intel_cstate);
-
 	/* Perform vblank evasion around commit operation */
 	intel_pipe_update_start(intel_cstate);
 
-- 
cgit v1.2.3


From c09d39166d8a3f3788680b32dbb0a40a70de32e2 Mon Sep 17 00:00:00 2001
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Thu, 7 Feb 2019 12:45:32 -0800
Subject: drm/i915: Update DRIVER_DATE to 20190207

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 08d75d57b34e..9adc7bb9e69c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -91,8 +91,8 @@
 
 #define DRIVER_NAME		"i915"
 #define DRIVER_DESC		"Intel Graphics"
-#define DRIVER_DATE		"20190202"
-#define DRIVER_TIMESTAMP	1549095268
+#define DRIVER_DATE		"20190207"
+#define DRIVER_TIMESTAMP	1549572331
 
 /* Use I915_STATE_WARN(x) and I915_STATE_WARN_ON() (rather than WARN() and
  * WARN_ON()) for hw state sanity checks to check for unexpected conditions
-- 
cgit v1.2.3