diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-09-01 11:26:46 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-09-01 11:26:46 -0700 |
commit | 477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6 (patch) | |
tree | 1897dd1de49e1ea24897163533e2d8ead5dad0ad /drivers/gpu/drm/i915/gt | |
parent | 835d31d319d9c8c4eb6cac074643360ba0ecab10 (diff) | |
parent | 8f0284f190e6a0aa09015090568c03f18288231a (diff) | |
download | linux-477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6.tar.gz linux-477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6.tar.bz2 linux-477f70cd2a67904e04c2c2b9bd0fa2e95222f2f6.zip |
Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- i915 has seen a lot of refactoring and uAPI cleanups due to a
change in the upstream direction going forward
This has all been audited with known userspace, but there may be
some pitfalls that were missed.
- i915 now uses common TTM to enable discrete memory on DG1/2 GPUs
- i915 enables Jasper and Elkhart Lake by default and has preliminary
XeHP/DG2 support
- amdgpu adds support for Cyan Skillfish
- lots of implicit fencing rules documented and fixed up in drivers
- msm now uses the core scheduler
- the irq midlayer has been removed for non-legacy drivers
- the sysfb code now works on more than x86.
Otherwise the usual smattering of stuff everywhere, panels, bridges,
refactorings.
Detailed summary:
core:
- extract i915 eDP backlight into core
- DP aux bus support
- drm_device.irq_enabled removed
- port drivers to native irq interfaces
- export gem shadow plane handling for vgem
- print proper driver name in framebuffer registration
- driver fixes for implicit fencing rules
- ARM fixed rate compression modifier added
- updated fb damage handling
- rmfb ioctl logging/docs
- drop drm_gem_object_put_locked
- define DRM_FORMAT_MAX_PLANES
- add gem fb vmap/vunmap helpers
- add lockdep_assert(once) helpers
- mark drm irq midlayer as legacy
- use offset adjusted bo mapping conversion
vgaarb:
- cleanups
fbdev:
- extend efifb handling to all arches
- div by 0 fixes for multiple drivers
udmabuf:
- add hugepage mapping support
dma-buf:
- non-dynamic exporter fixups
- document implicit fencing rules
amdgpu:
- Initial Cyan Skillfish support
- switch virtual DCE over to vkms based atomic
- VCN/JPEG power down fixes
- NAVI PCIE link handling fixes
- AMD HDMI freesync fixes
- Yellow Carp + Beige Goby fixes
- Clockgating/S0ix/SMU/EEPROM fixes
- embed hw fence in job
- rework dma-resv handling
- ensure eviction to system ram
amdkfd:
- uapi: SVM address range query added
- sysfs leak fix
- GPUVM TLB optimizations
- vmfault/migration counters
i915:
- Enable JSL and EHL by default
- preliminary XeHP/DG2 support
- remove all CNL support (never shipped)
- move to TTM for discrete memory support
- allow mixed object mmap handling
- GEM uAPI spring cleaning
- add I915_MMAP_OBJECT_FIXED
- reinstate ADL-P mmap ioctls
- drop a bunch of unused by userspace features
- disable and remove GPU relocations
- revert some i915 misfeatures
- major refactoring of GuC for Gen11+
- execbuffer object locking separate step
- reject caching/set-domain on discrete
- Enable pipe DMC loading on XE-LPD and ADL-P
- add PSF GV point support
- Refactor and fix DDI buffer translations
- Clean up FBC CFB allocation code
- Finish INTEL_GEN() and friends macro conversions
nouveau:
- add eDP backlight support
- implicit fence fix
msm:
- a680/7c3 support
- drm/scheduler conversion
panfrost:
- rework GPU reset
virtio:
- fix fencing for planes
ast:
- add detect support
bochs:
- move to tiny GPU driver
vc4:
- use hotplug irqs
- HDMI codec support
vmwgfx:
- use internal vmware device headers
ingenic:
- demidlayering irq
rcar-du:
- shutdown fixes
- convert to bridge connector helpers
zynqmp-dsub:
- misc fixes
mgag200:
- convert PLL handling to atomic
mediatek:
- MT8133 AAL support
- gem mmap object support
- MT8167 support
etnaviv:
- NXP Layerscape LS1028A SoC support
- GEM mmap cleanups
tegra:
- new user API
exynos:
- missing unlock fix
- build warning fix
- use refcount_t"
* tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits)
drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box
drm/amd/display: Remove duplicate dml init
drm/amd/display: Update bounding box states (v2)
drm/amd/display: Update number of DCN3 clock states
drm/amdgpu: disable GFX CGCG in aldebaran
drm/amdgpu: Clear RAS interrupt status on aldebaran
drm/amdgpu: Add support for RAS XGMI err query
drm/amdkfd: Account for SH/SE count when setting up cu masks.
drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain
drm/amdgpu: drop redundant cancel_delayed_work_sync call
drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
drm/amdkfd: map SVM range with correct access permission
drm/amdkfd: check access permisson to restore retry fault
drm/amdgpu: Update RAS XGMI Error Query
drm/amdgpu: Add driver infrastructure for MCA RAS
drm/amd/display: Add Logging for HDMI color depth information
drm/amd/amdgpu: consolidate PSP TA init shared buf functions
drm/amd/amdgpu: add name field back to ras_common_if
drm/amdgpu: Fix build with missing pm_suspend_target_state module export
...
Diffstat (limited to 'drivers/gpu/drm/i915/gt')
90 files changed, 9846 insertions, 2437 deletions
diff --git a/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c b/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c index 4270b5a34a83..d6f5836396f8 100644 --- a/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c @@ -437,20 +437,20 @@ static int frequency_show(struct seq_file *m, void *unused) max_freq = (IS_GEN9_LP(i915) ? rp_state_cap >> 0 : rp_state_cap >> 16) & 0xff; max_freq *= (IS_GEN9_BC(i915) || - GRAPHICS_VER(i915) >= 10 ? GEN9_FREQ_SCALER : 1); + GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1); seq_printf(m, "Lowest (RPN) frequency: %dMHz\n", intel_gpu_freq(rps, max_freq)); max_freq = (rp_state_cap & 0xff00) >> 8; max_freq *= (IS_GEN9_BC(i915) || - GRAPHICS_VER(i915) >= 10 ? GEN9_FREQ_SCALER : 1); + GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1); seq_printf(m, "Nominal (RP1) frequency: %dMHz\n", intel_gpu_freq(rps, max_freq)); max_freq = (IS_GEN9_LP(i915) ? rp_state_cap >> 16 : rp_state_cap >> 0) & 0xff; max_freq *= (IS_GEN9_BC(i915) || - GRAPHICS_VER(i915) >= 10 ? GEN9_FREQ_SCALER : 1); + GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1); seq_printf(m, "Max non-overclocked (RP0) frequency: %dMHz\n", intel_gpu_freq(rps, max_freq)); seq_printf(m, "Max overclocked frequency: %dMHz\n", @@ -500,7 +500,7 @@ static int llc_show(struct seq_file *m, void *data) min_gpu_freq = rps->min_freq; max_gpu_freq = rps->max_freq; - if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 10) { + if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 11) { /* Convert GT frequency to 50 HZ units */ min_gpu_freq /= GEN9_FREQ_SCALER; max_gpu_freq /= GEN9_FREQ_SCALER; @@ -518,7 +518,7 @@ static int llc_show(struct seq_file *m, void *data) intel_gpu_freq(rps, (gpu_freq * (IS_GEN9_BC(i915) || - GRAPHICS_VER(i915) >= 10 ? + GRAPHICS_VER(i915) >= 11 ? GEN9_FREQ_SCALER : 1))), ((ia_freq >> 0) & 0xff) * 100, ((ia_freq >> 8) & 0xff) * 100); diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 94e0a5669f90..461844dffd7e 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -42,7 +42,7 @@ int gen8_emit_flush_rcs(struct i915_request *rq, u32 mode) vf_flush_wa = true; /* WaForGAMHang:kbl */ - if (IS_KBL_GT_STEP(rq->engine->i915, 0, STEP_B0)) + if (IS_KBL_GT_STEP(rq->engine->i915, 0, STEP_C0)) dc_flush_wa = true; } @@ -208,7 +208,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode) flags |= PIPE_CONTROL_FLUSH_L3; flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; - /* Wa_1409600907:tgl */ + /* Wa_1409600907:tgl,adl-p */ flags |= PIPE_CONTROL_DEPTH_STALL; flags |= PIPE_CONTROL_DC_FLUSH_ENABLE; flags |= PIPE_CONTROL_FLUSH_ENABLE; @@ -279,7 +279,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode) if (mode & EMIT_INVALIDATE) aux_inv = rq->engine->mask & ~BIT(BCS0); if (aux_inv) - cmd += 2 * hweight8(aux_inv) + 2; + cmd += 2 * hweight32(aux_inv) + 2; cs = intel_ring_begin(rq, cmd); if (IS_ERR(cs)) @@ -313,9 +313,8 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode) struct intel_engine_cs *engine; unsigned int tmp; - *cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv)); - for_each_engine_masked(engine, rq->engine->gt, - aux_inv, tmp) { + *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv)); + for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) { *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); *cs++ = AUX_INV; } @@ -506,7 +505,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT; *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (intel_engine_has_semaphores(rq->engine) && + !intel_uc_uses_guc_submission(&rq->engine->gt->uc)) cs = emit_preempt_busywait(rq, cs); rq->tail = intel_ring_offset(rq, cs); @@ -598,7 +598,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT; *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (intel_engine_has_semaphores(rq->engine) && + !intel_uc_uses_guc_submission(&rq->engine->gt->uc)) cs = gen12_emit_preempt_busywait(rq, cs); rq->tail = intel_ring_offset(rq, cs); diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index da4f5eb43ac2..6e0e52eeb87a 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -358,6 +358,54 @@ static void gen8_ppgtt_alloc(struct i915_address_space *vm, &start, start + length, vm->top); } +static void __gen8_ppgtt_foreach(struct i915_address_space *vm, + struct i915_page_directory *pd, + u64 *start, u64 end, int lvl, + void (*fn)(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data), + void *data) +{ + unsigned int idx, len; + + len = gen8_pd_range(*start, end, lvl--, &idx); + + spin_lock(&pd->lock); + do { + struct i915_page_table *pt = pd->entry[idx]; + + atomic_inc(&pt->used); + spin_unlock(&pd->lock); + + if (lvl) { + __gen8_ppgtt_foreach(vm, as_pd(pt), start, end, lvl, + fn, data); + } else { + fn(vm, pt, data); + *start += gen8_pt_count(*start, end); + } + + spin_lock(&pd->lock); + atomic_dec(&pt->used); + } while (idx++, --len); + spin_unlock(&pd->lock); +} + +static void gen8_ppgtt_foreach(struct i915_address_space *vm, + u64 start, u64 length, + void (*fn)(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data), + void *data) +{ + start >>= GEN8_PTE_SHIFT; + length >>= GEN8_PTE_SHIFT; + + __gen8_ppgtt_foreach(vm, i915_vm_to_ppgtt(vm)->pd, + &start, start + length, vm->top, + fn, data); +} + static __always_inline u64 gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, struct i915_page_directory *pdp, @@ -552,6 +600,24 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, } } +static void gen8_ppgtt_insert_entry(struct i915_address_space *vm, + dma_addr_t addr, + u64 offset, + enum i915_cache_level level, + u32 flags) +{ + u64 idx = offset >> GEN8_PTE_SHIFT; + struct i915_page_directory * const pdp = + gen8_pdp_for_page_index(vm, idx); + struct i915_page_directory *pd = + i915_pd_entry(pdp, gen8_pd_index(idx, 2)); + gen8_pte_t *vaddr; + + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags); + clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr)); +} + static int gen8_init_scratch(struct i915_address_space *vm) { u32 pte_flags; @@ -731,8 +797,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND; ppgtt->vm.insert_entries = gen8_ppgtt_insert; + ppgtt->vm.insert_page = gen8_ppgtt_insert_entry; ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc; ppgtt->vm.clear_range = gen8_ppgtt_clear; + ppgtt->vm.foreach = gen8_ppgtt_foreach; ppgtt->vm.pte_encode = gen8_pte_encode; diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb..209cf265bf74 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -15,28 +15,14 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" -static bool irq_enable(struct intel_engine_cs *engine) +static bool irq_enable(struct intel_breadcrumbs *b) { - if (!engine->irq_enable) - return false; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_enable(engine); - spin_unlock(&engine->gt->irq_lock); - - return true; + return intel_engine_irq_enable(b->irq_engine); } -static void irq_disable(struct intel_engine_cs *engine) +static void irq_disable(struct intel_breadcrumbs *b) { - if (!engine->irq_disable) - return; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_disable(engine); - spin_unlock(&engine->gt->irq_lock); + intel_engine_irq_disable(b->irq_engine); } static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) WRITE_ONCE(b->irq_armed, true); /* Requests may have completed before we could enable the interrupt. */ - if (!b->irq_enabled++ && irq_enable(b->irq_engine)) + if (!b->irq_enabled++ && b->irq_enable(b)) irq_work_queue(&b->irq_work); } @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) { GEM_BUG_ON(!b->irq_enabled); if (!--b->irq_enabled) - irq_disable(b->irq_engine); + b->irq_disable(b); WRITE_ONCE(b->irq_armed, false); intel_gt_pm_put_async(b->irq_engine->gt); @@ -259,6 +245,9 @@ static void signal_irq_work(struct irq_work *work) llist_entry(signal, typeof(*rq), signal_node); struct list_head cb_list; + if (rq->engine->sched_engine->retire_inflight_request_prio) + rq->engine->sched_engine->retire_inflight_request_prio(rq); + spin_lock(&rq->lock); list_replace(&rq->fence.cb_list, &cb_list); __dma_fence_signal__timestamp(&rq->fence, timestamp); @@ -281,7 +270,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL; - b->irq_engine = irq_engine; + kref_init(&b->ref); spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers); @@ -290,6 +279,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) spin_lock_init(&b->irq_lock); init_irq_work(&b->irq_work, signal_irq_work); + b->irq_engine = irq_engine; + b->irq_enable = irq_enable; + b->irq_disable = irq_disable; + return b; } @@ -303,9 +296,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) spin_lock_irqsave(&b->irq_lock, flags); if (b->irq_enabled) - irq_enable(b->irq_engine); + b->irq_enable(b); else - irq_disable(b->irq_engine); + b->irq_disable(b); spin_unlock_irqrestore(&b->irq_lock, flags); } @@ -325,11 +318,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) } } -void intel_breadcrumbs_free(struct intel_breadcrumbs *b) +void intel_breadcrumbs_free(struct kref *kref) { + struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref); + irq_work_sync(&b->irq_work); GEM_BUG_ON(!list_empty(&b->signalers)); GEM_BUG_ON(b->irq_armed); + kfree(b); } diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..be0d4f379a85 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -9,7 +9,7 @@ #include <linux/atomic.h> #include <linux/irq_work.h> -#include "intel_engine_types.h" +#include "intel_breadcrumbs_types.h" struct drm_printer; struct i915_request; @@ -17,7 +17,7 @@ struct intel_breadcrumbs; struct intel_breadcrumbs * intel_breadcrumbs_create(struct intel_engine_cs *irq_engine); -void intel_breadcrumbs_free(struct intel_breadcrumbs *b); +void intel_breadcrumbs_free(struct kref *kref); void intel_breadcrumbs_reset(struct intel_breadcrumbs *b); void __intel_breadcrumbs_park(struct intel_breadcrumbs *b); @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request); void intel_context_remove_breadcrumbs(struct intel_context *ce, struct intel_breadcrumbs *b); +static inline struct intel_breadcrumbs * +intel_breadcrumbs_get(struct intel_breadcrumbs *b) +{ + kref_get(&b->ref); + return b; +} + +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b) +{ + kref_put(&b->ref, intel_breadcrumbs_free); +} + #endif /* __INTEL_BREADCRUMBS__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index 3a084ce8ff5e..72dfd3748c4c 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -7,10 +7,13 @@ #define __INTEL_BREADCRUMBS_TYPES__ #include <linux/irq_work.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/types.h> +#include "intel_engine_types.h" + /* * Rather than have every client wait upon all user interrupts, * with the herd waking after every interrupt and each doing the @@ -29,6 +32,7 @@ * the overhead of waking that client is much preferred. */ struct intel_breadcrumbs { + struct kref ref; atomic_t active; spinlock_t signalers_lock; /* protects the list of signalers */ @@ -42,7 +46,10 @@ struct intel_breadcrumbs { bool irq_armed; /* Not all breadcrumbs are attached to physical HW */ + intel_engine_mask_t engine_mask; struct intel_engine_cs *irq_engine; + bool (*irq_enable)(struct intel_breadcrumbs *b); + void (*irq_disable)(struct intel_breadcrumbs *b); }; #endif /* __INTEL_BREADCRUMBS_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 4033184f13b9..745e84c72c90 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -7,28 +7,26 @@ #include "gem/i915_gem_pm.h" #include "i915_drv.h" -#include "i915_globals.h" +#include "i915_trace.h" #include "intel_context.h" #include "intel_engine.h" #include "intel_engine_pm.h" #include "intel_ring.h" -static struct i915_global_context { - struct i915_global base; - struct kmem_cache *slab_ce; -} global; +static struct kmem_cache *slab_ce; static struct intel_context *intel_context_alloc(void) { - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); + return kmem_cache_zalloc(slab_ce, GFP_KERNEL); } static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); - kmem_cache_free(global.slab_ce, ce); + trace_intel_context_free(ce); + kmem_cache_free(slab_ce, ce); } void intel_context_free(struct intel_context *ce) @@ -46,6 +44,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM); intel_context_init(ce, engine); + trace_intel_context_create(ce); return ce; } @@ -80,7 +79,7 @@ static int intel_context_active_acquire(struct intel_context *ce) __i915_active_acquire(&ce->active); - if (intel_context_is_barrier(ce)) + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0; /* Preallocate tracking nodes */ @@ -268,6 +267,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce, GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */ + trace_intel_context_do_pin(ce); + err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin: @@ -306,9 +307,9 @@ retry: return err; } -void intel_context_unpin(struct intel_context *ce) +void __intel_context_do_unpin(struct intel_context *ce, int sub) { - if (!atomic_dec_and_test(&ce->pin_count)) + if (!atomic_sub_and_test(sub, &ce->pin_count)) return; CE_TRACE(ce, "unpin\n"); @@ -323,6 +324,7 @@ void intel_context_unpin(struct intel_context *ce) */ intel_context_get(ce); intel_context_active_release(ce); + trace_intel_context_do_unpin(ce); intel_context_put(ce); } @@ -360,6 +362,12 @@ static int __intel_context_active(struct i915_active *active) return 0; } +static int sw_fence_dummy_notify(struct i915_sw_fence *sf, + enum i915_sw_fence_notify state) +{ + return NOTIFY_DONE; +} + void intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) { @@ -371,7 +379,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) ce->engine = engine; ce->ops = engine->cops; ce->sseu = engine->sseu; - ce->ring = __intel_context_ring_size(SZ_4K); + ce->ring = NULL; + ce->ring_size = SZ_4K; ewma_runtime_init(&ce->runtime.avg); @@ -383,6 +392,22 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) mutex_init(&ce->pin_mutex); + spin_lock_init(&ce->guc_state.lock); + INIT_LIST_HEAD(&ce->guc_state.fences); + + spin_lock_init(&ce->guc_active.lock); + INIT_LIST_HEAD(&ce->guc_active.requests); + + ce->guc_id = GUC_INVALID_LRC_ID; + INIT_LIST_HEAD(&ce->guc_id_link); + + /* + * Initialize fence to be complete as this is expected to be complete + * unless there is a pending schedule disable outstanding. + */ + i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify); + i915_sw_fence_commit(&ce->guc_blocked); + i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0); } @@ -397,28 +422,17 @@ void intel_context_fini(struct intel_context *ce) i915_active_fini(&ce->active); } -static void i915_global_context_shrink(void) -{ - kmem_cache_shrink(global.slab_ce); -} - -static void i915_global_context_exit(void) +void i915_context_module_exit(void) { - kmem_cache_destroy(global.slab_ce); + kmem_cache_destroy(slab_ce); } -static struct i915_global_context global = { { - .shrink = i915_global_context_shrink, - .exit = i915_global_context_exit, -} }; - -int __init i915_global_context_init(void) +int __init i915_context_module_init(void) { - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); - if (!global.slab_ce) + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN); + if (!slab_ce) return -ENOMEM; - i915_global_register(&global.base); return 0; } @@ -499,6 +513,26 @@ retry: return rq; } +struct i915_request *intel_context_find_active_request(struct intel_context *ce) +{ + struct i915_request *rq, *active = NULL; + unsigned long flags; + + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine)); + + spin_lock_irqsave(&ce->guc_active.lock, flags); + list_for_each_entry_reverse(rq, &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + break; + + active = rq; + } + spin_unlock_irqrestore(&ce->guc_active.lock, flags); + + return active; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_context.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index f83a73a2b39f..c41098950746 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "i915_trace.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce); \ @@ -30,6 +31,9 @@ void intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine); void intel_context_fini(struct intel_context *ce); +void i915_context_module_exit(void); +int i915_context_module_init(void); + struct intel_context * intel_context_create(struct intel_engine_cs *engine); @@ -69,6 +73,13 @@ intel_context_is_pinned(struct intel_context *ce) return atomic_read(&ce->pin_count); } +static inline void intel_context_cancel_request(struct intel_context *ce, + struct i915_request *rq) +{ + GEM_BUG_ON(!ce->ops->cancel_request); + return ce->ops->cancel_request(ce, rq); +} + /** * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status * @ce - the context @@ -113,7 +124,32 @@ static inline void __intel_context_pin(struct intel_context *ce) atomic_inc(&ce->pin_count); } -void intel_context_unpin(struct intel_context *ce); +void __intel_context_do_unpin(struct intel_context *ce, int sub); + +static inline void intel_context_sched_disable_unpin(struct intel_context *ce) +{ + __intel_context_do_unpin(ce, 2); +} + +static inline void intel_context_unpin(struct intel_context *ce) +{ + if (!ce->ops->sched_disable) { + __intel_context_do_unpin(ce, 1); + } else { + /* + * Move ownership of this pin to the scheduling disable which is + * an async operation. When that operation completes the above + * intel_context_sched_disable_unpin is called potentially + * unpinning the context. + */ + while (!atomic_add_unless(&ce->pin_count, -1, 1)) { + if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) { + ce->ops->sched_disable(ce); + break; + } + } + } +} void intel_context_enter_engine(struct intel_context *ce); void intel_context_exit_engine(struct intel_context *ce); @@ -175,10 +211,8 @@ int intel_context_prepare_remote_request(struct intel_context *ce, struct i915_request *intel_context_create_request(struct intel_context *ce); -static inline struct intel_ring *__intel_context_ring_size(u64 sz) -{ - return u64_to_ptr(struct intel_ring, sz); -} +struct i915_request * +intel_context_find_active_request(struct intel_context *ce); static inline bool intel_context_is_barrier(const struct intel_context *ce) { @@ -220,6 +254,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce) return test_and_set_bit(CONTEXT_BANNED, &ce->flags); } +static inline bool intel_context_ban(struct intel_context *ce, + struct i915_request *rq) +{ + bool ret = intel_context_set_banned(ce); + + trace_intel_context_ban(ce); + if (ce->ops->ban) + ce->ops->ban(ce, rq); + + return ret; +} + static inline bool intel_context_force_single_submission(const struct intel_context *ce) { diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c b/drivers/gpu/drm/i915/gt/intel_context_param.c deleted file mode 100644 index 65dcd090245d..000000000000 --- a/drivers/gpu/drm/i915/gt/intel_context_param.c +++ /dev/null @@ -1,63 +0,0 @@ -// SPDX-License-Identifier: MIT -/* - * Copyright © 2019 Intel Corporation - */ - -#include "i915_active.h" -#include "intel_context.h" -#include "intel_context_param.h" -#include "intel_ring.h" - -int intel_context_set_ring_size(struct intel_context *ce, long sz) -{ - int err; - - if (intel_context_lock_pinned(ce)) - return -EINTR; - - err = i915_active_wait(&ce->active); - if (err < 0) - goto unlock; - - if (intel_context_is_pinned(ce)) { - err = -EBUSY; /* In active use, come back later! */ - goto unlock; - } - - if (test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) { - struct intel_ring *ring; - - /* Replace the existing ringbuffer */ - ring = intel_engine_create_ring(ce->engine, sz); - if (IS_ERR(ring)) { - err = PTR_ERR(ring); - goto unlock; - } - - intel_ring_put(ce->ring); - ce->ring = ring; - - /* Context image will be updated on next pin */ - } else { - ce->ring = __intel_context_ring_size(sz); - } - -unlock: - intel_context_unlock_pinned(ce); - return err; -} - -long intel_context_get_ring_size(struct intel_context *ce) -{ - long sz = (unsigned long)READ_ONCE(ce->ring); - - if (test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) { - if (intel_context_lock_pinned(ce)) - return -EINTR; - - sz = ce->ring->size; - intel_context_unlock_pinned(ce); - } - - return sz; -} diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h index 3ecacc675f41..0c69cb42d075 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_param.h +++ b/drivers/gpu/drm/i915/gt/intel_context_param.h @@ -10,14 +10,10 @@ #include "intel_context.h" -int intel_context_set_ring_size(struct intel_context *ce, long sz); -long intel_context_get_ring_size(struct intel_context *ce); - -static inline int +static inline void intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us) { ce->watchdog.timeout_us = timeout_us; - return 0; } #endif /* INTEL_CONTEXT_PARAM_H */ diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ed8c447a7346..e54351a170e2 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -13,12 +13,14 @@ #include <linux/types.h> #include "i915_active_types.h" +#include "i915_sw_fence.h" #include "i915_utils.h" #include "intel_engine_types.h" #include "intel_sseu.h" -#define CONTEXT_REDZONE POISON_INUSE +#include "uc/intel_guc_fwif.h" +#define CONTEXT_REDZONE POISON_INUSE DECLARE_EWMA(runtime, 3, 8); struct i915_gem_context; @@ -35,16 +37,29 @@ struct intel_context_ops { int (*alloc)(struct intel_context *ce); + void (*ban)(struct intel_context *ce, struct i915_request *rq); + int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); int (*pin)(struct intel_context *ce, void *vaddr); void (*unpin)(struct intel_context *ce); void (*post_unpin)(struct intel_context *ce); + void (*cancel_request)(struct intel_context *ce, + struct i915_request *rq); + void (*enter)(struct intel_context *ce); void (*exit)(struct intel_context *ce); + void (*sched_disable)(struct intel_context *ce); + void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref); + + /* virtual engine/context interface */ + struct intel_context *(*create_virtual)(struct intel_engine_cs **engine, + unsigned int count); + struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine, + unsigned int sibling); }; struct intel_context { @@ -82,6 +97,7 @@ struct intel_context { spinlock_t signal_lock; /* protects signals, the list of requests */ struct i915_vma *state; + u32 ring_size; struct intel_ring *ring; struct intel_timeline *timeline; @@ -95,6 +111,7 @@ struct intel_context { #define CONTEXT_BANNED 6 #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 +#define CONTEXT_LRCA_DIRTY 9 struct { u64 timeout_us; @@ -136,6 +153,51 @@ struct intel_context { struct intel_sseu sseu; u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + + struct { + /** lock: protects everything in guc_state */ + spinlock_t lock; + /** + * sched_state: scheduling state of this context using GuC + * submission + */ + u16 sched_state; + /* + * fences: maintains of list of requests that have a submit + * fence related to GuC submission + */ + struct list_head fences; + } guc_state; + + struct { + /** lock: protects everything in guc_active */ + spinlock_t lock; + /** requests: active requests on this context */ + struct list_head requests; + } guc_active; + + /* GuC scheduling state flags that do not require a lock. */ + atomic_t guc_sched_state_no_lock; + + /* GuC LRC descriptor ID */ + u16 guc_id; + + /* GuC LRC descriptor reference count */ + atomic_t guc_id_ref; + + /* + * GuC ID link - in list when unpinned but guc_id still valid in GuC + */ + struct list_head guc_id_link; + + /* GuC context blocked fence */ + struct i915_sw_fence guc_blocked; + + /* + * GuC priority management + */ + u8 guc_prio; + u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM]; }; #endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 8d9184920c51..87579affb952 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -19,7 +19,9 @@ #include "intel_workarounds.h" struct drm_printer; +struct intel_context; struct intel_gt; +struct lock_class_key; /* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill, * but keeps the logic simple. Indeed, the whole purpose of this macro is just @@ -123,20 +125,6 @@ execlists_active(const struct intel_engine_execlists *execlists) return active; } -static inline void -execlists_active_lock_bh(struct intel_engine_execlists *execlists) -{ - local_bh_disable(); /* prevent local softirq and lock recursion */ - tasklet_lock(&execlists->tasklet); -} - -static inline void -execlists_active_unlock_bh(struct intel_engine_execlists *execlists) -{ - tasklet_unlock(&execlists->tasklet); - local_bh_enable(); /* restore softirq, and kick ksoftirqd! */ -} - struct i915_request * execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists); @@ -186,11 +174,12 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value) #define I915_GEM_HWS_PREEMPT_ADDR (I915_GEM_HWS_PREEMPT * sizeof(u32)) #define I915_GEM_HWS_SEQNO 0x40 #define I915_GEM_HWS_SEQNO_ADDR (I915_GEM_HWS_SEQNO * sizeof(u32)) +#define I915_GEM_HWS_MIGRATE (0x42 * sizeof(u32)) #define I915_GEM_HWS_SCRATCH 0x80 #define I915_HWS_CSB_BUF0_INDEX 0x10 #define I915_HWS_CSB_WRITE_INDEX 0x1f -#define CNL_HWS_CSB_WRITE_INDEX 0x2f +#define ICL_HWS_CSB_WRITE_INDEX 0x2f void intel_engine_stop(struct intel_engine_cs *engine); void intel_engine_cleanup(struct intel_engine_cs *engine); @@ -223,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine, void intel_engine_init_execlists(struct intel_engine_cs *engine); +bool intel_engine_irq_enable(struct intel_engine_cs *engine); +void intel_engine_irq_disable(struct intel_engine_cs *engine); + static inline void __intel_engine_reset(struct intel_engine_cs *engine, bool stalled) { @@ -248,17 +240,27 @@ __printf(3, 4) void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m, const char *header, ...); +void intel_engine_dump_active_requests(struct list_head *requests, + struct i915_request *hung_rq, + struct drm_printer *m); ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now); struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine); +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine); u32 intel_engine_context_size(struct intel_gt *gt, u8 class); +struct intel_context * +intel_engine_create_pinned_context(struct intel_engine_cs *engine, + struct i915_address_space *vm, + unsigned int ring_size, + unsigned int hwsp, + struct lock_class_key *key, + const char *name); + +void intel_engine_destroy_pinned_context(struct intel_context *ce); -void intel_engine_init_active(struct intel_engine_cs *engine, - unsigned int subclass); #define ENGINE_PHYSICAL 0 #define ENGINE_MOCK 1 #define ENGINE_VIRTUAL 2 @@ -277,13 +279,60 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) return intel_engine_has_preemption(engine); } +struct intel_context * +intel_engine_create_virtual(struct intel_engine_cs **siblings, + unsigned int count); + +static inline bool +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine) +{ + /* + * For non-GuC submission we expect the back-end to look at the + * heartbeat status of the actual physical engine that the work + * has been (or is being) scheduled on, so we should only reach + * here with GuC submission enabled. + */ + GEM_BUG_ON(!intel_engine_uses_guc(engine)); + + return intel_guc_virtual_engine_has_heartbeat(engine); +} + static inline bool intel_engine_has_heartbeat(const struct intel_engine_cs *engine) { if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL)) return false; - return READ_ONCE(engine->props.heartbeat_interval_ms); + if (intel_engine_is_virtual(engine)) + return intel_virtual_engine_has_heartbeat(engine); + else + return READ_ONCE(engine->props.heartbeat_interval_ms); +} + +static inline struct intel_engine_cs * +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) +{ + GEM_BUG_ON(!intel_engine_is_virtual(engine)); + return engine->cops->get_sibling(engine, sibling); +} + +static inline void +intel_engine_set_hung_context(struct intel_engine_cs *engine, + struct intel_context *ce) +{ + engine->hung_ce = ce; +} + +static inline void +intel_engine_clear_hung_context(struct intel_engine_cs *engine) +{ + intel_engine_set_hung_context(engine, NULL); +} + +static inline struct intel_context * +intel_engine_get_hung_context(struct intel_engine_cs *engine) +{ + return engine->hung_ce; } #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 7f03df236613..0d9105a31d84 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -35,14 +35,12 @@ #define DEFAULT_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE) #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE) #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE) -#define GEN10_LR_CONTEXT_RENDER_SIZE (18 * PAGE_SIZE) #define GEN11_LR_CONTEXT_RENDER_SIZE (14 * PAGE_SIZE) #define GEN8_LR_CONTEXT_OTHER_SIZE ( 2 * PAGE_SIZE) #define MAX_MMIO_BASES 3 struct engine_info { - unsigned int hw_id; u8 class; u8 instance; /* mmio bases table *must* be sorted in reverse graphics_ver order */ @@ -54,7 +52,6 @@ struct engine_info { static const struct engine_info intel_engines[] = { [RCS0] = { - .hw_id = RCS0_HW, .class = RENDER_CLASS, .instance = 0, .mmio_bases = { @@ -62,7 +59,6 @@ static const struct engine_info intel_engines[] = { }, }, [BCS0] = { - .hw_id = BCS0_HW, .class = COPY_ENGINE_CLASS, .instance = 0, .mmio_bases = { @@ -70,7 +66,6 @@ static const struct engine_info intel_engines[] = { }, }, [VCS0] = { - .hw_id = VCS0_HW, .class = VIDEO_DECODE_CLASS, .instance = 0, .mmio_bases = { @@ -80,7 +75,6 @@ static const struct engine_info intel_engines[] = { }, }, [VCS1] = { - .hw_id = VCS1_HW, .class = VIDEO_DECODE_CLASS, .instance = 1, .mmio_bases = { @@ -89,7 +83,6 @@ static const struct engine_info intel_engines[] = { }, }, [VCS2] = { - .hw_id = VCS2_HW, .class = VIDEO_DECODE_CLASS, .instance = 2, .mmio_bases = { @@ -97,15 +90,41 @@ static const struct engine_info intel_engines[] = { }, }, [VCS3] = { - .hw_id = VCS3_HW, .class = VIDEO_DECODE_CLASS, .instance = 3, .mmio_bases = { { .graphics_ver = 11, .base = GEN11_BSD4_RING_BASE } }, }, + [VCS4] = { + .class = VIDEO_DECODE_CLASS, + .instance = 4, + .mmio_bases = { + { .graphics_ver = 12, .base = XEHP_BSD5_RING_BASE } + }, + }, + [VCS5] = { + .class = VIDEO_DECODE_CLASS, + .instance = 5, + .mmio_bases = { + { .graphics_ver = 12, .base = XEHP_BSD6_RING_BASE } + }, + }, + [VCS6] = { + .class = VIDEO_DECODE_CLASS, + .instance = 6, + .mmio_bases = { + { .graphics_ver = 12, .base = XEHP_BSD7_RING_BASE } + }, + }, + [VCS7] = { + .class = VIDEO_DECODE_CLASS, + .instance = 7, + .mmio_bases = { + { .graphics_ver = 12, .base = XEHP_BSD8_RING_BASE } + }, + }, [VECS0] = { - .hw_id = VECS0_HW, .class = VIDEO_ENHANCEMENT_CLASS, .instance = 0, .mmio_bases = { @@ -114,13 +133,26 @@ static const struct engine_info intel_engines[] = { }, }, [VECS1] = { - .hw_id = VECS1_HW, .class = VIDEO_ENHANCEMENT_CLASS, .instance = 1, .mmio_bases = { { .graphics_ver = 11, .base = GEN11_VEBOX2_RING_BASE } }, }, + [VECS2] = { + .class = VIDEO_ENHANCEMENT_CLASS, + .instance = 2, + .mmio_bases = { + { .graphics_ver = 12, .base = XEHP_VEBOX3_RING_BASE } + }, + }, + [VECS3] = { + .class = VIDEO_ENHANCEMENT_CLASS, + .instance = 3, + .mmio_bases = { + { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE } + }, + }, }; /** @@ -153,8 +185,6 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class) case 12: case 11: return GEN11_LR_CONTEXT_RENDER_SIZE; - case 10: - return GEN10_LR_CONTEXT_RENDER_SIZE; case 9: return GEN9_LR_CONTEXT_RENDER_SIZE; case 8: @@ -269,6 +299,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) BUILD_BUG_ON(MAX_ENGINE_CLASS >= BIT(GEN11_ENGINE_CLASS_WIDTH)); BUILD_BUG_ON(MAX_ENGINE_INSTANCE >= BIT(GEN11_ENGINE_INSTANCE_WIDTH)); + BUILD_BUG_ON(I915_MAX_VCS > (MAX_ENGINE_INSTANCE + 1)); + BUILD_BUG_ON(I915_MAX_VECS > (MAX_ENGINE_INSTANCE + 1)); if (GEM_DEBUG_WARN_ON(id >= ARRAY_SIZE(gt->engine))) return -EINVAL; @@ -294,7 +326,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->i915 = i915; engine->gt = gt; engine->uncore = gt->uncore; - engine->hw_id = info->hw_id; guc_class = engine_class_to_guc_class(info->class); engine->guc_id = MAKE_GUC_ID(guc_class, info->instance); engine->mmio_base = __engine_mmio_base(i915, info->mmio_bases); @@ -328,9 +359,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (engine->context_size) DRIVER_CAPS(i915)->has_logical_contexts = true; - /* Nothing to do here, execute in order of dependencies */ - engine->schedule = NULL; - ewma__engine_latency_init(&engine->latency); seqcount_init(&engine->stats.lock); @@ -445,6 +473,28 @@ void intel_engines_free(struct intel_gt *gt) } } +static +bool gen11_vdbox_has_sfc(struct drm_i915_private *i915, + unsigned int physical_vdbox, + unsigned int logical_vdbox, u16 vdbox_mask) +{ + /* + * In Gen11, only even numbered logical VDBOXes are hooked + * up to an SFC (Scaler & Format Converter) unit. + * In Gen12, Even numbered physical instance always are connected + * to an SFC. Odd numbered physical instances have SFC only if + * previous even instance is fused off. + */ + if (GRAPHICS_VER(i915) == 12) + return (physical_vdbox % 2 == 0) || + !(BIT(physical_vdbox - 1) & vdbox_mask); + else if (GRAPHICS_VER(i915) == 11) + return logical_vdbox % 2 == 0; + + MISSING_CASE(GRAPHICS_VER(i915)); + return false; +} + /* * Determine which engines are fused off in our particular hardware. * Note that we have a catch-22 situation where we need to be able to access @@ -471,7 +521,14 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt) if (GRAPHICS_VER(i915) < 11) return info->engine_mask; - media_fuse = ~intel_uncore_read(uncore, GEN11_GT_VEBOX_VDBOX_DISABLE); + /* + * On newer platforms the fusing register is called 'enable' and has + * enable semantics, while on older platforms it is called 'disable' + * and bits have disable semantices. + */ + media_fuse = intel_uncore_read(uncore, GEN11_GT_VEBOX_VDBOX_DISABLE); + if (GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) + media_fuse = ~media_fuse; vdbox_mask = media_fuse & GEN11_GT_VDBOX_DISABLE_MASK; vebox_mask = (media_fuse & GEN11_GT_VEBOX_DISABLE_MASK) >> @@ -489,13 +546,9 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt) continue; } - /* - * In Gen11, only even numbered logical VDBOXes are - * hooked up to an SFC (Scaler & Format Converter) unit. - * In TGL each VDBOX has access to an SFC. - */ - if (GRAPHICS_VER(i915) >= 12 || logical_vdbox++ % 2 == 0) + if (gen11_vdbox_has_sfc(i915, i, logical_vdbox, vdbox_mask)) gt->info.vdbox_sfc_access |= BIT(i); + logical_vdbox++; } drm_dbg(&i915->drm, "vdbox enable: %04x, instances: %04lx\n", vdbox_mask, VDBOX_MASK(gt)); @@ -585,9 +638,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine) memset(execlists->pending, 0, sizeof(execlists->pending)); execlists->active = memset(execlists->inflight, 0, sizeof(execlists->inflight)); - - execlists->queue_priority_hint = INT_MIN; - execlists->queue = RB_ROOT_CACHED; } static void cleanup_status_page(struct intel_engine_cs *engine) @@ -714,11 +764,17 @@ static int engine_setup_common(struct intel_engine_cs *engine) goto err_status; } + engine->sched_engine = i915_sched_engine_create(ENGINE_PHYSICAL); + if (!engine->sched_engine) { + err = -ENOMEM; + goto err_sched_engine; + } + engine->sched_engine->private_data = engine; + err = intel_engine_init_cmd_parser(engine); if (err) goto err_cmd_parser; - intel_engine_init_active(engine, ENGINE_PHYSICAL); intel_engine_init_execlists(engine); intel_engine_init__pm(engine); intel_engine_init_retire(engine); @@ -737,7 +793,9 @@ static int engine_setup_common(struct intel_engine_cs *engine) return 0; err_cmd_parser: - intel_breadcrumbs_free(engine->breadcrumbs); + i915_sched_engine_put(engine->sched_engine); +err_sched_engine: + intel_breadcrumbs_put(engine->breadcrumbs); err_status: cleanup_status_page(engine); return err; @@ -775,11 +833,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce) frame->rq.ring = &frame->ring; mutex_lock(&ce->timeline->mutex); - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&engine->sched_engine->lock); dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs; - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&engine->sched_engine->lock); mutex_unlock(&ce->timeline->mutex); GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */ @@ -788,33 +846,13 @@ static int measure_breadcrumb_dw(struct intel_context *ce) return dw; } -void -intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass) -{ - INIT_LIST_HEAD(&engine->active.requests); - INIT_LIST_HEAD(&engine->active.hold); - - spin_lock_init(&engine->active.lock); - lockdep_set_subclass(&engine->active.lock, subclass); - - /* - * Due to an interesting quirk in lockdep's internal debug tracking, - * after setting a subclass we must ensure the lock is used. Otherwise, - * nr_unused_locks is incremented once too often. - */ -#ifdef CONFIG_DEBUG_LOCK_ALLOC - local_irq_disable(); - lock_map_acquire(&engine->active.lock.dep_map); - lock_map_release(&engine->active.lock.dep_map); - local_irq_enable(); -#endif -} - -static struct intel_context * -create_pinned_context(struct intel_engine_cs *engine, - unsigned int hwsp, - struct lock_class_key *key, - const char *name) +struct intel_context * +intel_engine_create_pinned_context(struct intel_engine_cs *engine, + struct i915_address_space *vm, + unsigned int ring_size, + unsigned int hwsp, + struct lock_class_key *key, + const char *name) { struct intel_context *ce; int err; @@ -825,6 +863,11 @@ create_pinned_context(struct intel_engine_cs *engine, __set_bit(CONTEXT_BARRIER_BIT, &ce->flags); ce->timeline = page_pack_bits(NULL, hwsp); + ce->ring = NULL; + ce->ring_size = ring_size; + + i915_vm_put(ce->vm); + ce->vm = i915_vm_get(vm); err = intel_context_pin(ce); /* perma-pin so it is always available */ if (err) { @@ -843,7 +886,7 @@ create_pinned_context(struct intel_engine_cs *engine, return ce; } -static void destroy_pinned_context(struct intel_context *ce) +void intel_engine_destroy_pinned_context(struct intel_context *ce) { struct intel_engine_cs *engine = ce->engine; struct i915_vma *hwsp = engine->status_page.vma; @@ -863,8 +906,9 @@ create_kernel_context(struct intel_engine_cs *engine) { static struct lock_class_key kernel; - return create_pinned_context(engine, I915_GEM_HWS_SEQNO_ADDR, - &kernel, "kernel_context"); + return intel_engine_create_pinned_context(engine, engine->gt->vm, SZ_4K, + I915_GEM_HWS_SEQNO_ADDR, + &kernel, "kernel_context"); } /** @@ -907,7 +951,7 @@ static int engine_init_common(struct intel_engine_cs *engine) return 0; err_context: - destroy_pinned_context(ce); + intel_engine_destroy_pinned_context(ce); return ret; } @@ -957,10 +1001,10 @@ int intel_engines_init(struct intel_gt *gt) */ void intel_engine_cleanup_common(struct intel_engine_cs *engine) { - GEM_BUG_ON(!list_empty(&engine->active.requests)); - tasklet_kill(&engine->execlists.tasklet); /* flush the callback */ + GEM_BUG_ON(!list_empty(&engine->sched_engine->requests)); - intel_breadcrumbs_free(engine->breadcrumbs); + i915_sched_engine_put(engine->sched_engine); + intel_breadcrumbs_put(engine->breadcrumbs); intel_engine_fini_retire(engine); intel_engine_cleanup_cmd_parser(engine); @@ -969,7 +1013,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) fput(engine->default_state); if (engine->kernel_context) - destroy_pinned_context(engine->kernel_context); + intel_engine_destroy_pinned_context(engine->kernel_context); GEM_BUG_ON(!llist_empty(&engine->barrier_tasks)); cleanup_status_page(engine); @@ -1105,45 +1149,8 @@ static u32 read_subslice_reg(const struct intel_engine_cs *engine, int slice, int subslice, i915_reg_t reg) { - struct drm_i915_private *i915 = engine->i915; - struct intel_uncore *uncore = engine->uncore; - u32 mcr_mask, mcr_ss, mcr, old_mcr, val; - enum forcewake_domains fw_domains; - - if (GRAPHICS_VER(i915) >= 11) { - mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK; - mcr_ss = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice); - } else { - mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK; - mcr_ss = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice); - } - - fw_domains = intel_uncore_forcewake_for_reg(uncore, reg, - FW_REG_READ); - fw_domains |= intel_uncore_forcewake_for_reg(uncore, - GEN8_MCR_SELECTOR, - FW_REG_READ | FW_REG_WRITE); - - spin_lock_irq(&uncore->lock); - intel_uncore_forcewake_get__locked(uncore, fw_domains); - - old_mcr = mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR); - - mcr &= ~mcr_mask; - mcr |= mcr_ss; - intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr); - - val = intel_uncore_read_fw(uncore, reg); - - mcr &= ~mcr_mask; - mcr |= old_mcr & mcr_mask; - - intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr); - - intel_uncore_forcewake_put__locked(uncore, fw_domains); - spin_unlock_irq(&uncore->lock); - - return val; + return intel_uncore_read_with_mcr_steering(engine->uncore, reg, + slice, subslice); } /* NB: please notice the memset */ @@ -1243,7 +1250,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine) void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync) { - struct tasklet_struct *t = &engine->execlists.tasklet; + struct tasklet_struct *t = &engine->sched_engine->tasklet; if (!t->callback) return; @@ -1283,7 +1290,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) intel_engine_flush_submission(engine); /* ELSP is empty, but there are ready requests? E.g. after reset */ - if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) + if (!i915_sched_engine_is_empty(engine->sched_engine)) return false; /* Ring stopped? */ @@ -1314,6 +1321,30 @@ bool intel_engines_are_idle(struct intel_gt *gt) return true; } +bool intel_engine_irq_enable(struct intel_engine_cs *engine) +{ + if (!engine->irq_enable) + return false; + + /* Caller disables interrupts */ + spin_lock(&engine->gt->irq_lock); + engine->irq_enable(engine); + spin_unlock(&engine->gt->irq_lock); + + return true; +} + +void intel_engine_irq_disable(struct intel_engine_cs *engine) +{ + if (!engine->irq_disable) + return; + + /* Caller disables interrupts */ + spin_lock(&engine->gt->irq_lock); + engine->irq_disable(engine); + spin_unlock(&engine->gt->irq_lock); +} + void intel_engines_reset_default_submission(struct intel_gt *gt) { struct intel_engine_cs *engine; @@ -1349,7 +1380,7 @@ static struct intel_timeline *get_timeline(struct i915_request *rq) struct intel_timeline *tl; /* - * Even though we are holding the engine->active.lock here, there + * Even though we are holding the engine->sched_engine->lock here, there * is no control over the submission queue per-se and we are * inspecting the active state at a random point in time, with an * unknown queue. Play safe and make sure the timeline remains valid. @@ -1504,8 +1535,8 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n", yesno(test_bit(TASKLET_STATE_SCHED, - &engine->execlists.tasklet.state)), - enableddisabled(!atomic_read(&engine->execlists.tasklet.count)), + &engine->sched_engine->tasklet.state)), + enableddisabled(!atomic_read(&engine->sched_engine->tasklet.count)), repr_timer(&engine->execlists.preempt), repr_timer(&engine->execlists.timer)); @@ -1529,7 +1560,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, idx, hws[idx * 2], hws[idx * 2 + 1]); } - execlists_active_lock_bh(execlists); + i915_sched_engine_active_lock_bh(engine->sched_engine); rcu_read_lock(); for (port = execlists->active; (rq = *port); port++) { char hdr[160]; @@ -1560,7 +1591,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, i915_request_show(m, rq, hdr, 0); } rcu_read_unlock(); - execlists_active_unlock_bh(execlists); + i915_sched_engine_active_unlock_bh(engine->sched_engine); } else if (GRAPHICS_VER(dev_priv) > 6) { drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n", ENGINE_READ(engine, RING_PP_DIR_BASE)); @@ -1650,6 +1681,98 @@ static void print_properties(struct intel_engine_cs *engine, read_ul(&engine->defaults, p->offset)); } +static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg) +{ + struct intel_timeline *tl = get_timeline(rq); + + i915_request_show(m, rq, msg, 0); + + drm_printf(m, "\t\tring->start: 0x%08x\n", + i915_ggtt_offset(rq->ring->vma)); + drm_printf(m, "\t\tring->head: 0x%08x\n", + rq->ring->head); + drm_printf(m, "\t\tring->tail: 0x%08x\n", + rq->ring->tail); + drm_printf(m, "\t\tring->emit: 0x%08x\n", + rq->ring->emit); + drm_printf(m, "\t\tring->space: 0x%08x\n", + rq->ring->space); + + if (tl) { + drm_printf(m, "\t\tring->hwsp: 0x%08x\n", + tl->hwsp_offset); + intel_timeline_put(tl); + } + + print_request_ring(m, rq); + + if (rq->context->lrc_reg_state) { + drm_printf(m, "Logical Ring Context:\n"); + hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); + } +} + +void intel_engine_dump_active_requests(struct list_head *requests, + struct i915_request *hung_rq, + struct drm_printer *m) +{ + struct i915_request *rq; + const char *msg; + enum i915_request_state state; + + list_for_each_entry(rq, requests, sched.link) { + if (rq == hung_rq) + continue; + + state = i915_test_request_state(rq); + if (state < I915_REQUEST_QUEUED) + continue; + + if (state == I915_REQUEST_ACTIVE) + msg = "\t\tactive on engine"; + else + msg = "\t\tactive in queue"; + + engine_dump_request(rq, m, msg); + } +} + +static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m) +{ + struct i915_request *hung_rq = NULL; + struct intel_context *ce; + bool guc; + + /* + * No need for an engine->irq_seqno_barrier() before the seqno reads. + * The GPU is still running so requests are still executing and any + * hardware reads will be out of date by the time they are reported. + * But the intention here is just to report an instantaneous snapshot + * so that's fine. + */ + lockdep_assert_held(&engine->sched_engine->lock); + + drm_printf(m, "\tRequests:\n"); + + guc = intel_uc_uses_guc_submission(&engine->gt->uc); + if (guc) { + ce = intel_engine_get_hung_context(engine); + if (ce) + hung_rq = intel_context_find_active_request(ce); + } else { + hung_rq = intel_engine_execlist_find_hung_request(engine); + } + + if (hung_rq) + engine_dump_request(hung_rq, m, "\t\thung"); + + if (guc) + intel_guc_dump_active_requests(engine, hung_rq, m); + else + intel_engine_dump_active_requests(&engine->sched_engine->requests, + hung_rq, m); +} + void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m, const char *header, ...) @@ -1694,41 +1817,12 @@ void intel_engine_dump(struct intel_engine_cs *engine, i915_reset_count(error)); print_properties(engine, m); - drm_printf(m, "\tRequests:\n"); + spin_lock_irqsave(&engine->sched_engine->lock, flags); + engine_dump_active_requests(engine, m); - spin_lock_irqsave(&engine->active.lock, flags); - rq = intel_engine_find_active_request(engine); - if (rq) { - struct intel_timeline *tl = get_timeline(rq); - - i915_request_show(m, rq, "\t\tactive ", 0); - - drm_printf(m, "\t\tring->start: 0x%08x\n", - i915_ggtt_offset(rq->ring->vma)); - drm_printf(m, "\t\tring->head: 0x%08x\n", - rq->ring->head); - drm_printf(m, "\t\tring->tail: 0x%08x\n", - rq->ring->tail); - drm_printf(m, "\t\tring->emit: 0x%08x\n", - rq->ring->emit); - drm_printf(m, "\t\tring->space: 0x%08x\n", - rq->ring->space); - - if (tl) { - drm_printf(m, "\t\tring->hwsp: 0x%08x\n", - tl->hwsp_offset); - intel_timeline_put(tl); - } - - print_request_ring(m, rq); - - if (rq->context->lrc_reg_state) { - drm_printf(m, "Logical Ring Context:\n"); - hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); - } - } - drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->active.hold)); - spin_unlock_irqrestore(&engine->active.lock, flags); + drm_printf(m, "\tOn hold?: %lu\n", + list_count(&engine->sched_engine->hold)); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); drm_printf(m, "\tMMIO base: 0x%08x\n", engine->mmio_base); wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm); @@ -1785,19 +1879,33 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now) return total; } -static bool match_ring(struct i915_request *rq) +struct intel_context * +intel_engine_create_virtual(struct intel_engine_cs **siblings, + unsigned int count) { - u32 ring = ENGINE_READ(rq->engine, RING_START); + if (count == 0) + return ERR_PTR(-EINVAL); + + if (count == 1) + return intel_context_create(siblings[0]); - return ring == i915_ggtt_offset(rq->ring->vma); + GEM_BUG_ON(!siblings[0]->cops->create_virtual); + return siblings[0]->cops->create_virtual(siblings, count); } struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine) +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine) { struct i915_request *request, *active = NULL; /* + * This search does not work in GuC submission mode. However, the GuC + * will report the hanging context directly to the driver itself. So + * the driver should never get here when in GuC mode. + */ + GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc)); + + /* * We are called by the error capture, reset and to dump engine * state at random points in time. In particular, note that neither is * crucially ordered with an interrupt. After a hang, the GPU is dead @@ -1808,7 +1916,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine) * At all other times, we must assume the GPU is still running, but * we only care about the snapshot of this moment. */ - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&engine->sched_engine->lock); rcu_read_lock(); request = execlists_active(&engine->execlists); @@ -1826,15 +1934,9 @@ intel_engine_find_active_request(struct intel_engine_cs *engine) if (active) return active; - list_for_each_entry(request, &engine->active.requests, sched.link) { - if (__i915_request_is_complete(request)) - continue; - - if (!__i915_request_has_started(request)) - continue; - - /* More than one preemptible request may match! */ - if (!match_ring(request)) + list_for_each_entry(request, &engine->sched_engine->requests, + sched.link) { + if (i915_test_request_state(request) != I915_REQUEST_ACTIVE) continue; active = request; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index b99ac41695f3..74775ae961b2 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -70,12 +70,38 @@ static void show_heartbeat(const struct i915_request *rq, { struct drm_printer p = drm_debug_printer("heartbeat"); - intel_engine_dump(engine, &p, - "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n", - engine->name, - rq->fence.context, - rq->fence.seqno, - rq->sched.attr.priority); + if (!rq) { + intel_engine_dump(engine, &p, + "%s heartbeat not ticking\n", + engine->name); + } else { + intel_engine_dump(engine, &p, + "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n", + engine->name, + rq->fence.context, + rq->fence.seqno, + rq->sched.attr.priority); + } +} + +static void +reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) +{ + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) + show_heartbeat(rq, engine); + + if (intel_engine_uses_guc(engine)) + /* + * GuC itself is toast or GuC's hang detection + * is disabled. Either way, need to find the + * hang culprit manually. + */ + intel_guc_find_hung_context(engine); + + intel_gt_handle_error(engine->gt, engine->mask, + I915_ERROR_CAPTURE, + "stopped heartbeat on %s", + engine->name); } static void heartbeat(struct work_struct *wrk) @@ -102,6 +128,11 @@ static void heartbeat(struct work_struct *wrk) if (intel_gt_is_wedged(engine->gt)) goto out; + if (i915_sched_engine_disabled(engine->sched_engine)) { + reset_engine(engine, engine->heartbeat.systole); + goto out; + } + if (engine->heartbeat.systole) { long delay = READ_ONCE(engine->props.heartbeat_interval_ms); @@ -121,7 +152,7 @@ static void heartbeat(struct work_struct *wrk) * but all other contexts, including the kernel * context are stuck waiting for the signal. */ - } else if (engine->schedule && + } else if (engine->sched_engine->schedule && rq->sched.attr.priority < I915_PRIORITY_BARRIER) { /* * Gradually raise the priority of the heartbeat to @@ -136,16 +167,10 @@ static void heartbeat(struct work_struct *wrk) attr.priority = I915_PRIORITY_BARRIER; local_bh_disable(); - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); local_bh_enable(); } else { - if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) - show_heartbeat(rq, engine); - - intel_gt_handle_error(engine->gt, engine->mask, - I915_ERROR_CAPTURE, - "stopped heartbeat on %s", - engine->name); + reset_engine(engine, rq); } rq->emitted_jiffies = jiffies; @@ -194,6 +219,25 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine) i915_request_put(fetch_and_zero(&engine->heartbeat.systole)); } +void intel_gt_unpark_heartbeats(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + if (intel_engine_pm_is_awake(engine)) + intel_engine_unpark_heartbeat(engine); +} + +void intel_gt_park_heartbeats(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + intel_engine_park_heartbeat(engine); +} + void intel_engine_init_heartbeat(struct intel_engine_cs *engine) { INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h index a488ea3e84a3..5da6d809a87a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h @@ -7,6 +7,7 @@ #define INTEL_ENGINE_HEARTBEAT_H struct intel_engine_cs; +struct intel_gt; void intel_engine_init_heartbeat(struct intel_engine_cs *engine); @@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine, void intel_engine_park_heartbeat(struct intel_engine_cs *engine); void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine); +void intel_gt_park_heartbeats(struct intel_gt *gt); +void intel_gt_unpark_heartbeats(struct intel_gt *gt); + int intel_engine_pulse(struct intel_engine_cs *engine); int intel_engine_flush_barriers(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 47f4397095e5..1f07ac4e0672 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -275,13 +275,11 @@ static int __engine_park(struct intel_wakeref *wf) intel_breadcrumbs_park(engine->breadcrumbs); /* Must be reset upon idling, or we may miss the busy wakeup. */ - GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN); + GEM_BUG_ON(engine->sched_engine->queue_priority_hint != INT_MIN); if (engine->park) engine->park(engine); - engine->execlists.no_priolist = false; - /* While gt calls i915_vma_parked(), we have to break the lock cycle */ intel_gt_pm_put_async(engine->gt); return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e113f93b3274..ed91bcff20eb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -21,32 +21,20 @@ #include "i915_pmu.h" #include "i915_priolist_types.h" #include "i915_selftest.h" -#include "intel_breadcrumbs_types.h" #include "intel_sseu.h" #include "intel_timeline_types.h" #include "intel_uncore.h" #include "intel_wakeref.h" #include "intel_workarounds_types.h" -/* Legacy HW Engine ID */ - -#define RCS0_HW 0 -#define VCS0_HW 1 -#define BCS0_HW 2 -#define VECS0_HW 3 -#define VCS1_HW 4 -#define VCS2_HW 6 -#define VCS3_HW 7 -#define VECS1_HW 12 - -/* Gen11+ HW Engine class + instance */ +/* HW Engine class + instance */ #define RENDER_CLASS 0 #define VIDEO_DECODE_CLASS 1 #define VIDEO_ENHANCEMENT_CLASS 2 #define COPY_ENGINE_CLASS 3 #define OTHER_CLASS 4 #define MAX_ENGINE_CLASS 4 -#define MAX_ENGINE_INSTANCE 3 +#define MAX_ENGINE_INSTANCE 7 #define I915_MAX_SLICES 3 #define I915_MAX_SUBSLICES 8 @@ -59,11 +47,13 @@ struct drm_i915_reg_table; struct i915_gem_context; struct i915_request; struct i915_sched_attr; +struct i915_sched_engine; struct intel_gt; struct intel_ring; struct intel_uncore; +struct intel_breadcrumbs; -typedef u8 intel_engine_mask_t; +typedef u32 intel_engine_mask_t; #define ALL_ENGINES ((intel_engine_mask_t)~0ul) struct intel_hw_status_page { @@ -100,8 +90,8 @@ struct i915_ctx_workarounds { struct i915_vma *vma; }; -#define I915_MAX_VCS 4 -#define I915_MAX_VECS 2 +#define I915_MAX_VCS 8 +#define I915_MAX_VECS 4 /* * Engine IDs definitions. @@ -114,9 +104,15 @@ enum intel_engine_id { VCS1, VCS2, VCS3, + VCS4, + VCS5, + VCS6, + VCS7, #define _VCS(n) (VCS0 + (n)) VECS0, VECS1, + VECS2, + VECS3, #define _VECS(n) (VECS0 + (n)) I915_NUM_ENGINES #define INVALID_ENGINE ((enum intel_engine_id)-1) @@ -138,11 +134,6 @@ struct st_preempt_hang { */ struct intel_engine_execlists { /** - * @tasklet: softirq tasklet for bottom handler - */ - struct tasklet_struct tasklet; - - /** * @timer: kick the current context if its timeslice expires */ struct timer_list timer; @@ -153,11 +144,6 @@ struct intel_engine_execlists { struct timer_list preempt; /** - * @default_priolist: priority list for I915_PRIORITY_NORMAL - */ - struct i915_priolist default_priolist; - - /** * @ccid: identifier for contexts submitted to this engine */ u32 ccid; @@ -192,11 +178,6 @@ struct intel_engine_execlists { u32 reset_ccid; /** - * @no_priolist: priority lists disabled - */ - bool no_priolist; - - /** * @submit_reg: gen-specific execlist submission register * set to the ExecList Submission Port (elsp) register pre-Gen11 and to * the ExecList Submission Queue Contents register array for Gen11+ @@ -238,23 +219,10 @@ struct intel_engine_execlists { unsigned int port_mask; /** - * @queue_priority_hint: Highest pending priority. - * - * When we add requests into the queue, or adjust the priority of - * executing requests, we compute the maximum priority of those - * pending requests. We can then use this value to determine if - * we need to preempt the executing requests to service the queue. - * However, since the we may have recorded the priority of an inflight - * request we wanted to preempt but since completed, at the time of - * dequeuing the priority hint may no longer may match the highest - * available request priority. + * @virtual: Queue of requets on a virtual engine, sorted by priority. + * Each RB entry is a struct i915_priolist containing a list of requests + * of the same priority. */ - int queue_priority_hint; - - /** - * @queue: queue of requests, in priority lists - */ - struct rb_root_cached queue; struct rb_root_cached virtual; /** @@ -295,7 +263,6 @@ struct intel_engine_cs { enum intel_engine_id id; enum intel_engine_id legacy_idx; - unsigned int hw_id; unsigned int guc_id; intel_engine_mask_t mask; @@ -326,15 +293,13 @@ struct intel_engine_cs { struct intel_sseu sseu; - struct { - spinlock_t lock; - struct list_head requests; - struct list_head hold; /* ready requests, but on hold */ - } active; + struct i915_sched_engine *sched_engine; /* keep a request in reserve for a [pm] barrier under oom */ struct i915_request *request_pool; + struct intel_context *hung_ce; + struct llist_head barrier_tasks; struct intel_context *kernel_context; /* pinned */ @@ -419,6 +384,8 @@ struct intel_engine_cs { void (*park)(struct intel_engine_cs *engine); void (*unpark)(struct intel_engine_cs *engine); + void (*bump_serial)(struct intel_engine_cs *engine); + void (*set_default_submission)(struct intel_engine_cs *engine); const struct intel_context_ops *cops; @@ -447,22 +414,13 @@ struct intel_engine_cs { */ void (*submit_request)(struct i915_request *rq); - /* - * Called on signaling of a SUBMIT_FENCE, passing along the signaling - * request down to the bonded pairs. - */ - void (*bond_execute)(struct i915_request *rq, - struct dma_fence *signal); + void (*release)(struct intel_engine_cs *engine); /* - * Call when the priority on a request has changed and it and its - * dependencies may need rescheduling. Note the request itself may - * not be ready to run! + * Add / remove request from engine active tracking */ - void (*schedule)(struct i915_request *request, - const struct i915_sched_attr *attr); - - void (*release)(struct intel_engine_cs *engine); + void (*add_active_request)(struct i915_request *rq); + void (*remove_active_request)(struct i915_request *rq); struct intel_engine_execlists execlists; @@ -485,6 +443,7 @@ struct intel_engine_cs { #define I915_ENGINE_IS_VIRTUAL BIT(5) #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) unsigned int flags; /* diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 3cca7ea2d6ea..8f8bea08e734 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -11,6 +11,7 @@ #include "intel_engine.h" #include "intel_engine_user.h" #include "intel_gt.h" +#include "uc/intel_guc_submission.h" struct intel_engine_cs * intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance) @@ -108,13 +109,16 @@ static void set_scheduler_caps(struct drm_i915_private *i915) for_each_uabi_engine(engine, i915) { /* all engines must agree! */ int i; - if (engine->schedule) + if (engine->sched_engine->schedule) enabled |= (I915_SCHEDULER_CAP_ENABLED | I915_SCHEDULER_CAP_PRIORITY); else disabled |= (I915_SCHEDULER_CAP_ENABLED | I915_SCHEDULER_CAP_PRIORITY); + if (intel_uc_uses_guc_submission(&i915->gt.uc)) + enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP; + for (i = 0; i < ARRAY_SIZE(map); i++) { if (engine->flags & BIT(map[i].engine)) enabled |= BIT(map[i].sched); diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index fc77592d88a9..de5f9c86b9a4 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -114,6 +114,7 @@ #include "gen8_engine_cs.h" #include "intel_breadcrumbs.h" #include "intel_context.h" +#include "intel_engine_heartbeat.h" #include "intel_engine_pm.h" #include "intel_engine_stats.h" #include "intel_execlists_submission.h" @@ -153,6 +154,12 @@ #define GEN12_CSB_CTX_VALID(csb_dw) \ (FIELD_GET(GEN12_CSB_SW_CTX_ID_MASK, csb_dw) != GEN12_IDLE_CTX_ID) +#define XEHP_CTX_STATUS_SWITCHED_TO_NEW_QUEUE BIT(1) /* upper csb dword */ +#define XEHP_CSB_SW_CTX_ID_MASK GENMASK(31, 10) +#define XEHP_IDLE_CTX_ID 0xFFFF +#define XEHP_CSB_CTX_VALID(csb_dw) \ + (FIELD_GET(XEHP_CSB_SW_CTX_ID_MASK, csb_dw) != XEHP_IDLE_CTX_ID) + /* Typical size of the average request (2 pipecontrols and a MI_BB) */ #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ @@ -182,18 +189,6 @@ struct virtual_engine { int prio; } nodes[I915_NUM_ENGINES]; - /* - * Keep track of bonded pairs -- restrictions upon on our selection - * of physical engines any particular request may be submitted to. - * If we receive a submit-fence from a master engine, we will only - * use one of sibling_mask physical engines. - */ - struct ve_bond { - const struct intel_engine_cs *master; - intel_engine_mask_t sibling_mask; - } *bonds; - unsigned int num_bonds; - /* And finally, which physical engines this virtual engine maps onto. */ unsigned int num_siblings; struct intel_engine_cs *siblings[]; @@ -205,6 +200,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) return container_of(engine, struct virtual_engine, base); } +static struct intel_context * +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count); + static struct i915_request * __active_request(const struct intel_timeline * const tl, struct i915_request *rq, @@ -273,11 +271,11 @@ static int effective_prio(const struct i915_request *rq) return prio; } -static int queue_prio(const struct intel_engine_execlists *execlists) +static int queue_prio(const struct i915_sched_engine *sched_engine) { struct rb_node *rb; - rb = rb_first_cached(&execlists->queue); + rb = rb_first_cached(&sched_engine->queue); if (!rb) return INT_MIN; @@ -318,14 +316,14 @@ static bool need_preempt(const struct intel_engine_cs *engine, * to preserve FIFO ordering of dependencies. */ last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); - if (engine->execlists.queue_priority_hint <= last_prio) + if (engine->sched_engine->queue_priority_hint <= last_prio) return false; /* * Check against the first request in ELSP[1], it will, thanks to the * power of PI, be the highest priority of that context. */ - if (!list_is_last(&rq->sched.link, &engine->active.requests) && + if (!list_is_last(&rq->sched.link, &engine->sched_engine->requests) && rq_prio(list_next_entry(rq, sched.link)) > last_prio) return true; @@ -340,7 +338,7 @@ static bool need_preempt(const struct intel_engine_cs *engine, * context, it's priority would not exceed ELSP[0] aka last_prio. */ return max(virtual_prio(&engine->execlists), - queue_prio(&engine->execlists)) > last_prio; + queue_prio(engine->sched_engine)) > last_prio; } __maybe_unused static bool @@ -367,10 +365,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) struct list_head *pl; int prio = I915_PRIORITY_INVALID; - lockdep_assert_held(&engine->active.lock); + lockdep_assert_held(&engine->sched_engine->lock); list_for_each_entry_safe_reverse(rq, rn, - &engine->active.requests, + &engine->sched_engine->requests, sched.link) { if (__i915_request_is_complete(rq)) { list_del_init(&rq->sched.link); @@ -382,9 +380,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); if (rq_prio(rq) != prio) { prio = rq_prio(rq); - pl = i915_sched_lookup_priolist(engine, prio); + pl = i915_sched_lookup_priolist(engine->sched_engine, + prio); } - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); list_move(&rq->sched.link, pl); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); @@ -489,6 +488,16 @@ __execlists_schedule_in(struct i915_request *rq) /* Use a fixed tag for OA and friends */ GEM_BUG_ON(ce->tag <= BITS_PER_LONG); ce->lrc.ccid = ce->tag; + } else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) { + /* We don't need a strict matching tag, just different values */ + unsigned int tag = ffs(READ_ONCE(engine->context_tag)); + + GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG); + clear_bit(tag - 1, &engine->context_tag); + ce->lrc.ccid = tag << (XEHP_SW_CTX_ID_SHIFT - 32); + + BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); + } else { /* We don't need a strict matching tag, just different values */ unsigned int tag = __ffs(engine->context_tag); @@ -534,13 +543,13 @@ resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) { struct intel_engine_cs *engine = rq->engine; - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&engine->sched_engine->lock); clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); WRITE_ONCE(rq->engine, &ve->base); ve->base.submit_request(rq); - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&engine->sched_engine->lock); } static void kick_siblings(struct i915_request *rq, struct intel_context *ce) @@ -569,7 +578,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) resubmit_virtual_request(rq, ve); if (READ_ONCE(ve->request)) - tasklet_hi_schedule(&ve->base.execlists.tasklet); + tasklet_hi_schedule(&ve->base.sched_engine->tasklet); } static void __execlists_schedule_out(struct i915_request * const rq, @@ -579,7 +588,7 @@ static void __execlists_schedule_out(struct i915_request * const rq, unsigned int ccid; /* - * NB process_csb() is not under the engine->active.lock and hence + * NB process_csb() is not under the engine->sched_engine->lock and hence * schedule_out can race with schedule_in meaning that we should * refrain from doing non-trivial work here. */ @@ -599,8 +608,14 @@ static void __execlists_schedule_out(struct i915_request * const rq, intel_engine_add_retire(engine, ce->timeline); ccid = ce->lrc.ccid; - ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; - ccid &= GEN12_MAX_CONTEXT_HW_ID; + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) { + ccid >>= XEHP_SW_CTX_ID_SHIFT - 32; + ccid &= XEHP_MAX_CONTEXT_HW_ID; + } else { + ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; + ccid &= GEN12_MAX_CONTEXT_HW_ID; + } + if (ccid < BITS_PER_LONG) { GEM_BUG_ON(ccid == 0); GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); @@ -738,9 +753,9 @@ trace_ports(const struct intel_engine_execlists *execlists, } static bool -reset_in_progress(const struct intel_engine_execlists *execlists) +reset_in_progress(const struct intel_engine_cs *engine) { - return unlikely(!__tasklet_is_enabled(&execlists->tasklet)); + return unlikely(!__tasklet_is_enabled(&engine->sched_engine->tasklet)); } static __maybe_unused noinline bool @@ -756,7 +771,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists, trace_ports(execlists, msg, execlists->pending); /* We may be messing around with the lists during reset, lalala */ - if (reset_in_progress(execlists)) + if (reset_in_progress(engine)) return true; if (!execlists->pending[0]) { @@ -1096,7 +1111,8 @@ static void defer_active(struct intel_engine_cs *engine) if (!rq) return; - defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); + defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine, + rq_prio(rq))); } static bool @@ -1133,13 +1149,14 @@ static bool needs_timeslice(const struct intel_engine_cs *engine, return false; /* If ELSP[1] is occupied, always check to see if worth slicing */ - if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) { + if (!list_is_last_rcu(&rq->sched.link, + &engine->sched_engine->requests)) { ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); return true; } /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ - if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) { + if (!i915_sched_engine_is_empty(engine->sched_engine)) { ENGINE_TRACE(engine, "timeslice required for queue\n"); return true; } @@ -1187,7 +1204,7 @@ static void start_timeslice(struct intel_engine_cs *engine) * its timeslice, so recheck. */ if (!timer_pending(&el->timer)) - tasklet_hi_schedule(&el->tasklet); + tasklet_hi_schedule(&engine->sched_engine->tasklet); return; } @@ -1236,6 +1253,7 @@ static bool completed(const struct i915_request *rq) static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; struct i915_request *last, * const *active; @@ -1265,7 +1283,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ - spin_lock(&engine->active.lock); + spin_lock(&sched_engine->lock); /* * If the queue is higher priority than the last @@ -1287,7 +1305,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) last->fence.context, last->fence.seqno, last->sched.attr.priority, - execlists->queue_priority_hint); + sched_engine->queue_priority_hint); record_preemption(execlists); /* @@ -1313,7 +1331,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) yesno(timer_expired(&execlists->timer)), last->fence.context, last->fence.seqno, rq_prio(last), - execlists->queue_priority_hint, + sched_engine->queue_priority_hint, yesno(timeslice_yield(execlists, last))); /* @@ -1365,7 +1383,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * Even if ELSP[1] is occupied and not worthy * of timeslices, our queue might be. */ - spin_unlock(&engine->active.lock); + spin_unlock(&sched_engine->lock); return; } } @@ -1375,7 +1393,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) while ((ve = first_virtual_engine(engine))) { struct i915_request *rq; - spin_lock(&ve->base.active.lock); + spin_lock(&ve->base.sched_engine->lock); rq = ve->request; if (unlikely(!virtual_matches(ve, rq, engine))) @@ -1384,14 +1402,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(rq->engine != &ve->base); GEM_BUG_ON(rq->context != &ve->context); - if (unlikely(rq_prio(rq) < queue_prio(execlists))) { - spin_unlock(&ve->base.active.lock); + if (unlikely(rq_prio(rq) < queue_prio(sched_engine))) { + spin_unlock(&ve->base.sched_engine->lock); break; } if (last && !can_merge_rq(last, rq)) { - spin_unlock(&ve->base.active.lock); - spin_unlock(&engine->active.lock); + spin_unlock(&ve->base.sched_engine->lock); + spin_unlock(&engine->sched_engine->lock); return; /* leave this for another sibling */ } @@ -1405,7 +1423,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) yesno(engine != ve->siblings[0])); WRITE_ONCE(ve->request, NULL); - WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN); + WRITE_ONCE(ve->base.sched_engine->queue_priority_hint, INT_MIN); rb = &ve->nodes[engine->id].rb; rb_erase_cached(rb, &execlists->virtual); @@ -1437,7 +1455,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) i915_request_put(rq); unlock: - spin_unlock(&ve->base.active.lock); + spin_unlock(&ve->base.sched_engine->lock); /* * Hmm, we have a bunch of virtual engine requests, @@ -1450,7 +1468,7 @@ unlock: break; } - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn; @@ -1529,7 +1547,7 @@ unlock: } } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } done: @@ -1551,8 +1569,9 @@ done: * request triggering preemption on the next dequeue (or subsequent * interrupt for secondary ports). */ - execlists->queue_priority_hint = queue_prio(execlists); - spin_unlock(&engine->active.lock); + sched_engine->queue_priority_hint = queue_prio(sched_engine); + i915_sched_engine_reset_on_empty(sched_engine); + spin_unlock(&sched_engine->lock); /* * We can skip poking the HW if we ended up with exactly the same set @@ -1655,13 +1674,24 @@ static void invalidate_csb_entries(const u64 *first, const u64 *last) * bits 44-46: reserved * bits 47-57: sw context id of the lrc the GT switched away from * bits 58-63: sw counter of the lrc the GT switched away from + * + * Xe_HP csb shuffles things around compared to TGL: + * + * bits 0-3: context switch detail (same possible values as TGL) + * bits 4-9: engine instance + * bits 10-25: sw context id of the lrc the GT switched to + * bits 26-31: sw counter of the lrc the GT switched to + * bit 32: semaphore wait mode (poll or signal), Only valid when + * switch detail is set to "wait on semaphore" + * bit 33: switched to new queue + * bits 34-41: wait detail (for switch detail 1 to 4) + * bits 42-57: sw context id of the lrc the GT switched away from + * bits 58-63: sw counter of the lrc the GT switched away from */ -static bool gen12_csb_parse(const u64 csb) +static inline bool +__gen12_csb_parse(bool ctx_to_valid, bool ctx_away_valid, bool new_queue, + u8 switch_detail) { - bool ctx_away_valid = GEN12_CSB_CTX_VALID(upper_32_bits(csb)); - bool new_queue = - lower_32_bits(csb) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE; - /* * The context switch detail is not guaranteed to be 5 when a preemption * occurs, so we can't just check for that. The check below works for @@ -1670,7 +1700,7 @@ static bool gen12_csb_parse(const u64 csb) * would require some extra handling, but we don't support that. */ if (!ctx_away_valid || new_queue) { - GEM_BUG_ON(!GEN12_CSB_CTX_VALID(lower_32_bits(csb))); + GEM_BUG_ON(!ctx_to_valid); return true; } @@ -1679,10 +1709,26 @@ static bool gen12_csb_parse(const u64 csb) * context switch on an unsuccessful wait instruction since we always * use polling mode. */ - GEM_BUG_ON(GEN12_CTX_SWITCH_DETAIL(upper_32_bits(csb))); + GEM_BUG_ON(switch_detail); return false; } +static bool xehp_csb_parse(const u64 csb) +{ + return __gen12_csb_parse(XEHP_CSB_CTX_VALID(lower_32_bits(csb)), /* cxt to */ + XEHP_CSB_CTX_VALID(upper_32_bits(csb)), /* cxt away */ + upper_32_bits(csb) & XEHP_CTX_STATUS_SWITCHED_TO_NEW_QUEUE, + GEN12_CTX_SWITCH_DETAIL(lower_32_bits(csb))); +} + +static bool gen12_csb_parse(const u64 csb) +{ + return __gen12_csb_parse(GEN12_CSB_CTX_VALID(lower_32_bits(csb)), /* cxt to */ + GEN12_CSB_CTX_VALID(upper_32_bits(csb)), /* cxt away */ + lower_32_bits(csb) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE, + GEN12_CTX_SWITCH_DETAIL(upper_32_bits(csb))); +} + static bool gen8_csb_parse(const u64 csb) { return csb & (GEN8_CTX_STATUS_IDLE_ACTIVE | GEN8_CTX_STATUS_PREEMPTED); @@ -1767,8 +1813,8 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) * access. Either we are inside the tasklet, or the tasklet is disabled * and we assume that is only inside the reset paths and so serialised. */ - GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) && - !reset_in_progress(execlists)); + GEM_BUG_ON(!tasklet_is_locked(&engine->sched_engine->tasklet) && + !reset_in_progress(engine)); /* * Note that csb_write, csb_status may be either in HWSP or mmio. @@ -1847,7 +1893,9 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) ENGINE_TRACE(engine, "csb[%d]: status=0x%08x:0x%08x\n", head, upper_32_bits(csb), lower_32_bits(csb)); - if (GRAPHICS_VER(engine->i915) >= 12) + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) + promote = xehp_csb_parse(csb); + else if (GRAPHICS_VER(engine->i915) >= 12) promote = gen12_csb_parse(csb); else promote = gen8_csb_parse(csb); @@ -1979,7 +2027,8 @@ static void __execlists_hold(struct i915_request *rq) __i915_request_unsubmit(rq); clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - list_move_tail(&rq->sched.link, &rq->engine->active.hold); + list_move_tail(&rq->sched.link, + &rq->engine->sched_engine->hold); i915_request_set_hold(rq); RQ_TRACE(rq, "on hold\n"); @@ -2016,7 +2065,7 @@ static bool execlists_hold(struct intel_engine_cs *engine, if (i915_request_on_hold(rq)) return false; - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&engine->sched_engine->lock); if (__i915_request_is_complete(rq)) { /* too late! */ rq = NULL; @@ -2032,10 +2081,10 @@ static bool execlists_hold(struct intel_engine_cs *engine, GEM_BUG_ON(i915_request_on_hold(rq)); GEM_BUG_ON(rq->engine != engine); __execlists_hold(rq); - GEM_BUG_ON(list_empty(&engine->active.hold)); + GEM_BUG_ON(list_empty(&engine->sched_engine->hold)); unlock: - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&engine->sched_engine->lock); return rq; } @@ -2079,7 +2128,7 @@ static void __execlists_unhold(struct i915_request *rq) i915_request_clear_hold(rq); list_move_tail(&rq->sched.link, - i915_sched_lookup_priolist(rq->engine, + i915_sched_lookup_priolist(rq->engine->sched_engine, rq_prio(rq))); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); @@ -2115,7 +2164,7 @@ static void __execlists_unhold(struct i915_request *rq) static void execlists_unhold(struct intel_engine_cs *engine, struct i915_request *rq) { - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&engine->sched_engine->lock); /* * Move this request back to the priority queue, and all of its @@ -2123,12 +2172,12 @@ static void execlists_unhold(struct intel_engine_cs *engine, */ __execlists_unhold(rq); - if (rq_prio(rq) > engine->execlists.queue_priority_hint) { - engine->execlists.queue_priority_hint = rq_prio(rq); - tasklet_hi_schedule(&engine->execlists.tasklet); + if (rq_prio(rq) > engine->sched_engine->queue_priority_hint) { + engine->sched_engine->queue_priority_hint = rq_prio(rq); + tasklet_hi_schedule(&engine->sched_engine->tasklet); } - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&engine->sched_engine->lock); } struct execlists_capture { @@ -2258,13 +2307,13 @@ static void execlists_capture(struct intel_engine_cs *engine) if (!cap) return; - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&engine->sched_engine->lock); cap->rq = active_context(engine, active_ccid(engine)); if (cap->rq) { cap->rq = active_request(cap->rq->context->timeline, cap->rq); cap->rq = i915_request_get_rcu(cap->rq); } - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&engine->sched_engine->lock); if (!cap->rq) goto err_free; @@ -2316,13 +2365,13 @@ static void execlists_reset(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "reset for %s\n", msg); /* Mark this tasklet as disabled to avoid waiting for it to complete */ - tasklet_disable_nosync(&engine->execlists.tasklet); + tasklet_disable_nosync(&engine->sched_engine->tasklet); ring_set_paused(engine, 1); /* Freeze the current request in place */ execlists_capture(engine); intel_engine_reset(engine, msg); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&engine->sched_engine->tasklet); clear_and_wake_up_bit(bit, lock); } @@ -2345,8 +2394,9 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) */ static void execlists_submission_tasklet(struct tasklet_struct *t) { - struct intel_engine_cs * const engine = - from_tasklet(engine, t, execlists.tasklet); + struct i915_sched_engine *sched_engine = + from_tasklet(sched_engine, t, tasklet); + struct intel_engine_cs * const engine = sched_engine->private_data; struct i915_request *post[2 * EXECLIST_MAX_PORTS]; struct i915_request **inactive; @@ -2421,13 +2471,16 @@ static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); if (tasklet) - tasklet_hi_schedule(&engine->execlists.tasklet); + tasklet_hi_schedule(&engine->sched_engine->tasklet); } static void __execlists_kick(struct intel_engine_execlists *execlists) { + struct intel_engine_cs *engine = + container_of(execlists, typeof(*engine), execlists); + /* Kick the tasklet for some interrupt coalescing and reset handling */ - tasklet_hi_schedule(&execlists->tasklet); + tasklet_hi_schedule(&engine->sched_engine->tasklet); } #define execlists_kick(t, member) \ @@ -2448,19 +2501,20 @@ static void queue_request(struct intel_engine_cs *engine, { GEM_BUG_ON(!list_empty(&rq->sched.link)); list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(engine, rq_prio(rq))); + i915_sched_lookup_priolist(engine->sched_engine, + rq_prio(rq))); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } static bool submit_queue(struct intel_engine_cs *engine, const struct i915_request *rq) { - struct intel_engine_execlists *execlists = &engine->execlists; + struct i915_sched_engine *sched_engine = engine->sched_engine; - if (rq_prio(rq) <= execlists->queue_priority_hint) + if (rq_prio(rq) <= sched_engine->queue_priority_hint) return false; - execlists->queue_priority_hint = rq_prio(rq); + sched_engine->queue_priority_hint = rq_prio(rq); return true; } @@ -2468,7 +2522,7 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine, const struct i915_request *rq) { GEM_BUG_ON(i915_request_on_hold(rq)); - return !list_empty(&engine->active.hold) && hold_request(rq); + return !list_empty(&engine->sched_engine->hold) && hold_request(rq); } static void execlists_submit_request(struct i915_request *request) @@ -2477,23 +2531,24 @@ static void execlists_submit_request(struct i915_request *request) unsigned long flags; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&engine->sched_engine->lock, flags); if (unlikely(ancestor_on_hold(engine, request))) { RQ_TRACE(request, "ancestor on hold\n"); - list_add_tail(&request->sched.link, &engine->active.hold); + list_add_tail(&request->sched.link, + &engine->sched_engine->hold); i915_request_set_hold(request); } else { queue_request(engine, request); - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); + GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); GEM_BUG_ON(list_empty(&request->sched.link)); if (submit_queue(engine, request)) __execlists_kick(&engine->execlists); } - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); } static int @@ -2533,11 +2588,26 @@ static int execlists_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); } +static void execlists_context_cancel_request(struct intel_context *ce, + struct i915_request *rq) +{ + struct intel_engine_cs *engine = NULL; + + i915_request_active_engine(rq, &engine); + + if (engine && intel_engine_pulse(engine)) + intel_gt_handle_error(engine->gt, engine->mask, 0, + "request cancellation by %s", + current->comm); +} + static const struct intel_context_ops execlists_context_ops = { .flags = COPS_HAS_INFLIGHT, .alloc = execlists_context_alloc, + .cancel_request = execlists_context_cancel_request, + .pre_pin = execlists_context_pre_pin, .pin = execlists_context_pin, .unpin = lrc_unpin, @@ -2548,6 +2618,8 @@ static const struct intel_context_ops execlists_context_ops = { .reset = lrc_reset, .destroy = lrc_destroy, + + .create_virtual = execlists_create_virtual, }; static int emit_pdps(struct i915_request *rq) @@ -2800,10 +2872,8 @@ static int execlists_resume(struct intel_engine_cs *engine) static void execlists_reset_prepare(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - ENGINE_TRACE(engine, "depth<-%d\n", - atomic_read(&execlists->tasklet.count)); + atomic_read(&engine->sched_engine->tasklet.count)); /* * Prevent request submission to the hardware until we have @@ -2814,8 +2884,8 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) * Turning off the execlists->tasklet until the reset is over * prevents the race. */ - __tasklet_disable_sync_once(&execlists->tasklet); - GEM_BUG_ON(!reset_in_progress(execlists)); + __tasklet_disable_sync_once(&engine->sched_engine->tasklet); + GEM_BUG_ON(!reset_in_progress(engine)); /* * We stop engines, otherwise we might get failed reset and a @@ -2957,24 +3027,26 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) /* Push back any incomplete requests for replay after the reset. */ rcu_read_lock(); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&engine->sched_engine->lock, flags); __unwind_incomplete_requests(engine); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); rcu_read_unlock(); } static void nop_submission_tasklet(struct tasklet_struct *t) { - struct intel_engine_cs * const engine = - from_tasklet(engine, t, execlists.tasklet); + struct i915_sched_engine *sched_engine = + from_tasklet(sched_engine, t, tasklet); + struct intel_engine_cs * const engine = sched_engine->private_data; /* The driver is wedged; don't process any more events. */ - WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN); + WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN); } static void execlists_reset_cancel(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; @@ -2998,15 +3070,15 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) execlists_reset_csb(engine, true); rcu_read_lock(); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&engine->sched_engine->lock, flags); /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &engine->active.requests, sched.link) + list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(engine); /* Flush the queued requests to the timeline list (for retiring). */ - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { @@ -3016,12 +3088,12 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) } } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } /* On-hold requests will be flushed to timeline upon their release */ - list_for_each_entry(rq, &engine->active.hold, sched.link) + list_for_each_entry(rq, &sched_engine->hold, sched.link) i915_request_put(i915_request_mark_eio(rq)); /* Cancel all attached virtual engines */ @@ -3032,7 +3104,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) rb_erase_cached(rb, &execlists->virtual); RB_CLEAR_NODE(rb); - spin_lock(&ve->base.active.lock); + spin_lock(&ve->base.sched_engine->lock); rq = fetch_and_zero(&ve->request); if (rq) { if (i915_request_mark_eio(rq)) { @@ -3042,20 +3114,20 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) } i915_request_put(rq); - ve->base.execlists.queue_priority_hint = INT_MIN; + ve->base.sched_engine->queue_priority_hint = INT_MIN; } - spin_unlock(&ve->base.active.lock); + spin_unlock(&ve->base.sched_engine->lock); } /* Remaining _unready_ requests will be nop'ed when submitted */ - execlists->queue_priority_hint = INT_MIN; - execlists->queue = RB_ROOT_CACHED; + sched_engine->queue_priority_hint = INT_MIN; + sched_engine->queue = RB_ROOT_CACHED; - GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); - execlists->tasklet.callback = nop_submission_tasklet; + GEM_BUG_ON(__tasklet_is_enabled(&engine->sched_engine->tasklet)); + engine->sched_engine->tasklet.callback = nop_submission_tasklet; - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); rcu_read_unlock(); } @@ -3073,14 +3145,14 @@ static void execlists_reset_finish(struct intel_engine_cs *engine) * reset as the next level of recovery, and as a final resort we * will declare the device wedged. */ - GEM_BUG_ON(!reset_in_progress(execlists)); + GEM_BUG_ON(!reset_in_progress(engine)); /* And kick in case we missed a new request submission. */ - if (__tasklet_enable(&execlists->tasklet)) + if (__tasklet_enable(&engine->sched_engine->tasklet)) __execlists_kick(execlists); ENGINE_TRACE(engine, "depth->%d\n", - atomic_read(&execlists->tasklet.count)); + atomic_read(&engine->sched_engine->tasklet.count)); } static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) @@ -3101,6 +3173,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void remove_from_engine(struct i915_request *rq) +{ + struct intel_engine_cs *engine, *locked; + + /* + * Virtual engines complicate acquiring the engine timeline lock, + * as their rq->engine pointer is not stable until under that + * engine lock. The simple ploy we use is to take the lock then + * check that the rq still belongs to the newly locked engine. + */ + locked = READ_ONCE(rq->engine); + spin_lock_irq(&locked->sched_engine->lock); + while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { + spin_unlock(&locked->sched_engine->lock); + spin_lock(&engine->sched_engine->lock); + locked = engine; + } + list_del_init(&rq->sched.link); + + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + spin_unlock_irq(&locked->sched_engine->lock); + + i915_request_notify_execute_cb_imm(rq); +} + static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8) @@ -3110,11 +3218,62 @@ static bool can_preempt(struct intel_engine_cs *engine) return engine->class != RENDER_CLASS; } +static void kick_execlists(const struct i915_request *rq, int prio) +{ + struct intel_engine_cs *engine = rq->engine; + struct i915_sched_engine *sched_engine = engine->sched_engine; + const struct i915_request *inflight; + + /* + * We only need to kick the tasklet once for the high priority + * new context we add into the queue. + */ + if (prio <= sched_engine->queue_priority_hint) + return; + + rcu_read_lock(); + + /* Nothing currently active? We're overdue for a submission! */ + inflight = execlists_active(&engine->execlists); + if (!inflight) + goto unlock; + + /* + * If we are already the currently executing context, don't + * bother evaluating if we should preempt ourselves. + */ + if (inflight->context == rq->context) + goto unlock; + + ENGINE_TRACE(engine, + "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n", + prio, + rq->fence.context, rq->fence.seqno, + inflight->fence.context, inflight->fence.seqno, + inflight->sched.attr.priority); + + sched_engine->queue_priority_hint = prio; + + /* + * Allow preemption of low -> normal -> high, but we do + * not allow low priority tasks to preempt other low priority + * tasks under the impression that latency for low priority + * tasks does not matter (as much as background throughput), + * so kiss. + */ + if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight))) + tasklet_hi_schedule(&sched_engine->tasklet); + +unlock: + rcu_read_unlock(); +} + static void execlists_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = execlists_submit_request; - engine->schedule = i915_schedule; - engine->execlists.tasklet.callback = execlists_submission_tasklet; + engine->sched_engine->schedule = i915_schedule; + engine->sched_engine->kick_backend = kick_execlists; + engine->sched_engine->tasklet.callback = execlists_submission_tasklet; } static void execlists_shutdown(struct intel_engine_cs *engine) @@ -3122,7 +3281,7 @@ static void execlists_shutdown(struct intel_engine_cs *engine) /* Synchronise with residual timers and any softirq they raise */ del_timer_sync(&engine->execlists.timer); del_timer_sync(&engine->execlists.preempt); - tasklet_kill(&engine->execlists.tasklet); + tasklet_kill(&engine->sched_engine->tasklet); } static void execlists_release(struct intel_engine_cs *engine) @@ -3144,6 +3303,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; + engine->add_active_request = add_to_engine; + engine->remove_active_request = remove_from_engine; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind; @@ -3238,7 +3399,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) struct intel_uncore *uncore = engine->uncore; u32 base = engine->mmio_base; - tasklet_setup(&engine->execlists.tasklet, execlists_submission_tasklet); + tasklet_setup(&engine->sched_engine->tasklet, execlists_submission_tasklet); timer_setup(&engine->execlists.timer, execlists_timeslice, 0); timer_setup(&engine->execlists.preempt, execlists_preempt, 0); @@ -3255,6 +3416,10 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); execlists->ctrl_reg = uncore->regs + i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); + + engine->fw_domain = intel_uncore_forcewake_for_reg(engine->uncore, + RING_EXECLIST_CONTROL(engine->mmio_base), + FW_REG_WRITE); } else { execlists->submit_reg = uncore->regs + i915_mmio_reg_offset(RING_ELSP(base)); @@ -3272,7 +3437,8 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) execlists->csb_size = GEN11_CSB_ENTRIES; engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); - if (GRAPHICS_VER(engine->i915) >= 11) { + if (GRAPHICS_VER(engine->i915) >= 11 && + GRAPHICS_VER_FULL(engine->i915) < IP_VER(12, 50)) { execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); } @@ -3286,7 +3452,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) static struct list_head *virtual_queue(struct virtual_engine *ve) { - return &ve->base.execlists.default_priolist.requests; + return &ve->base.sched_engine->default_priolist.requests; } static void rcu_virtual_context_destroy(struct work_struct *wrk) @@ -3301,7 +3467,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) if (unlikely(ve->request)) { struct i915_request *old; - spin_lock_irq(&ve->base.active.lock); + spin_lock_irq(&ve->base.sched_engine->lock); old = fetch_and_zero(&ve->request); if (old) { @@ -3310,7 +3476,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) i915_request_put(old); } - spin_unlock_irq(&ve->base.active.lock); + spin_unlock_irq(&ve->base.sched_engine->lock); } /* @@ -3320,7 +3486,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) * rbtrees as in the case it is running in parallel, it may reinsert * the rb_node into a sibling. */ - tasklet_kill(&ve->base.execlists.tasklet); + tasklet_kill(&ve->base.sched_engine->tasklet); /* Decouple ourselves from the siblings, no more access allowed. */ for (n = 0; n < ve->num_siblings; n++) { @@ -3330,24 +3496,26 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) if (RB_EMPTY_NODE(node)) continue; - spin_lock_irq(&sibling->active.lock); + spin_lock_irq(&sibling->sched_engine->lock); - /* Detachment is lazily performed in the execlists tasklet */ + /* Detachment is lazily performed in the sched_engine->tasklet */ if (!RB_EMPTY_NODE(node)) rb_erase_cached(node, &sibling->execlists.virtual); - spin_unlock_irq(&sibling->active.lock); + spin_unlock_irq(&sibling->sched_engine->lock); } - GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); + GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.sched_engine->tasklet)); GEM_BUG_ON(!list_empty(virtual_queue(ve))); lrc_fini(&ve->context); intel_context_fini(&ve->context); - intel_breadcrumbs_free(ve->base.breadcrumbs); + if (ve->base.breadcrumbs) + intel_breadcrumbs_put(ve->base.breadcrumbs); + if (ve->base.sched_engine) + i915_sched_engine_put(ve->base.sched_engine); intel_engine_free_request_pool(&ve->base); - kfree(ve->bonds); kfree(ve); } @@ -3440,11 +3608,24 @@ static void virtual_context_exit(struct intel_context *ce) intel_engine_pm_put(ve->siblings[n]); } +static struct intel_engine_cs * +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) +{ + struct virtual_engine *ve = to_virtual_engine(engine); + + if (sibling >= ve->num_siblings) + return NULL; + + return ve->siblings[sibling]; +} + static const struct intel_context_ops virtual_context_ops = { .flags = COPS_HAS_INFLIGHT, .alloc = virtual_context_alloc, + .cancel_request = execlists_context_cancel_request, + .pre_pin = virtual_context_pre_pin, .pin = virtual_context_pin, .unpin = lrc_unpin, @@ -3454,6 +3635,8 @@ static const struct intel_context_ops virtual_context_ops = { .exit = virtual_context_exit, .destroy = virtual_context_destroy, + + .get_sibling = virtual_get_sibling, }; static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) @@ -3475,16 +3658,18 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", rq->fence.context, rq->fence.seqno, - mask, ve->base.execlists.queue_priority_hint); + mask, ve->base.sched_engine->queue_priority_hint); return mask; } static void virtual_submission_tasklet(struct tasklet_struct *t) { + struct i915_sched_engine *sched_engine = + from_tasklet(sched_engine, t, tasklet); struct virtual_engine * const ve = - from_tasklet(ve, t, base.execlists.tasklet); - const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint); + (struct virtual_engine *)sched_engine->private_data; + const int prio = READ_ONCE(sched_engine->queue_priority_hint); intel_engine_mask_t mask; unsigned int n; @@ -3503,7 +3688,7 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) if (!READ_ONCE(ve->request)) break; /* already handled by a sibling's tasklet */ - spin_lock_irq(&sibling->active.lock); + spin_lock_irq(&sibling->sched_engine->lock); if (unlikely(!(mask & sibling->mask))) { if (!RB_EMPTY_NODE(&node->rb)) { @@ -3552,11 +3737,11 @@ static void virtual_submission_tasklet(struct tasklet_struct *t) submit_engine: GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); node->prio = prio; - if (first && prio > sibling->execlists.queue_priority_hint) - tasklet_hi_schedule(&sibling->execlists.tasklet); + if (first && prio > sibling->sched_engine->queue_priority_hint) + tasklet_hi_schedule(&sibling->sched_engine->tasklet); unlock_engine: - spin_unlock_irq(&sibling->active.lock); + spin_unlock_irq(&sibling->sched_engine->lock); if (intel_context_inflight(&ve->context)) break; @@ -3574,7 +3759,7 @@ static void virtual_submit_request(struct i915_request *rq) GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); - spin_lock_irqsave(&ve->base.active.lock, flags); + spin_lock_irqsave(&ve->base.sched_engine->lock, flags); /* By the time we resubmit a request, it may be completed */ if (__i915_request_is_complete(rq)) { @@ -3588,68 +3773,25 @@ static void virtual_submit_request(struct i915_request *rq) i915_request_put(ve->request); } - ve->base.execlists.queue_priority_hint = rq_prio(rq); + ve->base.sched_engine->queue_priority_hint = rq_prio(rq); ve->request = i915_request_get(rq); GEM_BUG_ON(!list_empty(virtual_queue(ve))); list_move_tail(&rq->sched.link, virtual_queue(ve)); - tasklet_hi_schedule(&ve->base.execlists.tasklet); + tasklet_hi_schedule(&ve->base.sched_engine->tasklet); unlock: - spin_unlock_irqrestore(&ve->base.active.lock, flags); + spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags); } -static struct ve_bond * -virtual_find_bond(struct virtual_engine *ve, - const struct intel_engine_cs *master) -{ - int i; - - for (i = 0; i < ve->num_bonds; i++) { - if (ve->bonds[i].master == master) - return &ve->bonds[i]; - } - - return NULL; -} - -static void -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) -{ - struct virtual_engine *ve = to_virtual_engine(rq->engine); - intel_engine_mask_t allowed, exec; - struct ve_bond *bond; - - allowed = ~to_request(signal)->engine->mask; - - bond = virtual_find_bond(ve, to_request(signal)->engine); - if (bond) - allowed &= bond->sibling_mask; - - /* Restrict the bonded request to run on only the available engines */ - exec = READ_ONCE(rq->execution_mask); - while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) - ; - - /* Prevent the master from being re-run on the bonded engines */ - to_request(signal)->execution_mask &= ~allowed; -} - -struct intel_context * -intel_execlists_create_virtual(struct intel_engine_cs **siblings, - unsigned int count) +static struct intel_context * +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) { struct virtual_engine *ve; unsigned int n; int err; - if (count == 0) - return ERR_PTR(-EINVAL); - - if (count == 1) - return intel_context_create(siblings[0]); - ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); if (!ve) return ERR_PTR(-ENOMEM); @@ -3681,19 +3823,24 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); - intel_engine_init_active(&ve->base, ENGINE_VIRTUAL); intel_engine_init_execlists(&ve->base); + ve->base.sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL); + if (!ve->base.sched_engine) { + err = -ENOMEM; + goto err_put; + } + ve->base.sched_engine->private_data = &ve->base; + ve->base.cops = &virtual_context_ops; ve->base.request_alloc = execlists_request_alloc; - ve->base.schedule = i915_schedule; + ve->base.sched_engine->schedule = i915_schedule; + ve->base.sched_engine->kick_backend = kick_execlists; ve->base.submit_request = virtual_submit_request; - ve->base.bond_execute = virtual_bond_execute; INIT_LIST_HEAD(virtual_queue(ve)); - ve->base.execlists.queue_priority_hint = INT_MIN; - tasklet_setup(&ve->base.execlists.tasklet, virtual_submission_tasklet); + tasklet_setup(&ve->base.sched_engine->tasklet, virtual_submission_tasklet); intel_context_init(&ve->context, &ve->base); @@ -3721,7 +3868,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, * layering if we handle cloning of the requests and * submitting a copy into each backend. */ - if (sibling->execlists.tasklet.callback != + if (sibling->sched_engine->tasklet.callback != execlists_submission_tasklet) { err = -ENODEV; goto err_put; @@ -3756,6 +3903,8 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size; + ve->base.add_active_request = sibling->add_active_request; + ve->base.remove_active_request = sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; @@ -3776,70 +3925,6 @@ err_put: return ERR_PTR(err); } -struct intel_context * -intel_execlists_clone_virtual(struct intel_engine_cs *src) -{ - struct virtual_engine *se = to_virtual_engine(src); - struct intel_context *dst; - - dst = intel_execlists_create_virtual(se->siblings, - se->num_siblings); - if (IS_ERR(dst)) - return dst; - - if (se->num_bonds) { - struct virtual_engine *de = to_virtual_engine(dst->engine); - - de->bonds = kmemdup(se->bonds, - sizeof(*se->bonds) * se->num_bonds, - GFP_KERNEL); - if (!de->bonds) { - intel_context_put(dst); - return ERR_PTR(-ENOMEM); - } - - de->num_bonds = se->num_bonds; - } - - return dst; -} - -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, - const struct intel_engine_cs *master, - const struct intel_engine_cs *sibling) -{ - struct virtual_engine *ve = to_virtual_engine(engine); - struct ve_bond *bond; - int n; - - /* Sanity check the sibling is part of the virtual engine */ - for (n = 0; n < ve->num_siblings; n++) - if (sibling == ve->siblings[n]) - break; - if (n == ve->num_siblings) - return -EINVAL; - - bond = virtual_find_bond(ve, master); - if (bond) { - bond->sibling_mask |= sibling->mask; - return 0; - } - - bond = krealloc(ve->bonds, - sizeof(*bond) * (ve->num_bonds + 1), - GFP_KERNEL); - if (!bond) - return -ENOMEM; - - bond[ve->num_bonds].master = master; - bond[ve->num_bonds].sibling_mask = sibling->mask; - - ve->bonds = bond; - ve->num_bonds++; - - return 0; -} - void intel_execlists_show_requests(struct intel_engine_cs *engine, struct drm_printer *m, void (*show_request)(struct drm_printer *m, @@ -3849,16 +3934,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, unsigned int max) { const struct intel_engine_execlists *execlists = &engine->execlists; + struct i915_sched_engine *sched_engine = engine->sched_engine; struct i915_request *rq, *last; unsigned long flags; unsigned int count; struct rb_node *rb; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&sched_engine->lock, flags); last = NULL; count = 0; - list_for_each_entry(rq, &engine->active.requests, sched.link) { + list_for_each_entry(rq, &sched_engine->requests, sched.link) { if (count++ < max - 1) show_request(m, rq, "\t\t", 0); else @@ -3873,13 +3959,13 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, show_request(m, last, "\t\t", 0); } - if (execlists->queue_priority_hint != INT_MIN) + if (sched_engine->queue_priority_hint != INT_MIN) drm_printf(m, "\t\tQueue priority hint: %d\n", - READ_ONCE(execlists->queue_priority_hint)); + READ_ONCE(sched_engine->queue_priority_hint)); last = NULL; count = 0; - for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) { + for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) { struct i915_priolist *p = rb_entry(rb, typeof(*p), node); priolist_for_each_request(rq, p) { @@ -3921,7 +4007,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, show_request(m, last, "\t\t", 0); } - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h index 4ca9b475e252..a1aa92c983a5 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h @@ -32,15 +32,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, int indent), unsigned int max); -struct intel_context * -intel_execlists_create_virtual(struct intel_engine_cs **siblings, - unsigned int count); - -struct intel_context * -intel_execlists_clone_virtual(struct intel_engine_cs *src); - -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, - const struct intel_engine_cs *master, - const struct intel_engine_cs *sibling); +bool +intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine); #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 20e46b843324..de3ac58fceec 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -826,13 +826,13 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) phys_addr = pci_resource_start(pdev, 0) + pci_resource_len(pdev, 0) / 2; /* - * On BXT+/CNL+ writes larger than 64 bit to the GTT pagetable range + * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range * will be dropped. For WC mappings in general we have 64 byte burst * writes when the WC buffer is flushed, so we can't use it, but have to * resort to an uncached mapping. The WC issue is easily caught by the * readback check when writing GTT PTE entries. */ - if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 10) + if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 11) ggtt->gsm = ioremap(phys_addr, size); else ggtt->gsm = ioremap_wc(phys_addr, size); @@ -1494,7 +1494,7 @@ intel_partial_pages(const struct i915_ggtt_view *view, if (ret) goto err_sg_alloc; - iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset, true); + iter = i915_gem_object_get_sg_dma(obj, view->partial.offset, &offset); GEM_BUG_ON(!iter); sg = st->sgl; diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index 2694dbb9967e..1c3af0fc0456 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -123,8 +123,10 @@ #define MI_SEMAPHORE_SAD_NEQ_SDD (5 << 12) #define MI_SEMAPHORE_TOKEN_MASK REG_GENMASK(9, 5) #define MI_SEMAPHORE_TOKEN_SHIFT 5 +#define MI_STORE_DATA_IMM MI_INSTR(0x20, 0) #define MI_STORE_DWORD_IMM MI_INSTR(0x20, 1) #define MI_STORE_DWORD_IMM_GEN4 MI_INSTR(0x20, 2) +#define MI_STORE_QWORD_IMM_GEN8 (MI_INSTR(0x20, 3) | REG_BIT(21)) #define MI_MEM_VIRTUAL (1 << 22) /* 945,g33,965 */ #define MI_USE_GGTT (1 << 22) /* g4x+ */ #define MI_STORE_DWORD_INDEX MI_INSTR(0x21, 1) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 2161bf01ef8b..62d40c986642 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -13,6 +13,7 @@ #include "intel_gt_clock_utils.h" #include "intel_gt_pm.h" #include "intel_gt_requests.h" +#include "intel_migrate.h" #include "intel_mocs.h" #include "intel_rc6.h" #include "intel_renderstate.h" @@ -40,8 +41,8 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915) intel_gt_init_timelines(gt); intel_gt_pm_init_early(gt); - intel_rps_init_early(>->rps); intel_uc_init_early(>->uc); + intel_rps_init_early(>->rps); } int intel_gt_probe_lmem(struct intel_gt *gt) @@ -83,13 +84,73 @@ void intel_gt_init_hw_early(struct intel_gt *gt, struct i915_ggtt *ggtt) gt->ggtt = ggtt; } +static const struct intel_mmio_range icl_l3bank_steering_table[] = { + { 0x00B100, 0x00B3FF }, + {}, +}; + +static const struct intel_mmio_range xehpsdv_mslice_steering_table[] = { + { 0x004000, 0x004AFF }, + { 0x00C800, 0x00CFFF }, + { 0x00DD00, 0x00DDFF }, + { 0x00E900, 0x00FFFF }, /* 0xEA00 - OxEFFF is unused */ + {}, +}; + +static const struct intel_mmio_range xehpsdv_lncf_steering_table[] = { + { 0x00B000, 0x00B0FF }, + { 0x00D800, 0x00D8FF }, + {}, +}; + +static const struct intel_mmio_range dg2_lncf_steering_table[] = { + { 0x00B000, 0x00B0FF }, + { 0x00D880, 0x00D8FF }, + {}, +}; + +static u16 slicemask(struct intel_gt *gt, int count) +{ + u64 dss_mask = intel_sseu_get_subslices(>->info.sseu, 0); + + return intel_slicemask_from_dssmask(dss_mask, count); +} + int intel_gt_init_mmio(struct intel_gt *gt) { + struct drm_i915_private *i915 = gt->i915; + intel_gt_init_clock_frequency(gt); intel_uc_init_mmio(>->uc); intel_sseu_info_init(gt); + /* + * An mslice is unavailable only if both the meml3 for the slice is + * disabled *and* all of the DSS in the slice (quadrant) are disabled. + */ + if (HAS_MSLICES(i915)) + gt->info.mslice_mask = + slicemask(gt, GEN_DSS_PER_MSLICE) | + (intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) & + GEN12_MEML3_EN_MASK); + + if (IS_DG2(i915)) { + gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table; + gt->steering_table[LNCF] = dg2_lncf_steering_table; + } else if (IS_XEHPSDV(i915)) { + gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table; + gt->steering_table[LNCF] = xehpsdv_lncf_steering_table; + } else if (GRAPHICS_VER(i915) >= 11 && + GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) { + gt->steering_table[L3BANK] = icl_l3bank_steering_table; + gt->info.l3bank_mask = + ~intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) & + GEN10_L3BANK_MASK; + } else if (HAS_MSLICES(i915)) { + MISSING_CASE(INTEL_INFO(i915)->platform); + } + return intel_engines_init_mmio(gt); } @@ -192,7 +253,7 @@ static void clear_register(struct intel_uncore *uncore, i915_reg_t reg) intel_uncore_rmw(uncore, reg, 0, 0); } -static void gen8_clear_engine_error_register(struct intel_engine_cs *engine) +static void gen6_clear_engine_error_register(struct intel_engine_cs *engine) { GEN6_RING_FAULT_REG_RMW(engine, RING_FAULT_VALID, 0); GEN6_RING_FAULT_REG_POSTING_READ(engine); @@ -238,7 +299,7 @@ intel_gt_clear_error_registers(struct intel_gt *gt, enum intel_engine_id id; for_each_engine_masked(engine, gt, engine_mask, id) - gen8_clear_engine_error_register(engine); + gen6_clear_engine_error_register(engine); } } @@ -572,6 +633,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); } +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{ + long remaining_timeout; + + /* If the device is asleep, we have no requests outstanding */ + if (!intel_gt_pm_is_awake(gt)) + return 0; + + while ((timeout = intel_gt_retire_requests_timeout(gt, timeout, + &remaining_timeout)) > 0) { + cond_resched(); + if (signal_pending(current)) + return -EINTR; + } + + return timeout ? timeout : intel_uc_wait_for_idle(>->uc, + remaining_timeout); +} + int intel_gt_init(struct intel_gt *gt) { int err; @@ -622,10 +702,14 @@ int intel_gt_init(struct intel_gt *gt) if (err) goto err_gt; + intel_uc_init_late(>->uc); + err = i915_inject_probe_error(gt->i915, -EIO); if (err) goto err_gt; + intel_migrate_init(>->migrate, gt); + goto out_fw; err_gt: __intel_gt_disable(gt); @@ -649,6 +733,7 @@ void intel_gt_driver_remove(struct intel_gt *gt) { __intel_gt_disable(gt); + intel_migrate_fini(>->migrate); intel_uc_driver_remove(>->uc); intel_engines_release(gt); @@ -697,6 +782,112 @@ void intel_gt_driver_late_release(struct intel_gt *gt) intel_engines_free(gt); } +/** + * intel_gt_reg_needs_read_steering - determine whether a register read + * requires explicit steering + * @gt: GT structure + * @reg: the register to check steering requirements for + * @type: type of multicast steering to check + * + * Determines whether @reg needs explicit steering of a specific type for + * reads. + * + * Returns false if @reg does not belong to a register range of the given + * steering type, or if the default (subslice-based) steering IDs are suitable + * for @type steering too. + */ +static bool intel_gt_reg_needs_read_steering(struct intel_gt *gt, + i915_reg_t reg, + enum intel_steering_type type) +{ + const u32 offset = i915_mmio_reg_offset(reg); + const struct intel_mmio_range *entry; + + if (likely(!intel_gt_needs_read_steering(gt, type))) + return false; + + for (entry = gt->steering_table[type]; entry->end; entry++) { + if (offset >= entry->start && offset <= entry->end) + return true; + } + + return false; +} + +/** + * intel_gt_get_valid_steering - determines valid IDs for a class of MCR steering + * @gt: GT structure + * @type: multicast register type + * @sliceid: Slice ID returned + * @subsliceid: Subslice ID returned + * + * Determines sliceid and subsliceid values that will steer reads + * of a specific multicast register class to a valid value. + */ +static void intel_gt_get_valid_steering(struct intel_gt *gt, + enum intel_steering_type type, + u8 *sliceid, u8 *subsliceid) +{ + switch (type) { + case L3BANK: + GEM_DEBUG_WARN_ON(!gt->info.l3bank_mask); /* should be impossible! */ + + *sliceid = 0; /* unused */ + *subsliceid = __ffs(gt->info.l3bank_mask); + break; + case MSLICE: + GEM_DEBUG_WARN_ON(!gt->info.mslice_mask); /* should be impossible! */ + + *sliceid = __ffs(gt->info.mslice_mask); + *subsliceid = 0; /* unused */ + break; + case LNCF: + GEM_DEBUG_WARN_ON(!gt->info.mslice_mask); /* should be impossible! */ + + /* + * An LNCF is always present if its mslice is present, so we + * can safely just steer to LNCF 0 in all cases. + */ + *sliceid = __ffs(gt->info.mslice_mask) << 1; + *subsliceid = 0; /* unused */ + break; + default: + MISSING_CASE(type); + *sliceid = 0; + *subsliceid = 0; + } +} + +/** + * intel_gt_read_register_fw - reads a GT register with support for multicast + * @gt: GT structure + * @reg: register to read + * + * This function will read a GT register. If the register is a multicast + * register, the read will be steered to a valid instance (i.e., one that + * isn't fused off or powered down by power gating). + * + * Returns the value from a valid instance of @reg. + */ +u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg) +{ + int type; + u8 sliceid, subsliceid; + + for (type = 0; type < NUM_STEERING_TYPES; type++) { + if (intel_gt_reg_needs_read_steering(gt, reg, type)) { + intel_gt_get_valid_steering(gt, type, &sliceid, + &subsliceid); + return intel_uncore_read_with_mcr_steering_fw(gt->uncore, + reg, + sliceid, + subsliceid); + } + } + + return intel_uncore_read_fw(gt->uncore, reg); +} + void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 7ec395cace69..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt); void intel_gt_driver_late_release(struct intel_gt *gt); +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); + void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask); @@ -75,6 +77,14 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt) return unlikely(test_bit(I915_WEDGED, >->reset.flags)); } +static inline bool intel_gt_needs_read_steering(struct intel_gt *gt, + enum intel_steering_type type) +{ + return gt->steering_table[type]; +} + +u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg); + void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c index 9f0e729d2d15..3513d6f90747 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c @@ -24,8 +24,8 @@ static u32 read_reference_ts_freq(struct intel_uncore *uncore) return base_freq + frac_freq; } -static u32 gen10_get_crystal_clock_freq(struct intel_uncore *uncore, - u32 rpm_config_reg) +static u32 gen9_get_crystal_clock_freq(struct intel_uncore *uncore, + u32 rpm_config_reg) { u32 f19_2_mhz = 19200000; u32 f24_mhz = 24000000; @@ -128,10 +128,10 @@ static u32 read_clock_frequency(struct intel_uncore *uncore) } else { u32 c0 = intel_uncore_read(uncore, RPM_CONFIG0); - if (GRAPHICS_VER(uncore->i915) <= 10) - freq = gen10_get_crystal_clock_freq(uncore, c0); - else + if (GRAPHICS_VER(uncore->i915) >= 11) freq = gen11_get_crystal_clock_freq(uncore, c0); + else + freq = gen9_get_crystal_clock_freq(uncore, c0); /* * Now figure out how the command stream's timestamp diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index c13462274fe8..b2de83be4d97 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -184,7 +184,13 @@ void gen11_gt_irq_reset(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_BCS_RSVD_INTR_MASK, ~0); intel_uncore_write(uncore, GEN11_VCS0_VCS1_INTR_MASK, ~0); intel_uncore_write(uncore, GEN11_VCS2_VCS3_INTR_MASK, ~0); + if (HAS_ENGINE(gt, VCS4) || HAS_ENGINE(gt, VCS5)) + intel_uncore_write(uncore, GEN12_VCS4_VCS5_INTR_MASK, ~0); + if (HAS_ENGINE(gt, VCS6) || HAS_ENGINE(gt, VCS7)) + intel_uncore_write(uncore, GEN12_VCS6_VCS7_INTR_MASK, ~0); intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~0); + if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) + intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~0); intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); @@ -218,8 +224,13 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_BCS_RSVD_INTR_MASK, ~smask); intel_uncore_write(uncore, GEN11_VCS0_VCS1_INTR_MASK, ~dmask); intel_uncore_write(uncore, GEN11_VCS2_VCS3_INTR_MASK, ~dmask); + if (HAS_ENGINE(gt, VCS4) || HAS_ENGINE(gt, VCS5)) + intel_uncore_write(uncore, GEN12_VCS4_VCS5_INTR_MASK, ~dmask); + if (HAS_ENGINE(gt, VCS6) || HAS_ENGINE(gt, VCS7)) + intel_uncore_write(uncore, GEN12_VCS6_VCS7_INTR_MASK, ~dmask); intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~dmask); - + if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) + intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~dmask); /* * RPS interrupts will get enabled/disabled on demand when RPS itself * is enabled/disabled. diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..dea8e2479897 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -6,7 +6,6 @@ #include <linux/suspend.h> #include "i915_drv.h" -#include "i915_globals.h" #include "i915_params.h" #include "intel_context.h" #include "intel_engine_pm.h" @@ -67,8 +66,6 @@ static int __gt_unpark(struct intel_wakeref *wf) GT_TRACE(gt, "\n"); - i915_globals_unpark(); - /* * It seems that the DMC likes to transition between the DC states a lot * when there are no connected displays (no active power domains) during @@ -116,8 +113,6 @@ static int __gt_park(struct intel_wakeref *wf) GEM_BUG_ON(!wakeref); intel_display_power_put_async(i915, POWER_DOMAIN_GT_IRQ, wakeref); - i915_globals_park(); - return 0; } @@ -174,8 +169,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt); - intel_uc_sanitize(>->uc); - for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine); @@ -191,6 +184,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); } + intel_uc_reset(>->uc, false); + for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine); @@ -243,6 +238,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; } + intel_uc_reset_finish(>->uc); + intel_rps_enable(>->rps); intel_llc_enable(>->llc); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..edb881d75630 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); } -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout) { struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn; @@ -195,22 +196,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - return active_count ? timeout : 0; -} - -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{ - /* If the device is asleep, we have no requests outstanding */ - if (!intel_gt_pm_is_awake(gt)) - return 0; - - while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) { - cond_resched(); - if (signal_pending(current)) - return -EINTR; - } + if (remaining_timeout) + *remaining_timeout = timeout; - return timeout; + return active_count ? timeout : 0; } static void retire_work_handler(struct work_struct *work) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h index fcc30a6e4fe9..51dbe0e3294e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h @@ -6,14 +6,17 @@ #ifndef INTEL_GT_REQUESTS_H #define INTEL_GT_REQUESTS_H +#include <stddef.h> + struct intel_engine_cs; struct intel_gt; struct intel_timeline; -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout); +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout); static inline void intel_gt_retire_requests(struct intel_gt *gt) { - intel_gt_retire_requests_timeout(gt, 0); + intel_gt_retire_requests_timeout(gt, 0, NULL); } void intel_engine_init_retire(struct intel_engine_cs *engine); @@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine, struct intel_timeline *tl); void intel_engine_fini_retire(struct intel_engine_cs *engine); -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); - void intel_gt_init_requests(struct intel_gt *gt); void intel_gt_park_requests(struct intel_gt *gt); void intel_gt_unpark_requests(struct intel_gt *gt); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index fecfacf551d5..a81e21bf1bd1 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -24,6 +24,7 @@ #include "intel_reset_types.h" #include "intel_rc6_types.h" #include "intel_rps_types.h" +#include "intel_migrate_types.h" #include "intel_wakeref.h" struct drm_i915_private; @@ -31,6 +32,33 @@ struct i915_ggtt; struct intel_engine_cs; struct intel_uncore; +struct intel_mmio_range { + u32 start; + u32 end; +}; + +/* + * The hardware has multiple kinds of multicast register ranges that need + * special register steering (and future platforms are expected to add + * additional types). + * + * During driver startup, we initialize the steering control register to + * direct reads to a slice/subslice that are valid for the 'subslice' class + * of multicast registers. If another type of steering does not have any + * overlap in valid steering targets with 'subslice' style registers, we will + * need to explicitly re-steer reads of registers of the other type. + * + * Only the replication types that may need additional non-default steering + * are listed here. + */ +enum intel_steering_type { + L3BANK, + MSLICE, + LNCF, + + NUM_STEERING_TYPES +}; + enum intel_submission_method { INTEL_SUBMISSION_RING, INTEL_SUBMISSION_ELSP, @@ -145,8 +173,15 @@ struct intel_gt { struct i915_vma *scratch; + struct intel_migrate migrate; + + const struct intel_mmio_range *steering_table[NUM_STEERING_TYPES]; + struct intel_gt_info { intel_engine_mask_t engine_mask; + + u32 l3bank_mask; + u8 num_engines; /* Media engine access to SFC per instance */ @@ -154,6 +189,8 @@ struct intel_gt { /* Slice/subslice/EU info */ struct sseu_dev_info sseu; + + unsigned long mslice_mask; } info; }; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 084ea65d59c0..e137dd32b5b8 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -16,7 +16,19 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz) { struct drm_i915_gem_object *obj; - obj = i915_gem_object_create_lmem(vm->i915, sz, 0); + /* + * To avoid severe over-allocation when dealing with min_page_size + * restrictions, we override that behaviour here by allowing an object + * size and page layout which can be smaller. In practice this should be + * totally fine, since GTT paging structures are not typically inserted + * into the GTT. + * + * Note that we also hit this path for the scratch page, and for this + * case it might need to be 64K, but that should work fine here since we + * used the passed in size for the page size, which should ensure it + * also has the same alignment. + */ + obj = __i915_gem_object_create_lmem_with_ps(vm->i915, sz, sz, 0); /* * Ensure all paging structures for this vm share the same dma-resv * object underneath, with the idea that one object_lock() will lock @@ -414,7 +426,7 @@ static void tgl_setup_private_ppat(struct intel_uncore *uncore) intel_uncore_write(uncore, GEN12_PAT_INDEX(7), GEN8_PPAT_WB); } -static void cnl_setup_private_ppat(struct intel_uncore *uncore) +static void icl_setup_private_ppat(struct intel_uncore *uncore) { intel_uncore_write(uncore, GEN10_PAT_INDEX(0), @@ -514,8 +526,8 @@ void setup_private_pat(struct intel_uncore *uncore) if (GRAPHICS_VER(i915) >= 12) tgl_setup_private_ppat(uncore); - else if (GRAPHICS_VER(i915) >= 10) - cnl_setup_private_ppat(uncore); + else if (GRAPHICS_VER(i915) >= 11) + icl_setup_private_ppat(uncore); else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915)) chv_setup_private_ppat(uncore); else diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index edea95b97c36..bc7153018ebd 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -140,7 +140,6 @@ typedef u64 gen8_pte_t; enum i915_cache_level; -struct drm_i915_file_private; struct drm_i915_gem_object; struct i915_fence_reg; struct i915_vma; @@ -220,16 +219,6 @@ struct i915_address_space { struct intel_gt *gt; struct drm_i915_private *i915; struct device *dma; - /* - * Every address space belongs to a struct file - except for the global - * GTT that is owned by the driver (and so @file is set to NULL). In - * principle, no information should leak from one context to another - * (or between files/processes etc) unless explicitly shared by the - * owner. Tracking the owner is important in order to free up per-file - * objects along with the file, to aide resource tracking, and to - * assign blame. - */ - struct drm_i915_file_private *file; u64 total; /* size addr space maps (ex. 2GB for ggtt) */ u64 reserved; /* size addr space reserved */ @@ -296,6 +285,13 @@ struct i915_address_space { u32 flags); void (*cleanup)(struct i915_address_space *vm); + void (*foreach)(struct i915_address_space *vm, + u64 start, u64 length, + void (*fn)(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data), + void *data); + struct i915_vma_ops vma_ops; I915_SELFTEST_DECLARE(struct fault_attr fault_attr); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index a27bac0a4bfb..bb4af4977920 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -70,7 +70,7 @@ static void set_offsets(u32 *regs, if (close) { /* Close the batch; used mainly by live_lrc_layout() */ *regs = MI_BATCH_BUFFER_END; - if (GRAPHICS_VER(engine->i915) >= 10) + if (GRAPHICS_VER(engine->i915) >= 11) *regs |= BIT(0); } } @@ -484,6 +484,47 @@ static const u8 gen12_rcs_offsets[] = { END }; +static const u8 xehp_rcs_offsets[] = { + NOP(1), + LRI(13, POSTED), + REG16(0x244), + REG(0x034), + REG(0x030), + REG(0x038), + REG(0x03c), + REG(0x168), + REG(0x140), + REG(0x110), + REG(0x1c0), + REG(0x1c4), + REG(0x1c8), + REG(0x180), + REG16(0x2b4), + + NOP(5), + LRI(9, POSTED), + REG16(0x3a8), + REG16(0x28c), + REG16(0x288), + REG16(0x284), + REG16(0x280), + REG16(0x27c), + REG16(0x278), + REG16(0x274), + REG16(0x270), + + LRI(3, POSTED), + REG(0x1b0), + REG16(0x5a8), + REG16(0x5ac), + + NOP(6), + LRI(1, 0), + REG(0x0c8), + + END +}; + #undef END #undef REG16 #undef REG @@ -502,7 +543,9 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine) !intel_engine_has_relative_mmio(engine)); if (engine->class == RENDER_CLASS) { - if (GRAPHICS_VER(engine->i915) >= 12) + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) + return xehp_rcs_offsets; + else if (GRAPHICS_VER(engine->i915) >= 12) return gen12_rcs_offsets; else if (GRAPHICS_VER(engine->i915) >= 11) return gen11_rcs_offsets; @@ -522,7 +565,9 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine) static int lrc_ring_mi_mode(const struct intel_engine_cs *engine) { - if (GRAPHICS_VER(engine->i915) >= 12) + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) + return 0x70; + else if (GRAPHICS_VER(engine->i915) >= 12) return 0x60; else if (GRAPHICS_VER(engine->i915) >= 9) return 0x54; @@ -534,7 +579,9 @@ static int lrc_ring_mi_mode(const struct intel_engine_cs *engine) static int lrc_ring_gpr0(const struct intel_engine_cs *engine) { - if (GRAPHICS_VER(engine->i915) >= 12) + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) + return 0x84; + else if (GRAPHICS_VER(engine->i915) >= 12) return 0x74; else if (GRAPHICS_VER(engine->i915) >= 9) return 0x68; @@ -578,10 +625,16 @@ static int lrc_ring_indirect_offset(const struct intel_engine_cs *engine) static int lrc_ring_cmd_buf_cctl(const struct intel_engine_cs *engine) { - if (engine->class != RENDER_CLASS) - return -1; - if (GRAPHICS_VER(engine->i915) >= 12) + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) + /* + * Note that the CSFE context has a dummy slot for CMD_BUF_CCTL + * simply to match the RCS context image layout. + */ + return 0xc6; + else if (engine->class != RENDER_CLASS) + return -1; + else if (GRAPHICS_VER(engine->i915) >= 12) return 0xb6; else if (GRAPHICS_VER(engine->i915) >= 11) return 0xaa; @@ -600,8 +653,6 @@ lrc_ring_indirect_offset_default(const struct intel_engine_cs *engine) return GEN12_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; case 11: return GEN11_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; - case 10: - return GEN10_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; case 9: return GEN9_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; case 8: @@ -845,7 +896,7 @@ int lrc_alloc(struct intel_context *ce, struct intel_engine_cs *engine) if (IS_ERR(vma)) return PTR_ERR(vma); - ring = intel_engine_create_ring(engine, (unsigned long)ce->ring); + ring = intel_engine_create_ring(engine, ce->ring_size); if (IS_ERR(ring)) { err = PTR_ERR(ring); goto err_vma; @@ -1101,6 +1152,14 @@ setup_indirect_ctx_bb(const struct intel_context *ce, * bits 55-60: SW counter * bits 61-63: engine class * + * On Xe_HP, the upper dword of the descriptor has a new format: + * + * bits 32-37: virtual function number + * bit 38: mbz, reserved for use by hardware + * bits 39-54: SW context ID + * bits 55-57: reserved + * bits 58-63: SW counter + * * engine info, SW context ID and SW counter need to form a unique number * (Context ID) per lrc. */ @@ -1387,40 +1446,6 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) return batch; } -static u32 * -gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) -{ - int i; - - /* - * WaPipeControlBefore3DStateSamplePattern: cnl - * - * Ensure the engine is idle prior to programming a - * 3DSTATE_SAMPLE_PATTERN during a context restore. - */ - batch = gen8_emit_pipe_control(batch, - PIPE_CONTROL_CS_STALL, - 0); - /* - * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for - * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in - * total. However, a PIPE_CONTROL is 6 dwords long, not 4, which is - * confusing. Since gen8_emit_pipe_control() already advances the - * batch by 6 dwords, we advance the other 10 here, completing a - * cacheline. It's not clear if the workaround requires this padding - * before other commands, or if it's just the regular padding we would - * already have for the workaround bb, so leave it here for now. - */ - for (i = 0; i < 10; i++) - *batch++ = MI_NOOP; - - /* Pad to end of cacheline */ - while ((unsigned long)batch % CACHELINE_BYTES) - *batch++ = MI_NOOP; - - return batch; -} - #define CTX_WA_BB_SIZE (PAGE_SIZE) static int lrc_create_wa_ctx(struct intel_engine_cs *engine) @@ -1473,10 +1498,6 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine) case 12: case 11: return; - case 10: - wa_bb_fn[0] = gen10_init_indirectctx_bb; - wa_bb_fn[1] = NULL; - break; case 9: wa_bb_fn[0] = gen9_init_indirectctx_bb; wa_bb_fn[1] = NULL; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..f785d0ed238f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -87,9 +87,10 @@ #define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0) #define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ #define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) +/* in Xe_HP ID 0xFFFF is reserved to indicate "invalid context" */ +#define XEHP_MAX_CONTEXT_HW_ID 0xFFFF #endif /* _INTEL_LRC_REG_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c new file mode 100644 index 000000000000..1dac21aa7e5c --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -0,0 +1,688 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_context.h" +#include "intel_gpu_commands.h" +#include "intel_gt.h" +#include "intel_gtt.h" +#include "intel_migrate.h" +#include "intel_ring.h" + +struct insert_pte_data { + u64 offset; + bool is_lmem; +}; + +#define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ + +static bool engine_supports_migration(struct intel_engine_cs *engine) +{ + if (!engine) + return false; + + /* + * We need the ability to prevent aribtration (MI_ARB_ON_OFF), + * the ability to write PTE using inline data (MI_STORE_DATA) + * and of course the ability to do the block transfer (blits). + */ + GEM_BUG_ON(engine->class != COPY_ENGINE_CLASS); + + return true; +} + +static void insert_pte(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data) +{ + struct insert_pte_data *d = data; + + vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, + d->is_lmem ? PTE_LM : 0); + d->offset += PAGE_SIZE; +} + +static struct i915_address_space *migrate_vm(struct intel_gt *gt) +{ + struct i915_vm_pt_stash stash = {}; + struct i915_ppgtt *vm; + int err; + int i; + + /* + * We construct a very special VM for use by all migration contexts, + * it is kept pinned so that it can be used at any time. As we need + * to pre-allocate the page directories for the migration VM, this + * limits us to only using a small number of prepared vma. + * + * To be able to pipeline and reschedule migration operations while + * avoiding unnecessary contention on the vm itself, the PTE updates + * are inline with the blits. All the blits use the same fixed + * addresses, with the backing store redirection being updated on the + * fly. Only 2 implicit vma are used for all migration operations. + * + * We lay the ppGTT out as: + * + * [0, CHUNK_SZ) -> first object + * [CHUNK_SZ, 2 * CHUNK_SZ) -> second object + * [2 * CHUNK_SZ, 2 * CHUNK_SZ + 2 * CHUNK_SZ >> 9] -> PTE + * + * By exposing the dma addresses of the page directories themselves + * within the ppGTT, we are then able to rewrite the PTE prior to use. + * But the PTE update and subsequent migration operation must be atomic, + * i.e. within the same non-preemptible window so that we do not switch + * to another migration context that overwrites the PTE. + * + * TODO: Add support for huge LMEM PTEs + */ + + vm = i915_ppgtt_create(gt); + if (IS_ERR(vm)) + return ERR_CAST(vm); + + if (!vm->vm.allocate_va_range || !vm->vm.foreach) { + err = -ENODEV; + goto err_vm; + } + + /* + * Each engine instance is assigned its own chunk in the VM, so + * that we can run multiple instances concurrently + */ + for (i = 0; i < ARRAY_SIZE(gt->engine_class[COPY_ENGINE_CLASS]); i++) { + struct intel_engine_cs *engine; + u64 base = (u64)i << 32; + struct insert_pte_data d = {}; + struct i915_gem_ww_ctx ww; + u64 sz; + + engine = gt->engine_class[COPY_ENGINE_CLASS][i]; + if (!engine_supports_migration(engine)) + continue; + + /* + * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need + * 4x2 page directories for source/destination. + */ + sz = 2 * CHUNK_SZ; + d.offset = base + sz; + + /* + * We need another page directory setup so that we can write + * the 8x512 PTE in each chunk. + */ + sz += (sz >> 12) * sizeof(u64); + + err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz); + if (err) + goto err_vm; + + for_i915_gem_ww(&ww, err, true) { + err = i915_vm_lock_objects(&vm->vm, &ww); + if (err) + continue; + err = i915_vm_map_pt_stash(&vm->vm, &stash); + if (err) + continue; + + vm->vm.allocate_va_range(&vm->vm, &stash, base, sz); + } + i915_vm_free_pt_stash(&vm->vm, &stash); + if (err) + goto err_vm; + + /* Now allow the GPU to rewrite the PTE via its own ppGTT */ + d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]); + vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d); + } + + return &vm->vm; + +err_vm: + i915_vm_put(&vm->vm); + return ERR_PTR(err); +} + +static struct intel_engine_cs *first_copy_engine(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + int i; + + for (i = 0; i < ARRAY_SIZE(gt->engine_class[COPY_ENGINE_CLASS]); i++) { + engine = gt->engine_class[COPY_ENGINE_CLASS][i]; + if (engine_supports_migration(engine)) + return engine; + } + + return NULL; +} + +static struct intel_context *pinned_context(struct intel_gt *gt) +{ + static struct lock_class_key key; + struct intel_engine_cs *engine; + struct i915_address_space *vm; + struct intel_context *ce; + + engine = first_copy_engine(gt); + if (!engine) + return ERR_PTR(-ENODEV); + + vm = migrate_vm(gt); + if (IS_ERR(vm)) + return ERR_CAST(vm); + + ce = intel_engine_create_pinned_context(engine, vm, SZ_512K, + I915_GEM_HWS_MIGRATE, + &key, "migrate"); + i915_vm_put(vm); + return ce; +} + +int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt) +{ + struct intel_context *ce; + + memset(m, 0, sizeof(*m)); + + ce = pinned_context(gt); + if (IS_ERR(ce)) + return PTR_ERR(ce); + + m->context = ce; + return 0; +} + +static int random_index(unsigned int max) +{ + return upper_32_bits(mul_u32_u32(get_random_u32(), max)); +} + +static struct intel_context *__migrate_engines(struct intel_gt *gt) +{ + struct intel_engine_cs *engines[MAX_ENGINE_INSTANCE]; + struct intel_engine_cs *engine; + unsigned int count, i; + + count = 0; + for (i = 0; i < ARRAY_SIZE(gt->engine_class[COPY_ENGINE_CLASS]); i++) { + engine = gt->engine_class[COPY_ENGINE_CLASS][i]; + if (engine_supports_migration(engine)) + engines[count++] = engine; + } + + return intel_context_create(engines[random_index(count)]); +} + +struct intel_context *intel_migrate_create_context(struct intel_migrate *m) +{ + struct intel_context *ce; + + /* + * We randomly distribute contexts across the engines upon constrction, + * as they all share the same pinned vm, and so in order to allow + * multiple blits to run in parallel, we must construct each blit + * to use a different range of the vm for its GTT. This has to be + * known at construction, so we can not use the late greedy load + * balancing of the virtual-engine. + */ + ce = __migrate_engines(m->context->engine->gt); + if (IS_ERR(ce)) + return ce; + + ce->ring = NULL; + ce->ring_size = SZ_256K; + + i915_vm_put(ce->vm); + ce->vm = i915_vm_get(m->context->vm); + + return ce; +} + +static inline struct sgt_dma sg_sgt(struct scatterlist *sg) +{ + dma_addr_t addr = sg_dma_address(sg); + + return (struct sgt_dma){ sg, addr, addr + sg_dma_len(sg) }; +} + +static int emit_no_arbitration(struct i915_request *rq) +{ + u32 *cs; + + cs = intel_ring_begin(rq, 2); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + /* Explicitly disable preemption for this request. */ + *cs++ = MI_ARB_ON_OFF; + *cs++ = MI_NOOP; + intel_ring_advance(rq, cs); + + return 0; +} + +static int emit_pte(struct i915_request *rq, + struct sgt_dma *it, + enum i915_cache_level cache_level, + bool is_lmem, + u64 offset, + int length) +{ + const u64 encode = rq->context->vm->pte_encode(0, cache_level, + is_lmem ? PTE_LM : 0); + struct intel_ring *ring = rq->ring; + int total = 0; + u32 *hdr, *cs; + int pkt; + + GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8); + + /* Compute the page directory offset for the target address range */ + offset += (u64)rq->engine->instance << 32; + offset >>= 12; + offset *= sizeof(u64); + offset += 2 * CHUNK_SZ; + + cs = intel_ring_begin(rq, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + /* Pack as many PTE updates as possible into a single MI command */ + pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5); + pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); + + hdr = cs; + *cs++ = MI_STORE_DATA_IMM | REG_BIT(21); /* as qword elements */ + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + + do { + if (cs - hdr >= pkt) { + *hdr += cs - hdr - 2; + *cs++ = MI_NOOP; + + ring->emit = (void *)cs - ring->vaddr; + intel_ring_advance(rq, cs); + intel_ring_update_space(ring); + + cs = intel_ring_begin(rq, 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5); + pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); + + hdr = cs; + *cs++ = MI_STORE_DATA_IMM | REG_BIT(21); + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + } + + *cs++ = lower_32_bits(encode | it->dma); + *cs++ = upper_32_bits(encode | it->dma); + + offset += 8; + total += I915_GTT_PAGE_SIZE; + + it->dma += I915_GTT_PAGE_SIZE; + if (it->dma >= it->max) { + it->sg = __sg_next(it->sg); + if (!it->sg || sg_dma_len(it->sg) == 0) + break; + + it->dma = sg_dma_address(it->sg); + it->max = it->dma + sg_dma_len(it->sg); + } + } while (total < length); + + *hdr += cs - hdr - 2; + *cs++ = MI_NOOP; + + ring->emit = (void *)cs - ring->vaddr; + intel_ring_advance(rq, cs); + intel_ring_update_space(ring); + + return total; +} + +static bool wa_1209644611_applies(int ver, u32 size) +{ + u32 height = size >> PAGE_SHIFT; + + if (ver != 11) + return false; + + return height % 4 == 3 && height <= 8; +} + +static int emit_copy(struct i915_request *rq, int size) +{ + const int ver = GRAPHICS_VER(rq->engine->i915); + u32 instance = rq->engine->instance; + u32 *cs; + + cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + if (ver >= 9 && !wa_1209644611_applies(ver, size)) { + *cs++ = GEN9_XY_FAST_COPY_BLT_CMD | (10 - 2); + *cs++ = BLT_DEPTH_32 | PAGE_SIZE; + *cs++ = 0; + *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = instance; + *cs++ = 0; + *cs++ = PAGE_SIZE; + *cs++ = 0; /* src offset */ + *cs++ = instance; + } else if (ver >= 8) { + *cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2); + *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; + *cs++ = 0; + *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = instance; + *cs++ = 0; + *cs++ = PAGE_SIZE; + *cs++ = 0; /* src offset */ + *cs++ = instance; + } else { + GEM_BUG_ON(instance); + *cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); + *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; + *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE; + *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = PAGE_SIZE; + *cs++ = 0; /* src offset */ + } + + intel_ring_advance(rq, cs); + return 0; +} + +int +intel_context_migrate_copy(struct intel_context *ce, + struct dma_fence *await, + struct scatterlist *src, + enum i915_cache_level src_cache_level, + bool src_is_lmem, + struct scatterlist *dst, + enum i915_cache_level dst_cache_level, + bool dst_is_lmem, + struct i915_request **out) +{ + struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst); + struct i915_request *rq; + int err; + + GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); + *out = NULL; + + GEM_BUG_ON(ce->ring->size < SZ_64K); + + do { + int len; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out_ce; + } + + if (await) { + err = i915_request_await_dma_fence(rq, await); + if (err) + goto out_rq; + + if (rq->engine->emit_init_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (err) + goto out_rq; + } + + await = NULL; + } + + /* The PTE updates + copy must not be interrupted. */ + err = emit_no_arbitration(rq); + if (err) + goto out_rq; + + len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0, + CHUNK_SZ); + if (len <= 0) { + err = len; + goto out_rq; + } + + err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem, + CHUNK_SZ, len); + if (err < 0) + goto out_rq; + if (err < len) { + err = -EINVAL; + goto out_rq; + } + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (err) + goto out_rq; + + err = emit_copy(rq, len); + + /* Arbitration is re-enabled between requests. */ +out_rq: + if (*out) + i915_request_put(*out); + *out = i915_request_get(rq); + i915_request_add(rq); + if (err || !it_src.sg || !sg_dma_len(it_src.sg)) + break; + + cond_resched(); + } while (1); + +out_ce: + return err; +} + +static int emit_clear(struct i915_request *rq, int size, u32 value) +{ + const int ver = GRAPHICS_VER(rq->engine->i915); + u32 instance = rq->engine->instance; + u32 *cs; + + GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX); + + cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + if (ver >= 8) { + *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2); + *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; + *cs++ = 0; + *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cs++ = 0; /* offset */ + *cs++ = instance; + *cs++ = value; + *cs++ = MI_NOOP; + } else { + GEM_BUG_ON(instance); + *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); + *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; + *cs++ = 0; + *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; + *cs++ = 0; + *cs++ = value; + } + + intel_ring_advance(rq, cs); + return 0; +} + +int +intel_context_migrate_clear(struct intel_context *ce, + struct dma_fence *await, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool is_lmem, + u32 value, + struct i915_request **out) +{ + struct sgt_dma it = sg_sgt(sg); + struct i915_request *rq; + int err; + + GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); + *out = NULL; + + GEM_BUG_ON(ce->ring->size < SZ_64K); + + do { + int len; + + rq = i915_request_create(ce); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out_ce; + } + + if (await) { + err = i915_request_await_dma_fence(rq, await); + if (err) + goto out_rq; + + if (rq->engine->emit_init_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (err) + goto out_rq; + } + + await = NULL; + } + + /* The PTE updates + clear must not be interrupted. */ + err = emit_no_arbitration(rq); + if (err) + goto out_rq; + + len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ); + if (len <= 0) { + err = len; + goto out_rq; + } + + err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); + if (err) + goto out_rq; + + err = emit_clear(rq, len, value); + + /* Arbitration is re-enabled between requests. */ +out_rq: + if (*out) + i915_request_put(*out); + *out = i915_request_get(rq); + i915_request_add(rq); + if (err || !it.sg || !sg_dma_len(it.sg)) + break; + + cond_resched(); + } while (1); + +out_ce: + return err; +} + +int intel_migrate_copy(struct intel_migrate *m, + struct i915_gem_ww_ctx *ww, + struct dma_fence *await, + struct scatterlist *src, + enum i915_cache_level src_cache_level, + bool src_is_lmem, + struct scatterlist *dst, + enum i915_cache_level dst_cache_level, + bool dst_is_lmem, + struct i915_request **out) +{ + struct intel_context *ce; + int err; + + *out = NULL; + if (!m->context) + return -ENODEV; + + ce = intel_migrate_create_context(m); + if (IS_ERR(ce)) + ce = intel_context_get(m->context); + GEM_BUG_ON(IS_ERR(ce)); + + err = intel_context_pin_ww(ce, ww); + if (err) + goto out; + + err = intel_context_migrate_copy(ce, await, + src, src_cache_level, src_is_lmem, + dst, dst_cache_level, dst_is_lmem, + out); + + intel_context_unpin(ce); +out: + intel_context_put(ce); + return err; +} + +int +intel_migrate_clear(struct intel_migrate *m, + struct i915_gem_ww_ctx *ww, + struct dma_fence *await, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool is_lmem, + u32 value, + struct i915_request **out) +{ + struct intel_context *ce; + int err; + + *out = NULL; + if (!m->context) + return -ENODEV; + + ce = intel_migrate_create_context(m); + if (IS_ERR(ce)) + ce = intel_context_get(m->context); + GEM_BUG_ON(IS_ERR(ce)); + + err = intel_context_pin_ww(ce, ww); + if (err) + goto out; + + err = intel_context_migrate_clear(ce, await, sg, cache_level, + is_lmem, value, out); + + intel_context_unpin(ce); +out: + intel_context_put(ce); + return err; +} + +void intel_migrate_fini(struct intel_migrate *m) +{ + struct intel_context *ce; + + ce = fetch_and_zero(&m->context); + if (!ce) + return; + + intel_engine_destroy_pinned_context(ce); +} + +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +#include "selftest_migrate.c" +#endif diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h new file mode 100644 index 000000000000..4e18e755a00b --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2020 Intel Corporation + */ + +#ifndef __INTEL_MIGRATE__ +#define __INTEL_MIGRATE__ + +#include <linux/types.h> + +#include "intel_migrate_types.h" + +struct dma_fence; +struct i915_request; +struct i915_gem_ww_ctx; +struct intel_gt; +struct scatterlist; +enum i915_cache_level; + +int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt); + +struct intel_context *intel_migrate_create_context(struct intel_migrate *m); + +int intel_migrate_copy(struct intel_migrate *m, + struct i915_gem_ww_ctx *ww, + struct dma_fence *await, + struct scatterlist *src, + enum i915_cache_level src_cache_level, + bool src_is_lmem, + struct scatterlist *dst, + enum i915_cache_level dst_cache_level, + bool dst_is_lmem, + struct i915_request **out); + +int intel_context_migrate_copy(struct intel_context *ce, + struct dma_fence *await, + struct scatterlist *src, + enum i915_cache_level src_cache_level, + bool src_is_lmem, + struct scatterlist *dst, + enum i915_cache_level dst_cache_level, + bool dst_is_lmem, + struct i915_request **out); + +int +intel_migrate_clear(struct intel_migrate *m, + struct i915_gem_ww_ctx *ww, + struct dma_fence *await, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool is_lmem, + u32 value, + struct i915_request **out); +int +intel_context_migrate_clear(struct intel_context *ce, + struct dma_fence *await, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool is_lmem, + u32 value, + struct i915_request **out); + +void intel_migrate_fini(struct intel_migrate *m); + +#endif /* __INTEL_MIGRATE__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_migrate_types.h b/drivers/gpu/drm/i915/gt/intel_migrate_types.h new file mode 100644 index 000000000000..d98230597f42 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_migrate_types.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2020 Intel Corporation + */ + +#ifndef __INTEL_MIGRATE_TYPES__ +#define __INTEL_MIGRATE_TYPES__ + +struct intel_context; + +struct intel_migrate { + struct intel_context *context; +}; + +#endif /* __INTEL_MIGRATE_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c index 17848807f111..582c4423b95d 100644 --- a/drivers/gpu/drm/i915/gt/intel_mocs.c +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c @@ -352,7 +352,7 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915, table->size = ARRAY_SIZE(icl_mocs_table); table->table = icl_mocs_table; table->n_entries = GEN9_NUM_MOCS_ENTRIES; - } else if (IS_GEN9_BC(i915) || IS_CANNONLAKE(i915)) { + } else if (IS_GEN9_BC(i915)) { table->size = ARRAY_SIZE(skl_mocs_table); table->n_entries = GEN9_NUM_MOCS_ENTRIES; table->table = skl_mocs_table; diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c index 259d7eb4e165..799d382eea79 100644 --- a/drivers/gpu/drm/i915/gt/intel_rc6.c +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c @@ -62,20 +62,25 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6) u32 pg_enable; int i; - /* 2b: Program RC6 thresholds.*/ - set(uncore, GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85); - set(uncore, GEN10_MEDIA_WAKE_RATE_LIMIT, 150); + /* + * With GuCRC, these parameters are set by GuC + */ + if (!intel_uc_uses_guc_rc(>->uc)) { + /* 2b: Program RC6 thresholds.*/ + set(uncore, GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85); + set(uncore, GEN10_MEDIA_WAKE_RATE_LIMIT, 150); - set(uncore, GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */ - set(uncore, GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */ - for_each_engine(engine, rc6_to_gt(rc6), id) - set(uncore, RING_MAX_IDLE(engine->mmio_base), 10); + set(uncore, GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */ + set(uncore, GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */ + for_each_engine(engine, rc6_to_gt(rc6), id) + set(uncore, RING_MAX_IDLE(engine->mmio_base), 10); - set(uncore, GUC_MAX_IDLE_COUNT, 0xA); + set(uncore, GUC_MAX_IDLE_COUNT, 0xA); - set(uncore, GEN6_RC_SLEEP, 0); + set(uncore, GEN6_RC_SLEEP, 0); - set(uncore, GEN6_RC6_THRESHOLD, 50000); /* 50/125ms per EI */ + set(uncore, GEN6_RC6_THRESHOLD, 50000); /* 50/125ms per EI */ + } /* * 2c: Program Coarse Power Gating Policies. @@ -98,11 +103,19 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6) set(uncore, GEN9_MEDIA_PG_IDLE_HYSTERESIS, 60); set(uncore, GEN9_RENDER_PG_IDLE_HYSTERESIS, 60); - /* 3a: Enable RC6 */ - rc6->ctl_enable = - GEN6_RC_CTL_HW_ENABLE | - GEN6_RC_CTL_RC6_ENABLE | - GEN6_RC_CTL_EI_MODE(1); + /* 3a: Enable RC6 + * + * With GuCRC, we do not enable bit 31 of RC_CTL, + * thus allowing GuC to control RC6 entry/exit fully instead. + * We will not set the HW ENABLE and EI bits + */ + if (!intel_guc_rc_enable(>->uc.guc)) + rc6->ctl_enable = GEN6_RC_CTL_RC6_ENABLE; + else + rc6->ctl_enable = + GEN6_RC_CTL_HW_ENABLE | + GEN6_RC_CTL_RC6_ENABLE | + GEN6_RC_CTL_EI_MODE(1); pg_enable = GEN9_RENDER_PG_ENABLE | @@ -126,7 +139,7 @@ static void gen9_rc6_enable(struct intel_rc6 *rc6) enum intel_engine_id id; /* 2b: Program RC6 thresholds.*/ - if (GRAPHICS_VER(rc6_to_i915(rc6)) >= 10) { + if (GRAPHICS_VER(rc6_to_i915(rc6)) >= 11) { set(uncore, GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85); set(uncore, GEN10_MEDIA_WAKE_RATE_LIMIT, 150); } else if (IS_SKYLAKE(rc6_to_i915(rc6))) { @@ -513,6 +526,10 @@ static void __intel_rc6_disable(struct intel_rc6 *rc6) { struct drm_i915_private *i915 = rc6_to_i915(rc6); struct intel_uncore *uncore = rc6_to_uncore(rc6); + struct intel_gt *gt = rc6_to_gt(rc6); + + /* Take control of RC6 back from GuC */ + intel_guc_rc_disable(>->uc.guc); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); if (GRAPHICS_VER(i915) >= 9) diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index f7366b054f8e..a74b72f50cc9 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -9,7 +9,8 @@ #include "intel_region_ttm.h" #include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" -#include "intel_region_lmem.h" +#include "gem/i915_gem_ttm.h" +#include "gt/intel_gt.h" static int init_fake_lmem_bar(struct intel_memory_region *mem) { @@ -107,7 +108,7 @@ out_no_io: static const struct intel_memory_region_ops intel_region_lmem_ops = { .init = region_lmem_init, .release = region_lmem_release, - .init_object = __i915_gem_lmem_object_init, + .init_object = __i915_gem_ttm_object_init, }; struct intel_memory_region * @@ -157,7 +158,7 @@ intel_gt_setup_fake_lmem(struct intel_gt *gt) static bool get_legacy_lowmem_region(struct intel_uncore *uncore, u64 *start, u32 *size) { - if (!IS_DG1_REVID(uncore->i915, DG1_REVID_A0, DG1_REVID_B0)) + if (!IS_DG1_GT_STEP(uncore->i915, STEP_A0, STEP_C0)) return false; *start = 0; diff --git a/drivers/gpu/drm/i915/gt/intel_renderstate.h b/drivers/gpu/drm/i915/gt/intel_renderstate.h index 48f009203917..4da4c5234ef0 100644 --- a/drivers/gpu/drm/i915/gt/intel_renderstate.h +++ b/drivers/gpu/drm/i915/gt/intel_renderstate.h @@ -8,6 +8,7 @@ #include <linux/types.h> #include "i915_gem.h" +#include "i915_gem_ww.h" struct i915_request; struct intel_context; diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..91200c43951f 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -22,7 +22,6 @@ #include "intel_reset.h" #include "uc/intel_guc.h" -#include "uc/intel_guc_submission.h" #define RESET_MAX_RETRIES 3 @@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr) intel_uncore_rmw_fw(uncore, reg, clr, 0); } -static void skip_context(struct i915_request *rq) -{ - struct intel_context *hung_ctx = rq->context; - - list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) { - if (!i915_request_is_active(rq)) - return; - - if (rq->context == hung_ctx) { - i915_request_set_error_once(rq, -EIO); - __i915_request_skip(rq); - } - } -} - static void client_mark_guilty(struct i915_gem_context *ctx, bool banned) { struct drm_i915_file_private *file_priv = ctx->file_priv; @@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq) bool banned; int i; - if (intel_context_is_closed(rq->context)) { - intel_context_set_banned(rq->context); + if (intel_context_is_closed(rq->context)) return true; - } rcu_read_lock(); ctx = rcu_dereference(rq->context->gem_context); @@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq) banned = !i915_gem_context_is_recoverable(ctx); if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES)) banned = true; - if (banned) { + if (banned) drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n", ctx->name, atomic_read(&ctx->guilty_count)); - intel_context_set_banned(rq->context); - } client_mark_guilty(ctx, banned); @@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq) void __i915_request_reset(struct i915_request *rq, bool guilty) { + bool banned = false; + RQ_TRACE(rq, "guilty? %s\n", yesno(guilty)); GEM_BUG_ON(__i915_request_is_complete(rq)); @@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) if (guilty) { i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq); - if (mark_guilty(rq)) - skip_context(rq); + banned = mark_guilty(rq); } else { i915_request_set_error_once(rq, -EAGAIN); mark_innocent(rq); } rcu_read_unlock(); + + if (banned) + intel_context_ban(rq->context, rq); } static bool i915_in_reset(struct pci_dev *pdev) @@ -515,8 +499,14 @@ static int gen11_reset_engines(struct intel_gt *gt, [VCS1] = GEN11_GRDOM_MEDIA2, [VCS2] = GEN11_GRDOM_MEDIA3, [VCS3] = GEN11_GRDOM_MEDIA4, + [VCS4] = GEN11_GRDOM_MEDIA5, + [VCS5] = GEN11_GRDOM_MEDIA6, + [VCS6] = GEN11_GRDOM_MEDIA7, + [VCS7] = GEN11_GRDOM_MEDIA8, [VECS0] = GEN11_GRDOM_VECS, [VECS1] = GEN11_GRDOM_VECS2, + [VECS2] = GEN11_GRDOM_VECS3, + [VECS3] = GEN11_GRDOM_VECS4, }; struct intel_engine_cs *engine; intel_engine_mask_t tmp; @@ -826,6 +816,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable(); + intel_uc_reset(>->uc, true); + intel_ggtt_restore_fences(gt->ggtt); return err; @@ -850,6 +842,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); } + + intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request) @@ -903,6 +897,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine); + intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake); @@ -1191,6 +1186,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)); + if (intel_engine_uses_guc(engine)) + return -ENODEV; + if (!intel_engine_pm_get_if_awake(engine)) return 0; @@ -1201,13 +1199,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]); - if (intel_engine_uses_guc(engine)) - ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine); - else - ret = intel_gt_reset_engine(engine); + ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */ - ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret); + ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret); goto out; } @@ -1341,7 +1336,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */ - if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { + if (!intel_uc_uses_guc_submission(>->uc) && + intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); diff --git a/drivers/gpu/drm/i915/gt/intel_ring.h b/drivers/gpu/drm/i915/gt/intel_ring.h index dbf5f14a136f..1b32dadfb8c3 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.h +++ b/drivers/gpu/drm/i915/gt/intel_ring.h @@ -49,6 +49,7 @@ static inline void intel_ring_advance(struct i915_request *rq, u32 *cs) * intel_ring_begin()). */ GEM_BUG_ON((rq->ring->vaddr + rq->ring->emit) != cs); + GEM_BUG_ON(!IS_ALIGNED(rq->ring->emit, 8)); /* RING_TAIL qword align */ } static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos) diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 37d74d4ed59b..2958e2fae380 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -16,6 +16,7 @@ #include "intel_reset.h" #include "intel_ring.h" #include "shmem_utils.h" +#include "intel_engine_heartbeat.h" /* Rough estimate of the typical request size, performing a flush, * set-context and then emitting the batch. @@ -342,9 +343,9 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled) u32 head; rq = NULL; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&engine->sched_engine->lock, flags); rcu_read_lock(); - list_for_each_entry(pos, &engine->active.requests, sched.link) { + list_for_each_entry(pos, &engine->sched_engine->requests, sched.link) { if (!__i915_request_is_complete(pos)) { rq = pos; break; @@ -399,7 +400,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled) } engine->legacy.ring->head = intel_ring_wrap(engine->legacy.ring, head); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); } static void reset_finish(struct intel_engine_cs *engine) @@ -411,16 +412,16 @@ static void reset_cancel(struct intel_engine_cs *engine) struct i915_request *request; unsigned long flags; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&engine->sched_engine->lock, flags); /* Mark all submitted requests as skipped. */ - list_for_each_entry(request, &engine->active.requests, sched.link) + list_for_each_entry(request, &engine->sched_engine->requests, sched.link) i915_request_put(i915_request_mark_eio(request)); intel_engine_signal_breadcrumbs(engine); /* Remaining _unready_ requests will be nop'ed when submitted */ - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); } static void i9xx_submit_request(struct i915_request *request) @@ -586,9 +587,44 @@ static void ring_context_reset(struct intel_context *ce) clear_bit(CONTEXT_VALID_BIT, &ce->flags); } +static void ring_context_ban(struct intel_context *ce, + struct i915_request *rq) +{ + struct intel_engine_cs *engine; + + if (!rq || !i915_request_is_active(rq)) + return; + + engine = rq->engine; + lockdep_assert_held(&engine->sched_engine->lock); + list_for_each_entry_continue(rq, &engine->sched_engine->requests, + sched.link) + if (rq->context == ce) { + i915_request_set_error_once(rq, -EIO); + __i915_request_skip(rq); + } +} + +static void ring_context_cancel_request(struct intel_context *ce, + struct i915_request *rq) +{ + struct intel_engine_cs *engine = NULL; + + i915_request_active_engine(rq, &engine); + + if (engine && intel_engine_pulse(engine)) + intel_gt_handle_error(engine->gt, engine->mask, 0, + "request cancellation by %s", + current->comm); +} + static const struct intel_context_ops ring_context_ops = { .alloc = ring_context_alloc, + .cancel_request = ring_context_cancel_request, + + .ban = ring_context_ban, + .pre_pin = ring_context_pre_pin, .pin = ring_context_pin, .unpin = ring_context_unpin, @@ -1047,6 +1083,25 @@ static void setup_irq(struct intel_engine_cs *engine) } } +static void add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void remove_from_engine(struct i915_request *rq) +{ + spin_lock_irq(&rq->engine->sched_engine->lock); + list_del_init(&rq->sched.link); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + spin_unlock_irq(&rq->engine->sched_engine->lock); + + i915_request_notify_execute_cb_imm(rq); +} + static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; @@ -1064,6 +1119,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish; + engine->add_active_request = add_to_engine; + engine->remove_active_request = remove_from_engine; + engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 06e9a8ed4e03..d812b27835f8 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -37,6 +37,20 @@ static struct intel_uncore *rps_to_uncore(struct intel_rps *rps) return rps_to_gt(rps)->uncore; } +static struct intel_guc_slpc *rps_to_slpc(struct intel_rps *rps) +{ + struct intel_gt *gt = rps_to_gt(rps); + + return >->uc.guc.slpc; +} + +static bool rps_uses_slpc(struct intel_rps *rps) +{ + struct intel_gt *gt = rps_to_gt(rps); + + return intel_uc_uses_guc_slpc(>->uc); +} + static u32 rps_pm_sanitize_mask(struct intel_rps *rps, u32 mask) { return mask & ~rps->pm_intrmsk_mbz; @@ -167,6 +181,8 @@ static void rps_enable_interrupts(struct intel_rps *rps) { struct intel_gt *gt = rps_to_gt(rps); + GEM_BUG_ON(rps_uses_slpc(rps)); + GT_TRACE(gt, "interrupts:on rps->pm_events: %x, rps_pm_mask:%x\n", rps->pm_events, rps_pm_mask(rps, rps->last_freq)); @@ -771,6 +787,8 @@ static int gen6_rps_set(struct intel_rps *rps, u8 val) struct drm_i915_private *i915 = rps_to_i915(rps); u32 swreq; + GEM_BUG_ON(rps_uses_slpc(rps)); + if (GRAPHICS_VER(i915) >= 9) swreq = GEN9_FREQUENCY(val); else if (IS_HASWELL(i915) || IS_BROADWELL(i915)) @@ -861,6 +879,9 @@ void intel_rps_park(struct intel_rps *rps) { int adj; + if (!intel_rps_is_enabled(rps)) + return; + GEM_BUG_ON(atomic_read(&rps->num_waiters)); if (!intel_rps_clear_active(rps)) @@ -999,7 +1020,7 @@ static void gen6_rps_init(struct intel_rps *rps) rps->efficient_freq = rps->rp1_freq; if (IS_HASWELL(i915) || IS_BROADWELL(i915) || - IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 10) { + IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 11) { u32 ddcc_status = 0; if (sandybridge_pcode_read(i915, @@ -1012,7 +1033,7 @@ static void gen6_rps_init(struct intel_rps *rps) rps->max_freq); } - if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 10) { + if (IS_GEN9_BC(i915) || GRAPHICS_VER(i915) >= 11) { /* Store the frequency values in 16.66 MHZ units, which is * the natural hardware unit for SKL */ @@ -1356,6 +1377,9 @@ void intel_rps_enable(struct intel_rps *rps) if (!HAS_RPS(i915)) return; + if (rps_uses_slpc(rps)) + return; + intel_gt_check_clock_frequency(rps_to_gt(rps)); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); @@ -1829,6 +1853,9 @@ void intel_rps_init(struct intel_rps *rps) { struct drm_i915_private *i915 = rps_to_i915(rps); + if (rps_uses_slpc(rps)) + return; + if (IS_CHERRYVIEW(i915)) chv_rps_init(rps); else if (IS_VALLEYVIEW(i915)) @@ -1877,10 +1904,17 @@ void intel_rps_init(struct intel_rps *rps) if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11) rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC; + + /* GuC needs ARAT expired interrupt unmasked */ + if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc)) + rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK; } void intel_rps_sanitize(struct intel_rps *rps) { + if (rps_uses_slpc(rps)) + return; + if (GRAPHICS_VER(rps_to_i915(rps)) >= 6) rps_disable_interrupts(rps); } @@ -1936,6 +1970,176 @@ u32 intel_rps_read_actual_frequency(struct intel_rps *rps) return freq; } +u32 intel_rps_read_punit_req(struct intel_rps *rps) +{ + struct intel_uncore *uncore = rps_to_uncore(rps); + + return intel_uncore_read(uncore, GEN6_RPNSWREQ); +} + +static u32 intel_rps_get_req(u32 pureq) +{ + u32 req = pureq >> GEN9_SW_REQ_UNSLICE_RATIO_SHIFT; + + return req; +} + +u32 intel_rps_read_punit_req_frequency(struct intel_rps *rps) +{ + u32 freq = intel_rps_get_req(intel_rps_read_punit_req(rps)); + + return intel_gpu_freq(rps, freq); +} + +u32 intel_rps_get_requested_frequency(struct intel_rps *rps) +{ + if (rps_uses_slpc(rps)) + return intel_rps_read_punit_req_frequency(rps); + else + return intel_gpu_freq(rps, rps->cur_freq); +} + +u32 intel_rps_get_max_frequency(struct intel_rps *rps) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return slpc->max_freq_softlimit; + else + return intel_gpu_freq(rps, rps->max_freq_softlimit); +} + +u32 intel_rps_get_rp0_frequency(struct intel_rps *rps) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return slpc->rp0_freq; + else + return intel_gpu_freq(rps, rps->rp0_freq); +} + +u32 intel_rps_get_rp1_frequency(struct intel_rps *rps) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return slpc->rp1_freq; + else + return intel_gpu_freq(rps, rps->rp1_freq); +} + +u32 intel_rps_get_rpn_frequency(struct intel_rps *rps) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return slpc->min_freq; + else + return intel_gpu_freq(rps, rps->min_freq); +} + +static int set_max_freq(struct intel_rps *rps, u32 val) +{ + struct drm_i915_private *i915 = rps_to_i915(rps); + int ret = 0; + + mutex_lock(&rps->lock); + + val = intel_freq_opcode(rps, val); + if (val < rps->min_freq || + val > rps->max_freq || + val < rps->min_freq_softlimit) { + ret = -EINVAL; + goto unlock; + } + + if (val > rps->rp0_freq) + drm_dbg(&i915->drm, "User requested overclocking to %d\n", + intel_gpu_freq(rps, val)); + + rps->max_freq_softlimit = val; + + val = clamp_t(int, rps->cur_freq, + rps->min_freq_softlimit, + rps->max_freq_softlimit); + + /* + * We still need *_set_rps to process the new max_delay and + * update the interrupt limits and PMINTRMSK even though + * frequency request may be unchanged. + */ + intel_rps_set(rps, val); + +unlock: + mutex_unlock(&rps->lock); + + return ret; +} + +int intel_rps_set_max_frequency(struct intel_rps *rps, u32 val) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return intel_guc_slpc_set_max_freq(slpc, val); + else + return set_max_freq(rps, val); +} + +u32 intel_rps_get_min_frequency(struct intel_rps *rps) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return slpc->min_freq_softlimit; + else + return intel_gpu_freq(rps, rps->min_freq_softlimit); +} + +static int set_min_freq(struct intel_rps *rps, u32 val) +{ + int ret = 0; + + mutex_lock(&rps->lock); + + val = intel_freq_opcode(rps, val); + if (val < rps->min_freq || + val > rps->max_freq || + val > rps->max_freq_softlimit) { + ret = -EINVAL; + goto unlock; + } + + rps->min_freq_softlimit = val; + + val = clamp_t(int, rps->cur_freq, + rps->min_freq_softlimit, + rps->max_freq_softlimit); + + /* + * We still need *_set_rps to process the new min_delay and + * update the interrupt limits and PMINTRMSK even though + * frequency request may be unchanged. + */ + intel_rps_set(rps, val); + +unlock: + mutex_unlock(&rps->lock); + + return ret; +} + +int intel_rps_set_min_frequency(struct intel_rps *rps, u32 val) +{ + struct intel_guc_slpc *slpc = rps_to_slpc(rps); + + if (rps_uses_slpc(rps)) + return intel_guc_slpc_set_min_freq(slpc, val); + else + return set_min_freq(rps, val); +} + /* External interface for intel_ips.ko */ static struct drm_i915_private __rcu *ips_mchdev; @@ -2129,4 +2333,5 @@ EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable); #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_rps.c" +#include "selftest_slpc.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_rps.h b/drivers/gpu/drm/i915/gt/intel_rps.h index 1d2cfc98b510..4213bcce1667 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.h +++ b/drivers/gpu/drm/i915/gt/intel_rps.h @@ -31,6 +31,16 @@ int intel_gpu_freq(struct intel_rps *rps, int val); int intel_freq_opcode(struct intel_rps *rps, int val); u32 intel_rps_get_cagf(struct intel_rps *rps, u32 rpstat1); u32 intel_rps_read_actual_frequency(struct intel_rps *rps); +u32 intel_rps_get_requested_frequency(struct intel_rps *rps); +u32 intel_rps_get_min_frequency(struct intel_rps *rps); +int intel_rps_set_min_frequency(struct intel_rps *rps, u32 val); +u32 intel_rps_get_max_frequency(struct intel_rps *rps); +int intel_rps_set_max_frequency(struct intel_rps *rps, u32 val); +u32 intel_rps_get_rp0_frequency(struct intel_rps *rps); +u32 intel_rps_get_rp1_frequency(struct intel_rps *rps); +u32 intel_rps_get_rpn_frequency(struct intel_rps *rps); +u32 intel_rps_read_punit_req(struct intel_rps *rps); +u32 intel_rps_read_punit_req_frequency(struct intel_rps *rps); void gen5_rps_irq_handler(struct intel_rps *rps); void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir); diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index 367fd44b81c8..bbd272943c3f 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -139,17 +139,36 @@ static void gen12_sseu_info_init(struct intel_gt *gt) * Gen12 has Dual-Subslices, which behave similarly to 2 gen11 SS. * Instead of splitting these, provide userspace with an array * of DSS to more closely represent the hardware resource. + * + * In addition, the concept of slice has been removed in Xe_HP. + * To be compatible with prior generations, assume a single slice + * across the entire device. Then calculate out the DSS for each + * workload type within that software slice. */ - intel_sseu_set_info(sseu, 1, 6, 16); + if (IS_DG2(gt->i915) || IS_XEHPSDV(gt->i915)) + intel_sseu_set_info(sseu, 1, 32, 16); + else + intel_sseu_set_info(sseu, 1, 6, 16); - s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & - GEN11_GT_S_ENA_MASK; + /* + * As mentioned above, Xe_HP does not have the concept of a slice. + * Enable one for software backwards compatibility. + */ + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) + s_en = 0x1; + else + s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & + GEN11_GT_S_ENA_MASK; dss_en = intel_uncore_read(uncore, GEN12_GT_DSS_ENABLE); /* one bit per pair of EUs */ - eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) & - GEN11_EU_DIS_MASK); + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) + eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK; + else + eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) & + GEN11_EU_DIS_MASK); + for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++) if (eu_en_fuse & BIT(eu)) eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1); @@ -188,83 +207,6 @@ static void gen11_sseu_info_init(struct intel_gt *gt) sseu->has_eu_pg = 1; } -static void gen10_sseu_info_init(struct intel_gt *gt) -{ - struct intel_uncore *uncore = gt->uncore; - struct sseu_dev_info *sseu = >->info.sseu; - const u32 fuse2 = intel_uncore_read(uncore, GEN8_FUSE2); - const int eu_mask = 0xff; - u32 subslice_mask, eu_en; - int s, ss; - - intel_sseu_set_info(sseu, 6, 4, 8); - - sseu->slice_mask = (fuse2 & GEN10_F2_S_ENA_MASK) >> - GEN10_F2_S_ENA_SHIFT; - - /* Slice0 */ - eu_en = ~intel_uncore_read(uncore, GEN8_EU_DISABLE0); - for (ss = 0; ss < sseu->max_subslices; ss++) - sseu_set_eus(sseu, 0, ss, (eu_en >> (8 * ss)) & eu_mask); - /* Slice1 */ - sseu_set_eus(sseu, 1, 0, (eu_en >> 24) & eu_mask); - eu_en = ~intel_uncore_read(uncore, GEN8_EU_DISABLE1); - sseu_set_eus(sseu, 1, 1, eu_en & eu_mask); - /* Slice2 */ - sseu_set_eus(sseu, 2, 0, (eu_en >> 8) & eu_mask); - sseu_set_eus(sseu, 2, 1, (eu_en >> 16) & eu_mask); - /* Slice3 */ - sseu_set_eus(sseu, 3, 0, (eu_en >> 24) & eu_mask); - eu_en = ~intel_uncore_read(uncore, GEN8_EU_DISABLE2); - sseu_set_eus(sseu, 3, 1, eu_en & eu_mask); - /* Slice4 */ - sseu_set_eus(sseu, 4, 0, (eu_en >> 8) & eu_mask); - sseu_set_eus(sseu, 4, 1, (eu_en >> 16) & eu_mask); - /* Slice5 */ - sseu_set_eus(sseu, 5, 0, (eu_en >> 24) & eu_mask); - eu_en = ~intel_uncore_read(uncore, GEN10_EU_DISABLE3); - sseu_set_eus(sseu, 5, 1, eu_en & eu_mask); - - subslice_mask = (1 << 4) - 1; - subslice_mask &= ~((fuse2 & GEN10_F2_SS_DIS_MASK) >> - GEN10_F2_SS_DIS_SHIFT); - - for (s = 0; s < sseu->max_slices; s++) { - u32 subslice_mask_with_eus = subslice_mask; - - for (ss = 0; ss < sseu->max_subslices; ss++) { - if (sseu_get_eus(sseu, s, ss) == 0) - subslice_mask_with_eus &= ~BIT(ss); - } - - /* - * Slice0 can have up to 3 subslices, but there are only 2 in - * slice1/2. - */ - intel_sseu_set_subslices(sseu, s, s == 0 ? - subslice_mask_with_eus : - subslice_mask_with_eus & 0x3); - } - - sseu->eu_total = compute_eu_total(sseu); - - /* - * CNL is expected to always have a uniform distribution - * of EU across subslices with the exception that any one - * EU in any one subslice may be fused off for die - * recovery. - */ - sseu->eu_per_subslice = - intel_sseu_subslice_total(sseu) ? - DIV_ROUND_UP(sseu->eu_total, intel_sseu_subslice_total(sseu)) : - 0; - - /* No restrictions on Power Gating */ - sseu->has_slice_pg = 1; - sseu->has_subslice_pg = 1; - sseu->has_eu_pg = 1; -} - static void cherryview_sseu_info_init(struct intel_gt *gt) { struct sseu_dev_info *sseu = >->info.sseu; @@ -592,8 +534,6 @@ void intel_sseu_info_init(struct intel_gt *gt) bdw_sseu_info_init(gt); else if (GRAPHICS_VER(i915) == 9) gen9_sseu_info_init(gt); - else if (GRAPHICS_VER(i915) == 10) - gen10_sseu_info_init(gt); else if (GRAPHICS_VER(i915) == 11) gen11_sseu_info_init(gt); else if (GRAPHICS_VER(i915) >= 12) @@ -759,3 +699,21 @@ void intel_sseu_print_topology(const struct sseu_dev_info *sseu, } } } + +u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice) +{ + u16 slice_mask = 0; + int i; + + WARN_ON(sizeof(dss_mask) * 8 / dss_per_slice > 8 * sizeof(slice_mask)); + + for (i = 0; dss_mask; i++) { + if (dss_mask & GENMASK(dss_per_slice - 1, 0)) + slice_mask |= BIT(i); + + dss_mask >>= dss_per_slice; + } + + return slice_mask; +} + diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.h b/drivers/gpu/drm/i915/gt/intel_sseu.h index 4cd1a8a7298a..22fef98887c0 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.h +++ b/drivers/gpu/drm/i915/gt/intel_sseu.h @@ -15,13 +15,17 @@ struct drm_i915_private; struct intel_gt; struct drm_printer; -#define GEN_MAX_SLICES (6) /* CNL upper bound */ -#define GEN_MAX_SUBSLICES (8) /* ICL upper bound */ +#define GEN_MAX_SLICES (3) /* SKL upper bound */ +#define GEN_MAX_SUBSLICES (32) /* XEHPSDV upper bound */ #define GEN_SSEU_STRIDE(max_entries) DIV_ROUND_UP(max_entries, BITS_PER_BYTE) #define GEN_MAX_SUBSLICE_STRIDE GEN_SSEU_STRIDE(GEN_MAX_SUBSLICES) #define GEN_MAX_EUS (16) /* TGL upper bound */ #define GEN_MAX_EU_STRIDE GEN_SSEU_STRIDE(GEN_MAX_EUS) +#define GEN_DSS_PER_GSLICE 4 +#define GEN_DSS_PER_CSLICE 8 +#define GEN_DSS_PER_MSLICE 8 + struct sseu_dev_info { u8 slice_mask; u8 subslice_mask[GEN_MAX_SLICES * GEN_MAX_SUBSLICE_STRIDE]; @@ -104,4 +108,6 @@ void intel_sseu_dump(const struct sseu_dev_info *sseu, struct drm_printer *p); void intel_sseu_print_topology(const struct sseu_dev_info *sseu, struct drm_printer *p); +u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice); + #endif /* __INTEL_SSEU_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c b/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c index 714fe8495775..1ba8b7da9d37 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c @@ -50,10 +50,10 @@ static void cherryview_sseu_device_status(struct intel_gt *gt, #undef SS_MAX } -static void gen10_sseu_device_status(struct intel_gt *gt, +static void gen11_sseu_device_status(struct intel_gt *gt, struct sseu_dev_info *sseu) { -#define SS_MAX 6 +#define SS_MAX 8 struct intel_uncore *uncore = gt->uncore; const struct intel_gt_info *info = >->info; u32 s_reg[SS_MAX], eu_reg[2 * SS_MAX], eu_mask[2]; @@ -267,8 +267,8 @@ int intel_sseu_status(struct seq_file *m, struct intel_gt *gt) bdw_sseu_device_status(gt, &sseu); else if (GRAPHICS_VER(i915) == 9) gen9_sseu_device_status(gt, &sseu); - else if (GRAPHICS_VER(i915) >= 10) - gen10_sseu_device_status(gt, &sseu); + else if (GRAPHICS_VER(i915) >= 11) + gen11_sseu_device_status(gt, &sseu); } i915_print_sseu_info(m, false, HAS_POOLED_EU(i915), &sseu); diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index b62d1e31a645..aae609d7d85d 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa) } static void wa_add(struct i915_wa_list *wal, i915_reg_t reg, - u32 clear, u32 set, u32 read_mask) + u32 clear, u32 set, u32 read_mask, bool masked_reg) { struct i915_wa wa = { .reg = reg, .clr = clear, .set = set, .read = read_mask, + .masked_reg = masked_reg, }; _wa_add(wal, &wa); @@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg, static void wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set) { - wa_add(wal, reg, clear, set, clear); + wa_add(wal, reg, clear, set, clear, false); } static void @@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr) static void wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val) { - wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val); + wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true); } static void wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val) { - wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val); + wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true); } static void wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg, u32 mask, u32 val) { - wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask); + wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true); } static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine, @@ -514,53 +515,15 @@ static void cfl_ctx_workarounds_init(struct intel_engine_cs *engine, GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); } -static void cnl_ctx_workarounds_init(struct intel_engine_cs *engine, - struct i915_wa_list *wal) -{ - /* WaForceContextSaveRestoreNonCoherent:cnl */ - wa_masked_en(wal, CNL_HDC_CHICKEN0, - HDC_FORCE_CONTEXT_SAVE_RESTORE_NON_COHERENT); - - /* WaDisableReplayBufferBankArbitrationOptimization:cnl */ - wa_masked_en(wal, COMMON_SLICE_CHICKEN2, - GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION); - - /* WaPushConstantDereferenceHoldDisable:cnl */ - wa_masked_en(wal, GEN7_ROW_CHICKEN2, PUSH_CONSTANT_DEREF_DISABLE); - - /* FtrEnableFastAnisoL1BankingFix:cnl */ - wa_masked_en(wal, HALF_SLICE_CHICKEN3, CNL_FAST_ANISO_L1_BANKING_FIX); - - /* WaDisable3DMidCmdPreemption:cnl */ - wa_masked_dis(wal, GEN8_CS_CHICKEN1, GEN9_PREEMPT_3D_OBJECT_LEVEL); - - /* WaDisableGPGPUMidCmdPreemption:cnl */ - wa_masked_field_set(wal, GEN8_CS_CHICKEN1, - GEN9_PREEMPT_GPGPU_LEVEL_MASK, - GEN9_PREEMPT_GPGPU_COMMAND_LEVEL); - - /* WaDisableEarlyEOT:cnl */ - wa_masked_en(wal, GEN8_ROW_CHICKEN, DISABLE_EARLY_EOT); -} - static void icl_ctx_workarounds_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { - struct drm_i915_private *i915 = engine->i915; - - /* WaDisableBankHangMode:icl */ + /* Wa_1406697149 (WaDisableBankHangMode:icl) */ wa_write(wal, GEN8_L3CNTLREG, intel_uncore_read(engine->uncore, GEN8_L3CNTLREG) | GEN8_ERRDETBCTRL); - /* Wa_1604370585:icl (pre-prod) - * Formerly known as WaPushConstantDereferenceHoldDisable - */ - if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_B0)) - wa_masked_en(wal, GEN7_ROW_CHICKEN2, - PUSH_CONSTANT_DEREF_DISABLE); - /* WaForceEnableNonCoherent:icl * This is not the same workaround as in early Gen9 platforms, where * lacking this could cause system hangs, but coherency performance @@ -570,23 +533,11 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine, */ wa_masked_en(wal, ICL_HDC_MODE, HDC_FORCE_NON_COHERENT); - /* Wa_2006611047:icl (pre-prod) - * Formerly known as WaDisableImprovedTdlClkGating - */ - if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_A0)) - wa_masked_en(wal, GEN7_ROW_CHICKEN2, - GEN11_TDL_CLOCK_GATING_FIX_DISABLE); - - /* Wa_2006665173:icl (pre-prod) */ - if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_A0)) - wa_masked_en(wal, GEN11_COMMON_SLICE_CHICKEN3, - GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC); - /* WaEnableFloatBlendOptimization:icl */ - wa_write_clr_set(wal, - GEN10_CACHE_MODE_SS, - 0, /* write-only, so skip validation */ - _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE)); + wa_add(wal, GEN10_CACHE_MODE_SS, 0, + _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE), + 0 /* write-only, so skip validation */, + true); /* WaDisableGPGPUMidThreadPreemption:icl */ wa_masked_field_set(wal, GEN8_CS_CHICKEN1, @@ -631,7 +582,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine, FF_MODE2, FF_MODE2_TDS_TIMER_MASK, FF_MODE2_TDS_TIMER_128, - 0); + 0, false); } static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine, @@ -640,15 +591,16 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine, gen12_ctx_gt_tuning_init(engine, wal); /* - * Wa_1409142259:tgl - * Wa_1409347922:tgl - * Wa_1409252684:tgl - * Wa_1409217633:tgl - * Wa_1409207793:tgl - * Wa_1409178076:tgl - * Wa_1408979724:tgl - * Wa_14010443199:rkl - * Wa_14010698770:rkl + * Wa_1409142259:tgl,dg1,adl-p + * Wa_1409347922:tgl,dg1,adl-p + * Wa_1409252684:tgl,dg1,adl-p + * Wa_1409217633:tgl,dg1,adl-p + * Wa_1409207793:tgl,dg1,adl-p + * Wa_1409178076:tgl,dg1,adl-p + * Wa_1408979724:tgl,dg1,adl-p + * Wa_14010443199:tgl,rkl,dg1,adl-p + * Wa_14010698770:tgl,rkl,dg1,adl-s,adl-p + * Wa_1409342910:tgl,rkl,dg1,adl-s,adl-p */ wa_masked_en(wal, GEN11_COMMON_SLICE_CHICKEN3, GEN12_DISABLE_CPS_AWARE_COLOR_PIPE); @@ -668,7 +620,14 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine, FF_MODE2, FF_MODE2_GS_TIMER_MASK, FF_MODE2_GS_TIMER_224, - 0); + 0, false); + + /* + * Wa_14012131227:dg1 + * Wa_1508744258:tgl,rkl,dg1,adl-s,adl-p + */ + wa_masked_en(wal, GEN7_COMMON_SLICE_CHICKEN1, + GEN9_RHWO_OPTIMIZATION_DISABLE); } static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine, @@ -703,8 +662,6 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine, gen12_ctx_workarounds_init(engine, wal); else if (GRAPHICS_VER(i915) == 11) icl_ctx_workarounds_init(engine, wal); - else if (IS_CANNONLAKE(i915)) - cnl_ctx_workarounds_init(engine, wal); else if (IS_COFFEELAKE(i915) || IS_COMETLAKE(i915)) cfl_ctx_workarounds_init(engine, wal); else if (IS_GEMINILAKE(i915)) @@ -839,7 +796,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) wa_add(wal, HSW_ROW_CHICKEN3, 0, _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE), - 0 /* XXX does this reg exist? */); + 0 /* XXX does this reg exist? */, true); /* WaVSRefCountFullforceMissDisable:hsw */ wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME); @@ -882,30 +839,19 @@ skl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) GEN8_EU_GAUNIT_CLOCK_GATE_DISABLE); /* WaInPlaceDecompressionHang:skl */ - if (IS_SKL_REVID(i915, SKL_REVID_H0, REVID_FOREVER)) + if (IS_SKL_GT_STEP(i915, STEP_A0, STEP_H0)) wa_write_or(wal, GEN9_GAMT_ECO_REG_RW_IA, GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); } static void -bxt_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) -{ - gen9_gt_workarounds_init(i915, wal); - - /* WaInPlaceDecompressionHang:bxt */ - wa_write_or(wal, - GEN9_GAMT_ECO_REG_RW_IA, - GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); -} - -static void kbl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) { gen9_gt_workarounds_init(i915, wal); /* WaDisableDynamicCreditSharing:kbl */ - if (IS_KBL_GT_STEP(i915, 0, STEP_B0)) + if (IS_KBL_GT_STEP(i915, 0, STEP_C0)) wa_write_or(wal, GAMT_CHKN_BIT_REG, GAMT_CHKN_DISABLE_DYNAMIC_CREDIT_SHARING); @@ -943,98 +889,144 @@ cfl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); } +static void __set_mcr_steering(struct i915_wa_list *wal, + i915_reg_t steering_reg, + unsigned int slice, unsigned int subslice) +{ + u32 mcr, mcr_mask; + + mcr = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice); + mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK; + + wa_write_clr_set(wal, steering_reg, mcr_mask, mcr); +} + +static void __add_mcr_wa(struct drm_i915_private *i915, struct i915_wa_list *wal, + unsigned int slice, unsigned int subslice) +{ + drm_dbg(&i915->drm, "MCR slice=0x%x, subslice=0x%x\n", slice, subslice); + + __set_mcr_steering(wal, GEN8_MCR_SELECTOR, slice, subslice); +} + static void -wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal) +icl_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal) { const struct sseu_dev_info *sseu = &i915->gt.info.sseu; unsigned int slice, subslice; - u32 l3_en, mcr, mcr_mask; - GEM_BUG_ON(GRAPHICS_VER(i915) < 10); + GEM_BUG_ON(GRAPHICS_VER(i915) < 11); + GEM_BUG_ON(hweight8(sseu->slice_mask) > 1); + slice = 0; /* - * WaProgramMgsrForL3BankSpecificMmioReads: cnl,icl - * L3Banks could be fused off in single slice scenario. If that is - * the case, we might need to program MCR select to a valid L3Bank - * by default, to make sure we correctly read certain registers - * later on (in the range 0xB100 - 0xB3FF). + * Although a platform may have subslices, we need to always steer + * reads to the lowest instance that isn't fused off. When Render + * Power Gating is enabled, grabbing forcewake will only power up a + * single subslice (the "minconfig") if there isn't a real workload + * that needs to be run; this means that if we steer register reads to + * one of the higher subslices, we run the risk of reading back 0's or + * random garbage. + */ + subslice = __ffs(intel_sseu_get_subslices(sseu, slice)); + + /* + * If the subslice we picked above also steers us to a valid L3 bank, + * then we can just rely on the default steering and won't need to + * worry about explicitly re-steering L3BANK reads later. + */ + if (i915->gt.info.l3bank_mask & BIT(subslice)) + i915->gt.steering_table[L3BANK] = NULL; + + __add_mcr_wa(i915, wal, slice, subslice); +} + +static void +xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal) +{ + struct drm_i915_private *i915 = gt->i915; + const struct sseu_dev_info *sseu = >->info.sseu; + unsigned long slice, subslice = 0, slice_mask = 0; + u64 dss_mask = 0; + u32 lncf_mask = 0; + int i; + + /* + * On Xe_HP the steering increases in complexity. There are now several + * more units that require steering and we're not guaranteed to be able + * to find a common setting for all of them. These are: + * - GSLICE (fusable) + * - DSS (sub-unit within gslice; fusable) + * - L3 Bank (fusable) + * - MSLICE (fusable) + * - LNCF (sub-unit within mslice; always present if mslice is present) * - * WaProgramMgsrForCorrectSliceSpecificMmioReads:cnl,icl - * Before any MMIO read into slice/subslice specific registers, MCR - * packet control register needs to be programmed to point to any - * enabled s/ss pair. Otherwise, incorrect values will be returned. - * This means each subsequent MMIO read will be forwarded to an - * specific s/ss combination, but this is OK since these registers - * are consistent across s/ss in almost all cases. In the rare - * occasions, such as INSTDONE, where this value is dependent - * on s/ss combo, the read should be done with read_subslice_reg. + * We'll do our default/implicit steering based on GSLICE (in the + * sliceid field) and DSS (in the subsliceid field). If we can + * find overlap between the valid MSLICE and/or LNCF values with + * a suitable GSLICE, then we can just re-use the default value and + * skip and explicit steering at runtime. * - * Since GEN8_MCR_SELECTOR contains dual-purpose bits which select both - * to which subslice, or to which L3 bank, the respective mmio reads - * will go, we have to find a common index which works for both - * accesses. + * We only need to look for overlap between GSLICE/MSLICE/LNCF to find + * a valid sliceid value. DSS steering is the only type of steering + * that utilizes the 'subsliceid' bits. * - * Case where we cannot find a common index fortunately should not - * happen in production hardware, so we only emit a warning instead of - * implementing something more complex that requires checking the range - * of every MMIO read. + * Also note that, even though the steering domain is called "GSlice" + * and it is encoded in the register using the gslice format, the spec + * says that the combined (geometry | compute) fuse should be used to + * select the steering. */ - if (GRAPHICS_VER(i915) >= 10 && is_power_of_2(sseu->slice_mask)) { - u32 l3_fuse = - intel_uncore_read(&i915->uncore, GEN10_MIRROR_FUSE3) & - GEN10_L3BANK_MASK; + /* Find the potential gslice candidates */ + dss_mask = intel_sseu_get_subslices(sseu, 0); + slice_mask = intel_slicemask_from_dssmask(dss_mask, GEN_DSS_PER_GSLICE); - drm_dbg(&i915->drm, "L3 fuse = %x\n", l3_fuse); - l3_en = ~(l3_fuse << GEN10_L3BANK_PAIR_COUNT | l3_fuse); - } else { - l3_en = ~0; - } + /* + * Find the potential LNCF candidates. Either LNCF within a valid + * mslice is fine. + */ + for_each_set_bit(i, >->info.mslice_mask, GEN12_MAX_MSLICES) + lncf_mask |= (0x3 << (i * 2)); - slice = fls(sseu->slice_mask) - 1; - subslice = fls(l3_en & intel_sseu_get_subslices(sseu, slice)); - if (!subslice) { - drm_warn(&i915->drm, - "No common index found between subslice mask %x and L3 bank mask %x!\n", - intel_sseu_get_subslices(sseu, slice), l3_en); - subslice = fls(l3_en); - drm_WARN_ON(&i915->drm, !subslice); - } - subslice--; - - if (GRAPHICS_VER(i915) >= 11) { - mcr = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice); - mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK; - } else { - mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice); - mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK; + /* + * Are there any sliceid values that work for both GSLICE and LNCF + * steering? + */ + if (slice_mask & lncf_mask) { + slice_mask &= lncf_mask; + gt->steering_table[LNCF] = NULL; } - drm_dbg(&i915->drm, "MCR slice/subslice = %x\n", mcr); + /* How about sliceid values that also work for MSLICE steering? */ + if (slice_mask & gt->info.mslice_mask) { + slice_mask &= gt->info.mslice_mask; + gt->steering_table[MSLICE] = NULL; + } - wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr); -} + slice = __ffs(slice_mask); + subslice = __ffs(dss_mask >> (slice * GEN_DSS_PER_GSLICE)); + WARN_ON(subslice > GEN_DSS_PER_GSLICE); + WARN_ON(dss_mask >> (slice * GEN_DSS_PER_GSLICE) == 0); -static void -cnl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) -{ - wa_init_mcr(i915, wal); + __add_mcr_wa(i915, wal, slice, subslice); - /* WaInPlaceDecompressionHang:cnl */ - wa_write_or(wal, - GEN9_GAMT_ECO_REG_RW_IA, - GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); + /* + * SQIDI ranges are special because they use different steering + * registers than everything else we work with. On XeHP SDV and + * DG2-G10, any value in the steering registers will work fine since + * all instances are present, but DG2-G11 only has SQIDI instances at + * ID's 2 and 3, so we need to steer to one of those. For simplicity + * we'll just steer to a hardcoded "2" since that value will work + * everywhere. + */ + __set_mcr_steering(wal, MCFG_MCR_SELECTOR, 0, 2); + __set_mcr_steering(wal, SF_MCR_SELECTOR, 0, 2); } static void icl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) { - wa_init_mcr(i915, wal); - - /* WaInPlaceDecompressionHang:icl */ - wa_write_or(wal, - GEN9_GAMT_ECO_REG_RW_IA, - GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); + icl_wa_init_mcr(i915, wal); /* WaModifyGamTlbPartitioning:icl */ wa_write_clr_set(wal, @@ -1057,18 +1049,6 @@ icl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) GEN8_GAMW_ECO_DEV_RW_IA, GAMW_ECO_DEV_CTX_RELOAD_DISABLE); - /* Wa_1405779004:icl (pre-prod) */ - if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_A0)) - wa_write_or(wal, - SLICE_UNIT_LEVEL_CLKGATE, - MSCUNIT_CLKGATE_DIS); - - /* Wa_1406838659:icl (pre-prod) */ - if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_B0)) - wa_write_or(wal, - INF_UNIT_LEVEL_CLKGATE, - CGPSF_CLKGATE_DIS); - /* Wa_1406463099:icl * Formerly known as WaGamTlbPendError */ @@ -1078,10 +1058,16 @@ icl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) /* Wa_1607087056:icl,ehl,jsl */ if (IS_ICELAKE(i915) || - IS_JSL_EHL_REVID(i915, EHL_REVID_A0, EHL_REVID_A0)) + IS_JSL_EHL_GT_STEP(i915, STEP_A0, STEP_B0)) wa_write_or(wal, SLICE_UNIT_LEVEL_CLKGATE, L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS); + + /* + * This is not a documented workaround, but rather an optimization + * to reduce sampler power. + */ + wa_write_clr(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE); } /* @@ -1111,10 +1097,13 @@ static void gen12_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) { - wa_init_mcr(i915, wal); + icl_wa_init_mcr(i915, wal); - /* Wa_14011060649:tgl,rkl,dg1,adls */ + /* Wa_14011060649:tgl,rkl,dg1,adl-s,adl-p */ wa_14011060649(i915, wal); + + /* Wa_14011059788:tgl,rkl,adl-s,dg1,adl-p */ + wa_write_or(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE); } static void @@ -1123,19 +1112,19 @@ tgl_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) gen12_gt_workarounds_init(i915, wal); /* Wa_1409420604:tgl */ - if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) + if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) wa_write_or(wal, SUBSLICE_UNIT_LEVEL_CLKGATE2, CPSSUNIT_CLKGATE_DIS); /* Wa_1607087056:tgl also know as BUG:1409180338 */ - if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) + if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) wa_write_or(wal, SLICE_UNIT_LEVEL_CLKGATE, L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS); /* Wa_1408615072:tgl[a0] */ - if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) + if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2, VSUNIT_CLKGATE_DIS_TGL); } @@ -1146,7 +1135,7 @@ dg1_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) gen12_gt_workarounds_init(i915, wal); /* Wa_1607087056:dg1 */ - if (IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0)) + if (IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0)) wa_write_or(wal, SLICE_UNIT_LEVEL_CLKGATE, L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS); @@ -1165,9 +1154,17 @@ dg1_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) } static void +xehpsdv_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) +{ + xehp_init_mcr(&i915->gt, wal); +} + +static void gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal) { - if (IS_DG1(i915)) + if (IS_XEHPSDV(i915)) + xehpsdv_gt_workarounds_init(i915, wal); + else if (IS_DG1(i915)) dg1_gt_workarounds_init(i915, wal); else if (IS_TIGERLAKE(i915)) tgl_gt_workarounds_init(i915, wal); @@ -1175,8 +1172,6 @@ gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal) gen12_gt_workarounds_init(i915, wal); else if (GRAPHICS_VER(i915) == 11) icl_gt_workarounds_init(i915, wal); - else if (IS_CANNONLAKE(i915)) - cnl_gt_workarounds_init(i915, wal); else if (IS_COFFEELAKE(i915) || IS_COMETLAKE(i915)) cfl_gt_workarounds_init(i915, wal); else if (IS_GEMINILAKE(i915)) @@ -1184,7 +1179,7 @@ gt_init_workarounds(struct drm_i915_private *i915, struct i915_wa_list *wal) else if (IS_KABYLAKE(i915)) kbl_gt_workarounds_init(i915, wal); else if (IS_BROXTON(i915)) - bxt_gt_workarounds_init(i915, wal); + gen9_gt_workarounds_init(i915, wal); else if (IS_SKYLAKE(i915)) skl_gt_workarounds_init(i915, wal); else if (IS_HASWELL(i915)) @@ -1247,8 +1242,9 @@ wa_verify(const struct i915_wa *wa, u32 cur, const char *name, const char *from) } static void -wa_list_apply(struct intel_uncore *uncore, const struct i915_wa_list *wal) +wa_list_apply(struct intel_gt *gt, const struct i915_wa_list *wal) { + struct intel_uncore *uncore = gt->uncore; enum forcewake_domains fw; unsigned long flags; struct i915_wa *wa; @@ -1263,13 +1259,16 @@ wa_list_apply(struct intel_uncore *uncore, const struct i915_wa_list *wal) intel_uncore_forcewake_get__locked(uncore, fw); for (i = 0, wa = wal->list; i < wal->count; i++, wa++) { - if (wa->clr) - intel_uncore_rmw_fw(uncore, wa->reg, wa->clr, wa->set); - else - intel_uncore_write_fw(uncore, wa->reg, wa->set); + u32 val, old = 0; + + /* open-coded rmw due to steering */ + old = wa->clr ? intel_gt_read_register_fw(gt, wa->reg) : 0; + val = (old & ~wa->clr) | wa->set; + if (val != old || !wa->clr) + intel_uncore_write_fw(uncore, wa->reg, val); + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) - wa_verify(wa, - intel_uncore_read_fw(uncore, wa->reg), + wa_verify(wa, intel_gt_read_register_fw(gt, wa->reg), wal->name, "application"); } @@ -1279,28 +1278,39 @@ wa_list_apply(struct intel_uncore *uncore, const struct i915_wa_list *wal) void intel_gt_apply_workarounds(struct intel_gt *gt) { - wa_list_apply(gt->uncore, >->i915->gt_wa_list); + wa_list_apply(gt, >->i915->gt_wa_list); } -static bool wa_list_verify(struct intel_uncore *uncore, +static bool wa_list_verify(struct intel_gt *gt, const struct i915_wa_list *wal, const char *from) { + struct intel_uncore *uncore = gt->uncore; struct i915_wa *wa; + enum forcewake_domains fw; + unsigned long flags; unsigned int i; bool ok = true; + fw = wal_get_fw_for_rmw(uncore, wal); + + spin_lock_irqsave(&uncore->lock, flags); + intel_uncore_forcewake_get__locked(uncore, fw); + for (i = 0, wa = wal->list; i < wal->count; i++, wa++) ok &= wa_verify(wa, - intel_uncore_read(uncore, wa->reg), + intel_gt_read_register_fw(gt, wa->reg), wal->name, from); + intel_uncore_forcewake_put__locked(uncore, fw); + spin_unlock_irqrestore(&uncore->lock, flags); + return ok; } bool intel_gt_verify_workarounds(struct intel_gt *gt, const char *from) { - return wa_list_verify(gt->uncore, >->i915->gt_wa_list, from); + return wa_list_verify(gt, >->i915->gt_wa_list, from); } __maybe_unused @@ -1438,17 +1448,6 @@ static void cml_whitelist_build(struct intel_engine_cs *engine) cfl_whitelist_build(engine); } -static void cnl_whitelist_build(struct intel_engine_cs *engine) -{ - struct i915_wa_list *w = &engine->whitelist; - - if (engine->class != RENDER_CLASS) - return; - - /* WaEnablePreemptionGranularityControlByUMD:cnl */ - whitelist_reg(w, GEN8_CS_CHICKEN1); -} - static void icl_whitelist_build(struct intel_engine_cs *engine) { struct i915_wa_list *w = &engine->whitelist; @@ -1542,7 +1541,7 @@ static void dg1_whitelist_build(struct intel_engine_cs *engine) tgl_whitelist_build(engine); /* GEN:BUG:1409280441:dg1 */ - if (IS_DG1_REVID(engine->i915, DG1_REVID_A0, DG1_REVID_A0) && + if (IS_DG1_GT_STEP(engine->i915, STEP_A0, STEP_B0) && (engine->class == RENDER_CLASS || engine->class == COPY_ENGINE_CLASS)) whitelist_reg_ext(w, RING_ID(engine->mmio_base), @@ -1562,8 +1561,6 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine) tgl_whitelist_build(engine); else if (GRAPHICS_VER(i915) == 11) icl_whitelist_build(engine); - else if (IS_CANNONLAKE(i915)) - cnl_whitelist_build(engine); else if (IS_COMETLAKE(i915)) cml_whitelist_build(engine); else if (IS_COFFEELAKE(i915)) @@ -1612,8 +1609,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) { struct drm_i915_private *i915 = engine->i915; - if (IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0) || - IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) { + if (IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0) || + IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) { /* * Wa_1607138336:tgl[a0],dg1[a0] * Wa_1607063988:tgl[a0],dg1[a0] @@ -1623,7 +1620,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) GEN12_DISABLE_POSH_BUSY_FF_DOP_CG); } - if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_A0)) { + if (IS_TGL_UY_GT_STEP(i915, STEP_A0, STEP_B0)) { /* * Wa_1606679103:tgl * (see also Wa_1606682166:icl) @@ -1633,44 +1630,46 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) GEN7_DISABLE_SAMPLER_PREFETCH); } - if (IS_ALDERLAKE_S(i915) || IS_DG1(i915) || + if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { - /* Wa_1606931601:tgl,rkl,dg1,adl-s */ + /* Wa_1606931601:tgl,rkl,dg1,adl-s,adl-p */ wa_masked_en(wal, GEN7_ROW_CHICKEN2, GEN12_DISABLE_EARLY_READ); /* * Wa_1407928979:tgl A* * Wa_18011464164:tgl[B0+],dg1[B0+] * Wa_22010931296:tgl[B0+],dg1[B0+] - * Wa_14010919138:rkl,dg1,adl-s + * Wa_14010919138:rkl,dg1,adl-s,adl-p */ wa_write_or(wal, GEN7_FF_THREAD_MODE, GEN12_FF_TESSELATION_DOP_GATE_DISABLE); /* - * Wa_1606700617:tgl,dg1 - * Wa_22010271021:tgl,rkl,dg1, adl-s + * Wa_1606700617:tgl,dg1,adl-p + * Wa_22010271021:tgl,rkl,dg1,adl-s,adl-p + * Wa_14010826681:tgl,dg1,rkl,adl-p */ wa_masked_en(wal, GEN9_CS_DEBUG_MODE1, FF_DOP_CLOCK_GATE_DISABLE); } - if (IS_ALDERLAKE_S(i915) || IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0) || + if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || + IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { - /* Wa_1409804808:tgl,rkl,dg1[a0],adl-s */ + /* Wa_1409804808:tgl,rkl,dg1[a0],adl-s,adl-p */ wa_masked_en(wal, GEN7_ROW_CHICKEN2, GEN12_PUSH_CONST_DEREF_HOLD_DIS); /* * Wa_1409085225:tgl - * Wa_14010229206:tgl,rkl,dg1[a0],adl-s + * Wa_14010229206:tgl,rkl,dg1[a0],adl-s,adl-p */ wa_masked_en(wal, GEN9_ROW_CHICKEN4, GEN12_DISABLE_TDL_PUSH); } - if (IS_DG1_REVID(i915, DG1_REVID_A0, DG1_REVID_A0) || + if (IS_DG1_GT_STEP(i915, STEP_A0, STEP_B0) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { /* * Wa_1607030317:tgl @@ -1688,8 +1687,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) GEN8_RC_SEMA_IDLE_MSG_DISABLE); } - if (IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { - /* Wa_1406941453:tgl,rkl,dg1 */ + if (IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915) || + IS_ALDERLAKE_S(i915) || IS_ALDERLAKE_P(i915)) { + /* Wa_1406941453:tgl,rkl,dg1,adl-s,adl-p */ wa_masked_en(wal, GEN10_SAMPLER_MODE, ENABLE_SMALLPL); @@ -1701,11 +1701,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) _3D_CHICKEN3, _3D_CHICKEN3_AA_LINE_QUALITY_FIX_ENABLE); - /* WaPipelineFlushCoherentLines:icl */ - wa_write_or(wal, - GEN8_L3SQCREG4, - GEN8_LQSC_FLUSH_COHERENT_LINES); - /* * Wa_1405543622:icl * Formerly known as WaGAPZPriorityScheme @@ -1735,19 +1730,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) GEN8_L3SQCREG4, GEN11_LQSC_CLEAN_EVICT_DISABLE); - /* WaForwardProgressSoftReset:icl */ - wa_write_or(wal, - GEN10_SCRATCH_LNCF2, - PMFLUSHDONE_LNICRSDROP | - PMFLUSH_GAPL3UNBLOCK | - PMFLUSHDONE_LNEBLK); - - /* Wa_1406609255:icl (pre-prod) */ - if (IS_ICL_REVID(i915, ICL_REVID_A0, ICL_REVID_B0)) - wa_write_or(wal, - GEN7_SARCHKMD, - GEN7_DISABLE_DEMAND_PREFETCH); - /* Wa_1606682166:icl */ wa_write_or(wal, GEN7_SARCHKMD, @@ -1947,10 +1929,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) * disable bit, which we don't touch here, but it's good * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). */ - wa_add(wal, GEN7_GT_MODE, 0, - _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, - GEN6_WIZ_HASHING_16x4), - GEN6_WIZ_HASHING_16x4); + wa_masked_field_set(wal, + GEN7_GT_MODE, + GEN6_WIZ_HASHING_MASK, + GEN6_WIZ_HASHING_16x4); } if (IS_GRAPHICS_VER(i915, 6, 7)) @@ -2000,10 +1982,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) * disable bit, which we don't touch here, but it's good * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). */ - wa_add(wal, - GEN6_GT_MODE, 0, - _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4), - GEN6_WIZ_HASHING_16x4); + wa_masked_field_set(wal, + GEN6_GT_MODE, + GEN6_WIZ_HASHING_MASK, + GEN6_WIZ_HASHING_16x4); /* WaDisable_RenderCache_OperationalFlush:snb */ wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE); @@ -2024,7 +2006,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) wa_add(wal, MI_MODE, 0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH), /* XXX bit doesn't stick on Broadwater */ - IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH); + IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true); if (GRAPHICS_VER(i915) == 4) /* @@ -2039,7 +2021,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) */ wa_add(wal, ECOSKPD, 0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE), - 0 /* XXX bit doesn't stick on Broadwater */); + 0 /* XXX bit doesn't stick on Broadwater */, + true); } static void @@ -2048,7 +2031,7 @@ xcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) struct drm_i915_private *i915 = engine->i915; /* WaKBLVECSSemaphoreWaitPoll:kbl */ - if (IS_KBL_GT_STEP(i915, STEP_A0, STEP_E0)) { + if (IS_KBL_GT_STEP(i915, STEP_A0, STEP_F0)) { wa_write(wal, RING_SEMA_WAIT_POLL(engine->mmio_base), 1); @@ -2081,7 +2064,7 @@ void intel_engine_init_workarounds(struct intel_engine_cs *engine) void intel_engine_apply_workarounds(struct intel_engine_cs *engine) { - wa_list_apply(engine->uncore, &engine->wa_list); + wa_list_apply(engine->gt, &engine->wa_list); } struct mcr_range { @@ -2107,12 +2090,31 @@ static const struct mcr_range mcr_ranges_gen12[] = { {}, }; +static const struct mcr_range mcr_ranges_xehp[] = { + { .start = 0x4000, .end = 0x4aff }, + { .start = 0x5200, .end = 0x52ff }, + { .start = 0x5400, .end = 0x7fff }, + { .start = 0x8140, .end = 0x815f }, + { .start = 0x8c80, .end = 0x8dff }, + { .start = 0x94d0, .end = 0x955f }, + { .start = 0x9680, .end = 0x96ff }, + { .start = 0xb000, .end = 0xb3ff }, + { .start = 0xc800, .end = 0xcfff }, + { .start = 0xd800, .end = 0xd8ff }, + { .start = 0xdc00, .end = 0xffff }, + { .start = 0x17000, .end = 0x17fff }, + { .start = 0x24a00, .end = 0x24a7f }, + {}, +}; + static bool mcr_range(struct drm_i915_private *i915, u32 offset) { const struct mcr_range *mcr_ranges; int i; - if (GRAPHICS_VER(i915) >= 12) + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + mcr_ranges = mcr_ranges_xehp; + else if (GRAPHICS_VER(i915) >= 12) mcr_ranges = mcr_ranges_gen12; else if (GRAPHICS_VER(i915) >= 8) mcr_ranges = mcr_ranges_gen8; diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h index c214111ea367..1e873681795d 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h +++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h @@ -15,6 +15,7 @@ struct i915_wa { u32 clr; u32 set; u32 read; + bool masked_reg; }; struct i915_wa_list { diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 32589c6625e1..2c1af030310c 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,34 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); } +static void mock_add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void mock_remove_from_engine(struct i915_request *rq) +{ + struct intel_engine_cs *engine, *locked; + + /* + * Virtual engines complicate acquiring the engine timeline lock, + * as their rq->engine pointer is not stable until under that + * engine lock. The simple ploy we use is to take the lock then + * check that the rq still belongs to the newly locked engine. + */ + + locked = READ_ONCE(rq->engine); + spin_lock_irq(&locked->sched_engine->lock); + while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { + spin_unlock(&locked->sched_engine->lock); + spin_lock(&engine->sched_engine->lock); + locked = engine; + } + list_del_init(&rq->sched.link); + spin_unlock_irq(&locked->sched_engine->lock); +} + static void mock_reset_prepare(struct intel_engine_cs *engine) { } @@ -253,10 +281,10 @@ static void mock_reset_cancel(struct intel_engine_cs *engine) del_timer_sync(&mock->hw_delay); - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&engine->sched_engine->lock, flags); /* Mark all submitted requests as skipped. */ - list_for_each_entry(rq, &engine->active.requests, sched.link) + list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) i915_request_put(i915_request_mark_eio(rq)); intel_engine_signal_breadcrumbs(engine); @@ -269,7 +297,7 @@ static void mock_reset_cancel(struct intel_engine_cs *engine) } INIT_LIST_HEAD(&mock->hw_queue); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->sched_engine->lock, flags); } static void mock_reset_finish(struct intel_engine_cs *engine) @@ -283,7 +311,8 @@ static void mock_engine_release(struct intel_engine_cs *engine) GEM_BUG_ON(timer_pending(&mock->hw_delay)); - intel_breadcrumbs_free(engine->breadcrumbs); + i915_sched_engine_put(engine->sched_engine); + intel_breadcrumbs_put(engine->breadcrumbs); intel_context_unpin(engine->kernel_context); intel_context_put(engine->kernel_context); @@ -320,6 +349,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request; + engine->base.add_active_request = mock_add_to_engine; + engine->base.remove_active_request = mock_remove_from_engine; engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind; @@ -345,14 +376,18 @@ int mock_engine_init(struct intel_engine_cs *engine) { struct intel_context *ce; - intel_engine_init_active(engine, ENGINE_MOCK); + engine->sched_engine = i915_sched_engine_create(ENGINE_MOCK); + if (!engine->sched_engine) + return -ENOMEM; + engine->sched_engine->private_data = engine; + intel_engine_init_execlists(engine); intel_engine_init__pm(engine); intel_engine_init_retire(engine); engine->breadcrumbs = intel_breadcrumbs_create(NULL); if (!engine->breadcrumbs) - return -ENOMEM; + goto err_schedule; ce = create_kernel_context(engine); if (IS_ERR(ce)) @@ -365,7 +400,9 @@ int mock_engine_init(struct intel_engine_cs *engine) return 0; err_breadcrumbs: - intel_breadcrumbs_free(engine->breadcrumbs); + intel_breadcrumbs_put(engine->breadcrumbs); +err_schedule: + i915_sched_engine_put(engine->sched_engine); return -ENOMEM; } diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests). + * + * In GuC submission mode we don't use idle barriers and we instead + * get a message from the GuC to signal that it is safe to unpin the + * context from memory. */ + if (intel_engine_uses_guc(engine)) + return 0; if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n", @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier. + * + * In GuC submission mode we don't use idle barriers. */ + if (intel_engine_uses_guc(engine)) + return 0; if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n", diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c index 4896e4ccad50..317eebf086c3 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c @@ -405,3 +405,25 @@ void st_engine_heartbeat_enable(struct intel_engine_cs *engine) engine->props.heartbeat_interval_ms = engine->defaults.heartbeat_interval_ms; } + +void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine) +{ + engine->props.heartbeat_interval_ms = 0; + + /* + * Park the heartbeat but without holding the PM lock as that + * makes the engines appear not-idle. Note that if/when unpark + * is called due to the PM lock being acquired later the + * heartbeat still won't be enabled because of the above = 0. + */ + if (intel_engine_pm_get_if_awake(engine)) { + intel_engine_park_heartbeat(engine); + intel_engine_pm_put(engine); + } +} + +void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine) +{ + engine->props.heartbeat_interval_ms = + engine->defaults.heartbeat_interval_ms; +} diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h index cd27113d5400..81da2cd8e406 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h @@ -9,6 +9,8 @@ struct intel_engine_cs; void st_engine_heartbeat_disable(struct intel_engine_cs *engine); +void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine); void st_engine_heartbeat_enable(struct intel_engine_cs *engine); +void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine); #endif /* SELFTEST_ENGINE_HEARTBEAT_H */ diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c index 72cca3f0da21..75569666105d 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c @@ -173,8 +173,8 @@ static int __live_engine_timestamps(struct intel_engine_cs *engine) d_ctx = trifilter(s_ctx); d_ctx *= engine->gt->clock_frequency; - if (IS_ICELAKE(engine->i915)) - d_ring *= 12500000; /* Fixed 80ns for icl ctx timestamp? */ + if (GRAPHICS_VER(engine->i915) == 11) + d_ring *= 12500000; /* Fixed 80ns for GEN11 ctx timestamp? */ else d_ring *= engine->gt->clock_frequency; diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 1c8108d30b85..f12ffe797639 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -43,7 +43,7 @@ static int wait_for_submit(struct intel_engine_cs *engine, unsigned long timeout) { /* Ignore our own attempts to suppress excess tasklets */ - tasklet_hi_schedule(&engine->execlists.tasklet); + tasklet_hi_schedule(&engine->sched_engine->tasklet); timeout += jiffies; do { @@ -273,7 +273,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio) }; /* Alternatively preempt the spinner with ce[1] */ - engine->schedule(rq[1], &attr); + engine->sched_engine->schedule(rq[1], &attr); } /* And switch back to ce[0] for good measure */ @@ -553,13 +553,13 @@ static int live_pin_rewind(void *arg) static int engine_lock_reset_tasklet(struct intel_engine_cs *engine) { - tasklet_disable(&engine->execlists.tasklet); + tasklet_disable(&engine->sched_engine->tasklet); local_bh_disable(); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, &engine->gt->reset.flags)) { local_bh_enable(); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&engine->sched_engine->tasklet); intel_gt_set_wedged(engine->gt); return -EBUSY; @@ -574,7 +574,7 @@ static void engine_unlock_reset_tasklet(struct intel_engine_cs *engine) &engine->gt->reset.flags); local_bh_enable(); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&engine->sched_engine->tasklet); } static int live_hold_reset(void *arg) @@ -628,7 +628,7 @@ static int live_hold_reset(void *arg) if (err) goto out; - engine->execlists.tasklet.callback(&engine->execlists.tasklet); + engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet); GEM_BUG_ON(execlists_active(&engine->execlists) != rq); i915_request_get(rq); @@ -917,7 +917,7 @@ release_queue(struct intel_engine_cs *engine, i915_request_add(rq); local_bh_disable(); - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); local_bh_enable(); /* kick tasklet */ i915_request_put(rq); @@ -1200,7 +1200,7 @@ static int live_timeslice_rewind(void *arg) while (i915_request_is_active(rq[A2])) { /* semaphore yield! */ /* Wait for the timeslice to kick in */ del_timer(&engine->execlists.timer); - tasklet_hi_schedule(&engine->execlists.tasklet); + tasklet_hi_schedule(&engine->sched_engine->tasklet); intel_engine_flush_submission(engine); } /* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */ @@ -1342,7 +1342,7 @@ static int live_timeslice_queue(void *arg) err = PTR_ERR(rq); goto err_heartbeat; } - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); err = wait_for_submit(engine, rq, HZ / 2); if (err) { pr_err("%s: Timed out trying to submit semaphores\n", @@ -1539,12 +1539,12 @@ static int live_busywait_preempt(void *arg) * preempt the busywaits used to synchronise between rings. */ - ctx_hi = kernel_context(gt->i915); + ctx_hi = kernel_context(gt->i915, NULL); if (!ctx_hi) return -ENOMEM; ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY; - ctx_lo = kernel_context(gt->i915); + ctx_lo = kernel_context(gt->i915, NULL); if (!ctx_lo) goto err_ctx_hi; ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY; @@ -1741,12 +1741,12 @@ static int live_preempt(void *arg) if (igt_spinner_init(&spin_lo, gt)) goto err_spin_hi; - ctx_hi = kernel_context(gt->i915); + ctx_hi = kernel_context(gt->i915, NULL); if (!ctx_hi) goto err_spin_lo; ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY; - ctx_lo = kernel_context(gt->i915); + ctx_lo = kernel_context(gt->i915, NULL); if (!ctx_lo) goto err_ctx_hi; ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY; @@ -1833,11 +1833,11 @@ static int live_late_preempt(void *arg) if (igt_spinner_init(&spin_lo, gt)) goto err_spin_hi; - ctx_hi = kernel_context(gt->i915); + ctx_hi = kernel_context(gt->i915, NULL); if (!ctx_hi) goto err_spin_lo; - ctx_lo = kernel_context(gt->i915); + ctx_lo = kernel_context(gt->i915, NULL); if (!ctx_lo) goto err_ctx_hi; @@ -1884,7 +1884,7 @@ static int live_late_preempt(void *arg) } attr.priority = I915_PRIORITY_MAX; - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); if (!igt_wait_for_spinner(&spin_hi, rq)) { pr_err("High priority context failed to preempt the low priority context\n"); @@ -1927,7 +1927,7 @@ struct preempt_client { static int preempt_client_init(struct intel_gt *gt, struct preempt_client *c) { - c->ctx = kernel_context(gt->i915); + c->ctx = kernel_context(gt->i915, NULL); if (!c->ctx) return -ENOMEM; @@ -2497,7 +2497,7 @@ static int live_suppress_self_preempt(void *arg) i915_request_add(rq_b); GEM_BUG_ON(i915_request_completed(rq_a)); - engine->schedule(rq_a, &attr); + engine->sched_engine->schedule(rq_a, &attr); igt_spinner_end(&a.spin); if (!igt_wait_for_spinner(&b.spin, rq_b)) { @@ -2629,7 +2629,7 @@ static int live_chain_preempt(void *arg) i915_request_get(rq); i915_request_add(rq); - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); igt_spinner_end(&hi.spin); if (i915_request_wait(rq, 0, HZ / 5) < 0) { @@ -2810,7 +2810,7 @@ static int __live_preempt_ring(struct intel_engine_cs *engine, goto err_ce; } - tmp->ring = __intel_context_ring_size(ring_sz); + tmp->ring_size = ring_sz; err = intel_context_pin(tmp); if (err) { @@ -2988,7 +2988,7 @@ static int live_preempt_gang(void *arg) break; /* Submit each spinner at increasing priority */ - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); } while (prio <= I915_PRIORITY_MAX && !__igt_timeout(end_time, NULL)); pr_debug("%s: Preempt chain of %d requests\n", @@ -3236,7 +3236,7 @@ static int preempt_user(struct intel_engine_cs *engine, i915_request_get(rq); i915_request_add(rq); - engine->schedule(rq, &attr); + engine->sched_engine->schedule(rq, &attr); if (i915_request_wait(rq, 0, HZ / 2) < 0) err = -ETIME; @@ -3384,12 +3384,12 @@ static int live_preempt_timeout(void *arg) if (igt_spinner_init(&spin_lo, gt)) return -ENOMEM; - ctx_hi = kernel_context(gt->i915); + ctx_hi = kernel_context(gt->i915, NULL); if (!ctx_hi) goto err_spin_lo; ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY; - ctx_lo = kernel_context(gt->i915); + ctx_lo = kernel_context(gt->i915, NULL); if (!ctx_lo) goto err_ctx_hi; ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY; @@ -3561,12 +3561,16 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags) #define BATCH BIT(0) { struct task_struct *tsk[I915_NUM_ENGINES] = {}; - struct preempt_smoke arg[I915_NUM_ENGINES]; + struct preempt_smoke *arg; struct intel_engine_cs *engine; enum intel_engine_id id; unsigned long count; int err = 0; + arg = kmalloc_array(I915_NUM_ENGINES, sizeof(*arg), GFP_KERNEL); + if (!arg) + return -ENOMEM; + for_each_engine(engine, smoke->gt, id) { arg[id] = *smoke; arg[id].engine = engine; @@ -3574,7 +3578,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags) arg[id].batch = NULL; arg[id].count = 0; - tsk[id] = kthread_run(smoke_crescendo_thread, &arg, + tsk[id] = kthread_run(smoke_crescendo_thread, arg, "igt/smoke:%d", id); if (IS_ERR(tsk[id])) { err = PTR_ERR(tsk[id]); @@ -3603,6 +3607,8 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags) pr_info("Submitted %lu crescendo:%x requests across %d engines and %d contexts\n", count, flags, smoke->gt->info.num_engines, smoke->ncontext); + + kfree(arg); return 0; } @@ -3676,7 +3682,7 @@ static int live_preempt_smoke(void *arg) } for (n = 0; n < smoke.ncontext; n++) { - smoke.contexts[n] = kernel_context(smoke.gt->i915); + smoke.contexts[n] = kernel_context(smoke.gt->i915, NULL); if (!smoke.contexts[n]) goto err_ctx; } @@ -3727,7 +3733,7 @@ static int nop_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve)); for (n = 0; n < nctx; n++) { - ve[n] = intel_execlists_create_virtual(siblings, nsibling); + ve[n] = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve[n])) { err = PTR_ERR(ve[n]); nctx = n; @@ -3923,7 +3929,7 @@ static int mask_virtual_engine(struct intel_gt *gt, * restrict it to our desired engine within the virtual engine. */ - ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_close; @@ -4054,7 +4060,7 @@ static int slicein_virtual_engine(struct intel_gt *gt, i915_request_add(rq); } - ce = intel_execlists_create_virtual(siblings, nsibling); + ce = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out; @@ -4106,7 +4112,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt, /* XXX We do not handle oversubscription and fairness with normal rq */ for (n = 0; n < nsibling; n++) { - ce = intel_execlists_create_virtual(siblings, nsibling); + ce = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out; @@ -4208,7 +4214,7 @@ static int preserved_virtual_engine(struct intel_gt *gt, if (err) goto out_scratch; - ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_scratch; @@ -4328,234 +4334,6 @@ static int live_virtual_preserved(void *arg) return 0; } -static int bond_virtual_engine(struct intel_gt *gt, - unsigned int class, - struct intel_engine_cs **siblings, - unsigned int nsibling, - unsigned int flags) -#define BOND_SCHEDULE BIT(0) -{ - struct intel_engine_cs *master; - struct i915_request *rq[16]; - enum intel_engine_id id; - struct igt_spinner spin; - unsigned long n; - int err; - - /* - * A set of bonded requests is intended to be run concurrently - * across a number of engines. We use one request per-engine - * and a magic fence to schedule each of the bonded requests - * at the same time. A consequence of our current scheduler is that - * we only move requests to the HW ready queue when the request - * becomes ready, that is when all of its prerequisite fences have - * been signaled. As one of those fences is the master submit fence, - * there is a delay on all secondary fences as the HW may be - * currently busy. Equally, as all the requests are independent, - * they may have other fences that delay individual request - * submission to HW. Ergo, we do not guarantee that all requests are - * immediately submitted to HW at the same time, just that if the - * rules are abided by, they are ready at the same time as the - * first is submitted. Userspace can embed semaphores in its batch - * to ensure parallel execution of its phases as it requires. - * Though naturally it gets requested that perhaps the scheduler should - * take care of parallel execution, even across preemption events on - * different HW. (The proper answer is of course "lalalala".) - * - * With the submit-fence, we have identified three possible phases - * of synchronisation depending on the master fence: queued (not - * ready), executing, and signaled. The first two are quite simple - * and checked below. However, the signaled master fence handling is - * contentious. Currently we do not distinguish between a signaled - * fence and an expired fence, as once signaled it does not convey - * any information about the previous execution. It may even be freed - * and hence checking later it may not exist at all. Ergo we currently - * do not apply the bonding constraint for an already signaled fence, - * as our expectation is that it should not constrain the secondaries - * and is outside of the scope of the bonded request API (i.e. all - * userspace requests are meant to be running in parallel). As - * it imposes no constraint, and is effectively a no-op, we do not - * check below as normal execution flows are checked extensively above. - * - * XXX Is the degenerate handling of signaled submit fences the - * expected behaviour for userpace? - */ - - GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1); - - if (igt_spinner_init(&spin, gt)) - return -ENOMEM; - - err = 0; - rq[0] = ERR_PTR(-ENOMEM); - for_each_engine(master, gt, id) { - struct i915_sw_fence fence = {}; - struct intel_context *ce; - - if (master->class == class) - continue; - - ce = intel_context_create(master); - if (IS_ERR(ce)) { - err = PTR_ERR(ce); - goto out; - } - - memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq)); - - rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP); - intel_context_put(ce); - if (IS_ERR(rq[0])) { - err = PTR_ERR(rq[0]); - goto out; - } - i915_request_get(rq[0]); - - if (flags & BOND_SCHEDULE) { - onstack_fence_init(&fence); - err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit, - &fence, - GFP_KERNEL); - } - - i915_request_add(rq[0]); - if (err < 0) - goto out; - - if (!(flags & BOND_SCHEDULE) && - !igt_wait_for_spinner(&spin, rq[0])) { - err = -EIO; - goto out; - } - - for (n = 0; n < nsibling; n++) { - struct intel_context *ve; - - ve = intel_execlists_create_virtual(siblings, nsibling); - if (IS_ERR(ve)) { - err = PTR_ERR(ve); - onstack_fence_fini(&fence); - goto out; - } - - err = intel_virtual_engine_attach_bond(ve->engine, - master, - siblings[n]); - if (err) { - intel_context_put(ve); - onstack_fence_fini(&fence); - goto out; - } - - err = intel_context_pin(ve); - intel_context_put(ve); - if (err) { - onstack_fence_fini(&fence); - goto out; - } - - rq[n + 1] = i915_request_create(ve); - intel_context_unpin(ve); - if (IS_ERR(rq[n + 1])) { - err = PTR_ERR(rq[n + 1]); - onstack_fence_fini(&fence); - goto out; - } - i915_request_get(rq[n + 1]); - - err = i915_request_await_execution(rq[n + 1], - &rq[0]->fence, - ve->engine->bond_execute); - i915_request_add(rq[n + 1]); - if (err < 0) { - onstack_fence_fini(&fence); - goto out; - } - } - onstack_fence_fini(&fence); - intel_engine_flush_submission(master); - igt_spinner_end(&spin); - - if (i915_request_wait(rq[0], 0, HZ / 10) < 0) { - pr_err("Master request did not execute (on %s)!\n", - rq[0]->engine->name); - err = -EIO; - goto out; - } - - for (n = 0; n < nsibling; n++) { - if (i915_request_wait(rq[n + 1], 0, - MAX_SCHEDULE_TIMEOUT) < 0) { - err = -EIO; - goto out; - } - - if (rq[n + 1]->engine != siblings[n]) { - pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n", - siblings[n]->name, - rq[n + 1]->engine->name, - rq[0]->engine->name); - err = -EINVAL; - goto out; - } - } - - for (n = 0; !IS_ERR(rq[n]); n++) - i915_request_put(rq[n]); - rq[0] = ERR_PTR(-ENOMEM); - } - -out: - for (n = 0; !IS_ERR(rq[n]); n++) - i915_request_put(rq[n]); - if (igt_flush_test(gt->i915)) - err = -EIO; - - igt_spinner_fini(&spin); - return err; -} - -static int live_virtual_bond(void *arg) -{ - static const struct phase { - const char *name; - unsigned int flags; - } phases[] = { - { "", 0 }, - { "schedule", BOND_SCHEDULE }, - { }, - }; - struct intel_gt *gt = arg; - struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1]; - unsigned int class; - int err; - - if (intel_uc_uses_guc_submission(>->uc)) - return 0; - - for (class = 0; class <= MAX_ENGINE_CLASS; class++) { - const struct phase *p; - int nsibling; - - nsibling = select_siblings(gt, class, siblings); - if (nsibling < 2) - continue; - - for (p = phases; p->name; p++) { - err = bond_virtual_engine(gt, - class, siblings, nsibling, - p->flags); - if (err) { - pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n", - __func__, p->name, class, nsibling, err); - return err; - } - } - } - - return 0; -} - static int reset_virtual_engine(struct intel_gt *gt, struct intel_engine_cs **siblings, unsigned int nsibling) @@ -4576,7 +4354,7 @@ static int reset_virtual_engine(struct intel_gt *gt, if (igt_spinner_init(&spin, gt)) return -ENOMEM; - ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_spin; @@ -4606,13 +4384,13 @@ static int reset_virtual_engine(struct intel_gt *gt, if (err) goto out_heartbeat; - engine->execlists.tasklet.callback(&engine->execlists.tasklet); + engine->sched_engine->tasklet.callback(&engine->sched_engine->tasklet); GEM_BUG_ON(execlists_active(&engine->execlists) != rq); /* Fake a preemption event; failed of course */ - spin_lock_irq(&engine->active.lock); + spin_lock_irq(&engine->sched_engine->lock); __unwind_incomplete_requests(engine); - spin_unlock_irq(&engine->active.lock); + spin_unlock_irq(&engine->sched_engine->lock); GEM_BUG_ON(rq->engine != engine); /* Reset the engine while keeping our active request on hold */ @@ -4721,7 +4499,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915) SUBTEST(live_virtual_mask), SUBTEST(live_virtual_preserved), SUBTEST(live_virtual_slice), - SUBTEST(live_virtual_bond), SUBTEST(live_virtual_reset), }; diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 853246fad05f..2c1ed32ca5ac 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -17,6 +17,8 @@ #include "selftests/igt_flush_test.h" #include "selftests/igt_reset.h" #include "selftests/igt_atomic.h" +#include "selftests/igt_spinner.h" +#include "selftests/intel_scheduler_helpers.h" #include "selftests/mock_drm.h" @@ -42,7 +44,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) memset(h, 0, sizeof(*h)); h->gt = gt; - h->ctx = kernel_context(gt->i915); + h->ctx = kernel_context(gt->i915, NULL); if (IS_ERR(h->ctx)) return PTR_ERR(h->ctx); @@ -378,6 +380,7 @@ static int igt_reset_nop(void *arg) ce = intel_context_create(engine); if (IS_ERR(ce)) { err = PTR_ERR(ce); + pr_err("[%s] Create context failed: %d!\n", engine->name, err); break; } @@ -387,6 +390,8 @@ static int igt_reset_nop(void *arg) rq = intel_context_create_request(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); + pr_err("[%s] Create request failed: %d!\n", + engine->name, err); break; } @@ -401,24 +406,31 @@ static int igt_reset_nop(void *arg) igt_global_reset_unlock(gt); if (intel_gt_is_wedged(gt)) { + pr_err("[%s] GT is wedged!\n", engine->name); err = -EIO; break; } if (i915_reset_count(global) != reset_count + ++count) { - pr_err("Full GPU reset not recorded!\n"); + pr_err("[%s] Reset not recorded: %d vs %d + %d!\n", + engine->name, i915_reset_count(global), reset_count, count); err = -EINVAL; break; } err = igt_flush_test(gt->i915); - if (err) + if (err) { + pr_err("[%s] Flush failed: %d!\n", engine->name, err); break; + } } while (time_before(jiffies, end_time)); pr_info("%s: %d resets\n", __func__, count); - if (igt_flush_test(gt->i915)) + if (igt_flush_test(gt->i915)) { + pr_err("Post flush failed: %d!\n", err); err = -EIO; + } + return err; } @@ -440,9 +452,19 @@ static int igt_reset_nop_engine(void *arg) IGT_TIMEOUT(end_time); int err; + if (intel_engine_uses_guc(engine)) { + /* Engine level resets are triggered by GuC when a hang + * is detected. They can't be triggered by the KMD any + * more. Thus a nop batch cannot be used as a reset test + */ + continue; + } + ce = intel_context_create(engine); - if (IS_ERR(ce)) + if (IS_ERR(ce)) { + pr_err("[%s] Create context failed: %pe!\n", engine->name, ce); return PTR_ERR(ce); + } reset_count = i915_reset_count(global); reset_engine_count = i915_reset_engine_count(global, engine); @@ -549,9 +571,15 @@ static int igt_reset_fail_engine(void *arg) IGT_TIMEOUT(end_time); int err; + /* Can't manually break the reset if i915 doesn't perform it */ + if (intel_engine_uses_guc(engine)) + continue; + ce = intel_context_create(engine); - if (IS_ERR(ce)) + if (IS_ERR(ce)) { + pr_err("[%s] Create context failed: %pe!\n", engine->name, ce); return PTR_ERR(ce); + } st_engine_heartbeat_disable(engine); set_bit(I915_RESET_ENGINE + id, >->reset.flags); @@ -686,8 +714,12 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active) for_each_engine(engine, gt, id) { unsigned int reset_count, reset_engine_count; unsigned long count; + bool using_guc = intel_engine_uses_guc(engine); IGT_TIMEOUT(end_time); + if (using_guc && !active) + continue; + if (active && !intel_engine_can_store_dword(engine)) continue; @@ -705,13 +737,24 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active) set_bit(I915_RESET_ENGINE + id, >->reset.flags); count = 0; do { - if (active) { - struct i915_request *rq; + struct i915_request *rq = NULL; + struct intel_selftest_saved_policy saved; + int err2; + + err = intel_selftest_modify_policy(engine, &saved, + SELFTEST_SCHEDULER_MODIFY_FAST_RESET); + if (err) { + pr_err("[%s] Modify policy failed: %d!\n", engine->name, err); + break; + } + if (active) { rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); - break; + pr_err("[%s] Create hang request failed: %d!\n", + engine->name, err); + goto restore; } i915_request_get(rq); @@ -727,34 +770,59 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active) i915_request_put(rq); err = -EIO; - break; + goto restore; } + } - i915_request_put(rq); + if (!using_guc) { + err = intel_engine_reset(engine, NULL); + if (err) { + pr_err("intel_engine_reset(%s) failed, err:%d\n", + engine->name, err); + goto skip; + } } - err = intel_engine_reset(engine, NULL); - if (err) { - pr_err("intel_engine_reset(%s) failed, err:%d\n", - engine->name, err); - break; + if (rq) { + /* Ensure the reset happens and kills the engine */ + err = intel_selftest_wait_for_rq(rq); + if (err) + pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n", + engine->name, rq->fence.context, + rq->fence.seqno, rq->context->guc_id, err); } +skip: + if (rq) + i915_request_put(rq); + if (i915_reset_count(global) != reset_count) { pr_err("Full GPU reset recorded! (engine reset expected)\n"); err = -EINVAL; - break; + goto restore; } - if (i915_reset_engine_count(global, engine) != - ++reset_engine_count) { - pr_err("%s engine reset not recorded!\n", - engine->name); - err = -EINVAL; - break; + /* GuC based resets are not logged per engine */ + if (!using_guc) { + if (i915_reset_engine_count(global, engine) != + ++reset_engine_count) { + pr_err("%s engine reset not recorded!\n", + engine->name); + err = -EINVAL; + goto restore; + } } count++; + +restore: + err2 = intel_selftest_restore_policy(engine, &saved); + if (err2) + pr_err("[%s] Restore policy failed: %d!\n", engine->name, err); + if (err == 0) + err = err2; + if (err) + break; } while (time_before(jiffies, end_time)); clear_bit(I915_RESET_ENGINE + id, >->reset.flags); st_engine_heartbeat_enable(engine); @@ -765,12 +833,16 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active) break; err = igt_flush_test(gt->i915); - if (err) + if (err) { + pr_err("[%s] Flush failed: %d!\n", engine->name, err); break; + } } - if (intel_gt_is_wedged(gt)) + if (intel_gt_is_wedged(gt)) { + pr_err("GT is wedged!\n"); err = -EIO; + } if (active) hang_fini(&h); @@ -807,7 +879,7 @@ static int active_request_put(struct i915_request *rq) if (!rq) return 0; - if (i915_request_wait(rq, 0, 5 * HZ) < 0) { + if (i915_request_wait(rq, 0, 10 * HZ) < 0) { GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n", rq->engine->name, rq->fence.context, @@ -837,6 +909,7 @@ static int active_engine(void *data) ce[count] = intel_context_create(engine); if (IS_ERR(ce[count])) { err = PTR_ERR(ce[count]); + pr_err("[%s] Create context #%ld failed: %d!\n", engine->name, count, err); while (--count) intel_context_put(ce[count]); return err; @@ -852,23 +925,26 @@ static int active_engine(void *data) new = intel_context_create_request(ce[idx]); if (IS_ERR(new)) { err = PTR_ERR(new); + pr_err("[%s] Create request #%d failed: %d!\n", engine->name, idx, err); break; } rq[idx] = i915_request_get(new); i915_request_add(new); - if (engine->schedule && arg->flags & TEST_PRIORITY) { + if (engine->sched_engine->schedule && arg->flags & TEST_PRIORITY) { struct i915_sched_attr attr = { .priority = i915_prandom_u32_max_state(512, &prng), }; - engine->schedule(rq[idx], &attr); + engine->sched_engine->schedule(rq[idx], &attr); } err = active_request_put(old); - if (err) + if (err) { + pr_err("[%s] Request put failed: %d!\n", engine->name, err); break; + } cond_resched(); } @@ -876,6 +952,9 @@ static int active_engine(void *data) for (count = 0; count < ARRAY_SIZE(rq); count++) { int err__ = active_request_put(rq[count]); + if (err) + pr_err("[%s] Request put #%ld failed: %d!\n", engine->name, count, err); + /* Keep the first error */ if (!err) err = err__; @@ -916,10 +995,13 @@ static int __igt_reset_engines(struct intel_gt *gt, struct active_engine threads[I915_NUM_ENGINES] = {}; unsigned long device = i915_reset_count(global); unsigned long count = 0, reported; + bool using_guc = intel_engine_uses_guc(engine); IGT_TIMEOUT(end_time); - if (flags & TEST_ACTIVE && - !intel_engine_can_store_dword(engine)) + if (flags & TEST_ACTIVE) { + if (!intel_engine_can_store_dword(engine)) + continue; + } else if (using_guc) continue; if (!wait_for_idle(engine)) { @@ -949,6 +1031,7 @@ static int __igt_reset_engines(struct intel_gt *gt, "igt/%s", other->name); if (IS_ERR(tsk)) { err = PTR_ERR(tsk); + pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err); goto unwind; } @@ -958,16 +1041,27 @@ static int __igt_reset_engines(struct intel_gt *gt, yield(); /* start all threads before we begin */ - st_engine_heartbeat_disable(engine); + st_engine_heartbeat_disable_no_pm(engine); set_bit(I915_RESET_ENGINE + id, >->reset.flags); do { struct i915_request *rq = NULL; + struct intel_selftest_saved_policy saved; + int err2; + + err = intel_selftest_modify_policy(engine, &saved, + SELFTEST_SCHEDULER_MODIFY_FAST_RESET); + if (err) { + pr_err("[%s] Modify policy failed: %d!\n", engine->name, err); + break; + } if (flags & TEST_ACTIVE) { rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); - break; + pr_err("[%s] Create hang request failed: %d!\n", + engine->name, err); + goto restore; } i915_request_get(rq); @@ -983,32 +1077,44 @@ static int __igt_reset_engines(struct intel_gt *gt, i915_request_put(rq); err = -EIO; - break; + goto restore; } + } else { + intel_engine_pm_get(engine); } - err = intel_engine_reset(engine, NULL); - if (err) { - pr_err("i915_reset_engine(%s:%s): failed, err=%d\n", - engine->name, test_name, err); - break; + if (!using_guc) { + err = intel_engine_reset(engine, NULL); + if (err) { + pr_err("i915_reset_engine(%s:%s): failed, err=%d\n", + engine->name, test_name, err); + goto restore; + } + } + + if (rq) { + /* Ensure the reset happens and kills the engine */ + err = intel_selftest_wait_for_rq(rq); + if (err) + pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n", + engine->name, rq->fence.context, + rq->fence.seqno, rq->context->guc_id, err); } count++; if (rq) { if (rq->fence.error != -EIO) { - pr_err("i915_reset_engine(%s:%s):" - " failed to reset request %llx:%lld\n", + pr_err("i915_reset_engine(%s:%s): failed to reset request %lld:%lld [0x%04X]\n", engine->name, test_name, rq->fence.context, - rq->fence.seqno); + rq->fence.seqno, rq->context->guc_id); i915_request_put(rq); GEM_TRACE_DUMP(); intel_gt_set_wedged(gt); err = -EIO; - break; + goto restore; } if (i915_request_wait(rq, 0, HZ / 5) < 0) { @@ -1027,12 +1133,15 @@ static int __igt_reset_engines(struct intel_gt *gt, GEM_TRACE_DUMP(); intel_gt_set_wedged(gt); err = -EIO; - break; + goto restore; } i915_request_put(rq); } + if (!(flags & TEST_ACTIVE)) + intel_engine_pm_put(engine); + if (!(flags & TEST_SELF) && !wait_for_idle(engine)) { struct drm_printer p = drm_info_printer(gt->i915->drm.dev); @@ -1044,22 +1153,34 @@ static int __igt_reset_engines(struct intel_gt *gt, "%s\n", engine->name); err = -EIO; - break; + goto restore; } + +restore: + err2 = intel_selftest_restore_policy(engine, &saved); + if (err2) + pr_err("[%s] Restore policy failed: %d!\n", engine->name, err2); + if (err == 0) + err = err2; + if (err) + break; } while (time_before(jiffies, end_time)); clear_bit(I915_RESET_ENGINE + id, >->reset.flags); - st_engine_heartbeat_enable(engine); + st_engine_heartbeat_enable_no_pm(engine); pr_info("i915_reset_engine(%s:%s): %lu resets\n", engine->name, test_name, count); - reported = i915_reset_engine_count(global, engine); - reported -= threads[engine->id].resets; - if (reported != count) { - pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n", - engine->name, test_name, count, reported); - if (!err) - err = -EINVAL; + /* GuC based resets are not logged per engine */ + if (!using_guc) { + reported = i915_reset_engine_count(global, engine); + reported -= threads[engine->id].resets; + if (reported != count) { + pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n", + engine->name, test_name, count, reported); + if (!err) + err = -EINVAL; + } } unwind: @@ -1078,15 +1199,18 @@ unwind: } put_task_struct(threads[tmp].task); - if (other->uabi_class != engine->uabi_class && - threads[tmp].resets != - i915_reset_engine_count(global, other)) { - pr_err("Innocent engine %s was reset (count=%ld)\n", - other->name, - i915_reset_engine_count(global, other) - - threads[tmp].resets); - if (!err) - err = -EINVAL; + /* GuC based resets are not logged per engine */ + if (!using_guc) { + if (other->uabi_class != engine->uabi_class && + threads[tmp].resets != + i915_reset_engine_count(global, other)) { + pr_err("Innocent engine %s was reset (count=%ld)\n", + other->name, + i915_reset_engine_count(global, other) - + threads[tmp].resets); + if (!err) + err = -EINVAL; + } } } @@ -1101,8 +1225,10 @@ unwind: break; err = igt_flush_test(gt->i915); - if (err) + if (err) { + pr_err("[%s] Flush failed: %d!\n", engine->name, err); break; + } } if (intel_gt_is_wedged(gt)) @@ -1180,12 +1306,15 @@ static int igt_reset_wait(void *arg) igt_global_reset_lock(gt); err = hang_init(&h, gt); - if (err) + if (err) { + pr_err("[%s] Hang init failed: %d!\n", engine->name, err); goto unlock; + } rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); + pr_err("[%s] Create hang request failed: %d!\n", engine->name, err); goto fini; } @@ -1310,12 +1439,15 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, /* Check that we can recover an unbind stuck on a hanging request */ err = hang_init(&h, gt); - if (err) + if (err) { + pr_err("[%s] Hang init failed: %d!\n", engine->name, err); return err; + } obj = i915_gem_object_create_internal(gt->i915, SZ_1M); if (IS_ERR(obj)) { err = PTR_ERR(obj); + pr_err("[%s] Create object failed: %d!\n", engine->name, err); goto fini; } @@ -1330,12 +1462,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, arg.vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(arg.vma)) { err = PTR_ERR(arg.vma); + pr_err("[%s] VMA instance failed: %d!\n", engine->name, err); goto out_obj; } rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); + pr_err("[%s] Create hang request failed: %d!\n", engine->name, err); goto out_obj; } @@ -1347,6 +1481,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, err = i915_vma_pin(arg.vma, 0, 0, pin_flags); if (err) { i915_request_add(rq); + pr_err("[%s] VMA pin failed: %d!\n", engine->name, err); goto out_obj; } @@ -1363,8 +1498,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, i915_vma_lock(arg.vma); err = i915_request_await_object(rq, arg.vma->obj, flags & EXEC_OBJECT_WRITE); - if (err == 0) + if (err == 0) { err = i915_vma_move_to_active(arg.vma, rq, flags); + if (err) + pr_err("[%s] Move to active failed: %d!\n", engine->name, err); + } else { + pr_err("[%s] Request await failed: %d!\n", engine->name, err); + } + i915_vma_unlock(arg.vma); if (flags & EXEC_OBJECT_NEEDS_FENCE) @@ -1392,6 +1533,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, tsk = kthread_run(fn, &arg, "igt/evict_vma"); if (IS_ERR(tsk)) { err = PTR_ERR(tsk); + pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err); tsk = NULL; goto out_reset; } @@ -1508,17 +1650,29 @@ static int igt_reset_queue(void *arg) goto unlock; for_each_engine(engine, gt, id) { + struct intel_selftest_saved_policy saved; struct i915_request *prev; IGT_TIMEOUT(end_time); unsigned int count; + bool using_guc = intel_engine_uses_guc(engine); if (!intel_engine_can_store_dword(engine)) continue; + if (using_guc) { + err = intel_selftest_modify_policy(engine, &saved, + SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK); + if (err) { + pr_err("[%s] Modify policy failed: %d!\n", engine->name, err); + goto fini; + } + } + prev = hang_create_request(&h, engine); if (IS_ERR(prev)) { err = PTR_ERR(prev); - goto fini; + pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err); + goto restore; } i915_request_get(prev); @@ -1532,7 +1686,8 @@ static int igt_reset_queue(void *arg) rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto fini; + pr_err("[%s] Create hang request failed: %d!\n", engine->name, err); + goto restore; } i915_request_get(rq); @@ -1557,7 +1712,7 @@ static int igt_reset_queue(void *arg) GEM_TRACE_DUMP(); intel_gt_set_wedged(gt); - goto fini; + goto restore; } if (!wait_until_running(&h, prev)) { @@ -1575,7 +1730,7 @@ static int igt_reset_queue(void *arg) intel_gt_set_wedged(gt); err = -EIO; - goto fini; + goto restore; } reset_count = fake_hangcheck(gt, BIT(id)); @@ -1586,7 +1741,7 @@ static int igt_reset_queue(void *arg) i915_request_put(rq); i915_request_put(prev); err = -EINVAL; - goto fini; + goto restore; } if (rq->fence.error) { @@ -1595,7 +1750,7 @@ static int igt_reset_queue(void *arg) i915_request_put(rq); i915_request_put(prev); err = -EINVAL; - goto fini; + goto restore; } if (i915_reset_count(global) == reset_count) { @@ -1603,7 +1758,7 @@ static int igt_reset_queue(void *arg) i915_request_put(rq); i915_request_put(prev); err = -EINVAL; - goto fini; + goto restore; } i915_request_put(prev); @@ -1618,9 +1773,24 @@ static int igt_reset_queue(void *arg) i915_request_put(prev); - err = igt_flush_test(gt->i915); +restore: + if (using_guc) { + int err2 = intel_selftest_restore_policy(engine, &saved); + + if (err2) + pr_err("%s:%d> [%s] Restore policy failed: %d!\n", + __func__, __LINE__, engine->name, err2); + if (err == 0) + err = err2; + } if (err) + goto fini; + + err = igt_flush_test(gt->i915); + if (err) { + pr_err("[%s] Flush failed: %d!\n", engine->name, err); break; + } } fini: @@ -1653,12 +1823,15 @@ static int igt_handle_error(void *arg) return 0; err = hang_init(&h, gt); - if (err) + if (err) { + pr_err("[%s] Hang init failed: %d!\n", engine->name, err); return err; + } rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); + pr_err("[%s] Create hang request failed: %d!\n", engine->name, err); goto err_fini; } @@ -1702,7 +1875,7 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine, const struct igt_atomic_section *p, const char *mode) { - struct tasklet_struct * const t = &engine->execlists.tasklet; + struct tasklet_struct * const t = &engine->sched_engine->tasklet; int err; GEM_TRACE("i915_reset_engine(%s:%s) under %s\n", @@ -1743,12 +1916,15 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine, return err; err = hang_init(&h, engine->gt); - if (err) + if (err) { + pr_err("[%s] Hang init failed: %d!\n", engine->name, err); return err; + } rq = hang_create_request(&h, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); + pr_err("[%s] Create hang request failed: %d!\n", engine->name, err); goto out; } diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 3119016d9910..b0977a3b699b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -49,7 +49,7 @@ static int wait_for_submit(struct intel_engine_cs *engine, unsigned long timeout) { /* Ignore our own attempts to suppress excess tasklets */ - tasklet_hi_schedule(&engine->execlists.tasklet); + tasklet_hi_schedule(&engine->sched_engine->tasklet); timeout += jiffies; do { @@ -1613,12 +1613,12 @@ static void garbage_reset(struct intel_engine_cs *engine, local_bh_disable(); if (!test_and_set_bit(bit, lock)) { - tasklet_disable(&engine->execlists.tasklet); + tasklet_disable(&engine->sched_engine->tasklet); if (!rq->fence.error) __intel_engine_reset_bh(engine, NULL); - tasklet_enable(&engine->execlists.tasklet); + tasklet_enable(&engine->sched_engine->tasklet); clear_and_wake_up_bit(bit, lock); } local_bh_enable(); diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c new file mode 100644 index 000000000000..12ef2837c89b --- /dev/null +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c @@ -0,0 +1,669 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include <linux/sort.h> + +#include "selftests/i915_random.h" + +static const unsigned int sizes[] = { + SZ_4K, + SZ_64K, + SZ_2M, + CHUNK_SZ - SZ_4K, + CHUNK_SZ, + CHUNK_SZ + SZ_4K, + SZ_64M, +}; + +static struct drm_i915_gem_object * +create_lmem_or_internal(struct drm_i915_private *i915, size_t size) +{ + struct drm_i915_gem_object *obj; + + obj = i915_gem_object_create_lmem(i915, size, 0); + if (!IS_ERR(obj)) + return obj; + + return i915_gem_object_create_internal(i915, size); +} + +static int copy(struct intel_migrate *migrate, + int (*fn)(struct intel_migrate *migrate, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *src, + struct drm_i915_gem_object *dst, + struct i915_request **out), + u32 sz, struct rnd_state *prng) +{ + struct drm_i915_private *i915 = migrate->context->engine->i915; + struct drm_i915_gem_object *src, *dst; + struct i915_request *rq; + struct i915_gem_ww_ctx ww; + u32 *vaddr; + int err = 0; + int i; + + src = create_lmem_or_internal(i915, sz); + if (IS_ERR(src)) + return 0; + + dst = i915_gem_object_create_internal(i915, sz); + if (IS_ERR(dst)) + goto err_free_src; + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(src, &ww); + if (err) + continue; + + err = i915_gem_object_lock(dst, &ww); + if (err) + continue; + + vaddr = i915_gem_object_pin_map(src, I915_MAP_WC); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + continue; + } + + for (i = 0; i < sz / sizeof(u32); i++) + vaddr[i] = i; + i915_gem_object_flush_map(src); + + vaddr = i915_gem_object_pin_map(dst, I915_MAP_WC); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + goto unpin_src; + } + + for (i = 0; i < sz / sizeof(u32); i++) + vaddr[i] = ~i; + i915_gem_object_flush_map(dst); + + err = fn(migrate, &ww, src, dst, &rq); + if (!err) + continue; + + if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS) + pr_err("%ps failed, size: %u\n", fn, sz); + if (rq) { + i915_request_wait(rq, 0, HZ); + i915_request_put(rq); + } + i915_gem_object_unpin_map(dst); +unpin_src: + i915_gem_object_unpin_map(src); + } + if (err) + goto err_out; + + if (rq) { + if (i915_request_wait(rq, 0, HZ) < 0) { + pr_err("%ps timed out, size: %u\n", fn, sz); + err = -ETIME; + } + i915_request_put(rq); + } + + for (i = 0; !err && i < sz / PAGE_SIZE; i++) { + int x = i * 1024 + i915_prandom_u32_max_state(1024, prng); + + if (vaddr[x] != x) { + pr_err("%ps failed, size: %u, offset: %zu\n", + fn, sz, x * sizeof(u32)); + igt_hexdump(vaddr + i * 1024, 4096); + err = -EINVAL; + } + } + + i915_gem_object_unpin_map(dst); + i915_gem_object_unpin_map(src); + +err_out: + i915_gem_object_put(dst); +err_free_src: + i915_gem_object_put(src); + + return err; +} + +static int clear(struct intel_migrate *migrate, + int (*fn)(struct intel_migrate *migrate, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *obj, + u32 value, + struct i915_request **out), + u32 sz, struct rnd_state *prng) +{ + struct drm_i915_private *i915 = migrate->context->engine->i915; + struct drm_i915_gem_object *obj; + struct i915_request *rq; + struct i915_gem_ww_ctx ww; + u32 *vaddr; + int err = 0; + int i; + + obj = create_lmem_or_internal(i915, sz); + if (IS_ERR(obj)) + return 0; + + for_i915_gem_ww(&ww, err, true) { + err = i915_gem_object_lock(obj, &ww); + if (err) + continue; + + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + continue; + } + + for (i = 0; i < sz / sizeof(u32); i++) + vaddr[i] = ~i; + i915_gem_object_flush_map(obj); + + err = fn(migrate, &ww, obj, sz, &rq); + if (!err) + continue; + + if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS) + pr_err("%ps failed, size: %u\n", fn, sz); + if (rq) { + i915_request_wait(rq, 0, HZ); + i915_request_put(rq); + } + i915_gem_object_unpin_map(obj); + } + if (err) + goto err_out; + + if (rq) { + if (i915_request_wait(rq, 0, HZ) < 0) { + pr_err("%ps timed out, size: %u\n", fn, sz); + err = -ETIME; + } + i915_request_put(rq); + } + + for (i = 0; !err && i < sz / PAGE_SIZE; i++) { + int x = i * 1024 + i915_prandom_u32_max_state(1024, prng); + + if (vaddr[x] != sz) { + pr_err("%ps failed, size: %u, offset: %zu\n", + fn, sz, x * sizeof(u32)); + igt_hexdump(vaddr + i * 1024, 4096); + err = -EINVAL; + } + } + + i915_gem_object_unpin_map(obj); +err_out: + i915_gem_object_put(obj); + + return err; +} + +static int __migrate_copy(struct intel_migrate *migrate, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *src, + struct drm_i915_gem_object *dst, + struct i915_request **out) +{ + return intel_migrate_copy(migrate, ww, NULL, + src->mm.pages->sgl, src->cache_level, + i915_gem_object_is_lmem(src), + dst->mm.pages->sgl, dst->cache_level, + i915_gem_object_is_lmem(dst), + out); +} + +static int __global_copy(struct intel_migrate *migrate, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *src, + struct drm_i915_gem_object *dst, + struct i915_request **out) +{ + return intel_context_migrate_copy(migrate->context, NULL, + src->mm.pages->sgl, src->cache_level, + i915_gem_object_is_lmem(src), + dst->mm.pages->sgl, dst->cache_level, + i915_gem_object_is_lmem(dst), + out); +} + +static int +migrate_copy(struct intel_migrate *migrate, u32 sz, struct rnd_state *prng) +{ + return copy(migrate, __migrate_copy, sz, prng); +} + +static int +global_copy(struct intel_migrate *migrate, u32 sz, struct rnd_state *prng) +{ + return copy(migrate, __global_copy, sz, prng); +} + +static int __migrate_clear(struct intel_migrate *migrate, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *obj, + u32 value, + struct i915_request **out) +{ + return intel_migrate_clear(migrate, ww, NULL, + obj->mm.pages->sgl, + obj->cache_level, + i915_gem_object_is_lmem(obj), + value, out); +} + +static int __global_clear(struct intel_migrate *migrate, + struct i915_gem_ww_ctx *ww, + struct drm_i915_gem_object *obj, + u32 value, + struct i915_request **out) +{ + return intel_context_migrate_clear(migrate->context, NULL, + obj->mm.pages->sgl, + obj->cache_level, + i915_gem_object_is_lmem(obj), + value, out); +} + +static int +migrate_clear(struct intel_migrate *migrate, u32 sz, struct rnd_state *prng) +{ + return clear(migrate, __migrate_clear, sz, prng); +} + +static int +global_clear(struct intel_migrate *migrate, u32 sz, struct rnd_state *prng) +{ + return clear(migrate, __global_clear, sz, prng); +} + +static int live_migrate_copy(void *arg) +{ + struct intel_migrate *migrate = arg; + struct drm_i915_private *i915 = migrate->context->engine->i915; + I915_RND_STATE(prng); + int i; + + for (i = 0; i < ARRAY_SIZE(sizes); i++) { + int err; + + err = migrate_copy(migrate, sizes[i], &prng); + if (err == 0) + err = global_copy(migrate, sizes[i], &prng); + i915_gem_drain_freed_objects(i915); + if (err) + return err; + } + + return 0; +} + +static int live_migrate_clear(void *arg) +{ + struct intel_migrate *migrate = arg; + struct drm_i915_private *i915 = migrate->context->engine->i915; + I915_RND_STATE(prng); + int i; + + for (i = 0; i < ARRAY_SIZE(sizes); i++) { + int err; + + err = migrate_clear(migrate, sizes[i], &prng); + if (err == 0) + err = global_clear(migrate, sizes[i], &prng); + + i915_gem_drain_freed_objects(i915); + if (err) + return err; + } + + return 0; +} + +struct threaded_migrate { + struct intel_migrate *migrate; + struct task_struct *tsk; + struct rnd_state prng; +}; + +static int threaded_migrate(struct intel_migrate *migrate, + int (*fn)(void *arg), + unsigned int flags) +{ + const unsigned int n_cpus = num_online_cpus() + 1; + struct threaded_migrate *thread; + I915_RND_STATE(prng); + unsigned int i; + int err = 0; + + thread = kcalloc(n_cpus, sizeof(*thread), GFP_KERNEL); + if (!thread) + return 0; + + for (i = 0; i < n_cpus; ++i) { + struct task_struct *tsk; + + thread[i].migrate = migrate; + thread[i].prng = + I915_RND_STATE_INITIALIZER(prandom_u32_state(&prng)); + + tsk = kthread_run(fn, &thread[i], "igt-%d", i); + if (IS_ERR(tsk)) { + err = PTR_ERR(tsk); + break; + } + + get_task_struct(tsk); + thread[i].tsk = tsk; + } + + msleep(10); /* start all threads before we kthread_stop() */ + + for (i = 0; i < n_cpus; ++i) { + struct task_struct *tsk = thread[i].tsk; + int status; + + if (IS_ERR_OR_NULL(tsk)) + continue; + + status = kthread_stop(tsk); + if (status && !err) + err = status; + + put_task_struct(tsk); + } + + kfree(thread); + return err; +} + +static int __thread_migrate_copy(void *arg) +{ + struct threaded_migrate *tm = arg; + + return migrate_copy(tm->migrate, 2 * CHUNK_SZ, &tm->prng); +} + +static int thread_migrate_copy(void *arg) +{ + return threaded_migrate(arg, __thread_migrate_copy, 0); +} + +static int __thread_global_copy(void *arg) +{ + struct threaded_migrate *tm = arg; + + return global_copy(tm->migrate, 2 * CHUNK_SZ, &tm->prng); +} + +static int thread_global_copy(void *arg) +{ + return threaded_migrate(arg, __thread_global_copy, 0); +} + +static int __thread_migrate_clear(void *arg) +{ + struct threaded_migrate *tm = arg; + + return migrate_clear(tm->migrate, 2 * CHUNK_SZ, &tm->prng); +} + +static int __thread_global_clear(void *arg) +{ + struct threaded_migrate *tm = arg; + + return global_clear(tm->migrate, 2 * CHUNK_SZ, &tm->prng); +} + +static int thread_migrate_clear(void *arg) +{ + return threaded_migrate(arg, __thread_migrate_clear, 0); +} + +static int thread_global_clear(void *arg) +{ + return threaded_migrate(arg, __thread_global_clear, 0); +} + +int intel_migrate_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(live_migrate_copy), + SUBTEST(live_migrate_clear), + SUBTEST(thread_migrate_copy), + SUBTEST(thread_migrate_clear), + SUBTEST(thread_global_copy), + SUBTEST(thread_global_clear), + }; + struct intel_gt *gt = &i915->gt; + + if (!gt->migrate.context) + return 0; + + return i915_subtests(tests, >->migrate); +} + +static struct drm_i915_gem_object * +create_init_lmem_internal(struct intel_gt *gt, size_t sz, bool try_lmem) +{ + struct drm_i915_gem_object *obj = NULL; + int err; + + if (try_lmem) + obj = i915_gem_object_create_lmem(gt->i915, sz, 0); + + if (IS_ERR_OR_NULL(obj)) { + obj = i915_gem_object_create_internal(gt->i915, sz); + if (IS_ERR(obj)) + return obj; + } + + i915_gem_object_trylock(obj); + err = i915_gem_object_pin_pages(obj); + if (err) { + i915_gem_object_unlock(obj); + i915_gem_object_put(obj); + return ERR_PTR(err); + } + + return obj; +} + +static int wrap_ktime_compare(const void *A, const void *B) +{ + const ktime_t *a = A, *b = B; + + return ktime_compare(*a, *b); +} + +static int __perf_clear_blt(struct intel_context *ce, + struct scatterlist *sg, + enum i915_cache_level cache_level, + bool is_lmem, + size_t sz) +{ + ktime_t t[5]; + int pass; + int err = 0; + + for (pass = 0; pass < ARRAY_SIZE(t); pass++) { + struct i915_request *rq; + ktime_t t0, t1; + + t0 = ktime_get(); + + err = intel_context_migrate_clear(ce, NULL, sg, cache_level, + is_lmem, 0, &rq); + if (rq) { + if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0) + err = -EIO; + i915_request_put(rq); + } + if (err) + break; + + t1 = ktime_get(); + t[pass] = ktime_sub(t1, t0); + } + if (err) + return err; + + sort(t, ARRAY_SIZE(t), sizeof(*t), wrap_ktime_compare, NULL); + pr_info("%s: %zd KiB fill: %lld MiB/s\n", + ce->engine->name, sz >> 10, + div64_u64(mul_u32_u32(4 * sz, + 1000 * 1000 * 1000), + t[1] + 2 * t[2] + t[3]) >> 20); + return 0; +} + +static int perf_clear_blt(void *arg) +{ + struct intel_gt *gt = arg; + static const unsigned long sizes[] = { + SZ_4K, + SZ_64K, + SZ_2M, + SZ_64M + }; + int i; + + for (i = 0; i < ARRAY_SIZE(sizes); i++) { + struct drm_i915_gem_object *dst; + int err; + + dst = create_init_lmem_internal(gt, sizes[i], true); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + err = __perf_clear_blt(gt->migrate.context, + dst->mm.pages->sgl, + I915_CACHE_NONE, + i915_gem_object_is_lmem(dst), + sizes[i]); + + i915_gem_object_unlock(dst); + i915_gem_object_put(dst); + if (err) + return err; + } + + return 0; +} + +static int __perf_copy_blt(struct intel_context *ce, + struct scatterlist *src, + enum i915_cache_level src_cache_level, + bool src_is_lmem, + struct scatterlist *dst, + enum i915_cache_level dst_cache_level, + bool dst_is_lmem, + size_t sz) +{ + ktime_t t[5]; + int pass; + int err = 0; + + for (pass = 0; pass < ARRAY_SIZE(t); pass++) { + struct i915_request *rq; + ktime_t t0, t1; + + t0 = ktime_get(); + + err = intel_context_migrate_copy(ce, NULL, + src, src_cache_level, + src_is_lmem, + dst, dst_cache_level, + dst_is_lmem, + &rq); + if (rq) { + if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0) + err = -EIO; + i915_request_put(rq); + } + if (err) + break; + + t1 = ktime_get(); + t[pass] = ktime_sub(t1, t0); + } + if (err) + return err; + + sort(t, ARRAY_SIZE(t), sizeof(*t), wrap_ktime_compare, NULL); + pr_info("%s: %zd KiB copy: %lld MiB/s\n", + ce->engine->name, sz >> 10, + div64_u64(mul_u32_u32(4 * sz, + 1000 * 1000 * 1000), + t[1] + 2 * t[2] + t[3]) >> 20); + return 0; +} + +static int perf_copy_blt(void *arg) +{ + struct intel_gt *gt = arg; + static const unsigned long sizes[] = { + SZ_4K, + SZ_64K, + SZ_2M, + SZ_64M + }; + int i; + + for (i = 0; i < ARRAY_SIZE(sizes); i++) { + struct drm_i915_gem_object *src, *dst; + int err; + + src = create_init_lmem_internal(gt, sizes[i], true); + if (IS_ERR(src)) + return PTR_ERR(src); + + dst = create_init_lmem_internal(gt, sizes[i], false); + if (IS_ERR(dst)) { + err = PTR_ERR(dst); + goto err_src; + } + + err = __perf_copy_blt(gt->migrate.context, + src->mm.pages->sgl, + I915_CACHE_NONE, + i915_gem_object_is_lmem(src), + dst->mm.pages->sgl, + I915_CACHE_NONE, + i915_gem_object_is_lmem(dst), + sizes[i]); + + i915_gem_object_unlock(dst); + i915_gem_object_put(dst); +err_src: + i915_gem_object_unlock(src); + i915_gem_object_put(src); + if (err) + return err; + } + + return 0; +} + +int intel_migrate_perf_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(perf_clear_blt), + SUBTEST(perf_copy_blt), + }; + struct intel_gt *gt = &i915->gt; + + if (intel_gt_is_wedged(gt)) + return 0; + + if (!gt->migrate.context) + return 0; + + return intel_gt_live_subtests(tests, gt); +} diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c index b9bb0e6e97f7..13d25bf2a94a 100644 --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c @@ -10,6 +10,7 @@ #include "gem/selftests/mock_context.h" #include "selftests/igt_reset.h" #include "selftests/igt_spinner.h" +#include "selftests/intel_scheduler_helpers.h" struct live_mocs { struct drm_i915_mocs_table table; @@ -28,7 +29,7 @@ static struct intel_context *mocs_context_create(struct intel_engine_cs *engine) return ce; /* We build large requests to read the registers from the ring */ - ce->ring = __intel_context_ring_size(SZ_16K); + ce->ring_size = SZ_16K; return ce; } @@ -318,7 +319,8 @@ static int live_mocs_clean(void *arg) } static int active_engine_reset(struct intel_context *ce, - const char *reason) + const char *reason, + bool using_guc) { struct igt_spinner spin; struct i915_request *rq; @@ -335,9 +337,13 @@ static int active_engine_reset(struct intel_context *ce, } err = request_add_spin(rq, &spin); - if (err == 0) + if (err == 0 && !using_guc) err = intel_engine_reset(ce->engine, reason); + /* Ensure the reset happens and kills the engine */ + if (err == 0) + err = intel_selftest_wait_for_rq(rq); + igt_spinner_end(&spin); igt_spinner_fini(&spin); @@ -345,21 +351,23 @@ static int active_engine_reset(struct intel_context *ce, } static int __live_mocs_reset(struct live_mocs *mocs, - struct intel_context *ce) + struct intel_context *ce, bool using_guc) { struct intel_gt *gt = ce->engine->gt; int err; if (intel_has_reset_engine(gt)) { - err = intel_engine_reset(ce->engine, "mocs"); - if (err) - return err; - - err = check_mocs_engine(mocs, ce); - if (err) - return err; + if (!using_guc) { + err = intel_engine_reset(ce->engine, "mocs"); + if (err) + return err; + + err = check_mocs_engine(mocs, ce); + if (err) + return err; + } - err = active_engine_reset(ce, "mocs"); + err = active_engine_reset(ce, "mocs", using_guc); if (err) return err; @@ -395,19 +403,33 @@ static int live_mocs_reset(void *arg) igt_global_reset_lock(gt); for_each_engine(engine, gt, id) { + bool using_guc = intel_engine_uses_guc(engine); + struct intel_selftest_saved_policy saved; struct intel_context *ce; + int err2; + + err = intel_selftest_modify_policy(engine, &saved, + SELFTEST_SCHEDULER_MODIFY_FAST_RESET); + if (err) + break; ce = mocs_context_create(engine); if (IS_ERR(ce)) { err = PTR_ERR(ce); - break; + goto restore; } intel_engine_pm_get(engine); - err = __live_mocs_reset(&mocs, ce); - intel_engine_pm_put(engine); + err = __live_mocs_reset(&mocs, ce, using_guc); + + intel_engine_pm_put(engine); intel_context_put(ce); + +restore: + err2 = intel_selftest_restore_policy(engine, &saved); + if (err == 0) + err = err2; if (err) break; } diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index 8784257ec808..7a50c9f4071b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -321,7 +321,7 @@ static int igt_atomic_engine_reset(void *arg) goto out_unlock; for_each_engine(engine, gt, id) { - struct tasklet_struct *t = &engine->execlists.tasklet; + struct tasklet_struct *t = &engine->sched_engine->tasklet; if (t->func) tasklet_disable(t); diff --git a/drivers/gpu/drm/i915/gt/selftest_slpc.c b/drivers/gpu/drm/i915/gt/selftest_slpc.c new file mode 100644 index 000000000000..9334bad131a2 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/selftest_slpc.c @@ -0,0 +1,311 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#define NUM_STEPS 5 +#define H2G_DELAY 50000 +#define delay_for_h2g() usleep_range(H2G_DELAY, H2G_DELAY + 10000) +#define FREQUENCY_REQ_UNIT DIV_ROUND_CLOSEST(GT_FREQUENCY_MULTIPLIER, \ + GEN9_FREQ_SCALER) + +static int slpc_set_min_freq(struct intel_guc_slpc *slpc, u32 freq) +{ + int ret; + + ret = intel_guc_slpc_set_min_freq(slpc, freq); + if (ret) + pr_err("Could not set min frequency to [%u]\n", freq); + else /* Delay to ensure h2g completes */ + delay_for_h2g(); + + return ret; +} + +static int slpc_set_max_freq(struct intel_guc_slpc *slpc, u32 freq) +{ + int ret; + + ret = intel_guc_slpc_set_max_freq(slpc, freq); + if (ret) + pr_err("Could not set maximum frequency [%u]\n", + freq); + else /* Delay to ensure h2g completes */ + delay_for_h2g(); + + return ret; +} + +static int live_slpc_clamp_min(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_gt *gt = &i915->gt; + struct intel_guc_slpc *slpc = >->uc.guc.slpc; + struct intel_rps *rps = >->rps; + struct intel_engine_cs *engine; + enum intel_engine_id id; + struct igt_spinner spin; + u32 slpc_min_freq, slpc_max_freq; + int err = 0; + + if (!intel_uc_uses_guc_slpc(>->uc)) + return 0; + + if (igt_spinner_init(&spin, gt)) + return -ENOMEM; + + if (intel_guc_slpc_get_max_freq(slpc, &slpc_max_freq)) { + pr_err("Could not get SLPC max freq\n"); + return -EIO; + } + + if (intel_guc_slpc_get_min_freq(slpc, &slpc_min_freq)) { + pr_err("Could not get SLPC min freq\n"); + return -EIO; + } + + if (slpc_min_freq == slpc_max_freq) { + pr_err("Min/Max are fused to the same value\n"); + return -EINVAL; + } + + intel_gt_pm_wait_for_idle(gt); + intel_gt_pm_get(gt); + for_each_engine(engine, gt, id) { + struct i915_request *rq; + u32 step, min_freq, req_freq; + u32 act_freq, max_act_freq; + + if (!intel_engine_can_store_dword(engine)) + continue; + + /* Go from min to max in 5 steps */ + step = (slpc_max_freq - slpc_min_freq) / NUM_STEPS; + max_act_freq = slpc_min_freq; + for (min_freq = slpc_min_freq; min_freq < slpc_max_freq; + min_freq += step) { + err = slpc_set_min_freq(slpc, min_freq); + if (err) + break; + + st_engine_heartbeat_disable(engine); + + rq = igt_spinner_create_request(&spin, + engine->kernel_context, + MI_NOOP); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + st_engine_heartbeat_enable(engine); + break; + } + + i915_request_add(rq); + + if (!igt_wait_for_spinner(&spin, rq)) { + pr_err("%s: Spinner did not start\n", + engine->name); + igt_spinner_end(&spin); + st_engine_heartbeat_enable(engine); + intel_gt_set_wedged(engine->gt); + err = -EIO; + break; + } + + /* Wait for GuC to detect business and raise + * requested frequency if necessary. + */ + delay_for_h2g(); + + req_freq = intel_rps_read_punit_req_frequency(rps); + + /* GuC requests freq in multiples of 50/3 MHz */ + if (req_freq < (min_freq - FREQUENCY_REQ_UNIT)) { + pr_err("SWReq is %d, should be at least %d\n", req_freq, + min_freq - FREQUENCY_REQ_UNIT); + igt_spinner_end(&spin); + st_engine_heartbeat_enable(engine); + err = -EINVAL; + break; + } + + act_freq = intel_rps_read_actual_frequency(rps); + if (act_freq > max_act_freq) + max_act_freq = act_freq; + + igt_spinner_end(&spin); + st_engine_heartbeat_enable(engine); + } + + pr_info("Max actual frequency for %s was %d\n", + engine->name, max_act_freq); + + /* Actual frequency should rise above min */ + if (max_act_freq == slpc_min_freq) { + pr_err("Actual freq did not rise above min\n"); + err = -EINVAL; + } + + if (err) + break; + } + + /* Restore min/max frequencies */ + slpc_set_max_freq(slpc, slpc_max_freq); + slpc_set_min_freq(slpc, slpc_min_freq); + + if (igt_flush_test(gt->i915)) + err = -EIO; + + intel_gt_pm_put(gt); + igt_spinner_fini(&spin); + intel_gt_pm_wait_for_idle(gt); + + return err; +} + +static int live_slpc_clamp_max(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_gt *gt = &i915->gt; + struct intel_guc_slpc *slpc; + struct intel_rps *rps; + struct intel_engine_cs *engine; + enum intel_engine_id id; + struct igt_spinner spin; + int err = 0; + u32 slpc_min_freq, slpc_max_freq; + + slpc = >->uc.guc.slpc; + rps = >->rps; + + if (!intel_uc_uses_guc_slpc(>->uc)) + return 0; + + if (igt_spinner_init(&spin, gt)) + return -ENOMEM; + + if (intel_guc_slpc_get_max_freq(slpc, &slpc_max_freq)) { + pr_err("Could not get SLPC max freq\n"); + return -EIO; + } + + if (intel_guc_slpc_get_min_freq(slpc, &slpc_min_freq)) { + pr_err("Could not get SLPC min freq\n"); + return -EIO; + } + + if (slpc_min_freq == slpc_max_freq) { + pr_err("Min/Max are fused to the same value\n"); + return -EINVAL; + } + + intel_gt_pm_wait_for_idle(gt); + intel_gt_pm_get(gt); + for_each_engine(engine, gt, id) { + struct i915_request *rq; + u32 max_freq, req_freq; + u32 act_freq, max_act_freq; + u32 step; + + if (!intel_engine_can_store_dword(engine)) + continue; + + /* Go from max to min in 5 steps */ + step = (slpc_max_freq - slpc_min_freq) / NUM_STEPS; + max_act_freq = slpc_min_freq; + for (max_freq = slpc_max_freq; max_freq > slpc_min_freq; + max_freq -= step) { + err = slpc_set_max_freq(slpc, max_freq); + if (err) + break; + + st_engine_heartbeat_disable(engine); + + rq = igt_spinner_create_request(&spin, + engine->kernel_context, + MI_NOOP); + if (IS_ERR(rq)) { + st_engine_heartbeat_enable(engine); + err = PTR_ERR(rq); + break; + } + + i915_request_add(rq); + + if (!igt_wait_for_spinner(&spin, rq)) { + pr_err("%s: SLPC spinner did not start\n", + engine->name); + igt_spinner_end(&spin); + st_engine_heartbeat_enable(engine); + intel_gt_set_wedged(engine->gt); + err = -EIO; + break; + } + + delay_for_h2g(); + + /* Verify that SWREQ indeed was set to specific value */ + req_freq = intel_rps_read_punit_req_frequency(rps); + + /* GuC requests freq in multiples of 50/3 MHz */ + if (req_freq > (max_freq + FREQUENCY_REQ_UNIT)) { + pr_err("SWReq is %d, should be at most %d\n", req_freq, + max_freq + FREQUENCY_REQ_UNIT); + igt_spinner_end(&spin); + st_engine_heartbeat_enable(engine); + err = -EINVAL; + break; + } + + act_freq = intel_rps_read_actual_frequency(rps); + if (act_freq > max_act_freq) + max_act_freq = act_freq; + + st_engine_heartbeat_enable(engine); + igt_spinner_end(&spin); + + if (err) + break; + } + + pr_info("Max actual frequency for %s was %d\n", + engine->name, max_act_freq); + + /* Actual frequency should rise above min */ + if (max_act_freq == slpc_min_freq) { + pr_err("Actual freq did not rise above min\n"); + err = -EINVAL; + } + + if (igt_flush_test(gt->i915)) { + err = -EIO; + break; + } + + if (err) + break; + } + + /* Restore min/max freq */ + slpc_set_max_freq(slpc, slpc_max_freq); + slpc_set_min_freq(slpc, slpc_min_freq); + + intel_gt_pm_put(gt); + igt_spinner_fini(&spin); + intel_gt_pm_wait_for_idle(gt); + + return err; +} + +int intel_slpc_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(live_slpc_clamp_max), + SUBTEST(live_slpc_clamp_min), + }; + + if (intel_gt_is_wedged(&i915->gt)) + return 0; + + return i915_live_subtests(tests, i915); +} diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c index 64da0c91dec1..d0b6a3afcf44 100644 --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c @@ -874,7 +874,7 @@ static int create_watcher(struct hwsp_watcher *w, if (IS_ERR(ce)) return PTR_ERR(ce); - ce->ring = __intel_context_ring_size(ringsz); + ce->ring_size = ringsz; w->rq = intel_context_create_request(ce); intel_context_put(ce); if (IS_ERR(w->rq)) diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c index c30754daf4b1..e623ac45f4aa 100644 --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c @@ -12,6 +12,7 @@ #include "selftests/igt_flush_test.h" #include "selftests/igt_reset.h" #include "selftests/igt_spinner.h" +#include "selftests/intel_scheduler_helpers.h" #include "selftests/mock_drm.h" #include "gem/selftests/igt_gem_utils.h" @@ -261,28 +262,34 @@ static int do_engine_reset(struct intel_engine_cs *engine) return intel_engine_reset(engine, "live_workarounds"); } +static int do_guc_reset(struct intel_engine_cs *engine) +{ + /* Currently a no-op as the reset is handled by GuC */ + return 0; +} + static int switch_to_scratch_context(struct intel_engine_cs *engine, - struct igt_spinner *spin) + struct igt_spinner *spin, + struct i915_request **rq) { struct intel_context *ce; - struct i915_request *rq; int err = 0; ce = intel_context_create(engine); if (IS_ERR(ce)) return PTR_ERR(ce); - rq = igt_spinner_create_request(spin, ce, MI_NOOP); + *rq = igt_spinner_create_request(spin, ce, MI_NOOP); intel_context_put(ce); - if (IS_ERR(rq)) { + if (IS_ERR(*rq)) { spin = NULL; - err = PTR_ERR(rq); + err = PTR_ERR(*rq); goto err; } - err = request_add_spin(rq, spin); + err = request_add_spin(*rq, spin); err: if (err && spin) igt_spinner_end(spin); @@ -296,6 +303,7 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine, { struct intel_context *ce, *tmp; struct igt_spinner spin; + struct i915_request *rq; intel_wakeref_t wakeref; int err; @@ -316,13 +324,24 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine, goto out_spin; } - err = switch_to_scratch_context(engine, &spin); + err = switch_to_scratch_context(engine, &spin, &rq); if (err) goto out_spin; + /* Ensure the spinner hasn't aborted */ + if (i915_request_completed(rq)) { + pr_err("%s spinner failed to start\n", name); + err = -ETIMEDOUT; + goto out_spin; + } + with_intel_runtime_pm(engine->uncore->rpm, wakeref) err = reset(engine); + /* Ensure the reset happens and kills the engine */ + if (err == 0) + err = intel_selftest_wait_for_rq(rq); + igt_spinner_end(&spin); if (err) { @@ -787,9 +806,28 @@ static int live_reset_whitelist(void *arg) continue; if (intel_has_reset_engine(gt)) { - err = check_whitelist_across_reset(engine, - do_engine_reset, - "engine"); + if (intel_engine_uses_guc(engine)) { + struct intel_selftest_saved_policy saved; + int err2; + + err = intel_selftest_modify_policy(engine, &saved, + SELFTEST_SCHEDULER_MODIFY_FAST_RESET); + if (err) + goto out; + + err = check_whitelist_across_reset(engine, + do_guc_reset, + "guc"); + + err2 = intel_selftest_restore_policy(engine, &saved); + if (err == 0) + err = err2; + } else { + err = check_whitelist_across_reset(engine, + do_engine_reset, + "engine"); + } + if (err) goto out; } @@ -1147,7 +1185,7 @@ verify_wa_lists(struct intel_gt *gt, struct wa_lists *lists, enum intel_engine_id id; bool ok = true; - ok &= wa_list_verify(gt->uncore, &lists->gt_wa_list, str); + ok &= wa_list_verify(gt, &lists->gt_wa_list, str); for_each_engine(engine, gt, id) { struct intel_context *ce; @@ -1175,31 +1213,36 @@ live_gpu_reset_workarounds(void *arg) { struct intel_gt *gt = arg; intel_wakeref_t wakeref; - struct wa_lists lists; + struct wa_lists *lists; bool ok; if (!intel_has_gpu_reset(gt)) return 0; + lists = kzalloc(sizeof(*lists), GFP_KERNEL); + if (!lists) + return -ENOMEM; + pr_info("Verifying after GPU reset...\n"); igt_global_reset_lock(gt); wakeref = intel_runtime_pm_get(gt->uncore->rpm); - reference_lists_init(gt, &lists); + reference_lists_init(gt, lists); - ok = verify_wa_lists(gt, &lists, "before reset"); + ok = verify_wa_lists(gt, lists, "before reset"); if (!ok) goto out; intel_gt_reset(gt, ALL_ENGINES, "live_workarounds"); - ok = verify_wa_lists(gt, &lists, "after reset"); + ok = verify_wa_lists(gt, lists, "after reset"); out: - reference_lists_fini(gt, &lists); + reference_lists_fini(gt, lists); intel_runtime_pm_put(gt->uncore->rpm, wakeref); igt_global_reset_unlock(gt); + kfree(lists); return ok ? 0 : -ESRCH; } @@ -1214,43 +1257,57 @@ live_engine_reset_workarounds(void *arg) struct igt_spinner spin; struct i915_request *rq; intel_wakeref_t wakeref; - struct wa_lists lists; + struct wa_lists *lists; int ret = 0; if (!intel_has_reset_engine(gt)) return 0; + lists = kzalloc(sizeof(*lists), GFP_KERNEL); + if (!lists) + return -ENOMEM; + igt_global_reset_lock(gt); wakeref = intel_runtime_pm_get(gt->uncore->rpm); - reference_lists_init(gt, &lists); + reference_lists_init(gt, lists); for_each_engine(engine, gt, id) { + struct intel_selftest_saved_policy saved; + bool using_guc = intel_engine_uses_guc(engine); bool ok; + int ret2; pr_info("Verifying after %s reset...\n", engine->name); + ret = intel_selftest_modify_policy(engine, &saved, + SELFTEST_SCHEDULER_MODIFY_FAST_RESET); + if (ret) + break; + ce = intel_context_create(engine); if (IS_ERR(ce)) { ret = PTR_ERR(ce); - break; + goto restore; } - ok = verify_wa_lists(gt, &lists, "before reset"); - if (!ok) { - ret = -ESRCH; - goto err; - } + if (!using_guc) { + ok = verify_wa_lists(gt, lists, "before reset"); + if (!ok) { + ret = -ESRCH; + goto err; + } - ret = intel_engine_reset(engine, "live_workarounds:idle"); - if (ret) { - pr_err("%s: Reset failed while idle\n", engine->name); - goto err; - } + ret = intel_engine_reset(engine, "live_workarounds:idle"); + if (ret) { + pr_err("%s: Reset failed while idle\n", engine->name); + goto err; + } - ok = verify_wa_lists(gt, &lists, "after idle reset"); - if (!ok) { - ret = -ESRCH; - goto err; + ok = verify_wa_lists(gt, lists, "after idle reset"); + if (!ok) { + ret = -ESRCH; + goto err; + } } ret = igt_spinner_init(&spin, engine->gt); @@ -1271,32 +1328,49 @@ live_engine_reset_workarounds(void *arg) goto err; } - ret = intel_engine_reset(engine, "live_workarounds:active"); - if (ret) { - pr_err("%s: Reset failed on an active spinner\n", - engine->name); - igt_spinner_fini(&spin); - goto err; + /* Ensure the spinner hasn't aborted */ + if (i915_request_completed(rq)) { + ret = -ETIMEDOUT; + goto skip; } + if (!using_guc) { + ret = intel_engine_reset(engine, "live_workarounds:active"); + if (ret) { + pr_err("%s: Reset failed on an active spinner\n", + engine->name); + igt_spinner_fini(&spin); + goto err; + } + } + + /* Ensure the reset happens and kills the engine */ + if (ret == 0) + ret = intel_selftest_wait_for_rq(rq); + +skip: igt_spinner_end(&spin); igt_spinner_fini(&spin); - ok = verify_wa_lists(gt, &lists, "after busy reset"); - if (!ok) { + ok = verify_wa_lists(gt, lists, "after busy reset"); + if (!ok) ret = -ESRCH; - goto err; - } err: intel_context_put(ce); + +restore: + ret2 = intel_selftest_restore_policy(engine, &saved); + if (ret == 0) + ret = ret2; if (ret) break; } - reference_lists_fini(gt, &lists); + reference_lists_fini(gt, lists); intel_runtime_pm_put(gt->uncore->rpm, wakeref); igt_global_reset_unlock(gt); + kfree(lists); igt_flush_test(gt->i915); diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 90efef8a73e4..8ff582222aff 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -6,6 +6,113 @@ #ifndef _ABI_GUC_ACTIONS_ABI_H #define _ABI_GUC_ACTIONS_ABI_H +/** + * DOC: HOST2GUC_REGISTER_CTB + * + * This message is used as part of the `CTB based communication`_ setup. + * + * This message must be sent as `MMIO HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_HOST2GUC_REGISTER_CTB` = 0x4505 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:12 | RESERVED = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 11:8 | **TYPE** - type for the `CT Buffer`_ | + * | | | | + * | | | - _`GUC_CTB_TYPE_HOST2GUC` = 0 | + * | | | - _`GUC_CTB_TYPE_GUC2HOST` = 1 | + * | +-------+--------------------------------------------------------------+ + * | | 7:0 | **SIZE** - size of the `CT Buffer`_ in 4K units minus 1 | + * +---+-------+--------------------------------------------------------------+ + * | 2 | 31:0 | **DESC_ADDR** - GGTT address of the `CTB Descriptor`_ | + * +---+-------+--------------------------------------------------------------+ + * | 3 | 31:0 | **BUFF_ADDF** - GGTT address of the `CT Buffer`_ | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = MBZ | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_HOST2GUC_REGISTER_CTB 0x4505 + +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_LEN (GUC_HXG_REQUEST_MSG_MIN_LEN + 3u) +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_MBZ (0xfffff << 12) +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_TYPE (0xf << 8) +#define GUC_CTB_TYPE_HOST2GUC 0u +#define GUC_CTB_TYPE_GUC2HOST 1u +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_SIZE (0xff << 0) +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_2_DESC_ADDR GUC_HXG_REQUEST_MSG_n_DATAn +#define HOST2GUC_REGISTER_CTB_REQUEST_MSG_3_BUFF_ADDR GUC_HXG_REQUEST_MSG_n_DATAn + +#define HOST2GUC_REGISTER_CTB_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN +#define HOST2GUC_REGISTER_CTB_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 + +/** + * DOC: HOST2GUC_DEREGISTER_CTB + * + * This message is used as part of the `CTB based communication`_ teardown. + * + * This message must be sent as `MMIO HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_HOST2GUC_DEREGISTER_CTB` = 0x4506 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:12 | RESERVED = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 11:8 | **TYPE** - type of the `CT Buffer`_ | + * | | | | + * | | | see `GUC_ACTION_HOST2GUC_REGISTER_CTB`_ | + * | +-------+--------------------------------------------------------------+ + * | | 7:0 | RESERVED = MBZ | + * +---+-------+--------------------------------------------------------------+ + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | DATA0 = MBZ | + * +---+-------+--------------------------------------------------------------+ + */ +#define GUC_ACTION_HOST2GUC_DEREGISTER_CTB 0x4506 + +#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_LEN (GUC_HXG_REQUEST_MSG_MIN_LEN + 1u) +#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 +#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_MBZ (0xfffff << 12) +#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_TYPE (0xf << 8) +#define HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_MBZ2 (0xff << 0) + +#define HOST2GUC_DEREGISTER_CTB_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN +#define HOST2GUC_DEREGISTER_CTB_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 + +/* legacy definitions */ + enum intel_guc_action { INTEL_GUC_ACTION_DEFAULT = 0x0, INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2, @@ -17,13 +124,33 @@ enum intel_guc_action { INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302, INTEL_GUC_ACTION_ENTER_S_STATE = 0x501, INTEL_GUC_ACTION_EXIT_S_STATE = 0x502, - INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003, + INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506, + INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000, + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001, + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002, + INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003, + INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004, + INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005, + INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006, + INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007, + INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008, + INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009, + INTEL_GUC_ACTION_SETUP_PC_GUCRC = 0x3004, INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000, + INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502, + INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, + INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, + INTEL_GUC_ACTION_RESET_CLIENT = 0x5507, INTEL_GUC_ACTION_LIMIT }; +enum intel_guc_rc_options { + INTEL_GUCRC_HOST_CONTROL, + INTEL_GUCRC_FIRMWARE_CONTROL, +}; + enum intel_guc_preempt_options { INTEL_GUC_PREEMPT_OPTION_DROP_WORK_Q = 0x4, INTEL_GUC_PREEMPT_OPTION_DROP_SUBMIT_Q = 0x8, diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h new file mode 100644 index 000000000000..7a8d4bfc5f6a --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h @@ -0,0 +1,235 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef _GUC_ACTIONS_SLPC_ABI_H_ +#define _GUC_ACTIONS_SLPC_ABI_H_ + +#include <linux/types.h> +#include "i915_reg.h" + +/** + * DOC: SLPC SHARED DATA STRUCTURE + * + * +----+------+--------------------------------------------------------------+ + * | CL | Bytes| Description | + * +====+======+==============================================================+ + * | 1 | 0-3 | SHARED DATA SIZE | + * | +------+--------------------------------------------------------------+ + * | | 4-7 | GLOBAL STATE | + * | +------+--------------------------------------------------------------+ + * | | 8-11 | DISPLAY DATA ADDRESS | + * | +------+--------------------------------------------------------------+ + * | | 12:63| PADDING | + * +----+------+--------------------------------------------------------------+ + * | | 0:63 | PADDING(PLATFORM INFO) | + * +----+------+--------------------------------------------------------------+ + * | 3 | 0-3 | TASK STATE DATA | + * + +------+--------------------------------------------------------------+ + * | | 4:63 | PADDING | + * +----+------+--------------------------------------------------------------+ + * |4-21|0:1087| OVERRIDE PARAMS AND BIT FIELDS | + * +----+------+--------------------------------------------------------------+ + * | | | PADDING + EXTRA RESERVED PAGE | + * +----+------+--------------------------------------------------------------+ + */ + +/* + * SLPC exposes certain parameters for global configuration by the host. + * These are referred to as override parameters, because in most cases + * the host will not need to modify the default values used by SLPC. + * SLPC remembers the default values which allows the host to easily restore + * them by simply unsetting the override. The host can set or unset override + * parameters during SLPC (re-)initialization using the SLPC Reset event. + * The host can also set or unset override parameters on the fly using the + * Parameter Set and Parameter Unset events + */ + +#define SLPC_MAX_OVERRIDE_PARAMETERS 256 +#define SLPC_OVERRIDE_BITFIELD_SIZE \ + (SLPC_MAX_OVERRIDE_PARAMETERS / 32) + +#define SLPC_PAGE_SIZE_BYTES 4096 +#define SLPC_CACHELINE_SIZE_BYTES 64 +#define SLPC_SHARED_DATA_SIZE_BYTE_HEADER SLPC_CACHELINE_SIZE_BYTES +#define SLPC_SHARED_DATA_SIZE_BYTE_PLATFORM_INFO SLPC_CACHELINE_SIZE_BYTES +#define SLPC_SHARED_DATA_SIZE_BYTE_TASK_STATE SLPC_CACHELINE_SIZE_BYTES +#define SLPC_SHARED_DATA_MODE_DEFN_TABLE_SIZE SLPC_PAGE_SIZE_BYTES +#define SLPC_SHARED_DATA_SIZE_BYTE_MAX (2 * SLPC_PAGE_SIZE_BYTES) + +/* + * Cacheline size aligned (Total size needed for + * SLPM_KMD_MAX_OVERRIDE_PARAMETERS=256 is 1088 bytes) + */ +#define SLPC_OVERRIDE_PARAMS_TOTAL_BYTES (((((SLPC_MAX_OVERRIDE_PARAMETERS * 4) \ + + ((SLPC_MAX_OVERRIDE_PARAMETERS / 32) * 4)) \ + + (SLPC_CACHELINE_SIZE_BYTES - 1)) / SLPC_CACHELINE_SIZE_BYTES) * \ + SLPC_CACHELINE_SIZE_BYTES) + +#define SLPC_SHARED_DATA_SIZE_BYTE_OTHER (SLPC_SHARED_DATA_SIZE_BYTE_MAX - \ + (SLPC_SHARED_DATA_SIZE_BYTE_HEADER \ + + SLPC_SHARED_DATA_SIZE_BYTE_PLATFORM_INFO \ + + SLPC_SHARED_DATA_SIZE_BYTE_TASK_STATE \ + + SLPC_OVERRIDE_PARAMS_TOTAL_BYTES \ + + SLPC_SHARED_DATA_MODE_DEFN_TABLE_SIZE)) + +enum slpc_task_enable { + SLPC_PARAM_TASK_DEFAULT = 0, + SLPC_PARAM_TASK_ENABLED, + SLPC_PARAM_TASK_DISABLED, + SLPC_PARAM_TASK_UNKNOWN +}; + +enum slpc_global_state { + SLPC_GLOBAL_STATE_NOT_RUNNING = 0, + SLPC_GLOBAL_STATE_INITIALIZING = 1, + SLPC_GLOBAL_STATE_RESETTING = 2, + SLPC_GLOBAL_STATE_RUNNING = 3, + SLPC_GLOBAL_STATE_SHUTTING_DOWN = 4, + SLPC_GLOBAL_STATE_ERROR = 5 +}; + +enum slpc_param_id { + SLPC_PARAM_TASK_ENABLE_GTPERF = 0, + SLPC_PARAM_TASK_DISABLE_GTPERF = 1, + SLPC_PARAM_TASK_ENABLE_BALANCER = 2, + SLPC_PARAM_TASK_DISABLE_BALANCER = 3, + SLPC_PARAM_TASK_ENABLE_DCC = 4, + SLPC_PARAM_TASK_DISABLE_DCC = 5, + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ = 6, + SLPC_PARAM_GLOBAL_MAX_GT_UNSLICE_FREQ_MHZ = 7, + SLPC_PARAM_GLOBAL_MIN_GT_SLICE_FREQ_MHZ = 8, + SLPC_PARAM_GLOBAL_MAX_GT_SLICE_FREQ_MHZ = 9, + SLPC_PARAM_GTPERF_THRESHOLD_MAX_FPS = 10, + SLPC_PARAM_GLOBAL_DISABLE_GT_FREQ_MANAGEMENT = 11, + SLPC_PARAM_GTPERF_ENABLE_FRAMERATE_STALLING = 12, + SLPC_PARAM_GLOBAL_DISABLE_RC6_MODE_CHANGE = 13, + SLPC_PARAM_GLOBAL_OC_UNSLICE_FREQ_MHZ = 14, + SLPC_PARAM_GLOBAL_OC_SLICE_FREQ_MHZ = 15, + SLPC_PARAM_GLOBAL_ENABLE_IA_GT_BALANCING = 16, + SLPC_PARAM_GLOBAL_ENABLE_ADAPTIVE_BURST_TURBO = 17, + SLPC_PARAM_GLOBAL_ENABLE_EVAL_MODE = 18, + SLPC_PARAM_GLOBAL_ENABLE_BALANCER_IN_NON_GAMING_MODE = 19, + SLPC_PARAM_GLOBAL_RT_MODE_TURBO_FREQ_DELTA_MHZ = 20, + SLPC_PARAM_PWRGATE_RC_MODE = 21, + SLPC_PARAM_EDR_MODE_COMPUTE_TIMEOUT_MS = 22, + SLPC_PARAM_EDR_QOS_FREQ_MHZ = 23, + SLPC_PARAM_MEDIA_FF_RATIO_MODE = 24, + SLPC_PARAM_ENABLE_IA_FREQ_LIMITING = 25, + SLPC_PARAM_STRATEGIES = 26, + SLPC_PARAM_POWER_PROFILE = 27, + SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY = 28, + SLPC_MAX_PARAM = 32, +}; + +enum slpc_event_id { + SLPC_EVENT_RESET = 0, + SLPC_EVENT_SHUTDOWN = 1, + SLPC_EVENT_PLATFORM_INFO_CHANGE = 2, + SLPC_EVENT_DISPLAY_MODE_CHANGE = 3, + SLPC_EVENT_FLIP_COMPLETE = 4, + SLPC_EVENT_QUERY_TASK_STATE = 5, + SLPC_EVENT_PARAMETER_SET = 6, + SLPC_EVENT_PARAMETER_UNSET = 7, +}; + +struct slpc_task_state_data { + union { + u32 task_status_padding; + struct { + u32 status; +#define SLPC_GTPERF_TASK_ENABLED REG_BIT(0) +#define SLPC_DCC_TASK_ENABLED REG_BIT(11) +#define SLPC_IN_DCC REG_BIT(12) +#define SLPC_BALANCER_ENABLED REG_BIT(15) +#define SLPC_IBC_TASK_ENABLED REG_BIT(16) +#define SLPC_BALANCER_IA_LMT_ENABLED REG_BIT(17) +#define SLPC_BALANCER_IA_LMT_ACTIVE REG_BIT(18) + }; + }; + union { + u32 freq_padding; + struct { +#define SLPC_MAX_UNSLICE_FREQ_MASK REG_GENMASK(7, 0) +#define SLPC_MIN_UNSLICE_FREQ_MASK REG_GENMASK(15, 8) +#define SLPC_MAX_SLICE_FREQ_MASK REG_GENMASK(23, 16) +#define SLPC_MIN_SLICE_FREQ_MASK REG_GENMASK(31, 24) + u32 freq; + }; + }; +} __packed; + +struct slpc_shared_data_header { + /* Total size in bytes of this shared buffer. */ + u32 size; + u32 global_state; + u32 display_data_addr; +} __packed; + +struct slpc_override_params { + u32 bits[SLPC_OVERRIDE_BITFIELD_SIZE]; + u32 values[SLPC_MAX_OVERRIDE_PARAMETERS]; +} __packed; + +struct slpc_shared_data { + struct slpc_shared_data_header header; + u8 shared_data_header_pad[SLPC_SHARED_DATA_SIZE_BYTE_HEADER - + sizeof(struct slpc_shared_data_header)]; + + u8 platform_info_pad[SLPC_SHARED_DATA_SIZE_BYTE_PLATFORM_INFO]; + + struct slpc_task_state_data task_state_data; + u8 task_state_data_pad[SLPC_SHARED_DATA_SIZE_BYTE_TASK_STATE - + sizeof(struct slpc_task_state_data)]; + + struct slpc_override_params override_params; + u8 override_params_pad[SLPC_OVERRIDE_PARAMS_TOTAL_BYTES - + sizeof(struct slpc_override_params)]; + + u8 shared_data_pad[SLPC_SHARED_DATA_SIZE_BYTE_OTHER]; + + /* PAGE 2 (4096 bytes), mode based parameter will be removed soon */ + u8 reserved_mode_definition[4096]; +} __packed; + +/** + * DOC: SLPC H2G MESSAGE FORMAT + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | DATA0 = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | ACTION = _`GUC_ACTION_HOST2GUC_PC_SLPM_REQUEST` = 0x3003 | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:8 | **EVENT_ID** | + * + +-------+--------------------------------------------------------------+ + * | | 7:0 | **EVENT_ARGC** - number of data arguments | + * +---+-------+--------------------------------------------------------------+ + * | 2 | 31:0 | **EVENT_DATA1** | + * +---+-------+--------------------------------------------------------------+ + * |...| 31:0 | ... | + * +---+-------+--------------------------------------------------------------+ + * |2+n| 31:0 | **EVENT_DATAn** | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST 0x3003 + +#define HOST2GUC_PC_SLPC_REQUEST_MSG_MIN_LEN \ + (GUC_HXG_REQUEST_MSG_MIN_LEN + 1u) +#define HOST2GUC_PC_SLPC_EVENT_MAX_INPUT_ARGS 9 +#define HOST2GUC_PC_SLPC_REQUEST_MSG_MAX_LEN \ + (HOST2GUC_PC_SLPC_REQUEST_REQUEST_MSG_MIN_LEN + \ + HOST2GUC_PC_SLPC_EVENT_MAX_INPUT_ARGS) +#define HOST2GUC_PC_SLPC_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 +#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ID (0xff << 8) +#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ARGC (0xff << 0) +#define HOST2GUC_PC_SLPC_REQUEST_MSG_N_EVENT_DATA_N GUC_HXG_REQUEST_MSG_n_DATAn + +#endif diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h index d38935f47ecf..99e1fad5ca20 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h @@ -7,6 +7,111 @@ #define _ABI_GUC_COMMUNICATION_CTB_ABI_H #include <linux/types.h> +#include <linux/build_bug.h> + +#include "guc_messages_abi.h" + +/** + * DOC: CT Buffer + * + * Circular buffer used to send `CTB Message`_ + */ + +/** + * DOC: CTB Descriptor + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31:0 | **HEAD** - offset (in dwords) to the last dword that was | + * | | | read from the `CT Buffer`_. | + * | | | It can only be updated by the receiver. | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | **TAIL** - offset (in dwords) to the last dword that was | + * | | | written to the `CT Buffer`_. | + * | | | It can only be updated by the sender. | + * +---+-------+--------------------------------------------------------------+ + * | 2 | 31:0 | **STATUS** - status of the CTB | + * | | | | + * | | | - _`GUC_CTB_STATUS_NO_ERROR` = 0 (normal operation) | + * | | | - _`GUC_CTB_STATUS_OVERFLOW` = 1 (head/tail too large) | + * | | | - _`GUC_CTB_STATUS_UNDERFLOW` = 2 (truncated message) | + * | | | - _`GUC_CTB_STATUS_MISMATCH` = 4 (head/tail modified) | + * +---+-------+--------------------------------------------------------------+ + * |...| | RESERVED = MBZ | + * +---+-------+--------------------------------------------------------------+ + * | 15| 31:0 | RESERVED = MBZ | + * +---+-------+--------------------------------------------------------------+ + */ + +struct guc_ct_buffer_desc { + u32 head; + u32 tail; + u32 status; +#define GUC_CTB_STATUS_NO_ERROR 0 +#define GUC_CTB_STATUS_OVERFLOW (1 << 0) +#define GUC_CTB_STATUS_UNDERFLOW (1 << 1) +#define GUC_CTB_STATUS_MISMATCH (1 << 2) + u32 reserved[13]; +} __packed; +static_assert(sizeof(struct guc_ct_buffer_desc) == 64); + +/** + * DOC: CTB Message + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31:16 | **FENCE** - message identifier | + * | +-------+--------------------------------------------------------------+ + * | | 15:12 | **FORMAT** - format of the CTB message | + * | | | - _`GUC_CTB_FORMAT_HXG` = 0 - see `CTB HXG Message`_ | + * | +-------+--------------------------------------------------------------+ + * | | 11:8 | **RESERVED** | + * | +-------+--------------------------------------------------------------+ + * | | 7:0 | **NUM_DWORDS** - length of the CTB message (w/o header) | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | optional (depends on FORMAT) | + * +---+-------+ | + * |...| | | + * +---+-------+ | + * | n | 31:0 | | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_CTB_HDR_LEN 1u +#define GUC_CTB_MSG_MIN_LEN GUC_CTB_HDR_LEN +#define GUC_CTB_MSG_MAX_LEN 256u +#define GUC_CTB_MSG_0_FENCE (0xffff << 16) +#define GUC_CTB_MSG_0_FORMAT (0xf << 12) +#define GUC_CTB_FORMAT_HXG 0u +#define GUC_CTB_MSG_0_RESERVED (0xf << 8) +#define GUC_CTB_MSG_0_NUM_DWORDS (0xff << 0) + +/** + * DOC: CTB HXG Message + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31:16 | FENCE | + * | +-------+--------------------------------------------------------------+ + * | | 15:12 | FORMAT = GUC_CTB_FORMAT_HXG_ | + * | +-------+--------------------------------------------------------------+ + * | | 11:8 | RESERVED = MBZ | + * | +-------+--------------------------------------------------------------+ + * | | 7:0 | NUM_DWORDS = length (in dwords) of the embedded HXG message | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | +--------------------------------------------------------+ | + * +---+-------+ | | | + * |...| | | Embedded `HXG Message`_ | | + * +---+-------+ | | | + * | n | 31:0 | +--------------------------------------------------------+ | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_CTB_HXG_MSG_MIN_LEN (GUC_CTB_MSG_MIN_LEN + GUC_HXG_MSG_MIN_LEN) +#define GUC_CTB_HXG_MSG_MAX_LEN GUC_CTB_MSG_MAX_LEN /** * DOC: CTB based communication @@ -61,28 +166,6 @@ */ /* - * Describes single command transport buffer. - * Used by both guc-master and clients. - */ -struct guc_ct_buffer_desc { - u32 addr; /* gfx address */ - u64 host_private; /* host private data */ - u32 size; /* size in bytes */ - u32 head; /* offset updated by GuC*/ - u32 tail; /* offset updated by owner */ - u32 is_in_error; /* error indicator */ - u32 reserved1; - u32 reserved2; - u32 owner; /* id of the channel owner */ - u32 owner_sub_id; /* owner-defined field for extra tracking */ - u32 reserved[5]; -} __packed; - -/* Type of command transport buffer */ -#define INTEL_GUC_CT_BUFFER_TYPE_SEND 0x0u -#define INTEL_GUC_CT_BUFFER_TYPE_RECV 0x1u - -/* * Definition of the command transport message header (DW0) * * bit[4..0] message len (in dwords) diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h index be066a62e9e0..bbf1ddb77434 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_mmio_abi.h @@ -7,46 +7,43 @@ #define _ABI_GUC_COMMUNICATION_MMIO_ABI_H /** - * DOC: MMIO based communication + * DOC: GuC MMIO based communication * - * The MMIO based communication between Host and GuC uses software scratch - * registers, where first register holds data treated as message header, - * and other registers are used to hold message payload. + * The MMIO based communication between Host and GuC relies on special + * hardware registers which format could be defined by the software + * (so called scratch registers). * - * For Gen9+, GuC uses software scratch registers 0xC180-0xC1B8, - * but no H2G command takes more than 8 parameters and the GuC FW - * itself uses an 8-element array to store the H2G message. + * Each MMIO based message, both Host to GuC (H2G) and GuC to Host (G2H) + * messages, which maximum length depends on number of available scratch + * registers, is directly written into those scratch registers. * - * +-----------+---------+---------+---------+ - * | MMIO[0] | MMIO[1] | ... | MMIO[n] | - * +-----------+---------+---------+---------+ - * | header | optional payload | - * +======+====+=========+=========+=========+ - * | 31:28|type| | | | - * +------+----+ | | | - * | 27:16|data| | | | - * +------+----+ | | | - * | 15:0|code| | | | - * +------+----+---------+---------+---------+ + * For Gen9+, there are 16 software scratch registers 0xC180-0xC1B8, + * but no H2G command takes more than 4 parameters and the GuC firmware + * itself uses an 4-element array to store the H2G message. * - * The message header consists of: + * For Gen11+, there are additional 4 registers 0x190240-0x19024C, which + * are, regardless on lower count, preferred over legacy ones. * - * - **type**, indicates message type - * - **code**, indicates message code, is specific for **type** - * - **data**, indicates message data, optional, depends on **code** - * - * The following message **types** are supported: - * - * - **REQUEST**, indicates Host-to-GuC request, requested GuC action code - * must be priovided in **code** field. Optional action specific parameters - * can be provided in remaining payload registers or **data** field. - * - * - **RESPONSE**, indicates GuC-to-Host response from earlier GuC request, - * action response status will be provided in **code** field. Optional - * response data can be returned in remaining payload registers or **data** - * field. + * The MMIO based communication is mainly used during driver initialization + * phase to setup the `CTB based communication`_ that will be used afterwards. */ -#define GUC_MAX_MMIO_MSG_LEN 8 +#define GUC_MAX_MMIO_MSG_LEN 4 + +/** + * DOC: MMIO HXG Message + * + * Format of the MMIO messages follows definitions of `HXG Message`_. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31:0 | +--------------------------------------------------------+ | + * +---+-------+ | | | + * |...| | | Embedded `HXG Message`_ | | + * +---+-------+ | | | + * | n | 31:0 | +--------------------------------------------------------+ | + * +---+-------+--------------------------------------------------------------+ + */ #endif /* _ABI_GUC_COMMUNICATION_MMIO_ABI_H */ diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h index 775e21f3058c..29ac823acd4c 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h @@ -6,6 +6,219 @@ #ifndef _ABI_GUC_MESSAGES_ABI_H #define _ABI_GUC_MESSAGES_ABI_H +/** + * DOC: HXG Message + * + * All messages exchanged with GuC are defined using 32 bit dwords. + * First dword is treated as a message header. Remaining dwords are optional. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | | | | + * | 0 | 31 | **ORIGIN** - originator of the message | + * | | | - _`GUC_HXG_ORIGIN_HOST` = 0 | + * | | | - _`GUC_HXG_ORIGIN_GUC` = 1 | + * | | | | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | **TYPE** - message type | + * | | | - _`GUC_HXG_TYPE_REQUEST` = 0 | + * | | | - _`GUC_HXG_TYPE_EVENT` = 1 | + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3 | + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5 | + * | | | - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6 | + * | | | - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7 | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | **AUX** - auxiliary data (depends on TYPE) | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | | + * +---+-------+ | + * |...| | **PAYLOAD** - optional payload (depends on TYPE) | + * +---+-------+ | + * | n | 31:0 | | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_MSG_MIN_LEN 1u +#define GUC_HXG_MSG_0_ORIGIN (0x1 << 31) +#define GUC_HXG_ORIGIN_HOST 0u +#define GUC_HXG_ORIGIN_GUC 1u +#define GUC_HXG_MSG_0_TYPE (0x7 << 28) +#define GUC_HXG_TYPE_REQUEST 0u +#define GUC_HXG_TYPE_EVENT 1u +#define GUC_HXG_TYPE_NO_RESPONSE_BUSY 3u +#define GUC_HXG_TYPE_NO_RESPONSE_RETRY 5u +#define GUC_HXG_TYPE_RESPONSE_FAILURE 6u +#define GUC_HXG_TYPE_RESPONSE_SUCCESS 7u +#define GUC_HXG_MSG_0_AUX (0xfffffff << 0) +#define GUC_HXG_MSG_n_PAYLOAD (0xffffffff << 0) + +/** + * DOC: HXG Request + * + * The `HXG Request`_ message should be used to initiate synchronous activity + * for which confirmation or return data is expected. + * + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_ + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_ + * message as a intermediate reply. + * + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | **DATA0** - request data (depends on ACTION) | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **ACTION** - requested action code | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | | + * +---+-------+ | + * |...| | **DATAn** - optional data (depends on ACTION) | + * +---+-------+ | + * | n | 31:0 | | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_REQUEST_MSG_MIN_LEN GUC_HXG_MSG_MIN_LEN +#define GUC_HXG_REQUEST_MSG_0_DATA0 (0xfff << 16) +#define GUC_HXG_REQUEST_MSG_0_ACTION (0xffff << 0) +#define GUC_HXG_REQUEST_MSG_n_DATAn GUC_HXG_MSG_n_PAYLOAD + +/** + * DOC: HXG Event + * + * The `HXG Event`_ message should be used to initiate asynchronous activity + * that does not involves immediate confirmation nor data. + * + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_EVENT_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | **DATA0** - event data (depends on ACTION) | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **ACTION** - event action code | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | | + * +---+-------+ | + * |...| | **DATAn** - optional event data (depends on ACTION) | + * +---+-------+ | + * | n | 31:0 | | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_EVENT_MSG_MIN_LEN GUC_HXG_MSG_MIN_LEN +#define GUC_HXG_EVENT_MSG_0_DATA0 (0xfff << 16) +#define GUC_HXG_EVENT_MSG_0_ACTION (0xffff << 0) +#define GUC_HXG_EVENT_MSG_n_DATAn GUC_HXG_MSG_n_PAYLOAD + +/** + * DOC: HXG Busy + * + * The `HXG Busy`_ message may be used to acknowledge reception of the `HXG Request`_ + * message if the recipient expects that it processing will be longer than default + * timeout. + * + * The @COUNTER field may be used as a progress indicator. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_BUSY_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | **COUNTER** - progress indicator | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_BUSY_MSG_LEN GUC_HXG_MSG_MIN_LEN +#define GUC_HXG_BUSY_MSG_0_COUNTER GUC_HXG_MSG_0_AUX + +/** + * DOC: HXG Retry + * + * The `HXG Retry`_ message should be used by recipient to indicate that the + * `HXG Request`_ message was dropped and it should be resent again. + * + * The @REASON field may be used to provide additional information. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_NO_RESPONSE_RETRY_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | **REASON** - reason for retry | + * | | | - _`GUC_HXG_RETRY_REASON_UNSPECIFIED` = 0 | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_RETRY_MSG_LEN GUC_HXG_MSG_MIN_LEN +#define GUC_HXG_RETRY_MSG_0_REASON GUC_HXG_MSG_0_AUX +#define GUC_HXG_RETRY_REASON_UNSPECIFIED 0u + +/** + * DOC: HXG Failure + * + * The `HXG Failure`_ message shall be used as a reply to the `HXG Request`_ + * message that could not be processed due to an error. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_FAILURE_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:16 | **HINT** - additional error hint | + * | +-------+--------------------------------------------------------------+ + * | | 15:0 | **ERROR** - error/result code | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_FAILURE_MSG_LEN GUC_HXG_MSG_MIN_LEN +#define GUC_HXG_FAILURE_MSG_0_HINT (0xfff << 16) +#define GUC_HXG_FAILURE_MSG_0_ERROR (0xffff << 0) + +/** + * DOC: HXG Response + * + * The `HXG Response`_ message shall be used as a reply to the `HXG Request`_ + * message that was successfully processed without an error. + * + * +---+-------+--------------------------------------------------------------+ + * | | Bits | Description | + * +===+=======+==============================================================+ + * | 0 | 31 | ORIGIN | + * | +-------+--------------------------------------------------------------+ + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | + * | +-------+--------------------------------------------------------------+ + * | | 27:0 | **DATA0** - data (depends on ACTION from `HXG Request`_) | + * +---+-------+--------------------------------------------------------------+ + * | 1 | 31:0 | | + * +---+-------+ | + * |...| | **DATAn** - data (depends on ACTION from `HXG Request`_) | + * +---+-------+ | + * | n | 31:0 | | + * +---+-------+--------------------------------------------------------------+ + */ + +#define GUC_HXG_RESPONSE_MSG_MIN_LEN GUC_HXG_MSG_MIN_LEN +#define GUC_HXG_RESPONSE_MSG_0_DATA0 GUC_HXG_MSG_0_AUX +#define GUC_HXG_RESPONSE_MSG_n_DATAn GUC_HXG_MSG_n_PAYLOAD + +/* deprecated */ #define INTEL_GUC_MSG_TYPE_SHIFT 28 #define INTEL_GUC_MSG_TYPE_MASK (0xF << INTEL_GUC_MSG_TYPE_SHIFT) #define INTEL_GUC_MSG_DATA_SHIFT 16 diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index f147cb389a20..fbfcae727d7f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -7,6 +7,7 @@ #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm_irq.h" #include "intel_guc.h" +#include "intel_guc_slpc.h" #include "intel_guc_ads.h" #include "intel_guc_submission.h" #include "i915_drv.h" @@ -157,6 +158,8 @@ void intel_guc_init_early(struct intel_guc *guc) intel_guc_ct_init_early(&guc->ct); intel_guc_log_init_early(&guc->log); intel_guc_submission_init_early(guc); + intel_guc_slpc_init_early(&guc->slpc); + intel_guc_rc_init_early(guc); mutex_init(&guc->send_mutex); spin_lock_init(&guc->irq_lock); @@ -180,6 +183,11 @@ void intel_guc_init_early(struct intel_guc *guc) } } +void intel_guc_init_late(struct intel_guc *guc) +{ + intel_guc_ads_init_late(guc); +} + static u32 guc_ctl_debug_flags(struct intel_guc *guc) { u32 level = intel_guc_log_get_level(&guc->log); @@ -201,6 +209,9 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc) if (!intel_guc_submission_is_used(guc)) flags |= GUC_CTL_DISABLE_SCHEDULER; + if (intel_guc_slpc_is_used(guc)) + flags |= GUC_CTL_ENABLE_SLPC; + return flags; } @@ -219,24 +230,19 @@ static u32 guc_ctl_log_params_flags(struct intel_guc *guc) BUILD_BUG_ON(!CRASH_BUFFER_SIZE); BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, UNIT)); - BUILD_BUG_ON(!DPC_BUFFER_SIZE); - BUILD_BUG_ON(!IS_ALIGNED(DPC_BUFFER_SIZE, UNIT)); - BUILD_BUG_ON(!ISR_BUFFER_SIZE); - BUILD_BUG_ON(!IS_ALIGNED(ISR_BUFFER_SIZE, UNIT)); + BUILD_BUG_ON(!DEBUG_BUFFER_SIZE); + BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, UNIT)); BUILD_BUG_ON((CRASH_BUFFER_SIZE / UNIT - 1) > (GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT)); - BUILD_BUG_ON((DPC_BUFFER_SIZE / UNIT - 1) > - (GUC_LOG_DPC_MASK >> GUC_LOG_DPC_SHIFT)); - BUILD_BUG_ON((ISR_BUFFER_SIZE / UNIT - 1) > - (GUC_LOG_ISR_MASK >> GUC_LOG_ISR_SHIFT)); + BUILD_BUG_ON((DEBUG_BUFFER_SIZE / UNIT - 1) > + (GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT)); flags = GUC_LOG_VALID | GUC_LOG_NOTIFY_ON_HALF_FULL | FLAG | ((CRASH_BUFFER_SIZE / UNIT - 1) << GUC_LOG_CRASH_SHIFT) | - ((DPC_BUFFER_SIZE / UNIT - 1) << GUC_LOG_DPC_SHIFT) | - ((ISR_BUFFER_SIZE / UNIT - 1) << GUC_LOG_ISR_SHIFT) | + ((DEBUG_BUFFER_SIZE / UNIT - 1) << GUC_LOG_DEBUG_SHIFT) | (offset << GUC_LOG_BUF_ADDR_SHIFT); #undef UNIT @@ -331,6 +337,12 @@ int intel_guc_init(struct intel_guc *guc) goto err_ct; } + if (intel_guc_slpc_is_used(guc)) { + ret = intel_guc_slpc_init(&guc->slpc); + if (ret) + goto err_submission; + } + /* now that everything is perma-pinned, initialize the parameters */ guc_init_params(guc); @@ -341,6 +353,8 @@ int intel_guc_init(struct intel_guc *guc) return 0; +err_submission: + intel_guc_submission_fini(guc); err_ct: intel_guc_ct_fini(&guc->ct); err_ads: @@ -363,6 +377,9 @@ void intel_guc_fini(struct intel_guc *guc) i915_ggtt_disable_guc(gt->ggtt); + if (intel_guc_slpc_is_used(guc)) + intel_guc_slpc_fini(&guc->slpc); + if (intel_guc_submission_is_used(guc)) intel_guc_submission_fini(guc); @@ -376,29 +393,27 @@ void intel_guc_fini(struct intel_guc *guc) /* * This function implements the MMIO based host to GuC interface. */ -int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len, +int intel_guc_send_mmio(struct intel_guc *guc, const u32 *request, u32 len, u32 *response_buf, u32 response_buf_size) { + struct drm_i915_private *i915 = guc_to_gt(guc)->i915; struct intel_uncore *uncore = guc_to_gt(guc)->uncore; - u32 status; + u32 header; int i; int ret; GEM_BUG_ON(!len); GEM_BUG_ON(len > guc->send_regs.count); - /* We expect only action code */ - GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK); - - /* If CT is available, we expect to use MMIO only during init/fini */ - GEM_BUG_ON(*action != INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER && - *action != INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER); + GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, request[0]) != GUC_HXG_ORIGIN_HOST); + GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, request[0]) != GUC_HXG_TYPE_REQUEST); mutex_lock(&guc->send_mutex); intel_uncore_forcewake_get(uncore, guc->send_regs.fw_domains); +retry: for (i = 0; i < len; i++) - intel_uncore_write(uncore, guc_send_reg(guc, i), action[i]); + intel_uncore_write(uncore, guc_send_reg(guc, i), request[i]); intel_uncore_posting_read(uncore, guc_send_reg(guc, i - 1)); @@ -410,30 +425,74 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action, u32 len, */ ret = __intel_wait_for_register_fw(uncore, guc_send_reg(guc, 0), - INTEL_GUC_MSG_TYPE_MASK, - INTEL_GUC_MSG_TYPE_RESPONSE << - INTEL_GUC_MSG_TYPE_SHIFT, - 10, 10, &status); - /* If GuC explicitly returned an error, convert it to -EIO */ - if (!ret && !INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(status)) - ret = -EIO; + GUC_HXG_MSG_0_ORIGIN, + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, + GUC_HXG_ORIGIN_GUC), + 10, 10, &header); + if (unlikely(ret)) { +timeout: + drm_err(&i915->drm, "mmio request %#x: no reply %x\n", + request[0], header); + goto out; + } - if (ret) { - DRM_ERROR("MMIO: GuC action %#x failed with error %d %#x\n", - action[0], ret, status); + if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == GUC_HXG_TYPE_NO_RESPONSE_BUSY) { +#define done ({ header = intel_uncore_read(uncore, guc_send_reg(guc, 0)); \ + FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) != GUC_HXG_ORIGIN_GUC || \ + FIELD_GET(GUC_HXG_MSG_0_TYPE, header) != GUC_HXG_TYPE_NO_RESPONSE_BUSY; }) + + ret = wait_for(done, 1000); + if (unlikely(ret)) + goto timeout; + if (unlikely(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, header) != + GUC_HXG_ORIGIN_GUC)) + goto proto; +#undef done + } + + if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == GUC_HXG_TYPE_NO_RESPONSE_RETRY) { + u32 reason = FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, header); + + drm_dbg(&i915->drm, "mmio request %#x: retrying, reason %u\n", + request[0], reason); + goto retry; + } + + if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == GUC_HXG_TYPE_RESPONSE_FAILURE) { + u32 hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, header); + u32 error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, header); + + drm_err(&i915->drm, "mmio request %#x: failure %x/%u\n", + request[0], error, hint); + ret = -ENXIO; + goto out; + } + + if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) != GUC_HXG_TYPE_RESPONSE_SUCCESS) { +proto: + drm_err(&i915->drm, "mmio request %#x: unexpected reply %#x\n", + request[0], header); + ret = -EPROTO; goto out; } if (response_buf) { - int count = min(response_buf_size, guc->send_regs.count - 1); + int count = min(response_buf_size, guc->send_regs.count); + + GEM_BUG_ON(!count); - for (i = 0; i < count; i++) + response_buf[0] = header; + + for (i = 1; i < count; i++) response_buf[i] = intel_uncore_read(uncore, - guc_send_reg(guc, i + 1)); - } + guc_send_reg(guc, i)); - /* Use data from the GuC response as our return value */ - ret = INTEL_GUC_MSG_TO_DATA(status); + /* Use number of copied dwords as our return value */ + ret = count; + } else { + /* Use data from the GuC response as our return value */ + ret = FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, header); + } out: intel_uncore_forcewake_put(uncore, guc->send_regs.fw_domains); @@ -487,65 +546,35 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset) */ int intel_guc_suspend(struct intel_guc *guc) { - struct intel_uncore *uncore = guc_to_gt(guc)->uncore; int ret; - u32 status; u32 action[] = { - INTEL_GUC_ACTION_ENTER_S_STATE, - GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */ + INTEL_GUC_ACTION_RESET_CLIENT, }; - /* - * If GuC communication is enabled but submission is not supported, - * we do not need to suspend the GuC. - */ - if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc)) + if (!intel_guc_is_ready(guc)) return 0; - /* - * The ENTER_S_STATE action queues the save/restore operation in GuC FW - * and then returns, so waiting on the H2G is not enough to guarantee - * GuC is done. When all the processing is done, GuC writes - * INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll - * on that. Note that GuC does not ensure that the value in the register - * is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is - * in progress so we need to take care of that ourselves as well. - */ - - intel_uncore_write(uncore, SOFT_SCRATCH(14), - INTEL_GUC_SLEEP_STATE_INVALID_MASK); - - ret = intel_guc_send(guc, action, ARRAY_SIZE(action)); - if (ret) - return ret; - - ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14), - INTEL_GUC_SLEEP_STATE_INVALID_MASK, - 0, 0, 10, &status); - if (ret) - return ret; - - if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) { - DRM_ERROR("GuC failed to change sleep state. " - "action=0x%x, err=%u\n", - action[0], status); - return -EIO; + if (intel_guc_submission_is_used(guc)) { + /* + * This H2G MMIO command tears down the GuC in two steps. First it will + * generate a G2H CTB for every active context indicating a reset. In + * practice the i915 shouldn't ever get a G2H as suspend should only be + * called when the GPU is idle. Next, it tears down the CTBs and this + * H2G MMIO command completes. + * + * Don't abort on a failure code from the GuC. Keep going and do the + * clean up in santize() and re-initialisation on resume and hopefully + * the error here won't be problematic. + */ + ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0); + if (ret) + DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret); } - return 0; -} - -/** - * intel_guc_reset_engine() - ask GuC to reset an engine - * @guc: intel_guc structure - * @engine: engine to be reset - */ -int intel_guc_reset_engine(struct intel_guc *guc, - struct intel_engine_cs *engine) -{ - /* XXX: to be implemented with submission interface rework */ + /* Signal that the GuC isn't running. */ + intel_guc_sanitize(guc); - return -ENODEV; + return 0; } /** @@ -554,7 +583,12 @@ int intel_guc_reset_engine(struct intel_guc *guc, */ int intel_guc_resume(struct intel_guc *guc) { - /* XXX: to be implemented with submission interface rework */ + /* + * NB: This function can still be called even if GuC submission is + * disabled, e.g. if GuC is enabled for HuC authentication only. Thus, + * if any code is later added here, it must be support doing nothing + * if submission is disabled (as per intel_guc_suspend). + */ return 0; } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..2e27fe59786b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -6,12 +6,16 @@ #ifndef _INTEL_GUC_H_ #define _INTEL_GUC_H_ +#include <linux/xarray.h> +#include <linux/delay.h> + #include "intel_uncore.h" #include "intel_guc_fw.h" #include "intel_guc_fwif.h" #include "intel_guc_ct.h" #include "intel_guc_log.h" #include "intel_guc_reg.h" +#include "intel_guc_slpc_types.h" #include "intel_uc_fw.h" #include "i915_utils.h" #include "i915_vma.h" @@ -27,24 +31,47 @@ struct intel_guc { struct intel_uc_fw fw; struct intel_guc_log log; struct intel_guc_ct ct; + struct intel_guc_slpc slpc; + + /* Global engine used to submit requests to GuC */ + struct i915_sched_engine *sched_engine; + struct i915_request *stalled_request; /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask; + atomic_t outstanding_submission_g2h; + struct { void (*reset)(struct intel_guc *guc); void (*enable)(struct intel_guc *guc); void (*disable)(struct intel_guc *guc); } interrupts; + /* + * contexts_lock protects the pool of free guc ids and a linked list of + * guc ids available to be stolen + */ + spinlock_t contexts_lock; + struct ida guc_ids; + struct list_head guc_id_list; + + bool submission_supported; bool submission_selected; + bool rc_supported; + bool rc_selected; struct i915_vma *ads_vma; struct __guc_ads_blob *ads_blob; + u32 ads_regset_size; + u32 ads_golden_ctxt_size; - struct i915_vma *stage_desc_pool; - void *stage_desc_pool_vaddr; + struct i915_vma *lrc_desc_pool; + void *lrc_desc_pool_vaddr; + + /* guc_id to intel_context lookup */ + struct xarray context_lookup; /* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS]; @@ -74,7 +101,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) { - return intel_guc_ct_send(&guc->ct, action, len, NULL, 0); + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0); +} + +static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len, + u32 g2h_len_dw) +{ + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, + MAKE_SEND_FLAGS(g2h_len_dw)); } static inline int @@ -82,7 +117,43 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len, - response_buf, response_buf_size); + response_buf, response_buf_size, 0); +} + +static inline int intel_guc_send_busy_loop(struct intel_guc *guc, + const u32 *action, + u32 len, + u32 g2h_len_dw, + bool loop) +{ + int err; + unsigned int sleep_period_ms = 1; + bool not_atomic = !in_atomic() && !irqs_disabled(); + + /* + * FIXME: Have caller pass in if we are in an atomic context to avoid + * using in_atomic(). It is likely safe here as we check for irqs + * disabled which basically all the spin locks in the i915 do but + * regardless this should be cleaned up. + */ + + /* No sleeping with spin locks, just busy loop */ + might_sleep_if(loop && not_atomic); + +retry: + err = intel_guc_send_nb(guc, action, len, g2h_len_dw); + if (unlikely(err == -EBUSY && loop)) { + if (likely(not_atomic)) { + if (msleep_interruptible(sleep_period_ms)) + return -EINTR; + sleep_period_ms = sleep_period_ms << 1; + } else { + cpu_relax(); + } + goto retry; + } + + return err; } static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) @@ -118,6 +189,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc, } void intel_guc_init_early(struct intel_guc *guc); +void intel_guc_init_late(struct intel_guc *guc); void intel_guc_init_send_regs(struct intel_guc *guc); void intel_guc_write_params(struct intel_guc *guc); int intel_guc_init(struct intel_guc *guc); @@ -160,9 +232,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc) return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct); } +static inline void intel_guc_reset_interrupts(struct intel_guc *guc) +{ + guc->interrupts.reset(guc); +} + +static inline void intel_guc_enable_interrupts(struct intel_guc *guc) +{ + guc->interrupts.enable(guc); +} + +static inline void intel_guc_disable_interrupts(struct intel_guc *guc) +{ + guc->interrupts.disable(guc); +} + static inline int intel_guc_sanitize(struct intel_guc *guc) { intel_uc_fw_sanitize(&guc->fw); + intel_guc_disable_interrupts(guc); intel_guc_ct_sanitize(&guc->ct); guc->mmio_msg = 0; @@ -183,8 +271,27 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) spin_unlock_irq(&guc->irq_lock); } -int intel_guc_reset_engine(struct intel_guc *guc, - struct intel_engine_cs *engine); +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout); + +int intel_guc_deregister_done_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); +int intel_guc_sched_done_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); +int intel_guc_engine_failure_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); + +void intel_guc_find_hung_context(struct intel_engine_cs *engine); + +int intel_guc_global_policies_update(struct intel_guc *guc); + +void intel_guc_context_ban(struct intel_context *ce, struct i915_request *rq); + +void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc); void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 9abfbc6edbd6..6926919bcac6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -3,8 +3,11 @@ * Copyright © 2014-2019 Intel Corporation */ +#include <linux/bsearch.h> + #include "gt/intel_gt.h" #include "gt/intel_lrc.h" +#include "gt/shmem_utils.h" #include "intel_guc_ads.h" #include "intel_guc_fwif.h" #include "intel_uc.h" @@ -23,10 +26,15 @@ * | guc_policies | * +---------------------------------------+ * | guc_gt_system_info | - * +---------------------------------------+ - * | guc_clients_info | - * +---------------------------------------+ - * | guc_ct_pool_entry[size] | + * +---------------------------------------+ <== static + * | guc_mmio_reg[countA] (engine 0.0) | + * | guc_mmio_reg[countB] (engine 0.1) | + * | guc_mmio_reg[countC] (engine 1.0) | + * | ... | + * +---------------------------------------+ <== dynamic + * | padding | + * +---------------------------------------+ <== 4K aligned + * | golden contexts | * +---------------------------------------+ * | padding | * +---------------------------------------+ <== 4K aligned @@ -39,18 +47,49 @@ struct __guc_ads_blob { struct guc_ads ads; struct guc_policies policies; struct guc_gt_system_info system_info; - struct guc_clients_info clients_info; - struct guc_ct_pool_entry ct_pool[GUC_CT_POOL_SIZE]; + /* From here on, location is dynamic! Refer to above diagram. */ + struct guc_mmio_reg regset[0]; } __packed; +static u32 guc_ads_regset_size(struct intel_guc *guc) +{ + GEM_BUG_ON(!guc->ads_regset_size); + return guc->ads_regset_size; +} + +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc) +{ + return PAGE_ALIGN(guc->ads_golden_ctxt_size); +} + static u32 guc_ads_private_data_size(struct intel_guc *guc) { return PAGE_ALIGN(guc->fw.private_data_size); } +static u32 guc_ads_regset_offset(struct intel_guc *guc) +{ + return offsetof(struct __guc_ads_blob, regset); +} + +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc) +{ + u32 offset; + + offset = guc_ads_regset_offset(guc) + + guc_ads_regset_size(guc); + + return PAGE_ALIGN(offset); +} + static u32 guc_ads_private_data_offset(struct intel_guc *guc) { - return PAGE_ALIGN(sizeof(struct __guc_ads_blob)); + u32 offset; + + offset = guc_ads_golden_ctxt_offset(guc) + + guc_ads_golden_ctxt_size(guc); + + return PAGE_ALIGN(offset); } static u32 guc_ads_blob_size(struct intel_guc *guc) @@ -59,36 +98,66 @@ static u32 guc_ads_blob_size(struct intel_guc *guc) guc_ads_private_data_size(guc); } -static void guc_policy_init(struct guc_policy *policy) +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies) { - policy->execution_quantum = POLICY_DEFAULT_EXECUTION_QUANTUM_US; - policy->preemption_time = POLICY_DEFAULT_PREEMPTION_TIME_US; - policy->fault_time = POLICY_DEFAULT_FAULT_TIME_US; - policy->policy_flags = 0; + struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; + + policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; + policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI; + + policies->global_flags = 0; + if (i915->params.reset < 2) + policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET; + + policies->is_valid = 1; } -static void guc_policies_init(struct guc_policies *policies) +void intel_guc_ads_print_policy_info(struct intel_guc *guc, + struct drm_printer *dp) { - struct guc_policy *policy; - u32 p, i; + struct __guc_ads_blob *blob = guc->ads_blob; - policies->dpc_promote_time = POLICY_DEFAULT_DPC_PROMOTE_TIME_US; - policies->max_num_work_items = POLICY_MAX_NUM_WI; + if (unlikely(!blob)) + return; - for (p = 0; p < GUC_CLIENT_PRIORITY_NUM; p++) { - for (i = 0; i < GUC_MAX_ENGINE_CLASSES; i++) { - policy = &policies->policy[p][i]; + drm_printf(dp, "Global scheduling policies:\n"); + drm_printf(dp, " DPC promote time = %u\n", blob->policies.dpc_promote_time); + drm_printf(dp, " Max num work items = %u\n", blob->policies.max_num_work_items); + drm_printf(dp, " Flags = %u\n", blob->policies.global_flags); +} - guc_policy_init(policy); - } - } +static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) +{ + u32 action[] = { + INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE, + policy_offset + }; - policies->is_valid = 1; + return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); } -static void guc_ct_pool_entries_init(struct guc_ct_pool_entry *pool, u32 num) +int intel_guc_global_policies_update(struct intel_guc *guc) { - memset(pool, 0, num * sizeof(*pool)); + struct __guc_ads_blob *blob = guc->ads_blob; + struct intel_gt *gt = guc_to_gt(guc); + intel_wakeref_t wakeref; + int ret; + + if (!blob) + return -EOPNOTSUPP; + + GEM_BUG_ON(!blob->ads.scheduler_policies); + + guc_policies_init(guc, &blob->policies); + + if (!intel_guc_is_ready(guc)) + return 0; + + with_intel_runtime_pm(>->i915->runtime_pm, wakeref) + ret = guc_action_policies_update(guc, blob->ads.scheduler_policies); + + return ret; } static void guc_mapping_table_init(struct intel_gt *gt, @@ -113,53 +182,324 @@ static void guc_mapping_table_init(struct intel_gt *gt, } /* - * The first 80 dwords of the register state context, containing the - * execlists and ppgtt registers. + * The save/restore register list must be pre-calculated to a temporary + * buffer of driver defined size before it can be generated in place + * inside the ADS. */ -#define LR_HW_CONTEXT_SIZE (80 * sizeof(u32)) +#define MAX_MMIO_REGS 128 /* Arbitrary size, increase as needed */ +struct temp_regset { + struct guc_mmio_reg *registers; + u32 used; + u32 size; +}; -static void __guc_ads_init(struct intel_guc *guc) +static int guc_mmio_reg_cmp(const void *a, const void *b) +{ + const struct guc_mmio_reg *ra = a; + const struct guc_mmio_reg *rb = b; + + return (int)ra->offset - (int)rb->offset; +} + +static void guc_mmio_reg_add(struct temp_regset *regset, + u32 offset, u32 flags) +{ + u32 count = regset->used; + struct guc_mmio_reg reg = { + .offset = offset, + .flags = flags, + }; + struct guc_mmio_reg *slot; + + GEM_BUG_ON(count >= regset->size); + + /* + * The mmio list is built using separate lists within the driver. + * It's possible that at some point we may attempt to add the same + * register more than once. Do not consider this an error; silently + * move on if the register is already in the list. + */ + if (bsearch(®, regset->registers, count, + sizeof(reg), guc_mmio_reg_cmp)) + return; + + slot = ®set->registers[count]; + regset->used++; + *slot = reg; + + while (slot-- > regset->registers) { + GEM_BUG_ON(slot[0].offset == slot[1].offset); + if (slot[1].offset > slot[0].offset) + break; + + swap(slot[1], slot[0]); + } +} + +#define GUC_MMIO_REG_ADD(regset, reg, masked) \ + guc_mmio_reg_add(regset, \ + i915_mmio_reg_offset((reg)), \ + (masked) ? GUC_REGSET_MASKED : 0) + +static void guc_mmio_regset_init(struct temp_regset *regset, + struct intel_engine_cs *engine) +{ + const u32 base = engine->mmio_base; + struct i915_wa_list *wal = &engine->wa_list; + struct i915_wa *wa; + unsigned int i; + + regset->used = 0; + + GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true); + GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false); + GUC_MMIO_REG_ADD(regset, RING_IMR(base), false); + + for (i = 0, wa = wal->list; i < wal->count; i++, wa++) + GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg); + + /* Be extra paranoid and include all whitelist registers. */ + for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++) + GUC_MMIO_REG_ADD(regset, + RING_FORCE_TO_NONPRIV(base, i), + false); + + /* add in local MOCS registers */ + for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++) + GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false); +} + +static int guc_mmio_reg_state_query(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); - struct drm_i915_private *i915 = gt->i915; + struct intel_engine_cs *engine; + enum intel_engine_id id; + struct temp_regset temp_set; + u32 total; + + /* + * Need to actually build the list in order to filter out + * duplicates and other such data dependent constructions. + */ + temp_set.size = MAX_MMIO_REGS; + temp_set.registers = kmalloc_array(temp_set.size, + sizeof(*temp_set.registers), + GFP_KERNEL); + if (!temp_set.registers) + return -ENOMEM; + + total = 0; + for_each_engine(engine, gt, id) { + guc_mmio_regset_init(&temp_set, engine); + total += temp_set.used; + } + + kfree(temp_set.registers); + + return total * sizeof(struct guc_mmio_reg); +} + +static void guc_mmio_reg_state_init(struct intel_guc *guc, + struct __guc_ads_blob *blob) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; + struct temp_regset temp_set; + struct guc_mmio_reg_set *ads_reg_set; + u32 addr_ggtt, offset; + u8 guc_class; + + offset = guc_ads_regset_offset(guc); + addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset; + temp_set.registers = (struct guc_mmio_reg *)(((u8 *)blob) + offset); + temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]); + + for_each_engine(engine, gt, id) { + /* Class index is checked in class converter */ + GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS); + + guc_class = engine_class_to_guc_class(engine->class); + ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance]; + + guc_mmio_regset_init(&temp_set, engine); + if (!temp_set.used) { + ads_reg_set->address = 0; + ads_reg_set->count = 0; + continue; + } + + ads_reg_set->address = addr_ggtt; + ads_reg_set->count = temp_set.used; + + temp_set.size -= temp_set.used; + temp_set.registers += temp_set.used; + addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg); + } + + GEM_BUG_ON(temp_set.size); +} + +static void fill_engine_enable_masks(struct intel_gt *gt, + struct guc_gt_system_info *info) +{ + info->engine_enabled_masks[GUC_RENDER_CLASS] = 1; + info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1; + info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt); + info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt); +} + +static int guc_prep_golden_context(struct intel_guc *guc, + struct __guc_ads_blob *blob) +{ + struct intel_gt *gt = guc_to_gt(guc); + u32 addr_ggtt, offset; + u32 total_size = 0, alloc_size, real_size; + u8 engine_class, guc_class; + struct guc_gt_system_info *info, local_info; + + /* + * Reserve the memory for the golden contexts and point GuC at it but + * leave it empty for now. The context data will be filled in later + * once there is something available to put there. + * + * Note that the HWSP and ring context are not included. + * + * Note also that the storage must be pinned in the GGTT, so that the + * address won't change after GuC has been told where to find it. The + * GuC will also validate that the LRC base + size fall within the + * allowed GGTT range. + */ + if (blob) { + offset = guc_ads_golden_ctxt_offset(guc); + addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset; + info = &blob->system_info; + } else { + memset(&local_info, 0, sizeof(local_info)); + info = &local_info; + fill_engine_enable_masks(gt, info); + } + + for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) { + if (engine_class == OTHER_CLASS) + continue; + + guc_class = engine_class_to_guc_class(engine_class); + + if (!info->engine_enabled_masks[guc_class]) + continue; + + real_size = intel_engine_context_size(gt, engine_class); + alloc_size = PAGE_ALIGN(real_size); + total_size += alloc_size; + + if (!blob) + continue; + + blob->ads.eng_state_size[guc_class] = real_size; + blob->ads.golden_context_lrca[guc_class] = addr_ggtt; + addr_ggtt += alloc_size; + } + + if (!blob) + return total_size; + + GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size); + return total_size; +} + +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) { + if (engine->class != engine_class) + continue; + + if (!engine->default_state) + continue; + + return engine; + } + + return NULL; +} + +static void guc_init_golden_context(struct intel_guc *guc) +{ struct __guc_ads_blob *blob = guc->ads_blob; - const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE; - u32 base; + struct intel_engine_cs *engine; + struct intel_gt *gt = guc_to_gt(guc); + u32 addr_ggtt, offset; + u32 total_size = 0, alloc_size, real_size; u8 engine_class, guc_class; + u8 *ptr; - /* GuC scheduling policies */ - guc_policies_init(&blob->policies); + /* Skip execlist and PPGTT registers + HWSP */ + const u32 lr_hw_context_size = 80 * sizeof(u32); + const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE + + lr_hw_context_size; + + if (!intel_uc_uses_guc_submission(>->uc)) + return; + + GEM_BUG_ON(!blob); /* - * GuC expects a per-engine-class context image and size - * (minus hwsp and ring context). The context image will be - * used to reinitialize engines after a reset. It must exist - * and be pinned in the GGTT, so that the address won't change after - * we have told GuC where to find it. The context size will be used - * to validate that the LRC base + size fall within allowed GGTT. + * Go back and fill in the golden context data now that it is + * available. */ + offset = guc_ads_golden_ctxt_offset(guc); + addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset; + ptr = ((u8 *)blob) + offset; + for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) { if (engine_class == OTHER_CLASS) continue; guc_class = engine_class_to_guc_class(engine_class); - /* - * TODO: Set context pointer to default state to allow - * GuC to re-init guilty contexts after internal reset. - */ - blob->ads.golden_context_lrca[guc_class] = 0; - blob->ads.eng_state_size[guc_class] = - intel_engine_context_size(guc_to_gt(guc), - engine_class) - - skipped_size; + if (!blob->system_info.engine_enabled_masks[guc_class]) + continue; + + real_size = intel_engine_context_size(gt, engine_class); + alloc_size = PAGE_ALIGN(real_size); + total_size += alloc_size; + + engine = find_engine_state(gt, engine_class); + if (!engine) { + drm_err(>->i915->drm, "No engine state recorded for class %d!\n", + engine_class); + blob->ads.eng_state_size[guc_class] = 0; + blob->ads.golden_context_lrca[guc_class] = 0; + continue; + } + + GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size); + GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt); + addr_ggtt += alloc_size; + + shmem_read(engine->default_state, skip_size, ptr + skip_size, + real_size - skip_size); + ptr += alloc_size; } + GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size); +} + +static void __guc_ads_init(struct intel_guc *guc) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; + struct __guc_ads_blob *blob = guc->ads_blob; + u32 base; + + /* GuC scheduling policies */ + guc_policies_init(guc, &blob->policies); + /* System info */ - blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1; - blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1; - blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt); - blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt); + fill_engine_enable_masks(gt, &blob->system_info); blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] = hweight8(gt->info.sseu.slice_mask); @@ -174,21 +514,19 @@ static void __guc_ads_init(struct intel_guc *guc) GEN12_DOORBELLS_PER_SQIDI) + 1; } + /* Golden contexts for re-initialising after a watchdog reset */ + guc_prep_golden_context(guc, blob); + guc_mapping_table_init(guc_to_gt(guc), &blob->system_info); base = intel_guc_ggtt_offset(guc, guc->ads_vma); - /* Clients info */ - guc_ct_pool_entries_init(blob->ct_pool, ARRAY_SIZE(blob->ct_pool)); - - blob->clients_info.clients_num = 1; - blob->clients_info.ct_pool_addr = base + ptr_offset(blob, ct_pool); - blob->clients_info.ct_pool_count = ARRAY_SIZE(blob->ct_pool); - /* ADS */ blob->ads.scheduler_policies = base + ptr_offset(blob, policies); blob->ads.gt_system_info = base + ptr_offset(blob, system_info); - blob->ads.clients_info = base + ptr_offset(blob, clients_info); + + /* MMIO save/restore list */ + guc_mmio_reg_state_init(guc, blob); /* Private Data */ blob->ads.private_data = base + guc_ads_private_data_offset(guc); @@ -210,6 +548,19 @@ int intel_guc_ads_create(struct intel_guc *guc) GEM_BUG_ON(guc->ads_vma); + /* Need to calculate the reg state size dynamically: */ + ret = guc_mmio_reg_state_query(guc); + if (ret < 0) + return ret; + guc->ads_regset_size = ret; + + /* Likewise the golden contexts: */ + ret = guc_prep_golden_context(guc, NULL); + if (ret < 0) + return ret; + guc->ads_golden_ctxt_size = ret; + + /* Now the total size can be determined: */ size = guc_ads_blob_size(guc); ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma, @@ -222,6 +573,18 @@ int intel_guc_ads_create(struct intel_guc *guc) return 0; } +void intel_guc_ads_init_late(struct intel_guc *guc) +{ + /* + * The golden context setup requires the saved engine state from + * __engines_record_defaults(). However, that requires engines to be + * operational which means the ADS must already have been configured. + * Fortunately, the golden context state is not needed until a hang + * occurs, so it can be filled in during this late init phase. + */ + guc_init_golden_context(guc); +} + void intel_guc_ads_destroy(struct intel_guc *guc) { i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h index b00d3ae1113a..3d85051d57e4 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h @@ -7,9 +7,13 @@ #define _INTEL_GUC_ADS_H_ struct intel_guc; +struct drm_printer; int intel_guc_ads_create(struct intel_guc *guc); void intel_guc_ads_destroy(struct intel_guc *guc); +void intel_guc_ads_init_late(struct intel_guc *guc); void intel_guc_ads_reset(struct intel_guc *guc); +void intel_guc_ads_print_policy_info(struct intel_guc *guc, + struct drm_printer *p); #endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 8f7b148fef58..22b4733b55e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@ * Copyright © 2016-2019 Intel Corporation */ +#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h> + #include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -58,11 +63,17 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) * +--------+-----------------------------------------------+------+ * * Size of each `CT Buffer`_ must be multiple of 4K. - * As we don't expect too many messages, for now use minimum sizes. + * We don't expect too many messages in flight at any time, unless we are + * using the GuC submission. In that case each request requires a minimum + * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this + * enough space to avoid backpressure on the driver. We increase the size + * of the receive buffer (relative to the send) to ensure a G2H response + * CTB has a landing spot. */ #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (CTB_G2H_BUFFER_SIZE / 4) struct ct_request { struct list_head link; @@ -98,66 +109,84 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct) INIT_LIST_HEAD(&ct->requests.incoming); INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func); tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func); + init_waitqueue_head(&ct->wq); } static inline const char *guc_ct_buffer_type_to_str(u32 type) { switch (type) { - case INTEL_GUC_CT_BUFFER_TYPE_SEND: + case GUC_CTB_TYPE_HOST2GUC: return "SEND"; - case INTEL_GUC_CT_BUFFER_TYPE_RECV: + case GUC_CTB_TYPE_GUC2HOST: return "RECV"; default: return "<invalid>"; } } -static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc, - u32 cmds_addr, u32 size) +static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) { memset(desc, 0, sizeof(*desc)); - desc->addr = cmds_addr; - desc->size = size; - desc->owner = CTB_OWNER_HOST; } -static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb, u32 cmds_addr) +static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { - guc_ct_buffer_desc_init(ctb->desc, cmds_addr, ctb->size); + u32 space; + + ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space; + atomic_set(&ctb->space, space); + + guc_ct_buffer_desc_init(ctb->desc); } static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc, - u32 *cmds, u32 size) + u32 *cmds, u32 size_in_bytes, u32 resv_space) { - GEM_BUG_ON(size % 4); + GEM_BUG_ON(size_in_bytes % 4); ctb->desc = desc; ctb->cmds = cmds; - ctb->size = size; + ctb->size = size_in_bytes / 4; + ctb->resv_space = resv_space / 4; - guc_ct_buffer_reset(ctb, 0); + guc_ct_buffer_reset(ctb); } -static int guc_action_register_ct_buffer(struct intel_guc *guc, - u32 desc_addr, - u32 type) +static int guc_action_register_ct_buffer(struct intel_guc *guc, u32 type, + u32 desc_addr, u32 buff_addr, u32 size) { - u32 action[] = { - INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER, - desc_addr, - sizeof(struct guc_ct_buffer_desc), - type + u32 request[HOST2GUC_REGISTER_CTB_REQUEST_MSG_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_HOST2GUC_REGISTER_CTB), + FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_SIZE, size / SZ_4K - 1) | + FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_1_TYPE, type), + FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_2_DESC_ADDR, desc_addr), + FIELD_PREP(HOST2GUC_REGISTER_CTB_REQUEST_MSG_3_BUFF_ADDR, buff_addr), }; - /* Can't use generic send(), CT registration must go over MMIO */ - return intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0); + GEM_BUG_ON(type != GUC_CTB_TYPE_HOST2GUC && type != GUC_CTB_TYPE_GUC2HOST); + GEM_BUG_ON(size % SZ_4K); + + /* CT registration must go over MMIO */ + return intel_guc_send_mmio(guc, request, ARRAY_SIZE(request), NULL, 0); } -static int ct_register_buffer(struct intel_guc_ct *ct, u32 desc_addr, u32 type) +static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, + u32 desc_addr, u32 buff_addr, u32 size) { - int err = guc_action_register_ct_buffer(ct_to_guc(ct), desc_addr, type); + int err; + + err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO); + if (unlikely(err)) + return err; + err = guc_action_register_ct_buffer(ct_to_guc(ct), type, + desc_addr, buff_addr, size); if (unlikely(err)) CT_ERROR(ct, "Failed to register %s buffer (err=%d)\n", guc_ct_buffer_type_to_str(type), err); @@ -166,14 +195,17 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 desc_addr, u32 type) static int guc_action_deregister_ct_buffer(struct intel_guc *guc, u32 type) { - u32 action[] = { - INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER, - CTB_OWNER_HOST, - type + u32 request[HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_LEN] = { + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_HOST2GUC_DEREGISTER_CTB), + FIELD_PREP(HOST2GUC_DEREGISTER_CTB_REQUEST_MSG_1_TYPE, type), }; - /* Can't use generic send(), CT deregistration must go over MMIO */ - return intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0); + GEM_BUG_ON(type != GUC_CTB_TYPE_HOST2GUC && type != GUC_CTB_TYPE_GUC2HOST); + + /* CT deregistration must go over MMIO */ + return intel_guc_send_mmio(guc, request, ARRAY_SIZE(request), NULL, 0); } static int ct_deregister_buffer(struct intel_guc_ct *ct, u32 type) @@ -200,10 +232,15 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size; + u32 resv_space; void *blob; u32 *cmds; int err; + err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO); + if (err) + return err; + GEM_BUG_ON(ct->vma); blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE; @@ -220,19 +257,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) desc = blob; cmds = blob + 2 * CTB_DESC_SIZE; cmds_size = CTB_H2G_BUFFER_SIZE; - CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send", - ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size); + resv_space = 0; + CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send", + ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size, + resv_space); - guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size); + guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space); /* store pointers to desc and cmds for recv ctb */ desc = blob + CTB_DESC_SIZE; cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE; cmds_size = CTB_G2H_BUFFER_SIZE; - CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv", - ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size); + resv_space = G2H_ROOM_BUFFER_SIZE; + CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv", + ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size, + resv_space); - guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size); + guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space); return 0; } @@ -261,7 +302,7 @@ void intel_guc_ct_fini(struct intel_guc_ct *ct) int intel_guc_ct_enable(struct intel_guc_ct *ct) { struct intel_guc *guc = ct_to_guc(ct); - u32 base, cmds; + u32 base, desc, cmds; void *blob; int err; @@ -277,32 +318,36 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct) GEM_BUG_ON(blob != ct->ctbs.send.desc); /* (re)initialize descriptors */ - cmds = base + ptrdiff(ct->ctbs.send.cmds, blob); - guc_ct_buffer_reset(&ct->ctbs.send, cmds); - - cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob); - guc_ct_buffer_reset(&ct->ctbs.recv, cmds); + guc_ct_buffer_reset(&ct->ctbs.send); + guc_ct_buffer_reset(&ct->ctbs.recv); /* * Register both CT buffers starting with RECV buffer. * Descriptors are in first half of the blob. */ - err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.recv.desc, blob), - INTEL_GUC_CT_BUFFER_TYPE_RECV); + desc = base + ptrdiff(ct->ctbs.recv.desc, blob); + cmds = base + ptrdiff(ct->ctbs.recv.cmds, blob); + err = ct_register_buffer(ct, GUC_CTB_TYPE_GUC2HOST, + desc, cmds, ct->ctbs.recv.size * 4); + if (unlikely(err)) goto err_out; - err = ct_register_buffer(ct, base + ptrdiff(ct->ctbs.send.desc, blob), - INTEL_GUC_CT_BUFFER_TYPE_SEND); + desc = base + ptrdiff(ct->ctbs.send.desc, blob); + cmds = base + ptrdiff(ct->ctbs.send.cmds, blob); + err = ct_register_buffer(ct, GUC_CTB_TYPE_HOST2GUC, + desc, cmds, ct->ctbs.send.size * 4); + if (unlikely(err)) goto err_deregister; ct->enabled = true; + ct->stall_time = KTIME_MAX; return 0; err_deregister: - ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV); + ct_deregister_buffer(ct, GUC_CTB_TYPE_GUC2HOST); err_out: CT_PROBE_ERROR(ct, "Failed to enable CTB (%pe)\n", ERR_PTR(err)); return err; @@ -321,8 +366,8 @@ void intel_guc_ct_disable(struct intel_guc_ct *ct) ct->enabled = false; if (intel_guc_is_fw_running(guc)) { - ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_SEND); - ct_deregister_buffer(ct, INTEL_GUC_CT_BUFFER_TYPE_RECV); + ct_deregister_buffer(ct, GUC_CTB_TYPE_HOST2GUC); + ct_deregister_buffer(ct, GUC_CTB_TYPE_GUC2HOST); } } @@ -354,81 +399,63 @@ static void write_barrier(struct intel_guc_ct *ct) } } -/** - * DOC: CTB Host to GuC request - * - * Format of the CTB Host to GuC request message is as follows:: - * - * +------------+---------+---------+---------+---------+ - * | msg[0] | [1] | [2] | ... | [n-1] | - * +------------+---------+---------+---------+---------+ - * | MESSAGE | MESSAGE PAYLOAD | - * + HEADER +---------+---------+---------+---------+ - * | | 0 | 1 | ... | n | - * +============+=========+=========+=========+=========+ - * | len >= 1 | FENCE | request specific data | - * +------+-----+---------+---------+---------+---------+ - * - * ^-----------------len-------------------^ - */ - static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */, - u32 fence) + u32 fence, u32 flags) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; + u32 hxg; + u32 type; u32 *cmds = ctb->cmds; unsigned int i; - if (unlikely(desc->is_in_error)) - return -EPIPE; - - if (unlikely(!IS_ALIGNED(head | tail, 4) || - (tail | head) >= size)) + if (unlikely(desc->status)) goto corrupted; - /* later calculations will be done in dwords */ - head /= 4; - tail /= 4; - size /= 4; - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; + GEM_BUG_ON(tail > size); - /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + 1 >= size)) - return -ENOSPC; +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", + desc->tail, tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely(READ_ONCE(desc->head) >= size)) { + CT_ERROR(ct, "Invalid head offset %u >= %u)\n", + desc->head, size); + desc->status |= GUC_CTB_STATUS_OVERFLOW; + goto corrupted; + } +#endif /* - * Write the message. The format is the following: - * DW0: header (including action code) - * DW1: fence - * DW2+: action data + * dw0: CT header (including fence) + * dw1: HXG header (including action code) + * dw2+: action data */ - header = (len << GUC_CT_MSG_LEN_SHIFT) | - GUC_CT_MSG_SEND_STATUS | - (action[0] << GUC_CT_MSG_ACTION_SHIFT); + header = FIELD_PREP(GUC_CTB_MSG_0_FORMAT, GUC_CTB_FORMAT_HXG) | + FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | + FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence); - CT_DEBUG(ct, "writing %*ph %*ph %*ph\n", - 4, &header, 4, &fence, 4 * (len - 1), &action[1]); + type = (flags & INTEL_GUC_CT_SEND_NB) ? GUC_HXG_TYPE_EVENT : + GUC_HXG_TYPE_REQUEST; + hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, type) | + FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION | + GUC_HXG_EVENT_MSG_0_DATA0, action[0]); + + CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", + tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); cmds[tail] = header; tail = (tail + 1) % size; - cmds[tail] = fence; + cmds[tail] = hxg; tail = (tail + 1) % size; for (i = 1; i < len; i++) { @@ -443,14 +470,20 @@ static int ct_write(struct intel_guc_ct *ct, */ write_barrier(ct); - /* now update desc tail (back in bytes) */ - desc->tail = tail * 4; + /* update local copies */ + ctb->tail = tail; + GEM_BUG_ON(atomic_read(&ctb->space) < len + GUC_CTB_HDR_LEN); + atomic_sub(len + GUC_CTB_HDR_LEN, &ctb->space); + + /* now update descriptor */ + WRITE_ONCE(desc->tail, tail); + return 0; corrupted: - CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n", - desc->addr, desc->head, desc->tail, desc->size); - desc->is_in_error = 1; + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n", + desc->head, desc->tail, desc->status); + ctb->broken = true; return -EPIPE; } @@ -459,7 +492,7 @@ corrupted: * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -475,12 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) /* * Fast commands should complete in less than 10us, so sample quickly * up to that length of time, then switch to a slower sleep-wait loop. - * No GuC command should ever take longer than 10ms. + * No GuC command should ever take longer than 10ms but many GuC + * commands can be inflight at time, so use a 1s timeout on the slower + * sleep-wait loop. */ -#define done INTEL_GUC_MSG_IS_RESPONSE(READ_ONCE(req->status)) - err = wait_for_us(done, 10); +#define GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS 10 +#define GUC_CTB_RESPONSE_TIMEOUT_LONG_MS 1000 +#define done \ + (FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \ + GUC_HXG_ORIGIN_GUC) + err = wait_for_us(done, GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS); if (err) - err = wait_for(done, 10); + err = wait_for(done, GUC_CTB_RESPONSE_TIMEOUT_LONG_MS); #undef done if (unlikely(err)) @@ -490,6 +529,131 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; } +#define GUC_CTB_TIMEOUT_MS 1500 +static inline bool ct_deadlocked(struct intel_guc_ct *ct) +{ + long timeout = GUC_CTB_TIMEOUT_MS; + bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout; + + if (unlikely(ret)) { + struct guc_ct_buffer_desc *send = ct->ctbs.send.desc; + struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc; + + CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n", + ktime_ms_delta(ktime_get(), ct->stall_time), + send->status, recv->status); + ct->ctbs.send.broken = true; + } + + return ret; +} + +static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; + + /* + * We leave a certain amount of space in the G2H CTB buffer for + * unexpected G2H CTBs (e.g. logging, engine hang, etc...) + */ + return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw; +} + +static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw) +{ + lockdep_assert_held(&ct->ctbs.send.lock); + + GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw)); + + if (g2h_len_dw) + atomic_sub(g2h_len_dw, &ct->ctbs.recv.space); +} + +static inline void g2h_release_space(struct intel_guc_ct *ct, u32 g2h_len_dw) +{ + atomic_add(g2h_len_dw, &ct->ctbs.recv.space); +} + +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + struct guc_ct_buffer_desc *desc = ctb->desc; + u32 head; + u32 space; + + if (atomic_read(&ctb->space) >= len_dw) + return true; + + head = READ_ONCE(desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Invalid head offset %u >= %u)\n", + head, ctb->size); + desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + space = CIRC_SPACE(ctb->tail, head, ctb->size); + atomic_set(&ctb->space, space); + + return space >= len_dw; +} + +static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) +{ + lockdep_assert_held(&ct->ctbs.send.lock); + + if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); + + if (unlikely(ct_deadlocked(ct))) + return -EPIPE; + else + return -EBUSY; + } + + ct->stall_time = KTIME_MAX; + return 0; +} + +#define G2H_LEN_DW(f) ({ \ + typeof(f) f_ = (f); \ + FIELD_GET(INTEL_GUC_CT_SEND_G2H_DW_MASK, f_) ? \ + FIELD_GET(INTEL_GUC_CT_SEND_G2H_DW_MASK, f_) + \ + GUC_CTB_HXG_MSG_MIN_LEN : 0; \ +}) +static int ct_send_nb(struct intel_guc_ct *ct, + const u32 *action, + u32 len, + u32 flags) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + unsigned long spin_flags; + u32 g2h_len_dw = G2H_LEN_DW(flags); + u32 fence; + int ret; + + spin_lock_irqsave(&ctb->lock, spin_flags); + + ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN, g2h_len_dw); + if (unlikely(ret)) + goto out; + + fence = ct_get_next_fence(ct); + ret = ct_write(ct, action, len, fence, flags); + if (unlikely(ret)) + goto out; + + g2h_reserve_space(ct, g2h_len_dw); + intel_guc_notify(ct_to_guc(ct)); + +out: + spin_unlock_irqrestore(&ctb->lock, spin_flags); + + return ret; +} + static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -497,8 +661,10 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) { + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; + unsigned int sleep_period_ms = 1; u32 fence; int err; @@ -506,8 +672,33 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size); + might_sleep(); + + /* + * We use a lazy spin wait loop here as we believe that if the CT + * buffers are sized correctly the flow control condition should be + * rare. Reserving the maximum size in the G2H credits as we don't know + * how big the response is going to be. + */ +retry: + spin_lock_irqsave(&ctb->lock, flags); + if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN) || + !g2h_has_room(ct, GUC_CTB_HXG_MSG_MAX_LEN))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); + spin_unlock_irqrestore(&ctb->lock, flags); + + if (unlikely(ct_deadlocked(ct))) + return -EPIPE; + + if (msleep_interruptible(sleep_period_ms)) + return -EINTR; + sleep_period_ms = sleep_period_ms << 1; + + goto retry; + } - spin_lock_irqsave(&ct->ctbs.send.lock, flags); + ct->stall_time = KTIME_MAX; fence = ct_get_next_fence(ct); request.fence = fence; @@ -519,9 +710,10 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock); - err = ct_write(ct, action, len, fence); + err = ct_write(ct, action, len, fence, 0); + g2h_reserve_space(ct, GUC_CTB_HXG_MSG_MAX_LEN); - spin_unlock_irqrestore(&ct->ctbs.send.lock, flags); + spin_unlock_irqrestore(&ctb->lock, flags); if (unlikely(err)) goto unlink; @@ -529,24 +721,25 @@ static int ct_send(struct intel_guc_ct *ct, intel_guc_notify(ct_to_guc(ct)); err = wait_for_ct_request_update(&request, status); + g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN); if (unlikely(err)) goto unlink; - if (!INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(*status)) { + if (FIELD_GET(GUC_HXG_MSG_0_TYPE, *status) != GUC_HXG_TYPE_RESPONSE_SUCCESS) { err = -EIO; goto unlink; } if (response_buf) { /* There shall be no data in the status */ - WARN_ON(INTEL_GUC_MSG_TO_DATA(request.status)); + WARN_ON(FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, request.status)); /* Return actual response len */ err = request.response_len; } else { /* There shall be no response payload */ WARN_ON(request.response_len); /* Return data decoded from the status dword */ - err = INTEL_GUC_MSG_TO_DATA(*status); + err = FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, *status); } unlink: @@ -561,16 +754,25 @@ unlink: * Command Transport (CT) buffer based GuC send function. */ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, - u32 *response_buf, u32 response_buf_size) + u32 *response_buf, u32 response_buf_size, u32 flags) { u32 status = ~0; /* undefined */ int ret; if (unlikely(!ct->enabled)) { - WARN(1, "Unexpected send: action=%#x\n", *action); + struct intel_guc *guc = ct_to_guc(ct); + struct intel_uc *uc = container_of(guc, struct intel_uc, guc); + + WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action); return -ENODEV; } + if (unlikely(ct->ctbs.send.broken)) + return -EPIPE; + + if (flags & INTEL_GUC_CT_SEND_NB) + return ct_send_nb(ct, action, len, flags); + ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n", @@ -583,21 +785,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return ret; } -static inline unsigned int ct_header_get_len(u32 header) -{ - return (header >> GUC_CT_MSG_LEN_SHIFT) & GUC_CT_MSG_LEN_MASK; -} - -static inline unsigned int ct_header_get_action(u32 header) -{ - return (header >> GUC_CT_MSG_ACTION_SHIFT) & GUC_CT_MSG_ACTION_MASK; -} - -static inline bool ct_header_is_response(u32 header) -{ - return !!(header & GUC_CT_MSG_IS_RESPONSE); -} - static struct ct_incoming_msg *ct_alloc_msg(u32 num_dwords) { struct ct_incoming_msg *msg; @@ -621,8 +808,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 head = ctb->head; + u32 tail = READ_ONCE(desc->tail); u32 size = ctb->size; u32 *cmds = ctb->cmds; s32 available; @@ -630,17 +817,28 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) unsigned int i; u32 header; - if (unlikely(desc->is_in_error)) + if (unlikely(ctb->broken)) return -EPIPE; - if (unlikely(!IS_ALIGNED(head | tail, 4) || - (tail | head) >= size)) + if (unlikely(desc->status)) goto corrupted; - /* later calculations will be done in dwords */ - head /= 4; - tail /= 4; - size /= 4; + GEM_BUG_ON(head > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(head != READ_ONCE(desc->head))) { + CT_ERROR(ct, "Head was modified %u != %u\n", + desc->head, head); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } +#endif + if (unlikely(tail >= size)) { + CT_ERROR(ct, "Invalid tail offset %u >= %u)\n", + tail, size); + desc->status |= GUC_CTB_STATUS_OVERFLOW; + goto corrupted; + } /* tail == head condition indicates empty */ available = tail - head; @@ -652,14 +850,14 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size; - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); GEM_BUG_ON(available < 0); header = cmds[head]; head = (head + 1) % size; /* message len with header */ - len = ct_header_get_len(header) + 1; + len = FIELD_GET(GUC_CTB_MSG_0_NUM_DWORDS, header) + GUC_CTB_MSG_MIN_LEN; if (unlikely(len > (u32)available)) { CT_ERROR(ct, "Incomplete message %*ph %*ph %*ph\n", 4, &header, @@ -667,6 +865,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) size - head : available - 1), &cmds[head], 4 * (head + available - 1 > size ? available - 1 - size + head : 0), &cmds[0]); + desc->status |= GUC_CTB_STATUS_UNDERFLOW; goto corrupted; } @@ -689,65 +888,39 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg); - desc->head = head * 4; + /* update local copies */ + ctb->head = head; + + /* now update descriptor */ + WRITE_ONCE(desc->head, head); + return available - len; corrupted: - CT_ERROR(ct, "Corrupted descriptor addr=%#x head=%u tail=%u size=%u\n", - desc->addr, desc->head, desc->tail, desc->size); - desc->is_in_error = 1; + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n", + desc->head, desc->tail, desc->status); + ctb->broken = true; return -EPIPE; } -/** - * DOC: CTB GuC to Host response - * - * Format of the CTB GuC to Host response message is as follows:: - * - * +------------+---------+---------+---------+---------+---------+ - * | msg[0] | [1] | [2] | [3] | ... | [n-1] | - * +------------+---------+---------+---------+---------+---------+ - * | MESSAGE | MESSAGE PAYLOAD | - * + HEADER +---------+---------+---------+---------+---------+ - * | | 0 | 1 | 2 | ... | n | - * +============+=========+=========+=========+=========+=========+ - * | len >= 2 | FENCE | STATUS | response specific data | - * +------+-----+---------+---------+---------+---------+---------+ - * - * ^-----------------------len-----------------------^ - */ - static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *response) { - u32 header = response->msg[0]; - u32 len = ct_header_get_len(header); - u32 fence; - u32 status; - u32 datalen; + u32 len = FIELD_GET(GUC_CTB_MSG_0_NUM_DWORDS, response->msg[0]); + u32 fence = FIELD_GET(GUC_CTB_MSG_0_FENCE, response->msg[0]); + const u32 *hxg = &response->msg[GUC_CTB_MSG_MIN_LEN]; + const u32 *data = &hxg[GUC_HXG_MSG_MIN_LEN]; + u32 datalen = len - GUC_HXG_MSG_MIN_LEN; struct ct_request *req; unsigned long flags; bool found = false; int err = 0; - GEM_BUG_ON(!ct_header_is_response(header)); - - /* Response payload shall at least include fence and status */ - if (unlikely(len < 2)) { - CT_ERROR(ct, "Corrupted response (len %u)\n", len); - return -EPROTO; - } - - fence = response->msg[1]; - status = response->msg[2]; - datalen = len - 2; - - /* Format of the status follows RESPONSE message */ - if (unlikely(!INTEL_GUC_MSG_IS_RESPONSE(status))) { - CT_ERROR(ct, "Corrupted response (status %#x)\n", status); - return -EPROTO; - } + GEM_BUG_ON(len < GUC_HXG_MSG_MIN_LEN); + GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_ORIGIN, hxg[0]) != GUC_HXG_ORIGIN_GUC); + GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_RESPONSE_SUCCESS && + FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_RESPONSE_FAILURE); - CT_DEBUG(ct, "response fence %u status %#x\n", fence, status); + CT_DEBUG(ct, "response fence %u status %#x\n", fence, hxg[0]); spin_lock_irqsave(&ct->requests.lock, flags); list_for_each_entry(req, &ct->requests.pending, link) { @@ -763,18 +936,22 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r err = -EMSGSIZE; } if (datalen) - memcpy(req->response_buf, response->msg + 3, 4 * datalen); + memcpy(req->response_buf, data, 4 * datalen); req->response_len = datalen; - WRITE_ONCE(req->status, status); + WRITE_ONCE(req->status, hxg[0]); found = true; break; } - spin_unlock_irqrestore(&ct->requests.lock, flags); - if (!found) { CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence); - return -ENOKEY; + CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence, + ct->requests.last_fence); + list_for_each_entry(req, &ct->requests.pending, link) + CT_ERROR(ct, "request %u awaits response\n", + req->fence); + err = -ENOKEY; } + spin_unlock_irqrestore(&ct->requests.lock, flags); if (unlikely(err)) return err; @@ -786,14 +963,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request) { struct intel_guc *guc = ct_to_guc(ct); - u32 header, action, len; + const u32 *hxg; const u32 *payload; + u32 hxg_len, action, len; int ret; - header = request->msg[0]; - payload = &request->msg[1]; - action = ct_header_get_action(header); - len = ct_header_get_len(header); + hxg = &request->msg[GUC_CTB_MSG_MIN_LEN]; + hxg_len = request->size - GUC_CTB_MSG_MIN_LEN; + payload = &hxg[GUC_HXG_MSG_MIN_LEN]; + action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]); + len = hxg_len - GUC_HXG_MSG_MIN_LEN; CT_DEBUG(ct, "request %x %*ph\n", action, 4 * len, payload); @@ -801,6 +980,19 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_DEFAULT: ret = intel_guc_to_host_process_recv_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE: + ret = intel_guc_deregister_done_process_msg(guc, payload, + len); + break; + case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: + ret = intel_guc_sched_done_process_msg(guc, payload, len); + break; + case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: + ret = intel_guc_context_reset_process_msg(guc, payload, len); + break; + case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION: + ret = intel_guc_engine_failure_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; @@ -855,29 +1047,24 @@ static void ct_incoming_request_worker_func(struct work_struct *w) queue_work(system_unbound_wq, &ct->requests.worker); } -/** - * DOC: CTB GuC to Host request - * - * Format of the CTB GuC to Host request message is as follows:: - * - * +------------+---------+---------+---------+---------+---------+ - * | msg[0] | [1] | [2] | [3] | ... | [n-1] | - * +------------+---------+---------+---------+---------+---------+ - * | MESSAGE | MESSAGE PAYLOAD | - * + HEADER +---------+---------+---------+---------+---------+ - * | | 0 | 1 | 2 | ... | n | - * +============+=========+=========+=========+=========+=========+ - * | len | request specific data | - * +------+-----+---------+---------+---------+---------+---------+ - * - * ^-----------------------len-----------------------^ - */ - -static int ct_handle_request(struct intel_guc_ct *ct, struct ct_incoming_msg *request) +static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request) { + const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN]; + u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]); unsigned long flags; - GEM_BUG_ON(ct_header_is_response(request->msg[0])); + GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT); + + /* + * Adjusting the space must be done in IRQ or deadlock can occur as the + * CTB processing in the below workqueue can send CTBs which creates a + * circular dependency if the space was returned there. + */ + switch (action) { + case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: + case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE: + g2h_release_space(ct, request->size); + } spin_lock_irqsave(&ct->requests.lock, flags); list_add_tail(&request->link, &ct->requests.incoming); @@ -887,15 +1074,53 @@ static int ct_handle_request(struct intel_guc_ct *ct, struct ct_incoming_msg *re return 0; } -static void ct_handle_msg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg) +static int ct_handle_hxg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg) { - u32 header = msg->msg[0]; + u32 origin, type; + u32 *hxg; int err; - if (ct_header_is_response(header)) + if (unlikely(msg->size < GUC_CTB_HXG_MSG_MIN_LEN)) + return -EBADMSG; + + hxg = &msg->msg[GUC_CTB_MSG_MIN_LEN]; + + origin = FIELD_GET(GUC_HXG_MSG_0_ORIGIN, hxg[0]); + if (unlikely(origin != GUC_HXG_ORIGIN_GUC)) { + err = -EPROTO; + goto failed; + } + + type = FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]); + switch (type) { + case GUC_HXG_TYPE_EVENT: + err = ct_handle_event(ct, msg); + break; + case GUC_HXG_TYPE_RESPONSE_SUCCESS: + case GUC_HXG_TYPE_RESPONSE_FAILURE: err = ct_handle_response(ct, msg); + break; + default: + err = -EOPNOTSUPP; + } + + if (unlikely(err)) { +failed: + CT_ERROR(ct, "Failed to handle HXG message (%pe) %*ph\n", + ERR_PTR(err), 4 * GUC_HXG_MSG_MIN_LEN, hxg); + } + return err; +} + +static void ct_handle_msg(struct intel_guc_ct *ct, struct ct_incoming_msg *msg) +{ + u32 format = FIELD_GET(GUC_CTB_MSG_0_FORMAT, msg->msg[0]); + int err; + + if (format == GUC_CTB_FORMAT_HXG) + err = ct_handle_hxg(ct, msg); else - err = ct_handle_request(ct, msg); + err = -EOPNOTSUPP; if (unlikely(err)) { CT_ERROR(ct, "Failed to process CT message (%pe) %*ph\n", @@ -958,3 +1183,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct) ct_try_receive_message(ct); } + +void intel_guc_ct_print_info(struct intel_guc_ct *ct, + struct drm_printer *p) +{ + drm_printf(p, "CT %s\n", enableddisabled(ct->enabled)); + + if (!ct->enabled) + return; + + drm_printf(p, "H2G Space: %u\n", + atomic_read(&ct->ctbs.send.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.send.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.send.desc->tail); + drm_printf(p, "G2H Space: %u\n", + atomic_read(&ct->ctbs.recv.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.recv.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.recv.desc->tail); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index cb222f202301..f709a19c7e21 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -9,11 +9,14 @@ #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <linux/ktime.h> +#include <linux/wait.h> #include "intel_guc_fwif.h" struct i915_vma; struct intel_guc; +struct drm_printer; /** * DOC: Command Transport (CT). @@ -31,16 +34,25 @@ struct intel_guc; * @lock: protects access to the commands buffer and buffer descriptor * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer - * @size: size of the commands buffer + * @size: size of the commands buffer in dwords + * @resv_space: reserved space in buffer in dwords + * @head: local shadow copy of head in dwords + * @tail: local shadow copy of tail in dwords + * @space: local shadow copy of space in dwords + * @broken: flag to indicate if descriptor data is broken */ struct intel_guc_ct_buffer { spinlock_t lock; struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size; + u32 resv_space; + u32 tail; + u32 head; + atomic_t space; + bool broken; }; - /** Top-level structure for Command Transport related data * * Includes a pair of CT buffers for bi-directional communication and tracking @@ -58,8 +70,11 @@ struct intel_guc_ct { struct tasklet_struct receive_tasklet; + /** @wq: wait queue for g2h chanenl */ + wait_queue_head_t wq; + struct { - u32 last_fence; /* last fence used to send request */ + u16 last_fence; /* last fence used to send request */ spinlock_t lock; /* protects pending requests list */ struct list_head pending; /* requests waiting for response */ @@ -67,6 +82,9 @@ struct intel_guc_ct { struct list_head incoming; /* incoming requests */ struct work_struct worker; /* handler for incoming requests */ } requests; + + /** @stall_time: time of first time a CTB submission is stalled */ + ktime_t stall_time; }; void intel_guc_ct_init_early(struct intel_guc_ct *ct); @@ -85,8 +103,18 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) return ct->enabled; } +#define INTEL_GUC_CT_SEND_NB BIT(31) +#define INTEL_GUC_CT_SEND_G2H_DW_SHIFT 0 +#define INTEL_GUC_CT_SEND_G2H_DW_MASK (0xff << INTEL_GUC_CT_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) ({ \ + typeof(len) len_ = (len); \ + GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_CT_SEND_G2H_DW_MASK, len_)); \ + (FIELD_PREP(INTEL_GUC_CT_SEND_G2H_DW_MASK, len_) | INTEL_GUC_CT_SEND_NB); \ +}) int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, - u32 *response_buf, u32 response_buf_size); + u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct); +void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p); + #endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..887c8c8f35db 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,10 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_ads.h" +#include "gt/uc/intel_guc_submission.h" +#include "gt/uc/intel_guc_slpc.h" static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +26,57 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p); - /* Add more as required ... */ + if (!intel_guc_submission_is_used(guc)) + return 0; + + intel_guc_ct_print_info(&guc->ct, &p); + intel_guc_submission_print_info(guc, &p); + intel_guc_ads_print_policy_info(guc, &p); return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info); +static int guc_registered_contexts_show(struct seq_file *m, void *data) +{ + struct intel_guc *guc = m->private; + struct drm_printer p = drm_seq_file_printer(m); + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + intel_guc_submission_print_context_info(guc, &p); + + return 0; +} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts); + +static int guc_slpc_info_show(struct seq_file *m, void *unused) +{ + struct intel_guc *guc = m->private; + struct intel_guc_slpc *slpc = &guc->slpc; + struct drm_printer p = drm_seq_file_printer(m); + + if (!intel_guc_slpc_is_used(guc)) + return -ENODEV; + + return intel_guc_slpc_print_info(slpc, &p); +} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_slpc_info); + +static bool intel_eval_slpc_support(void *data) +{ + struct intel_guc *guc = (struct intel_guc *)data; + + return intel_guc_slpc_is_used(guc); +} + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL }, + { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, + { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index e9a9d85e2aa3..fa4be13c8854 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -12,19 +12,27 @@ #include "gt/intel_engine_types.h" #include "abi/guc_actions_abi.h" +#include "abi/guc_actions_slpc_abi.h" #include "abi/guc_errors_abi.h" #include "abi/guc_communication_mmio_abi.h" #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h" +/* Payload length only i.e. don't include G2H header length */ +#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET 2 +#define G2H_LEN_DW_DEREGISTER_CONTEXT 1 + +#define GUC_CONTEXT_DISABLE 0 +#define GUC_CONTEXT_ENABLE 1 + #define GUC_CLIENT_PRIORITY_KMD_HIGH 0 #define GUC_CLIENT_PRIORITY_HIGH 1 #define GUC_CLIENT_PRIORITY_KMD_NORMAL 2 #define GUC_CLIENT_PRIORITY_NORMAL 3 #define GUC_CLIENT_PRIORITY_NUM 4 -#define GUC_MAX_STAGE_DESCRIPTORS 1024 -#define GUC_INVALID_STAGE_ID GUC_MAX_STAGE_DESCRIPTORS +#define GUC_MAX_LRC_DESCRIPTORS 65535 +#define GUC_INVALID_LRC_ID GUC_MAX_LRC_DESCRIPTORS #define GUC_RENDER_ENGINE 0 #define GUC_VIDEO_ENGINE 1 @@ -81,15 +89,14 @@ #define GUC_LOG_ALLOC_IN_MEGABYTE (1 << 3) #define GUC_LOG_CRASH_SHIFT 4 #define GUC_LOG_CRASH_MASK (0x3 << GUC_LOG_CRASH_SHIFT) -#define GUC_LOG_DPC_SHIFT 6 -#define GUC_LOG_DPC_MASK (0x7 << GUC_LOG_DPC_SHIFT) -#define GUC_LOG_ISR_SHIFT 9 -#define GUC_LOG_ISR_MASK (0x7 << GUC_LOG_ISR_SHIFT) +#define GUC_LOG_DEBUG_SHIFT 6 +#define GUC_LOG_DEBUG_MASK (0xF << GUC_LOG_DEBUG_SHIFT) #define GUC_LOG_BUF_ADDR_SHIFT 12 #define GUC_CTL_WA 1 #define GUC_CTL_FEATURE 2 #define GUC_CTL_DISABLE_SCHEDULER (1 << 14) +#define GUC_CTL_ENABLE_SLPC BIT(2) #define GUC_CTL_DEBUG 3 #define GUC_LOG_VERBOSITY_SHIFT 0 @@ -136,6 +143,11 @@ #define GUC_ID_TO_ENGINE_INSTANCE(guc_id) \ (((guc_id) & GUC_ENGINE_INSTANCE_MASK) >> GUC_ENGINE_INSTANCE_SHIFT) +#define SLPC_EVENT(id, c) (\ +FIELD_PREP(HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ID, id) | \ +FIELD_PREP(HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ARGC, c) \ +) + static inline u8 engine_class_to_guc_class(u8 class) { BUILD_BUG_ON(GUC_RENDER_CLASS != RENDER_CLASS); @@ -177,66 +189,40 @@ struct guc_process_desc { u32 reserved[30]; } __packed; -/* engine id and context id is packed into guc_execlist_context.context_id*/ -#define GUC_ELC_CTXID_OFFSET 0 -#define GUC_ELC_ENGINE_OFFSET 29 +#define CONTEXT_REGISTRATION_FLAG_KMD BIT(0) -/* The execlist context including software and HW information */ -struct guc_execlist_context { - u32 context_desc; - u32 context_id; - u32 ring_status; - u32 ring_lrca; - u32 ring_begin; - u32 ring_end; - u32 ring_next_free_location; - u32 ring_current_tail_pointer_value; - u8 engine_state_submit_value; - u8 engine_state_wait_value; - u16 pagefault_count; - u16 engine_submit_queue_count; -} __packed; +#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000 +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000 + +/* Preempt to idle on quantum expiry */ +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE BIT(0) /* - * This structure describes a stage set arranged for a particular communication - * between uKernel (GuC) and Driver (KMD). Technically, this is known as a - * "GuC Context descriptor" in the specs, but we use the term "stage descriptor" - * to avoid confusion with all the other things already named "context" in the - * driver. A static pool of these descriptors are stored inside a GEM object - * (stage_desc_pool) which is held for the entire lifetime of our interaction - * with the GuC, being allocated before the GuC is loaded with its firmware. + * GuC Context registration descriptor. + * FIXME: This is only required to exist during context registration. + * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC + * is not required. */ -struct guc_stage_desc { - u32 sched_common_area; - u32 stage_id; - u32 pas_id; - u8 engines_used; - u64 db_trigger_cpu; - u32 db_trigger_uk; - u64 db_trigger_phy; - u16 db_id; - - struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM]; - - u8 attribute; - +struct guc_lrc_desc { + u32 hw_context_desc; + u32 slpm_perf_mode_hint; /* SPLC v1 only */ + u32 slpm_freq_hint; + u32 engine_submit_mask; /* In logical space */ + u8 engine_class; + u8 reserved0[3]; u32 priority; - - u32 wq_sampled_tail_offset; - u32 wq_total_submit_enqueues; - u32 process_desc; u32 wq_addr; u32 wq_size; - - u32 engine_presence; - - u8 engine_suspended; - - u8 reserved0[3]; - u64 reserved1[1]; - - u64 desc_private; + u32 context_flags; /* CONTEXT_REGISTRATION_* */ + /* Time for one workload to execute. (in micro seconds) */ + u32 execution_quantum; + /* Time to wait for a preemption request to complete before issuing a + * reset. (in micro seconds). + */ + u32 preemption_timeout; + u32 policy_flags; /* CONTEXT_POLICY_* */ + u32 reserved1[19]; } __packed; #define GUC_POWER_UNSPECIFIED 0 @@ -247,32 +233,14 @@ struct guc_stage_desc { /* Scheduling policy settings */ -/* Reset engine upon preempt failure */ -#define POLICY_RESET_ENGINE (1<<0) -/* Preempt to idle on quantum expiry */ -#define POLICY_PREEMPT_TO_IDLE (1<<1) +#define GLOBAL_POLICY_MAX_NUM_WI 15 -#define POLICY_MAX_NUM_WI 15 -#define POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000 -#define POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000 -#define POLICY_DEFAULT_PREEMPTION_TIME_US 500000 -#define POLICY_DEFAULT_FAULT_TIME_US 250000 +/* Don't reset an engine upon preemption failure */ +#define GLOBAL_POLICY_DISABLE_ENGINE_RESET BIT(0) -struct guc_policy { - /* Time for one workload to execute. (in micro seconds) */ - u32 execution_quantum; - /* Time to wait for a preemption request to completed before issuing a - * reset. (in micro seconds). */ - u32 preemption_time; - /* How much time to allow to run after the first fault is observed. - * Then preempt afterwards. (in micro seconds) */ - u32 fault_time; - u32 policy_flags; - u32 reserved[8]; -} __packed; +#define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000 struct guc_policies { - struct guc_policy policy[GUC_CLIENT_PRIORITY_NUM][GUC_MAX_ENGINE_CLASSES]; u32 submission_queue_depth[GUC_MAX_ENGINE_CLASSES]; /* In micro seconds. How much time to allow before DPC processing is * called back via interrupt (to prevent DPC queue drain starving). @@ -286,6 +254,7 @@ struct guc_policies { * idle. */ u32 max_num_work_items; + u32 global_flags; u32 reserved[4]; } __packed; @@ -311,29 +280,13 @@ struct guc_gt_system_info { u32 generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_MAX]; } __packed; -/* Clients info */ -struct guc_ct_pool_entry { - struct guc_ct_buffer_desc desc; - u32 reserved[7]; -} __packed; - -#define GUC_CT_POOL_SIZE 2 - -struct guc_clients_info { - u32 clients_num; - u32 reserved0[13]; - u32 ct_pool_addr; - u32 ct_pool_count; - u32 reserved[4]; -} __packed; - /* GuC Additional Data Struct */ struct guc_ads { struct guc_mmio_reg_set reg_state_list[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; u32 reserved0; u32 scheduler_policies; u32 gt_system_info; - u32 clients_info; + u32 reserved1; u32 control_data; u32 golden_context_lrca[GUC_MAX_ENGINE_CLASSES]; u32 eng_state_size[GUC_MAX_ENGINE_CLASSES]; @@ -344,8 +297,7 @@ struct guc_ads { /* GuC logging structures */ enum guc_log_buffer_type { - GUC_ISR_LOG_BUFFER, - GUC_DPC_LOG_BUFFER, + GUC_DEBUG_LOG_BUFFER, GUC_CRASH_DUMP_LOG_BUFFER, GUC_MAX_LOG_BUFFER }; @@ -414,23 +366,6 @@ struct guc_shared_ctx_data { struct guc_ctx_report preempt_ctx_report[GUC_MAX_ENGINES_NUM]; } __packed; -#define __INTEL_GUC_MSG_GET(T, m) \ - (((m) & INTEL_GUC_MSG_ ## T ## _MASK) >> INTEL_GUC_MSG_ ## T ## _SHIFT) -#define INTEL_GUC_MSG_TO_TYPE(m) __INTEL_GUC_MSG_GET(TYPE, m) -#define INTEL_GUC_MSG_TO_DATA(m) __INTEL_GUC_MSG_GET(DATA, m) -#define INTEL_GUC_MSG_TO_CODE(m) __INTEL_GUC_MSG_GET(CODE, m) - -#define __INTEL_GUC_MSG_TYPE_IS(T, m) \ - (INTEL_GUC_MSG_TO_TYPE(m) == INTEL_GUC_MSG_TYPE_ ## T) -#define INTEL_GUC_MSG_IS_REQUEST(m) __INTEL_GUC_MSG_TYPE_IS(REQUEST, m) -#define INTEL_GUC_MSG_IS_RESPONSE(m) __INTEL_GUC_MSG_TYPE_IS(RESPONSE, m) - -#define INTEL_GUC_MSG_IS_RESPONSE_SUCCESS(m) \ - (typecheck(u32, (m)) && \ - ((m) & (INTEL_GUC_MSG_TYPE_MASK | INTEL_GUC_MSG_CODE_MASK)) == \ - ((INTEL_GUC_MSG_TYPE_RESPONSE << INTEL_GUC_MSG_TYPE_SHIFT) | \ - (INTEL_GUC_RESPONSE_STATUS_SUCCESS << INTEL_GUC_MSG_CODE_SHIFT))) - /* This action will be programmed in C1BC - SOFT_SCRATCH_15_REG */ enum intel_guc_recv_message { INTEL_GUC_RECV_MSG_CRASH_DUMP_POSTED = BIT(1), diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index c36d5eb5bbb9..ac0931f0374b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -197,10 +197,8 @@ static bool guc_check_log_buf_overflow(struct intel_guc_log *log, static unsigned int guc_get_log_buffer_size(enum guc_log_buffer_type type) { switch (type) { - case GUC_ISR_LOG_BUFFER: - return ISR_BUFFER_SIZE; - case GUC_DPC_LOG_BUFFER: - return DPC_BUFFER_SIZE; + case GUC_DEBUG_LOG_BUFFER: + return DEBUG_BUFFER_SIZE; case GUC_CRASH_DUMP_LOG_BUFFER: return CRASH_BUFFER_SIZE; default: @@ -245,7 +243,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) src_data += PAGE_SIZE; dst_data += PAGE_SIZE; - for (type = GUC_ISR_LOG_BUFFER; type < GUC_MAX_LOG_BUFFER; type++) { + for (type = GUC_DEBUG_LOG_BUFFER; type < GUC_MAX_LOG_BUFFER; type++) { /* * Make a copy of the state structure, inside GuC log buffer * (which is uncached mapped), on the stack to avoid reading @@ -463,21 +461,16 @@ int intel_guc_log_create(struct intel_guc_log *log) * +===============================+ 00B * | Crash dump state header | * +-------------------------------+ 32B - * | DPC state header | + * | Debug state header | * +-------------------------------+ 64B - * | ISR state header | - * +-------------------------------+ 96B * | | * +===============================+ PAGE_SIZE (4KB) * | Crash Dump logs | * +===============================+ + CRASH_SIZE - * | DPC logs | - * +===============================+ + DPC_SIZE - * | ISR logs | - * +===============================+ + ISR_SIZE + * | Debug logs | + * +===============================+ + DEBUG_SIZE */ - guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DPC_BUFFER_SIZE + - ISR_BUFFER_SIZE; + guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE; vma = intel_guc_allocate_vma(guc, guc_log_size); if (IS_ERR(vma)) { @@ -675,10 +668,8 @@ static const char * stringify_guc_log_type(enum guc_log_buffer_type type) { switch (type) { - case GUC_ISR_LOG_BUFFER: - return "ISR"; - case GUC_DPC_LOG_BUFFER: - return "DPC"; + case GUC_DEBUG_LOG_BUFFER: + return "DEBUG"; case GUC_CRASH_DUMP_LOG_BUFFER: return "CRASH"; default: @@ -708,7 +699,7 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p) drm_printf(p, "\tRelay full count: %u\n", log->relay.full_count); - for (type = GUC_ISR_LOG_BUFFER; type < GUC_MAX_LOG_BUFFER; type++) { + for (type = GUC_DEBUG_LOG_BUFFER; type < GUC_MAX_LOG_BUFFER; type++) { drm_printf(p, "\t%s:\tflush count %10u, overflow count %10u\n", stringify_guc_log_type(type), log->stats[type].flush, diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h index 11fccd0b2294..ac1ee1d5ce10 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h @@ -17,12 +17,10 @@ struct intel_guc; #ifdef CONFIG_DRM_I915_DEBUG_GUC #define CRASH_BUFFER_SIZE SZ_2M -#define DPC_BUFFER_SIZE SZ_8M -#define ISR_BUFFER_SIZE SZ_8M +#define DEBUG_BUFFER_SIZE SZ_16M #else #define CRASH_BUFFER_SIZE SZ_8K -#define DPC_BUFFER_SIZE SZ_32K -#define ISR_BUFFER_SIZE SZ_32K +#define DEBUG_BUFFER_SIZE SZ_64K #endif /* diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c new file mode 100644 index 000000000000..fc805d466d99 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include "intel_guc_rc.h" +#include "gt/intel_gt.h" +#include "i915_drv.h" + +static bool __guc_rc_supported(struct intel_guc *guc) +{ + /* GuC RC is unavailable for pre-Gen12 */ + return guc->submission_supported && + GRAPHICS_VER(guc_to_gt(guc)->i915) >= 12; +} + +static bool __guc_rc_selected(struct intel_guc *guc) +{ + if (!intel_guc_rc_is_supported(guc)) + return false; + + return guc->submission_selected; +} + +void intel_guc_rc_init_early(struct intel_guc *guc) +{ + guc->rc_supported = __guc_rc_supported(guc); + guc->rc_selected = __guc_rc_selected(guc); +} + +static int guc_action_control_gucrc(struct intel_guc *guc, bool enable) +{ + u32 rc_mode = enable ? INTEL_GUCRC_FIRMWARE_CONTROL : + INTEL_GUCRC_HOST_CONTROL; + u32 action[] = { + INTEL_GUC_ACTION_SETUP_PC_GUCRC, + rc_mode + }; + int ret; + + ret = intel_guc_send(guc, action, ARRAY_SIZE(action)); + ret = ret > 0 ? -EPROTO : ret; + + return ret; +} + +static int __guc_rc_control(struct intel_guc *guc, bool enable) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct drm_device *drm = &guc_to_gt(guc)->i915->drm; + int ret; + + if (!intel_uc_uses_guc_rc(>->uc)) + return -EOPNOTSUPP; + + if (!intel_guc_is_ready(guc)) + return -EINVAL; + + ret = guc_action_control_gucrc(guc, enable); + if (ret) { + drm_err(drm, "Failed to %s GuC RC (%pe)\n", + enabledisable(enable), ERR_PTR(ret)); + return ret; + } + + drm_info(>->i915->drm, "GuC RC: %s\n", + enableddisabled(enable)); + + return 0; +} + +int intel_guc_rc_enable(struct intel_guc *guc) +{ + return __guc_rc_control(guc, true); +} + +int intel_guc_rc_disable(struct intel_guc *guc) +{ + return __guc_rc_control(guc, false); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.h new file mode 100644 index 000000000000..57e86c337838 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef _INTEL_GUC_RC_H_ +#define _INTEL_GUC_RC_H_ + +#include "intel_guc_submission.h" + +void intel_guc_rc_init_early(struct intel_guc *guc); + +static inline bool intel_guc_rc_is_supported(struct intel_guc *guc) +{ + return guc->rc_supported; +} + +static inline bool intel_guc_rc_is_wanted(struct intel_guc *guc) +{ + return guc->submission_selected && intel_guc_rc_is_supported(guc); +} + +static inline bool intel_guc_rc_is_used(struct intel_guc *guc) +{ + return intel_guc_submission_is_used(guc) && intel_guc_rc_is_wanted(guc); +} + +int intel_guc_rc_enable(struct intel_guc *guc); +int intel_guc_rc_disable(struct intel_guc *guc); + +#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c new file mode 100644 index 000000000000..65a3e7fdb2b2 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c @@ -0,0 +1,626 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_guc_slpc.h" +#include "gt/intel_gt.h" + +static inline struct intel_guc *slpc_to_guc(struct intel_guc_slpc *slpc) +{ + return container_of(slpc, struct intel_guc, slpc); +} + +static inline struct intel_gt *slpc_to_gt(struct intel_guc_slpc *slpc) +{ + return guc_to_gt(slpc_to_guc(slpc)); +} + +static inline struct drm_i915_private *slpc_to_i915(struct intel_guc_slpc *slpc) +{ + return slpc_to_gt(slpc)->i915; +} + +static bool __detect_slpc_supported(struct intel_guc *guc) +{ + /* GuC SLPC is unavailable for pre-Gen12 */ + return guc->submission_supported && + GRAPHICS_VER(guc_to_gt(guc)->i915) >= 12; +} + +static bool __guc_slpc_selected(struct intel_guc *guc) +{ + if (!intel_guc_slpc_is_supported(guc)) + return false; + + return guc->submission_selected; +} + +void intel_guc_slpc_init_early(struct intel_guc_slpc *slpc) +{ + struct intel_guc *guc = slpc_to_guc(slpc); + + slpc->supported = __detect_slpc_supported(guc); + slpc->selected = __guc_slpc_selected(guc); +} + +static void slpc_mem_set_param(struct slpc_shared_data *data, + u32 id, u32 value) +{ + GEM_BUG_ON(id >= SLPC_MAX_OVERRIDE_PARAMETERS); + /* + * When the flag bit is set, corresponding value will be read + * and applied by SLPC. + */ + data->override_params.bits[id >> 5] |= (1 << (id % 32)); + data->override_params.values[id] = value; +} + +static void slpc_mem_set_enabled(struct slpc_shared_data *data, + u8 enable_id, u8 disable_id) +{ + /* + * Enabling a param involves setting the enable_id + * to 1 and disable_id to 0. + */ + slpc_mem_set_param(data, enable_id, 1); + slpc_mem_set_param(data, disable_id, 0); +} + +static void slpc_mem_set_disabled(struct slpc_shared_data *data, + u8 enable_id, u8 disable_id) +{ + /* + * Disabling a param involves setting the enable_id + * to 0 and disable_id to 1. + */ + slpc_mem_set_param(data, disable_id, 1); + slpc_mem_set_param(data, enable_id, 0); +} + +int intel_guc_slpc_init(struct intel_guc_slpc *slpc) +{ + struct intel_guc *guc = slpc_to_guc(slpc); + struct drm_i915_private *i915 = slpc_to_i915(slpc); + u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data)); + int err; + + GEM_BUG_ON(slpc->vma); + + err = intel_guc_allocate_and_map_vma(guc, size, &slpc->vma, (void **)&slpc->vaddr); + if (unlikely(err)) { + drm_err(&i915->drm, + "Failed to allocate SLPC struct (err=%pe)\n", + ERR_PTR(err)); + return err; + } + + slpc->max_freq_softlimit = 0; + slpc->min_freq_softlimit = 0; + + return err; +} + +static u32 slpc_get_state(struct intel_guc_slpc *slpc) +{ + struct slpc_shared_data *data; + + GEM_BUG_ON(!slpc->vma); + + drm_clflush_virt_range(slpc->vaddr, sizeof(u32)); + data = slpc->vaddr; + + return data->header.global_state; +} + +static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, u32 value) +{ + u32 request[] = { + GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST, + SLPC_EVENT(SLPC_EVENT_PARAMETER_SET, 2), + id, + value, + }; + int ret; + + ret = intel_guc_send(guc, request, ARRAY_SIZE(request)); + + return ret > 0 ? -EPROTO : ret; +} + +static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id) +{ + u32 request[] = { + GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST, + SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 2), + id, + }; + + return intel_guc_send(guc, request, ARRAY_SIZE(request)); +} + +static bool slpc_is_running(struct intel_guc_slpc *slpc) +{ + return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING; +} + +static int guc_action_slpc_query(struct intel_guc *guc, u32 offset) +{ + u32 request[] = { + GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST, + SLPC_EVENT(SLPC_EVENT_QUERY_TASK_STATE, 2), + offset, + 0, + }; + int ret; + + ret = intel_guc_send(guc, request, ARRAY_SIZE(request)); + + return ret > 0 ? -EPROTO : ret; +} + +static int slpc_query_task_state(struct intel_guc_slpc *slpc) +{ + struct intel_guc *guc = slpc_to_guc(slpc); + struct drm_i915_private *i915 = slpc_to_i915(slpc); + u32 offset = intel_guc_ggtt_offset(guc, slpc->vma); + int ret; + + ret = guc_action_slpc_query(guc, offset); + if (unlikely(ret)) + drm_err(&i915->drm, "Failed to query task state (%pe)\n", + ERR_PTR(ret)); + + drm_clflush_virt_range(slpc->vaddr, SLPC_PAGE_SIZE_BYTES); + + return ret; +} + +static int slpc_set_param(struct intel_guc_slpc *slpc, u8 id, u32 value) +{ + struct intel_guc *guc = slpc_to_guc(slpc); + struct drm_i915_private *i915 = slpc_to_i915(slpc); + int ret; + + GEM_BUG_ON(id >= SLPC_MAX_PARAM); + + ret = guc_action_slpc_set_param(guc, id, value); + if (ret) + drm_err(&i915->drm, "Failed to set param %d to %u (%pe)\n", + id, value, ERR_PTR(ret)); + + return ret; +} + +static int slpc_unset_param(struct intel_guc_slpc *slpc, + u8 id) +{ + struct intel_guc *guc = slpc_to_guc(slpc); + + GEM_BUG_ON(id >= SLPC_MAX_PARAM); + + return guc_action_slpc_unset_param(guc, id); +} + +static const char *slpc_global_state_to_string(enum slpc_global_state state) +{ + switch (state) { + case SLPC_GLOBAL_STATE_NOT_RUNNING: + return "not running"; + case SLPC_GLOBAL_STATE_INITIALIZING: + return "initializing"; + case SLPC_GLOBAL_STATE_RESETTING: + return "resetting"; + case SLPC_GLOBAL_STATE_RUNNING: + return "running"; + case SLPC_GLOBAL_STATE_SHUTTING_DOWN: + return "shutting down"; + case SLPC_GLOBAL_STATE_ERROR: + return "error"; + default: + return "unknown"; + } +} + +static const char *slpc_get_state_string(struct intel_guc_slpc *slpc) +{ + return slpc_global_state_to_string(slpc_get_state(slpc)); +} + +static int guc_action_slpc_reset(struct intel_guc *guc, u32 offset) +{ + u32 request[] = { + GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST, + SLPC_EVENT(SLPC_EVENT_RESET, 2), + offset, + 0, + }; + int ret; + + ret = intel_guc_send(guc, request, ARRAY_SIZE(request)); + + return ret > 0 ? -EPROTO : ret; +} + +static int slpc_reset(struct intel_guc_slpc *slpc) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + struct intel_guc *guc = slpc_to_guc(slpc); + u32 offset = intel_guc_ggtt_offset(guc, slpc->vma); + int ret; + + ret = guc_action_slpc_reset(guc, offset); + + if (unlikely(ret < 0)) { + drm_err(&i915->drm, "SLPC reset action failed (%pe)\n", + ERR_PTR(ret)); + return ret; + } + + if (!ret) { + if (wait_for(slpc_is_running(slpc), SLPC_RESET_TIMEOUT_MS)) { + drm_err(&i915->drm, "SLPC not enabled! State = %s\n", + slpc_get_state_string(slpc)); + return -EIO; + } + } + + return 0; +} + +static u32 slpc_decode_min_freq(struct intel_guc_slpc *slpc) +{ + struct slpc_shared_data *data = slpc->vaddr; + + GEM_BUG_ON(!slpc->vma); + + return DIV_ROUND_CLOSEST(REG_FIELD_GET(SLPC_MIN_UNSLICE_FREQ_MASK, + data->task_state_data.freq) * + GT_FREQUENCY_MULTIPLIER, GEN9_FREQ_SCALER); +} + +static u32 slpc_decode_max_freq(struct intel_guc_slpc *slpc) +{ + struct slpc_shared_data *data = slpc->vaddr; + + GEM_BUG_ON(!slpc->vma); + + return DIV_ROUND_CLOSEST(REG_FIELD_GET(SLPC_MAX_UNSLICE_FREQ_MASK, + data->task_state_data.freq) * + GT_FREQUENCY_MULTIPLIER, GEN9_FREQ_SCALER); +} + +static void slpc_shared_data_reset(struct slpc_shared_data *data) +{ + memset(data, 0, sizeof(struct slpc_shared_data)); + + data->header.size = sizeof(struct slpc_shared_data); + + /* Enable only GTPERF task, disable others */ + slpc_mem_set_enabled(data, SLPC_PARAM_TASK_ENABLE_GTPERF, + SLPC_PARAM_TASK_DISABLE_GTPERF); + + slpc_mem_set_disabled(data, SLPC_PARAM_TASK_ENABLE_BALANCER, + SLPC_PARAM_TASK_DISABLE_BALANCER); + + slpc_mem_set_disabled(data, SLPC_PARAM_TASK_ENABLE_DCC, + SLPC_PARAM_TASK_DISABLE_DCC); +} + +/** + * intel_guc_slpc_set_max_freq() - Set max frequency limit for SLPC. + * @slpc: pointer to intel_guc_slpc. + * @val: frequency (MHz) + * + * This function will invoke GuC SLPC action to update the max frequency + * limit for unslice. + * + * Return: 0 on success, non-zero error code on failure. + */ +int intel_guc_slpc_set_max_freq(struct intel_guc_slpc *slpc, u32 val) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + intel_wakeref_t wakeref; + int ret; + + if (val < slpc->min_freq || + val > slpc->rp0_freq || + val < slpc->min_freq_softlimit) + return -EINVAL; + + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { + ret = slpc_set_param(slpc, + SLPC_PARAM_GLOBAL_MAX_GT_UNSLICE_FREQ_MHZ, + val); + + /* Return standardized err code for sysfs calls */ + if (ret) + ret = -EIO; + } + + if (!ret) + slpc->max_freq_softlimit = val; + + return ret; +} + +/** + * intel_guc_slpc_get_max_freq() - Get max frequency limit for SLPC. + * @slpc: pointer to intel_guc_slpc. + * @val: pointer to val which will hold max frequency (MHz) + * + * This function will invoke GuC SLPC action to read the max frequency + * limit for unslice. + * + * Return: 0 on success, non-zero error code on failure. + */ +int intel_guc_slpc_get_max_freq(struct intel_guc_slpc *slpc, u32 *val) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + intel_wakeref_t wakeref; + int ret = 0; + + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { + /* Force GuC to update task data */ + ret = slpc_query_task_state(slpc); + + if (!ret) + *val = slpc_decode_max_freq(slpc); + } + + return ret; +} + +/** + * intel_guc_slpc_set_min_freq() - Set min frequency limit for SLPC. + * @slpc: pointer to intel_guc_slpc. + * @val: frequency (MHz) + * + * This function will invoke GuC SLPC action to update the min unslice + * frequency. + * + * Return: 0 on success, non-zero error code on failure. + */ +int intel_guc_slpc_set_min_freq(struct intel_guc_slpc *slpc, u32 val) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + intel_wakeref_t wakeref; + int ret; + + if (val < slpc->min_freq || + val > slpc->rp0_freq || + val > slpc->max_freq_softlimit) + return -EINVAL; + + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { + ret = slpc_set_param(slpc, + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ, + val); + + /* Return standardized err code for sysfs calls */ + if (ret) + ret = -EIO; + } + + if (!ret) + slpc->min_freq_softlimit = val; + + return ret; +} + +/** + * intel_guc_slpc_get_min_freq() - Get min frequency limit for SLPC. + * @slpc: pointer to intel_guc_slpc. + * @val: pointer to val which will hold min frequency (MHz) + * + * This function will invoke GuC SLPC action to read the min frequency + * limit for unslice. + * + * Return: 0 on success, non-zero error code on failure. + */ +int intel_guc_slpc_get_min_freq(struct intel_guc_slpc *slpc, u32 *val) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + intel_wakeref_t wakeref; + int ret = 0; + + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { + /* Force GuC to update task data */ + ret = slpc_query_task_state(slpc); + + if (!ret) + *val = slpc_decode_min_freq(slpc); + } + + return ret; +} + +void intel_guc_pm_intrmsk_enable(struct intel_gt *gt) +{ + u32 pm_intrmsk_mbz = 0; + + /* + * Allow GuC to receive ARAT timer expiry event. + * This interrupt register is setup by RPS code + * when host based Turbo is enabled. + */ + pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK; + + intel_uncore_rmw(gt->uncore, + GEN6_PMINTRMSK, pm_intrmsk_mbz, 0); +} + +static int slpc_set_softlimits(struct intel_guc_slpc *slpc) +{ + int ret = 0; + + /* + * Softlimits are initially equivalent to platform limits + * unless they have deviated from defaults, in which case, + * we retain the values and set min/max accordingly. + */ + if (!slpc->max_freq_softlimit) + slpc->max_freq_softlimit = slpc->rp0_freq; + else if (slpc->max_freq_softlimit != slpc->rp0_freq) + ret = intel_guc_slpc_set_max_freq(slpc, + slpc->max_freq_softlimit); + + if (unlikely(ret)) + return ret; + + if (!slpc->min_freq_softlimit) + slpc->min_freq_softlimit = slpc->min_freq; + else if (slpc->min_freq_softlimit != slpc->min_freq) + return intel_guc_slpc_set_min_freq(slpc, + slpc->min_freq_softlimit); + + return 0; +} + +static int slpc_ignore_eff_freq(struct intel_guc_slpc *slpc, bool ignore) +{ + int ret = 0; + + if (ignore) { + ret = slpc_set_param(slpc, + SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY, + ignore); + if (!ret) + return slpc_set_param(slpc, + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ, + slpc->min_freq); + } else { + ret = slpc_unset_param(slpc, + SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY); + if (!ret) + return slpc_unset_param(slpc, + SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ); + } + + return ret; +} + +static int slpc_use_fused_rp0(struct intel_guc_slpc *slpc) +{ + /* Force SLPC to used platform rp0 */ + return slpc_set_param(slpc, + SLPC_PARAM_GLOBAL_MAX_GT_UNSLICE_FREQ_MHZ, + slpc->rp0_freq); +} + +static void slpc_get_rp_values(struct intel_guc_slpc *slpc) +{ + u32 rp_state_cap; + + rp_state_cap = intel_uncore_read(slpc_to_gt(slpc)->uncore, + GEN6_RP_STATE_CAP); + + slpc->rp0_freq = REG_FIELD_GET(RP0_CAP_MASK, rp_state_cap) * + GT_FREQUENCY_MULTIPLIER; + slpc->rp1_freq = REG_FIELD_GET(RP1_CAP_MASK, rp_state_cap) * + GT_FREQUENCY_MULTIPLIER; + slpc->min_freq = REG_FIELD_GET(RPN_CAP_MASK, rp_state_cap) * + GT_FREQUENCY_MULTIPLIER; +} + +/* + * intel_guc_slpc_enable() - Start SLPC + * @slpc: pointer to intel_guc_slpc. + * + * SLPC is enabled by setting up the shared data structure and + * sending reset event to GuC SLPC. Initial data is setup in + * intel_guc_slpc_init. Here we send the reset event. We do + * not currently need a slpc_disable since this is taken care + * of automatically when a reset/suspend occurs and the GuC + * CTB is destroyed. + * + * Return: 0 on success, non-zero error code on failure. + */ +int intel_guc_slpc_enable(struct intel_guc_slpc *slpc) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + int ret; + + GEM_BUG_ON(!slpc->vma); + + slpc_shared_data_reset(slpc->vaddr); + + ret = slpc_reset(slpc); + if (unlikely(ret < 0)) { + drm_err(&i915->drm, "SLPC Reset event returned (%pe)\n", + ERR_PTR(ret)); + return ret; + } + + ret = slpc_query_task_state(slpc); + if (unlikely(ret < 0)) + return ret; + + intel_guc_pm_intrmsk_enable(&i915->gt); + + slpc_get_rp_values(slpc); + + /* Ignore efficient freq and set min to platform min */ + ret = slpc_ignore_eff_freq(slpc, true); + if (unlikely(ret)) { + drm_err(&i915->drm, "Failed to set SLPC min to RPn (%pe)\n", + ERR_PTR(ret)); + return ret; + } + + /* Set SLPC max limit to RP0 */ + ret = slpc_use_fused_rp0(slpc); + if (unlikely(ret)) { + drm_err(&i915->drm, "Failed to set SLPC max to RP0 (%pe)\n", + ERR_PTR(ret)); + return ret; + } + + /* Revert SLPC min/max to softlimits if necessary */ + ret = slpc_set_softlimits(slpc); + if (unlikely(ret)) { + drm_err(&i915->drm, "Failed to set SLPC softlimits (%pe)\n", + ERR_PTR(ret)); + return ret; + } + + return 0; +} + +int intel_guc_slpc_print_info(struct intel_guc_slpc *slpc, struct drm_printer *p) +{ + struct drm_i915_private *i915 = slpc_to_i915(slpc); + struct slpc_shared_data *data = slpc->vaddr; + struct slpc_task_state_data *slpc_tasks; + intel_wakeref_t wakeref; + int ret = 0; + + GEM_BUG_ON(!slpc->vma); + + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { + ret = slpc_query_task_state(slpc); + + if (!ret) { + slpc_tasks = &data->task_state_data; + + drm_printf(p, "\tSLPC state: %s\n", slpc_get_state_string(slpc)); + drm_printf(p, "\tGTPERF task active: %s\n", + yesno(slpc_tasks->status & SLPC_GTPERF_TASK_ENABLED)); + drm_printf(p, "\tMax freq: %u MHz\n", + slpc_decode_max_freq(slpc)); + drm_printf(p, "\tMin freq: %u MHz\n", + slpc_decode_min_freq(slpc)); + } + } + + return ret; +} + +void intel_guc_slpc_fini(struct intel_guc_slpc *slpc) +{ + if (!slpc->vma) + return; + + i915_vma_unpin_and_release(&slpc->vma, I915_VMA_RELEASE_MAP); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h new file mode 100644 index 000000000000..e45054d5b9b4 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef _INTEL_GUC_SLPC_H_ +#define _INTEL_GUC_SLPC_H_ + +#include "intel_guc_submission.h" +#include "intel_guc_slpc_types.h" + +struct intel_gt; +struct drm_printer; + +static inline bool intel_guc_slpc_is_supported(struct intel_guc *guc) +{ + return guc->slpc.supported; +} + +static inline bool intel_guc_slpc_is_wanted(struct intel_guc *guc) +{ + return guc->slpc.selected; +} + +static inline bool intel_guc_slpc_is_used(struct intel_guc *guc) +{ + return intel_guc_submission_is_used(guc) && intel_guc_slpc_is_wanted(guc); +} + +void intel_guc_slpc_init_early(struct intel_guc_slpc *slpc); + +int intel_guc_slpc_init(struct intel_guc_slpc *slpc); +int intel_guc_slpc_enable(struct intel_guc_slpc *slpc); +void intel_guc_slpc_fini(struct intel_guc_slpc *slpc); +int intel_guc_slpc_set_max_freq(struct intel_guc_slpc *slpc, u32 val); +int intel_guc_slpc_set_min_freq(struct intel_guc_slpc *slpc, u32 val); +int intel_guc_slpc_get_max_freq(struct intel_guc_slpc *slpc, u32 *val); +int intel_guc_slpc_get_min_freq(struct intel_guc_slpc *slpc, u32 *val); +int intel_guc_slpc_print_info(struct intel_guc_slpc *slpc, struct drm_printer *p); +void intel_guc_pm_intrmsk_enable(struct intel_gt *gt); + +#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc_types.h new file mode 100644 index 000000000000..41d13527666f --- /dev/null +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc_types.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef _INTEL_GUC_SLPC_TYPES_H_ +#define _INTEL_GUC_SLPC_TYPES_H_ + +#include <linux/types.h> + +#define SLPC_RESET_TIMEOUT_MS 5 + +struct intel_guc_slpc { + struct i915_vma *vma; + struct slpc_shared_data *vaddr; + bool supported; + bool selected; + + /* platform frequency limits */ + u32 min_freq; + u32 rp0_freq; + u32 rp1_freq; + + /* frequency softlimits */ + u32 min_freq_softlimit; + u32 max_freq_softlimit; +}; + +#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 7c8ff9792f7b..87d8dc8f51b9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -10,10 +10,13 @@ #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine_pm.h" +#include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt_requests.h" #include "gt/intel_lrc.h" +#include "gt/intel_lrc_reg.h" #include "gt/intel_mocs.h" #include "gt/intel_ring.h" @@ -58,244 +61,705 @@ * */ +/* GuC Virtual Engine */ +struct guc_virtual_engine { + struct intel_engine_cs base; + struct intel_context context; +}; + +static struct intel_context * +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count); + #define GUC_REQUEST_SIZE 64 /* bytes */ -static inline struct i915_priolist *to_priolist(struct rb_node *rb) +/* + * Below is a set of functions which control the GuC scheduling state which do + * not require a lock as all state transitions are mutually exclusive. i.e. It + * is not possible for the context pinning code and submission, for the same + * context, to be executing simultaneously. We still need an atomic as it is + * possible for some of the bits to changing at the same time though. + */ +#define SCHED_STATE_NO_LOCK_ENABLED BIT(0) +#define SCHED_STATE_NO_LOCK_PENDING_ENABLE BIT(1) +#define SCHED_STATE_NO_LOCK_REGISTERED BIT(2) +static inline bool context_enabled(struct intel_context *ce) { - return rb_entry(rb, struct i915_priolist, node); + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_ENABLED); } -static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id) +static inline void set_context_enabled(struct intel_context *ce) { - struct guc_stage_desc *base = guc->stage_desc_pool_vaddr; + atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock); +} - return &base[id]; +static inline void clr_context_enabled(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED, + &ce->guc_sched_state_no_lock); } -static int guc_stage_desc_pool_create(struct intel_guc *guc) +static inline bool context_pending_enable(struct intel_context *ce) { - u32 size = PAGE_ALIGN(sizeof(struct guc_stage_desc) * - GUC_MAX_STAGE_DESCRIPTORS); + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_PENDING_ENABLE); +} - return intel_guc_allocate_and_map_vma(guc, size, &guc->stage_desc_pool, - &guc->stage_desc_pool_vaddr); +static inline void set_context_pending_enable(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE, + &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_pending_enable(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE, + &ce->guc_sched_state_no_lock); } -static void guc_stage_desc_pool_destroy(struct intel_guc *guc) +static inline bool context_registered(struct intel_context *ce) { - i915_vma_unpin_and_release(&guc->stage_desc_pool, I915_VMA_RELEASE_MAP); + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_REGISTERED); +} + +static inline void set_context_registered(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_REGISTERED, + &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_registered(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED, + &ce->guc_sched_state_no_lock); } /* - * Initialise/clear the stage descriptor shared with the GuC firmware. - * - * This descriptor tells the GuC where (in GGTT space) to find the important - * data structures related to work submission (process descriptor, write queue, - * etc). + * Below is a set of functions which control the GuC scheduling state which + * require a lock, aside from the special case where the functions are called + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code + * path to be executing on the context. */ -static void guc_stage_desc_init(struct intel_guc *guc) +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) +#define SCHED_STATE_DESTROYED BIT(1) +#define SCHED_STATE_PENDING_DISABLE BIT(2) +#define SCHED_STATE_BANNED BIT(3) +#define SCHED_STATE_BLOCKED_SHIFT 4 +#define SCHED_STATE_BLOCKED BIT(SCHED_STATE_BLOCKED_SHIFT) +#define SCHED_STATE_BLOCKED_MASK (0xfff << SCHED_STATE_BLOCKED_SHIFT) +static inline void init_sched_state(struct intel_context *ce) +{ + /* Only should be called from guc_lrc_desc_pin() */ + atomic_set(&ce->guc_sched_state_no_lock, 0); + ce->guc_state.sched_state = 0; +} + +static inline bool +context_wait_for_deregister_to_register(struct intel_context *ce) +{ + return ce->guc_state.sched_state & + SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; +} + +static inline void +set_context_wait_for_deregister_to_register(struct intel_context *ce) { - struct guc_stage_desc *desc; + /* Only should be called from guc_lrc_desc_pin() without lock */ + ce->guc_state.sched_state |= + SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; +} - /* we only use 1 stage desc, so hardcode it to 0 */ - desc = __get_stage_desc(guc, 0); - memset(desc, 0, sizeof(*desc)); +static inline void +clr_context_wait_for_deregister_to_register(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= + ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; +} - desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE | - GUC_STAGE_DESC_ATTR_KERNEL; +static inline bool +context_destroyed(struct intel_context *ce) +{ + return ce->guc_state.sched_state & SCHED_STATE_DESTROYED; +} - desc->stage_id = 0; - desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL; +static inline void +set_context_destroyed(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_DESTROYED; +} - desc->wq_size = GUC_WQ_SIZE; +static inline bool context_pending_disable(struct intel_context *ce) +{ + return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE; } -static void guc_stage_desc_fini(struct intel_guc *guc) +static inline void set_context_pending_disable(struct intel_context *ce) { - struct guc_stage_desc *desc; + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE; +} - desc = __get_stage_desc(guc, 0); - memset(desc, 0, sizeof(*desc)); +static inline void clr_context_pending_disable(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_PENDING_DISABLE; } -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static inline bool context_banned(struct intel_context *ce) { - /* Leaving stub as this function will be used in future patches */ + return ce->guc_state.sched_state & SCHED_STATE_BANNED; } -/* - * When we're doing submissions using regular execlists backend, writing to - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages - * pinned in mappable aperture portion of GGTT are visible to command streamer. - * Writes done by GuC on our behalf are not guaranteeing such ordering, - * therefore, to ensure the flush, we're issuing a POSTING READ. - */ -static void flush_ggtt_writes(struct i915_vma *vma) +static inline void set_context_banned(struct intel_context *ce) { - if (i915_vma_is_map_and_fenceable(vma)) - intel_uncore_posting_read_fw(vma->vm->gt->uncore, - GUC_STATUS); + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_BANNED; } -static void guc_submit(struct intel_engine_cs *engine, - struct i915_request **out, - struct i915_request **end) +static inline void clr_context_banned(struct intel_context *ce) { - struct intel_guc *guc = &engine->gt->uc.guc; + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_BANNED; +} - do { - struct i915_request *rq = *out++; +static inline u32 context_blocked(struct intel_context *ce) +{ + return (ce->guc_state.sched_state & SCHED_STATE_BLOCKED_MASK) >> + SCHED_STATE_BLOCKED_SHIFT; +} - flush_ggtt_writes(rq->ring->vma); - guc_add_request(guc, rq); - } while (out != end); +static inline void incr_context_blocked(struct intel_context *ce) +{ + lockdep_assert_held(&ce->engine->sched_engine->lock); + lockdep_assert_held(&ce->guc_state.lock); + + ce->guc_state.sched_state += SCHED_STATE_BLOCKED; + + GEM_BUG_ON(!context_blocked(ce)); /* Overflow check */ } -static inline int rq_prio(const struct i915_request *rq) +static inline void decr_context_blocked(struct intel_context *ce) { - return rq->sched.attr.priority; + lockdep_assert_held(&ce->engine->sched_engine->lock); + lockdep_assert_held(&ce->guc_state.lock); + + GEM_BUG_ON(!context_blocked(ce)); /* Underflow check */ + + ce->guc_state.sched_state -= SCHED_STATE_BLOCKED; } -static struct i915_request *schedule_in(struct i915_request *rq, int idx) +static inline bool context_guc_id_invalid(struct intel_context *ce) { - trace_i915_request_in(rq, idx); + return ce->guc_id == GUC_INVALID_LRC_ID; +} + +static inline void set_context_guc_id_invalid(struct intel_context *ce) +{ + ce->guc_id = GUC_INVALID_LRC_ID; +} + +static inline struct intel_guc *ce_to_guc(struct intel_context *ce) +{ + return &ce->engine->gt->uc.guc; +} + +static inline struct i915_priolist *to_priolist(struct rb_node *rb) +{ + return rb_entry(rb, struct i915_priolist, node); +} + +static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) +{ + struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; + + GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS); + + return &base[index]; +} + +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) +{ + struct intel_context *ce = xa_load(&guc->context_lookup, id); + + GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS); + + return ce; +} + +static int guc_lrc_desc_pool_create(struct intel_guc *guc) +{ + u32 size; + int ret; + + size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * + GUC_MAX_LRC_DESCRIPTORS); + ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool, + (void **)&guc->lrc_desc_pool_vaddr); + if (ret) + return ret; + + return 0; +} + +static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) +{ + guc->lrc_desc_pool_vaddr = NULL; + i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); +} + +static inline bool guc_submission_initialized(struct intel_guc *guc) +{ + return !!guc->lrc_desc_pool_vaddr; +} + +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) +{ + if (likely(guc_submission_initialized(guc))) { + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); + unsigned long flags; + + memset(desc, 0, sizeof(*desc)); + + /* + * xarray API doesn't have xa_erase_irqsave wrapper, so calling + * the lower level functions directly. + */ + xa_lock_irqsave(&guc->context_lookup, flags); + __xa_erase(&guc->context_lookup, id); + xa_unlock_irqrestore(&guc->context_lookup, flags); + } +} + +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) +{ + return __get_context(guc, id); +} + +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, + struct intel_context *ce) +{ + unsigned long flags; /* - * Currently we are not tracking the rq->context being inflight - * (ce->inflight = rq->engine). It is only used by the execlists - * backend at the moment, a similar counting strategy would be - * required if we generalise the inflight tracking. + * xarray API doesn't have xa_save_irqsave wrapper, so calling the + * lower level functions directly. */ + xa_lock_irqsave(&guc->context_lookup, flags); + __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC); + xa_unlock_irqrestore(&guc->context_lookup, flags); +} + +static int guc_submission_send_busy_loop(struct intel_guc *guc, + const u32 *action, + u32 len, + u32 g2h_len_dw, + bool loop) +{ + int err; + + err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + + if (!err && g2h_len_dw) + atomic_inc(&guc->outstanding_submission_g2h); + + return err; +} + +int intel_guc_wait_for_pending_msg(struct intel_guc *guc, + atomic_t *wait_var, + bool interruptible, + long timeout) +{ + const int state = interruptible ? + TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; + DEFINE_WAIT(wait); - __intel_gt_pm_get(rq->engine->gt); - return i915_request_get(rq); + might_sleep(); + GEM_BUG_ON(timeout < 0); + + if (!atomic_read(wait_var)) + return 0; + + if (!timeout) + return -ETIME; + + for (;;) { + prepare_to_wait(&guc->ct.wq, &wait, state); + + if (!atomic_read(wait_var)) + break; + + if (signal_pending_state(state, current)) { + timeout = -EINTR; + break; + } + + if (!timeout) { + timeout = -ETIME; + break; + } + + timeout = io_schedule_timeout(timeout); + } + finish_wait(&guc->ct.wq, &wait); + + return (timeout < 0) ? timeout : 0; } -static void schedule_out(struct i915_request *rq) +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) { - trace_i915_request_out(rq); + if (!intel_uc_uses_guc_submission(&guc_to_gt(guc)->uc)) + return 0; - intel_gt_pm_put_async(rq->engine->gt); - i915_request_put(rq); + return intel_guc_wait_for_pending_msg(guc, + &guc->outstanding_submission_g2h, + true, timeout); } -static void __guc_dequeue(struct intel_engine_cs *engine) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop); + +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_request **first = execlists->inflight; - struct i915_request ** const last_port = first + execlists->port_mask; - struct i915_request *last = first[0]; - struct i915_request **port; - bool submit = false; - struct rb_node *rb; + int err = 0; + struct intel_context *ce = rq->context; + u32 action[3]; + int len = 0; + u32 g2h_len_dw = 0; + bool enabled; - lockdep_assert_held(&engine->active.lock); + /* + * Corner case where requests were sitting in the priority list or a + * request resubmitted after the context was banned. + */ + if (unlikely(intel_context_is_banned(ce))) { + i915_request_put(i915_request_mark_eio(rq)); + intel_engine_signal_breadcrumbs(ce->engine); + goto out; + } - if (last) { - if (*++first) - return; + GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); + GEM_BUG_ON(context_guc_id_invalid(ce)); - last = NULL; + /* + * Corner case where the GuC firmware was blown away and reloaded while + * this context was pinned. + */ + if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) { + err = guc_lrc_desc_pin(ce, false); + if (unlikely(err)) + goto out; } /* - * We write directly into the execlists->inflight queue and don't use - * the execlists->pending queue, as we don't have a distinct switch - * event. + * The request / context will be run on the hardware when scheduling + * gets enabled in the unblock. */ - port = first; - while ((rb = rb_first_cached(&execlists->queue))) { + if (unlikely(context_blocked(ce))) + goto out; + + enabled = context_enabled(ce); + + if (!enabled) { + action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; + action[len++] = ce->guc_id; + action[len++] = GUC_CONTEXT_ENABLE; + set_context_pending_enable(ce); + intel_context_get(ce); + g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET; + } else { + action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; + action[len++] = ce->guc_id; + } + + err = intel_guc_send_nb(guc, action, len, g2h_len_dw); + if (!enabled && !err) { + trace_intel_context_sched_enable(ce); + atomic_inc(&guc->outstanding_submission_g2h); + set_context_enabled(ce); + } else if (!enabled) { + clr_context_pending_enable(ce); + intel_context_put(ce); + } + if (likely(!err)) + trace_i915_request_guc_submit(rq); + +out: + return err; +} + +static inline void guc_set_lrc_tail(struct i915_request *rq) +{ + rq->context->lrc_reg_state[CTX_RING_TAIL] = + intel_ring_set_tail(rq->ring, rq->tail); +} + +static inline int rq_prio(const struct i915_request *rq) +{ + return rq->sched.attr.priority; +} + +static int guc_dequeue_one_context(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + struct i915_request *last = NULL; + bool submit = false; + struct rb_node *rb; + int ret; + + lockdep_assert_held(&sched_engine->lock); + + if (guc->stalled_request) { + submit = true; + last = guc->stalled_request; + goto resubmit; + } + + while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn; priolist_for_each_request_consume(rq, rn, p) { - if (last && rq->context != last->context) { - if (port == last_port) - goto done; - - *port = schedule_in(last, - port - execlists->inflight); - port++; - } + if (last && rq->context != last->context) + goto done; list_del_init(&rq->sched.link); + __i915_request_submit(rq); - submit = true; + + trace_i915_request_in(rq, 0); last = rq; + submit = true; } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } done: - execlists->queue_priority_hint = - rb ? to_priolist(rb)->priority : INT_MIN; if (submit) { - *port = schedule_in(last, port - execlists->inflight); - *++port = NULL; - guc_submit(engine, first, port); + guc_set_lrc_tail(last); +resubmit: + ret = guc_add_request(guc, last); + if (unlikely(ret == -EPIPE)) + goto deadlk; + else if (ret == -EBUSY) { + tasklet_schedule(&sched_engine->tasklet); + guc->stalled_request = last; + return false; + } } - execlists->active = execlists->inflight; + + guc->stalled_request = NULL; + return submit; + +deadlk: + sched_engine->tasklet.callback = NULL; + tasklet_disable_nosync(&sched_engine->tasklet); + return false; } static void guc_submission_tasklet(struct tasklet_struct *t) { - struct intel_engine_cs * const engine = - from_tasklet(engine, t, execlists.tasklet); - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_request **port, *rq; + struct i915_sched_engine *sched_engine = + from_tasklet(sched_engine, t, tasklet); unsigned long flags; + bool loop; - spin_lock_irqsave(&engine->active.lock, flags); - - for (port = execlists->inflight; (rq = *port); port++) { - if (!i915_request_completed(rq)) - break; + spin_lock_irqsave(&sched_engine->lock, flags); - schedule_out(rq); - } - if (port != execlists->inflight) { - int idx = port - execlists->inflight; - int rem = ARRAY_SIZE(execlists->inflight) - idx; - memmove(execlists->inflight, port, rem * sizeof(*port)); - } + do { + loop = guc_dequeue_one_context(sched_engine->private_data); + } while (loop); - __guc_dequeue(engine); + i915_sched_engine_reset_on_empty(sched_engine); - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); } static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) { - if (iir & GT_RENDER_USER_INTERRUPT) { + if (iir & GT_RENDER_USER_INTERRUPT) intel_engine_signal_breadcrumbs(engine); - tasklet_hi_schedule(&engine->execlists.tasklet); +} + +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce); +static void guc_cancel_context_requests(struct intel_context *ce); +static void guc_blocked_fence_complete(struct intel_context *ce); + +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{ + struct intel_context *ce; + unsigned long index, flags; + bool pending_disable, pending_enable, deregister, destroyed, banned; + + xa_for_each(&guc->context_lookup, index, ce) { + /* Flush context */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + /* + * Once we are at this point submission_disabled() is guaranteed + * to be visible to all callers who set the below flags (see above + * flush and flushes in reset_prepare). If submission_disabled() + * is set, the caller shouldn't set these flags. + */ + + destroyed = context_destroyed(ce); + pending_enable = context_pending_enable(ce); + pending_disable = context_pending_disable(ce); + deregister = context_wait_for_deregister_to_register(ce); + banned = context_banned(ce); + init_sched_state(ce); + + if (pending_enable || destroyed || deregister) { + atomic_dec(&guc->outstanding_submission_g2h); + if (deregister) + guc_signal_context_fence(ce); + if (destroyed) { + release_guc_id(guc, ce); + __guc_context_destroy(ce); + } + if (pending_enable || deregister) + intel_context_put(ce); + } + + /* Not mutualy exclusive with above if statement. */ + if (pending_disable) { + guc_signal_context_fence(ce); + if (banned) { + guc_cancel_context_requests(ce); + intel_engine_signal_breadcrumbs(ce->engine); + } + intel_context_sched_disable_unpin(ce); + atomic_dec(&guc->outstanding_submission_g2h); + spin_lock_irqsave(&ce->guc_state.lock, flags); + guc_blocked_fence_complete(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + intel_context_put(ce); + } } } -static void guc_reset_prepare(struct intel_engine_cs *engine) +static inline bool +submission_disabled(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + + return unlikely(!sched_engine || + !__tasklet_is_enabled(&sched_engine->tasklet)); +} + +static void disable_submission(struct intel_guc *guc) { - struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_sched_engine * const sched_engine = guc->sched_engine; + + if (__tasklet_is_enabled(&sched_engine->tasklet)) { + GEM_BUG_ON(!guc->ct.enabled); + __tasklet_disable_sync_once(&sched_engine->tasklet); + sched_engine->tasklet.callback = NULL; + } +} + +static void enable_submission(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + unsigned long flags; - ENGINE_TRACE(engine, "\n"); + spin_lock_irqsave(&guc->sched_engine->lock, flags); + sched_engine->tasklet.callback = guc_submission_tasklet; + wmb(); /* Make sure callback visible */ + if (!__tasklet_is_enabled(&sched_engine->tasklet) && + __tasklet_enable(&sched_engine->tasklet)) { + GEM_BUG_ON(!guc->ct.enabled); + + /* And kick in case we missed a new request submission. */ + tasklet_hi_schedule(&sched_engine->tasklet); + } + spin_unlock_irqrestore(&guc->sched_engine->lock, flags); +} + +static void guc_flush_submissions(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + unsigned long flags; + + spin_lock_irqsave(&sched_engine->lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{ + int i; + + if (unlikely(!guc_submission_initialized(guc))) { + /* Reset called during driver load? GuC not yet initialised! */ + return; + } + + intel_gt_park_heartbeats(guc_to_gt(guc)); + disable_submission(guc); + guc->interrupts.disable(guc); + + /* Flush IRQ handler */ + spin_lock_irq(&guc_to_gt(guc)->irq_lock); + spin_unlock_irq(&guc_to_gt(guc)->irq_lock); + + guc_flush_submissions(guc); /* - * Prevent request submission to the hardware until we have - * completed the reset in i915_gem_reset_finish(). If a request - * is completed by one engine, it may then queue a request - * to a second via its execlists->tasklet *just* as we are - * calling engine->init_hw() and also writing the ELSP. - * Turning off the execlists->tasklet until the reset is over - * prevents the race. + * Handle any outstanding G2Hs before reset. Call IRQ handler directly + * each pass as interrupt have been disabled. We always scrub for + * outstanding G2H as it is possible for outstanding_submission_g2h to + * be incremented after the context state update. */ - __tasklet_disable_sync_once(&execlists->tasklet); + for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) { + intel_guc_to_host_event_handler(guc); +#define wait_for_reset(guc, wait_var) \ + intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20)) + do { + wait_for_reset(guc, &guc->outstanding_submission_g2h); + } while (!list_empty(&guc->ct.requests.incoming)); + } + scrub_guc_desc_for_outstanding_g2h(guc); +} + +static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) +{ + struct intel_engine_cs *engine; + intel_engine_mask_t tmp, mask = ve->mask; + unsigned int num_siblings = 0; + + for_each_engine_masked(engine, ve->gt, mask, tmp) + if (num_siblings++ == sibling) + return engine; + + return NULL; +} + +static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{ + struct intel_engine_cs *engine = ce->engine; + + if (intel_engine_is_virtual(engine)) + engine = guc_virtual_get_sibling(engine, 0); + + return engine; } -static void guc_reset_state(struct intel_context *ce, - struct intel_engine_cs *engine, - u32 head, - bool scrub) +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) { + struct intel_engine_cs *engine = __context_to_physical_engine(ce); + + if (intel_context_is_banned(ce)) + return; + GEM_BUG_ON(!intel_context_is_pinned(ce)); /* @@ -313,37 +777,132 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); } -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_request *rq; +} + +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +} + +static void +__unwind_incomplete_requests(struct intel_context *ce) +{ + struct i915_request *rq, *rn; + struct list_head *pl; + int prio = I915_PRIORITY_INVALID; + struct i915_sched_engine * const sched_engine = + ce->engine->sched_engine; unsigned long flags; - spin_lock_irqsave(&engine->active.lock, flags); + spin_lock_irqsave(&sched_engine->lock, flags); + spin_lock(&ce->guc_active.lock); + list_for_each_entry_safe(rq, rn, + &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + continue; - /* Push back any incomplete requests for replay after the reset. */ - rq = execlists_unwind_incomplete_requests(execlists); - if (!rq) - goto out_unlock; + list_del_init(&rq->sched.link); + spin_unlock(&ce->guc_active.lock); + + __i915_request_unsubmit(rq); + + /* Push the request back into the queue for later resubmission. */ + GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); + if (rq_prio(rq) != prio) { + prio = rq_prio(rq); + pl = i915_sched_lookup_priolist(sched_engine, prio); + } + GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); + + list_add_tail(&rq->sched.link, pl); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + + spin_lock(&ce->guc_active.lock); + } + spin_unlock(&ce->guc_active.lock); + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +static void __guc_reset_context(struct intel_context *ce, bool stalled) +{ + struct i915_request *rq; + u32 head; + + intel_context_get(ce); + + /* + * GuC will implicitly mark the context as non-schedulable + * when it sends the reset notification. Make sure our state + * reflects this change. The context will be marked enabled + * on resubmission. + */ + clr_context_enabled(ce); + + rq = intel_context_find_active_request(ce); + if (!rq) { + head = ce->ring->tail; + stalled = false; + goto out_replay; + } if (!i915_request_started(rq)) stalled = false; + GEM_BUG_ON(i915_active_is_idle(&ce->active)); + head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled); - guc_reset_state(rq->context, engine, rq->head, stalled); -out_unlock: - spin_unlock_irqrestore(&engine->active.lock, flags); +out_replay: + guc_reset_state(ce, head, stalled); + __unwind_incomplete_requests(ce); + intel_context_put(ce); +} + +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{ + struct intel_context *ce; + unsigned long index; + + if (unlikely(!guc_submission_initialized(guc))) { + /* Reset called during driver load? GuC not yet initialised! */ + return; + } + + xa_for_each(&guc->context_lookup, index, ce) + if (intel_context_is_pinned(ce)) + __guc_reset_context(ce, stalled); + + /* GuC is blown away, drop all references to contexts */ + xa_destroy(&guc->context_lookup); } -static void guc_reset_cancel(struct intel_engine_cs *engine) +static void guc_cancel_context_requests(struct intel_context *ce) +{ + struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine; + struct i915_request *rq; + unsigned long flags; + + /* Mark all executing requests as skipped. */ + spin_lock_irqsave(&sched_engine->lock, flags); + spin_lock(&ce->guc_active.lock); + list_for_each_entry(rq, &ce->guc_active.requests, sched.link) + i915_request_put(i915_request_mark_eio(rq)); + spin_unlock(&ce->guc_active.lock); + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; - ENGINE_TRACE(engine, "\n"); + /* Can be called during boot if GuC fails to load */ + if (!sched_engine) + return; /* * Before we call engine->cancel_requests(), we should have exclusive @@ -359,47 +918,67 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) * submission's irq state, we also wish to remind ourselves that * it is irq state.) */ - spin_lock_irqsave(&engine->active.lock, flags); - - /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &engine->active.requests, sched.link) { - i915_request_set_error_once(rq, -EIO); - i915_request_mark_complete(rq); - } + spin_lock_irqsave(&sched_engine->lock, flags); /* Flush the queued requests to the timeline list (for retiring). */ - while ((rb = rb_first_cached(&execlists->queue))) { + while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link); + __i915_request_submit(rq); - dma_fence_set_error(&rq->fence, -EIO); - i915_request_mark_complete(rq); + + i915_request_put(i915_request_mark_eio(rq)); } - rb_erase_cached(&p->node, &execlists->queue); + rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } /* Remaining _unready_ requests will be nop'ed when submitted */ - execlists->queue_priority_hint = INT_MIN; - execlists->queue = RB_ROOT_CACHED; + sched_engine->queue_priority_hint = INT_MIN; + sched_engine->queue = RB_ROOT_CACHED; - spin_unlock_irqrestore(&engine->active.lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); } -static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) { - struct intel_engine_execlists * const execlists = &engine->execlists; + struct intel_context *ce; + unsigned long index; - if (__tasklet_enable(&execlists->tasklet)) - /* And kick in case we missed a new request submission. */ - tasklet_hi_schedule(&execlists->tasklet); + xa_for_each(&guc->context_lookup, index, ce) + if (intel_context_is_pinned(ce)) + guc_cancel_context_requests(ce); + + guc_cancel_sched_engine_requests(guc->sched_engine); - ENGINE_TRACE(engine, "depth->%d\n", - atomic_read(&execlists->tasklet.count)); + /* GuC is blown away, drop all references to contexts */ + xa_destroy(&guc->context_lookup); +} + +void intel_guc_submission_reset_finish(struct intel_guc *guc) +{ + /* Reset called during driver load or during wedge? */ + if (unlikely(!guc_submission_initialized(guc) || + test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags))) { + return; + } + + /* + * Technically possible for either of these values to be non-zero here, + * but very unlikely + harmless. Regardless let's add a warn so we can + * see in CI if this happens frequently / a precursor to taking down the + * machine. + */ + GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); + atomic_set(&guc->outstanding_submission_g2h, 0); + + intel_guc_global_policies_update(guc); + enable_submission(guc); + intel_gt_unpark_heartbeats(guc_to_gt(guc)); } /* @@ -410,43 +989,986 @@ int intel_guc_submission_init(struct intel_guc *guc) { int ret; - if (guc->stage_desc_pool) + if (guc->lrc_desc_pool) return 0; - ret = guc_stage_desc_pool_create(guc); + ret = guc_lrc_desc_pool_create(guc); if (ret) return ret; /* * Keep static analysers happy, let them know that we allocated the * vma after testing that it didn't exist earlier. */ - GEM_BUG_ON(!guc->stage_desc_pool); + GEM_BUG_ON(!guc->lrc_desc_pool); + + xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); + + spin_lock_init(&guc->contexts_lock); + INIT_LIST_HEAD(&guc->guc_id_list); + ida_init(&guc->guc_ids); return 0; } void intel_guc_submission_fini(struct intel_guc *guc) { - if (guc->stage_desc_pool) { - guc_stage_desc_pool_destroy(guc); + if (!guc->lrc_desc_pool) + return; + + guc_lrc_desc_pool_destroy(guc); + i915_sched_engine_put(guc->sched_engine); +} + +static inline void queue_request(struct i915_sched_engine *sched_engine, + struct i915_request *rq, + int prio) +{ + GEM_BUG_ON(!list_empty(&rq->sched.link)); + list_add_tail(&rq->sched.link, + i915_sched_lookup_priolist(sched_engine, prio)); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); +} + +static int guc_bypass_tasklet_submit(struct intel_guc *guc, + struct i915_request *rq) +{ + int ret; + + __i915_request_submit(rq); + + trace_i915_request_in(rq, 0); + + guc_set_lrc_tail(rq); + ret = guc_add_request(guc, rq); + if (ret == -EBUSY) + guc->stalled_request = rq; + + if (unlikely(ret == -EPIPE)) + disable_submission(guc); + + return ret; +} + +static void guc_submit_request(struct i915_request *rq) +{ + struct i915_sched_engine *sched_engine = rq->engine->sched_engine; + struct intel_guc *guc = &rq->engine->gt->uc.guc; + unsigned long flags; + + /* Will be called from irq-context when using foreign fences. */ + spin_lock_irqsave(&sched_engine->lock, flags); + + if (submission_disabled(guc) || guc->stalled_request || + !i915_sched_engine_is_empty(sched_engine)) + queue_request(sched_engine, rq, rq_prio(rq)); + else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) + tasklet_hi_schedule(&sched_engine->tasklet); + + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +static int new_guc_id(struct intel_guc *guc) +{ + return ida_simple_get(&guc->guc_ids, 0, + GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL | + __GFP_RETRY_MAYFAIL | __GFP_NOWARN); +} + +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + if (!context_guc_id_invalid(ce)) { + ida_simple_remove(&guc->guc_ids, ce->guc_id); + reset_lrc_desc(guc, ce->guc_id); + set_context_guc_id_invalid(ce); } + if (!list_empty(&ce->guc_id_link)) + list_del_init(&ce->guc_id_link); } -static int guc_context_alloc(struct intel_context *ce) +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce) { - return lrc_alloc(ce, ce->engine); + unsigned long flags; + + spin_lock_irqsave(&guc->contexts_lock, flags); + __release_guc_id(guc, ce); + spin_unlock_irqrestore(&guc->contexts_lock, flags); +} + +static int steal_guc_id(struct intel_guc *guc) +{ + struct intel_context *ce; + int guc_id; + + lockdep_assert_held(&guc->contexts_lock); + + if (!list_empty(&guc->guc_id_list)) { + ce = list_first_entry(&guc->guc_id_list, + struct intel_context, + guc_id_link); + + GEM_BUG_ON(atomic_read(&ce->guc_id_ref)); + GEM_BUG_ON(context_guc_id_invalid(ce)); + + list_del_init(&ce->guc_id_link); + guc_id = ce->guc_id; + clr_context_registered(ce); + set_context_guc_id_invalid(ce); + return guc_id; + } else { + return -EAGAIN; + } +} + +static int assign_guc_id(struct intel_guc *guc, u16 *out) +{ + int ret; + + lockdep_assert_held(&guc->contexts_lock); + + ret = new_guc_id(guc); + if (unlikely(ret < 0)) { + ret = steal_guc_id(guc); + if (ret < 0) + return ret; + } + + *out = ret; + return 0; +} + +#define PIN_GUC_ID_TRIES 4 +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + int ret = 0; + unsigned long flags, tries = PIN_GUC_ID_TRIES; + + GEM_BUG_ON(atomic_read(&ce->guc_id_ref)); + +try_again: + spin_lock_irqsave(&guc->contexts_lock, flags); + + if (context_guc_id_invalid(ce)) { + ret = assign_guc_id(guc, &ce->guc_id); + if (ret) + goto out_unlock; + ret = 1; /* Indidcates newly assigned guc_id */ + } + if (!list_empty(&ce->guc_id_link)) + list_del_init(&ce->guc_id_link); + atomic_inc(&ce->guc_id_ref); + +out_unlock: + spin_unlock_irqrestore(&guc->contexts_lock, flags); + + /* + * -EAGAIN indicates no guc_ids are available, let's retire any + * outstanding requests to see if that frees up a guc_id. If the first + * retire didn't help, insert a sleep with the timeslice duration before + * attempting to retire more requests. Double the sleep period each + * subsequent pass before finally giving up. The sleep period has max of + * 100ms and minimum of 1ms. + */ + if (ret == -EAGAIN && --tries) { + if (PIN_GUC_ID_TRIES - tries > 1) { + unsigned int timeslice_shifted = + ce->engine->props.timeslice_duration_ms << + (PIN_GUC_ID_TRIES - tries - 2); + unsigned int max = min_t(unsigned int, 100, + timeslice_shifted); + + msleep(max_t(unsigned int, max, 1)); + } + intel_gt_retire_requests(guc_to_gt(guc)); + goto try_again; + } + + return ret; +} + +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + unsigned long flags; + + GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0); + + if (unlikely(context_guc_id_invalid(ce))) + return; + + spin_lock_irqsave(&guc->contexts_lock, flags); + if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) && + !atomic_read(&ce->guc_id_ref)) + list_add_tail(&ce->guc_id_link, &guc->guc_id_list); + spin_unlock_irqrestore(&guc->contexts_lock, flags); +} + +static int __guc_action_register_context(struct intel_guc *guc, + u32 guc_id, + u32 offset, + bool loop) +{ + u32 action[] = { + INTEL_GUC_ACTION_REGISTER_CONTEXT, + guc_id, + offset, + }; + + return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), + 0, loop); +} + +static int register_context(struct intel_context *ce, bool loop) +{ + struct intel_guc *guc = ce_to_guc(ce); + u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + + ce->guc_id * sizeof(struct guc_lrc_desc); + int ret; + + trace_intel_context_register(ce); + + ret = __guc_action_register_context(guc, ce->guc_id, offset, loop); + if (likely(!ret)) + set_context_registered(ce); + + return ret; +} + +static int __guc_action_deregister_context(struct intel_guc *guc, + u32 guc_id, + bool loop) +{ + u32 action[] = { + INTEL_GUC_ACTION_DEREGISTER_CONTEXT, + guc_id, + }; + + return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), + G2H_LEN_DW_DEREGISTER_CONTEXT, + loop); +} + +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) +{ + struct intel_guc *guc = ce_to_guc(ce); + + trace_intel_context_deregister(ce); + + return __guc_action_deregister_context(guc, guc_id, loop); +} + +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask) +{ + switch (class) { + case RENDER_CLASS: + return mask >> RCS0; + case VIDEO_ENHANCEMENT_CLASS: + return mask >> VECS0; + case VIDEO_DECODE_CLASS: + return mask >> VCS0; + case COPY_ENGINE_CLASS: + return mask >> BCS0; + default: + MISSING_CASE(class); + return 0; + } +} + +static void guc_context_policy_init(struct intel_engine_cs *engine, + struct guc_lrc_desc *desc) +{ + desc->policy_flags = 0; + + if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION) + desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE; + + /* NB: For both of these, zero means disabled. */ + desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; + desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; +} + +static inline u8 map_i915_prio_to_guc_prio(int prio); + +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) +{ + struct intel_engine_cs *engine = ce->engine; + struct intel_runtime_pm *runtime_pm = engine->uncore->rpm; + struct intel_guc *guc = &engine->gt->uc.guc; + u32 desc_idx = ce->guc_id; + struct guc_lrc_desc *desc; + const struct i915_gem_context *ctx; + int prio = I915_CONTEXT_DEFAULT_PRIORITY; + bool context_registered; + intel_wakeref_t wakeref; + int ret = 0; + + GEM_BUG_ON(!engine->mask); + + /* + * Ensure LRC + CT vmas are is same region as write barrier is done + * based on CT vma region. + */ + GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) != + i915_gem_object_is_lmem(ce->ring->vma->obj)); + + context_registered = lrc_desc_registered(guc, desc_idx); + + rcu_read_lock(); + ctx = rcu_dereference(ce->gem_context); + if (ctx) + prio = ctx->sched.priority; + rcu_read_unlock(); + + reset_lrc_desc(guc, desc_idx); + set_lrc_desc_registered(guc, desc_idx, ce); + + desc = __get_lrc_desc(guc, desc_idx); + desc->engine_class = engine_class_to_guc_class(engine->class); + desc->engine_submit_mask = adjust_engine_mask(engine->class, + engine->mask); + desc->hw_context_desc = ce->lrc.lrca; + ce->guc_prio = map_i915_prio_to_guc_prio(prio); + desc->priority = ce->guc_prio; + desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD; + guc_context_policy_init(engine, desc); + init_sched_state(ce); + + /* + * The context_lookup xarray is used to determine if the hardware + * context is currently registered. There are two cases in which it + * could be registered either the guc_id has been stolen from another + * context or the lrc descriptor address of this context has changed. In + * either case the context needs to be deregistered with the GuC before + * registering this context. + */ + if (context_registered) { + trace_intel_context_steal_guc_id(ce); + if (!loop) { + set_context_wait_for_deregister_to_register(ce); + intel_context_get(ce); + } else { + bool disabled; + unsigned long flags; + + /* Seal race with Reset */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + disabled = submission_disabled(guc); + if (likely(!disabled)) { + set_context_wait_for_deregister_to_register(ce); + intel_context_get(ce); + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (unlikely(disabled)) { + reset_lrc_desc(guc, desc_idx); + return 0; /* Will get registered later */ + } + } + + /* + * If stealing the guc_id, this ce has the same guc_id as the + * context whose guc_id was stolen. + */ + with_intel_runtime_pm(runtime_pm, wakeref) + ret = deregister_context(ce, ce->guc_id, loop); + if (unlikely(ret == -EBUSY)) { + clr_context_wait_for_deregister_to_register(ce); + intel_context_put(ce); + } else if (unlikely(ret == -ENODEV)) { + ret = 0; /* Will get registered later */ + } + } else { + with_intel_runtime_pm(runtime_pm, wakeref) + ret = register_context(ce, loop); + if (unlikely(ret == -EBUSY)) + reset_lrc_desc(guc, desc_idx); + else if (unlikely(ret == -ENODEV)) + ret = 0; /* Will get registered later */ + } + + return ret; +} + +static int __guc_context_pre_pin(struct intel_context *ce, + struct intel_engine_cs *engine, + struct i915_gem_ww_ctx *ww, + void **vaddr) +{ + return lrc_pre_pin(ce, engine, ww, vaddr); +} + +static int __guc_context_pin(struct intel_context *ce, + struct intel_engine_cs *engine, + void *vaddr) +{ + if (i915_ggtt_offset(ce->state) != + (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) + set_bit(CONTEXT_LRCA_DIRTY, &ce->flags); + + /* + * GuC context gets pinned in guc_request_alloc. See that function for + * explaination of why. + */ + + return lrc_pin(ce, engine, vaddr); } static int guc_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr) { - return lrc_pre_pin(ce, ce->engine, ww, vaddr); + return __guc_context_pre_pin(ce, ce->engine, ww, vaddr); } static int guc_context_pin(struct intel_context *ce, void *vaddr) { - return lrc_pin(ce, ce->engine, vaddr); + return __guc_context_pin(ce, ce->engine, vaddr); +} + +static void guc_context_unpin(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + + unpin_guc_id(guc, ce); + lrc_unpin(ce); +} + +static void guc_context_post_unpin(struct intel_context *ce) +{ + lrc_post_unpin(ce); +} + +static void __guc_context_sched_enable(struct intel_guc *guc, + struct intel_context *ce) +{ + u32 action[] = { + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET, + ce->guc_id, + GUC_CONTEXT_ENABLE + }; + + trace_intel_context_sched_enable(ce); + + guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), + G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); +} + +static void __guc_context_sched_disable(struct intel_guc *guc, + struct intel_context *ce, + u16 guc_id) +{ + u32 action[] = { + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET, + guc_id, /* ce->guc_id not stable */ + GUC_CONTEXT_DISABLE + }; + + GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); + + trace_intel_context_sched_disable(ce); + + guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), + G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); +} + +static void guc_blocked_fence_complete(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + + if (!i915_sw_fence_done(&ce->guc_blocked)) + i915_sw_fence_complete(&ce->guc_blocked); +} + +static void guc_blocked_fence_reinit(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked)); + + /* + * This fence is always complete unless a pending schedule disable is + * outstanding. We arm the fence here and complete it when we receive + * the pending schedule disable complete message. + */ + i915_sw_fence_fini(&ce->guc_blocked); + i915_sw_fence_reinit(&ce->guc_blocked); + i915_sw_fence_await(&ce->guc_blocked); + i915_sw_fence_commit(&ce->guc_blocked); +} + +static u16 prep_context_pending_disable(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + + set_context_pending_disable(ce); + clr_context_enabled(ce); + guc_blocked_fence_reinit(ce); + intel_context_get(ce); + + return ce->guc_id; +} + +static struct i915_sw_fence *guc_context_block(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; + unsigned long flags; + struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm; + intel_wakeref_t wakeref; + u16 guc_id; + bool enabled; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + + /* + * Sync with submission path, increment before below changes to context + * state. + */ + spin_lock(&sched_engine->lock); + incr_context_blocked(ce); + spin_unlock(&sched_engine->lock); + + enabled = context_enabled(ce); + if (unlikely(!enabled || submission_disabled(guc))) { + if (enabled) + clr_context_enabled(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return &ce->guc_blocked; + } + + /* + * We add +2 here as the schedule disable complete CTB handler calls + * intel_context_sched_disable_unpin (-2 to pin_count). + */ + atomic_add(2, &ce->pin_count); + + guc_id = prep_context_pending_disable(ce); + + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + with_intel_runtime_pm(runtime_pm, wakeref) + __guc_context_sched_disable(guc, ce, guc_id); + + return &ce->guc_blocked; +} + +static void guc_context_unblock(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; + unsigned long flags; + struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm; + intel_wakeref_t wakeref; + bool enable; + + GEM_BUG_ON(context_enabled(ce)); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + + if (unlikely(submission_disabled(guc) || + !intel_context_is_pinned(ce) || + context_pending_disable(ce) || + context_blocked(ce) > 1)) { + enable = false; + } else { + enable = true; + set_context_pending_enable(ce); + set_context_enabled(ce); + intel_context_get(ce); + } + + /* + * Sync with submission path, decrement after above changes to context + * state. + */ + spin_lock(&sched_engine->lock); + decr_context_blocked(ce); + spin_unlock(&sched_engine->lock); + + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + if (enable) { + with_intel_runtime_pm(runtime_pm, wakeref) + __guc_context_sched_enable(guc, ce); + } +} + +static void guc_context_cancel_request(struct intel_context *ce, + struct i915_request *rq) +{ + if (i915_sw_fence_signaled(&rq->submit)) { + struct i915_sw_fence *fence = guc_context_block(ce); + + i915_sw_fence_wait(fence); + if (!i915_request_completed(rq)) { + __i915_request_skip(rq); + guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head), + true); + } + guc_context_unblock(ce); + } +} + +static void __guc_context_set_preemption_timeout(struct intel_guc *guc, + u16 guc_id, + u32 preemption_timeout) +{ + u32 action[] = { + INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT, + guc_id, + preemption_timeout + }; + + intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); +} + +static void guc_context_ban(struct intel_context *ce, struct i915_request *rq) +{ + struct intel_guc *guc = ce_to_guc(ce); + struct intel_runtime_pm *runtime_pm = + &ce->engine->gt->i915->runtime_pm; + intel_wakeref_t wakeref; + unsigned long flags; + + guc_flush_submissions(guc); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + set_context_banned(ce); + + if (submission_disabled(guc) || + (!context_enabled(ce) && !context_pending_disable(ce))) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + guc_cancel_context_requests(ce); + intel_engine_signal_breadcrumbs(ce->engine); + } else if (!context_pending_disable(ce)) { + u16 guc_id; + + /* + * We add +2 here as the schedule disable complete CTB handler + * calls intel_context_sched_disable_unpin (-2 to pin_count). + */ + atomic_add(2, &ce->pin_count); + + guc_id = prep_context_pending_disable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + /* + * In addition to disabling scheduling, set the preemption + * timeout to the minimum value (1 us) so the banned context + * gets kicked off the HW ASAP. + */ + with_intel_runtime_pm(runtime_pm, wakeref) { + __guc_context_set_preemption_timeout(guc, guc_id, 1); + __guc_context_sched_disable(guc, ce, guc_id); + } + } else { + if (!context_guc_id_invalid(ce)) + with_intel_runtime_pm(runtime_pm, wakeref) + __guc_context_set_preemption_timeout(guc, + ce->guc_id, + 1); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + } +} + +static void guc_context_sched_disable(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; + struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; + intel_wakeref_t wakeref; + u16 guc_id; + bool enabled; + + if (submission_disabled(guc) || context_guc_id_invalid(ce) || + !lrc_desc_registered(guc, ce->guc_id)) { + clr_context_enabled(ce); + goto unpin; + } + + if (!context_enabled(ce)) + goto unpin; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + + /* + * We have to check if the context has been disabled by another thread. + * We also have to check if the context has been pinned again as another + * pin operation is allowed to pass this function. Checking the pin + * count, within ce->guc_state.lock, synchronizes this function with + * guc_request_alloc ensuring a request doesn't slip through the + * 'context_pending_disable' fence. Checking within the spin lock (can't + * sleep) ensures another process doesn't pin this context and generate + * a request before we set the 'context_pending_disable' flag here. + */ + enabled = context_enabled(ce); + if (unlikely(!enabled || submission_disabled(guc))) { + if (enabled) + clr_context_enabled(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + goto unpin; + } + if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return; + } + guc_id = prep_context_pending_disable(ce); + + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + with_intel_runtime_pm(runtime_pm, wakeref) + __guc_context_sched_disable(guc, ce, guc_id); + + return; +unpin: + intel_context_sched_disable_unpin(ce); +} + +static inline void guc_lrc_desc_unpin(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + + GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); + GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); + GEM_BUG_ON(context_enabled(ce)); + + clr_context_registered(ce); + deregister_context(ce, ce->guc_id, true); +} + +static void __guc_context_destroy(struct intel_context *ce) +{ + GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] || + ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] || + ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] || + ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]); + + lrc_fini(ce); + intel_context_fini(ce); + + if (intel_engine_is_virtual(ce->engine)) { + struct guc_virtual_engine *ve = + container_of(ce, typeof(*ve), context); + + if (ve->base.breadcrumbs) + intel_breadcrumbs_put(ve->base.breadcrumbs); + + kfree(ve); + } else { + intel_context_free(ce); + } +} + +static void guc_context_destroy(struct kref *kref) +{ + struct intel_context *ce = container_of(kref, typeof(*ce), ref); + struct intel_runtime_pm *runtime_pm = ce->engine->uncore->rpm; + struct intel_guc *guc = ce_to_guc(ce); + intel_wakeref_t wakeref; + unsigned long flags; + bool disabled; + + /* + * If the guc_id is invalid this context has been stolen and we can free + * it immediately. Also can be freed immediately if the context is not + * registered with the GuC or the GuC is in the middle of a reset. + */ + if (context_guc_id_invalid(ce)) { + __guc_context_destroy(ce); + return; + } else if (submission_disabled(guc) || + !lrc_desc_registered(guc, ce->guc_id)) { + release_guc_id(guc, ce); + __guc_context_destroy(ce); + return; + } + + /* + * We have to acquire the context spinlock and check guc_id again, if it + * is valid it hasn't been stolen and needs to be deregistered. We + * delete this context from the list of unpinned guc_ids available to + * steal to seal a race with guc_lrc_desc_pin(). When the G2H CTB + * returns indicating this context has been deregistered the guc_id is + * returned to the pool of available guc_ids. + */ + spin_lock_irqsave(&guc->contexts_lock, flags); + if (context_guc_id_invalid(ce)) { + spin_unlock_irqrestore(&guc->contexts_lock, flags); + __guc_context_destroy(ce); + return; + } + + if (!list_empty(&ce->guc_id_link)) + list_del_init(&ce->guc_id_link); + spin_unlock_irqrestore(&guc->contexts_lock, flags); + + /* Seal race with Reset */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + disabled = submission_disabled(guc); + if (likely(!disabled)) + set_context_destroyed(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (unlikely(disabled)) { + release_guc_id(guc, ce); + __guc_context_destroy(ce); + return; + } + + /* + * We defer GuC context deregistration until the context is destroyed + * in order to save on CTBs. With this optimization ideally we only need + * 1 CTB to register the context during the first pin and 1 CTB to + * deregister the context when the context is destroyed. Without this + * optimization, a CTB would be needed every pin & unpin. + * + * XXX: Need to acqiure the runtime wakeref as this can be triggered + * from context_free_worker when runtime wakeref is not held. + * guc_lrc_desc_unpin requires the runtime as a GuC register is written + * in H2G CTB to deregister the context. A future patch may defer this + * H2G CTB if the runtime wakeref is zero. + */ + with_intel_runtime_pm(runtime_pm, wakeref) + guc_lrc_desc_unpin(ce); +} + +static int guc_context_alloc(struct intel_context *ce) +{ + return lrc_alloc(ce, ce->engine); +} + +static void guc_context_set_prio(struct intel_guc *guc, + struct intel_context *ce, + u8 prio) +{ + u32 action[] = { + INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY, + ce->guc_id, + prio, + }; + + GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH || + prio > GUC_CLIENT_PRIORITY_NORMAL); + + if (ce->guc_prio == prio || submission_disabled(guc) || + !context_registered(ce)) + return; + + guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); + + ce->guc_prio = prio; + trace_intel_context_set_prio(ce); +} + +static inline u8 map_i915_prio_to_guc_prio(int prio) +{ + if (prio == I915_PRIORITY_NORMAL) + return GUC_CLIENT_PRIORITY_KMD_NORMAL; + else if (prio < I915_PRIORITY_NORMAL) + return GUC_CLIENT_PRIORITY_NORMAL; + else if (prio < I915_PRIORITY_DISPLAY) + return GUC_CLIENT_PRIORITY_HIGH; + else + return GUC_CLIENT_PRIORITY_KMD_HIGH; +} + +static inline void add_context_inflight_prio(struct intel_context *ce, + u8 guc_prio) +{ + lockdep_assert_held(&ce->guc_active.lock); + GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count)); + + ++ce->guc_prio_count[guc_prio]; + + /* Overflow protection */ + GEM_WARN_ON(!ce->guc_prio_count[guc_prio]); +} + +static inline void sub_context_inflight_prio(struct intel_context *ce, + u8 guc_prio) +{ + lockdep_assert_held(&ce->guc_active.lock); + GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count)); + + /* Underflow protection */ + GEM_WARN_ON(!ce->guc_prio_count[guc_prio]); + + --ce->guc_prio_count[guc_prio]; +} + +static inline void update_context_prio(struct intel_context *ce) +{ + struct intel_guc *guc = &ce->engine->gt->uc.guc; + int i; + + BUILD_BUG_ON(GUC_CLIENT_PRIORITY_KMD_HIGH != 0); + BUILD_BUG_ON(GUC_CLIENT_PRIORITY_KMD_HIGH > GUC_CLIENT_PRIORITY_NORMAL); + + lockdep_assert_held(&ce->guc_active.lock); + + for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) { + if (ce->guc_prio_count[i]) { + guc_context_set_prio(guc, ce, i); + break; + } + } +} + +static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio) +{ + /* Lower value is higher priority */ + return new_guc_prio < old_guc_prio; +} + +static void add_to_context(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq)); + + GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI); + + spin_lock(&ce->guc_active.lock); + list_move_tail(&rq->sched.link, &ce->guc_active.requests); + + if (rq->guc_prio == GUC_PRIO_INIT) { + rq->guc_prio = new_guc_prio; + add_context_inflight_prio(ce, rq->guc_prio); + } else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) { + sub_context_inflight_prio(ce, rq->guc_prio); + rq->guc_prio = new_guc_prio; + add_context_inflight_prio(ce, rq->guc_prio); + } + update_context_prio(ce); + + spin_unlock(&ce->guc_active.lock); +} + +static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_active.lock); + + if (rq->guc_prio != GUC_PRIO_INIT && + rq->guc_prio != GUC_PRIO_FINI) { + sub_context_inflight_prio(ce, rq->guc_prio); + update_context_prio(ce); + } + rq->guc_prio = GUC_PRIO_FINI; +} + +static void remove_from_context(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + + spin_lock_irq(&ce->guc_active.lock); + + list_del_init(&rq->sched.link); + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + guc_prio_fini(rq, ce); + + spin_unlock_irq(&ce->guc_active.lock); + + atomic_dec(&ce->guc_id_ref); + i915_request_notify_execute_cb_imm(rq); } static const struct intel_context_ops guc_context_ops = { @@ -454,28 +1976,71 @@ static const struct intel_context_ops guc_context_ops = { .pre_pin = guc_context_pre_pin, .pin = guc_context_pin, - .unpin = lrc_unpin, - .post_unpin = lrc_post_unpin, + .unpin = guc_context_unpin, + .post_unpin = guc_context_post_unpin, + + .ban = guc_context_ban, + + .cancel_request = guc_context_cancel_request, .enter = intel_context_enter_engine, .exit = intel_context_exit_engine, + .sched_disable = guc_context_sched_disable, + .reset = lrc_reset, - .destroy = lrc_destroy, + .destroy = guc_context_destroy, + + .create_virtual = guc_create_virtual, }; -static int guc_request_alloc(struct i915_request *request) +static void __guc_signal_context_fence(struct intel_context *ce) { + struct i915_request *rq; + + lockdep_assert_held(&ce->guc_state.lock); + + if (!list_empty(&ce->guc_state.fences)) + trace_intel_context_fence_release(ce); + + list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) + i915_sw_fence_complete(&rq->submit); + + INIT_LIST_HEAD(&ce->guc_state.fences); +} + +static void guc_signal_context_fence(struct intel_context *ce) +{ + unsigned long flags; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + clr_context_wait_for_deregister_to_register(ce); + __guc_signal_context_fence(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); +} + +static bool context_needs_register(struct intel_context *ce, bool new_guc_id) +{ + return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) || + !lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) && + !submission_disabled(ce_to_guc(ce)); +} + +static int guc_request_alloc(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; int ret; - GEM_BUG_ON(!intel_context_is_pinned(request->context)); + GEM_BUG_ON(!intel_context_is_pinned(rq->context)); /* * Flush enough space to reduce the likelihood of waiting after * we start building the request - in which case we will just * have to repeat work. */ - request->reserved_space += GUC_REQUEST_SIZE; + rq->reserved_space += GUC_REQUEST_SIZE; /* * Note that after this point, we have committed to using @@ -486,40 +2051,232 @@ static int guc_request_alloc(struct i915_request *request) */ /* Unconditionally invalidate GPU caches and TLBs. */ - ret = request->engine->emit_flush(request, EMIT_INVALIDATE); + ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE); if (ret) return ret; - request->reserved_space -= GUC_REQUEST_SIZE; + rq->reserved_space -= GUC_REQUEST_SIZE; + + /* + * Call pin_guc_id here rather than in the pinning step as with + * dma_resv, contexts can be repeatedly pinned / unpinned trashing the + * guc_ids and creating horrible race conditions. This is especially bad + * when guc_ids are being stolen due to over subscription. By the time + * this function is reached, it is guaranteed that the guc_id will be + * persistent until the generated request is retired. Thus, sealing these + * race conditions. It is still safe to fail here if guc_ids are + * exhausted and return -EAGAIN to the user indicating that they can try + * again in the future. + * + * There is no need for a lock here as the timeline mutex ensures at + * most one context can be executing this code path at once. The + * guc_id_ref is incremented once for every request in flight and + * decremented on each retire. When it is zero, a lock around the + * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id. + */ + if (atomic_add_unless(&ce->guc_id_ref, 1, 0)) + goto out; + + ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */ + if (unlikely(ret < 0)) + return ret; + if (context_needs_register(ce, !!ret)) { + ret = guc_lrc_desc_pin(ce, true); + if (unlikely(ret)) { /* unwind */ + if (ret == -EPIPE) { + disable_submission(guc); + goto out; /* GPU will be reset */ + } + atomic_dec(&ce->guc_id_ref); + unpin_guc_id(guc, ce); + return ret; + } + } + + clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags); + +out: + /* + * We block all requests on this context if a G2H is pending for a + * schedule disable or context deregistration as the GuC will fail a + * schedule enable or context registration if either G2H is pending + * respectfully. Once a G2H returns, the fence is released that is + * blocking these requests (see guc_signal_context_fence). + * + * We can safely check the below fields outside of the lock as it isn't + * possible for these fields to transition from being clear to set but + * converse is possible, hence the need for the check within the lock. + */ + if (likely(!context_wait_for_deregister_to_register(ce) && + !context_pending_disable(ce))) + return 0; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + if (context_wait_for_deregister_to_register(ce) || + context_pending_disable(ce)) { + i915_sw_fence_await(&rq->submit); + + list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return 0; } -static inline void queue_request(struct intel_engine_cs *engine, - struct i915_request *rq, - int prio) +static int guc_virtual_context_pre_pin(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, + void **vaddr) { - GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(engine, prio)); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0); + + return __guc_context_pre_pin(ce, engine, ww, vaddr); } -static void guc_submit_request(struct i915_request *rq) +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr) { - struct intel_engine_cs *engine = rq->engine; - unsigned long flags; + struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0); - /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->active.lock, flags); + return __guc_context_pin(ce, engine, vaddr); +} - queue_request(engine, rq, rq_prio(rq)); +static void guc_virtual_context_enter(struct intel_context *ce) +{ + intel_engine_mask_t tmp, mask = ce->engine->mask; + struct intel_engine_cs *engine; - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - GEM_BUG_ON(list_empty(&rq->sched.link)); + for_each_engine_masked(engine, ce->engine->gt, mask, tmp) + intel_engine_pm_get(engine); - tasklet_hi_schedule(&engine->execlists.tasklet); + intel_timeline_enter(ce->timeline); +} + +static void guc_virtual_context_exit(struct intel_context *ce) +{ + intel_engine_mask_t tmp, mask = ce->engine->mask; + struct intel_engine_cs *engine; - spin_unlock_irqrestore(&engine->active.lock, flags); + for_each_engine_masked(engine, ce->engine->gt, mask, tmp) + intel_engine_pm_put(engine); + + intel_timeline_exit(ce->timeline); +} + +static int guc_virtual_context_alloc(struct intel_context *ce) +{ + struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0); + + return lrc_alloc(ce, engine); +} + +static const struct intel_context_ops virtual_guc_context_ops = { + .alloc = guc_virtual_context_alloc, + + .pre_pin = guc_virtual_context_pre_pin, + .pin = guc_virtual_context_pin, + .unpin = guc_context_unpin, + .post_unpin = guc_context_post_unpin, + + .ban = guc_context_ban, + + .cancel_request = guc_context_cancel_request, + + .enter = guc_virtual_context_enter, + .exit = guc_virtual_context_exit, + + .sched_disable = guc_context_sched_disable, + + .destroy = guc_context_destroy, + + .get_sibling = guc_virtual_get_sibling, +}; + +static bool +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b) +{ + struct intel_engine_cs *sibling; + intel_engine_mask_t tmp, mask = b->engine_mask; + bool result = false; + + for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp) + result |= intel_engine_irq_enable(sibling); + + return result; +} + +static void +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b) +{ + struct intel_engine_cs *sibling; + intel_engine_mask_t tmp, mask = b->engine_mask; + + for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp) + intel_engine_irq_disable(sibling); +} + +static void guc_init_breadcrumbs(struct intel_engine_cs *engine) +{ + int i; + + /* + * In GuC submission mode we do not know which physical engine a request + * will be scheduled on, this creates a problem because the breadcrumb + * interrupt is per physical engine. To work around this we attach + * requests and direct all breadcrumb interrupts to the first instance + * of an engine per class. In addition all breadcrumb interrupts are + * enabled / disabled across an engine class in unison. + */ + for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) { + struct intel_engine_cs *sibling = + engine->gt->engine_class[engine->class][i]; + + if (sibling) { + if (engine->breadcrumbs != sibling->breadcrumbs) { + intel_breadcrumbs_put(engine->breadcrumbs); + engine->breadcrumbs = + intel_breadcrumbs_get(sibling->breadcrumbs); + } + break; + } + } + + if (engine->breadcrumbs) { + engine->breadcrumbs->engine_mask |= engine->mask; + engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs; + engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs; + } +} + +static void guc_bump_inflight_request_prio(struct i915_request *rq, + int prio) +{ + struct intel_context *ce = rq->context; + u8 new_guc_prio = map_i915_prio_to_guc_prio(prio); + + /* Short circuit function */ + if (prio < I915_PRIORITY_NORMAL || + rq->guc_prio == GUC_PRIO_FINI || + (rq->guc_prio != GUC_PRIO_INIT && + !new_guc_prio_higher(rq->guc_prio, new_guc_prio))) + return; + + spin_lock(&ce->guc_active.lock); + if (rq->guc_prio != GUC_PRIO_FINI) { + if (rq->guc_prio != GUC_PRIO_INIT) + sub_context_inflight_prio(ce, rq->guc_prio); + rq->guc_prio = new_guc_prio; + add_context_inflight_prio(ce, rq->guc_prio); + update_context_prio(ce); + } + spin_unlock(&ce->guc_active.lock); +} + +static void guc_retire_inflight_request_prio(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + + spin_lock(&ce->guc_active.lock); + guc_prio_fini(rq, ce); + spin_unlock(&ce->guc_active.lock); } static void sanitize_hwsp(struct intel_engine_cs *engine) @@ -588,21 +2345,68 @@ static int guc_resume(struct intel_engine_cs *engine) return 0; } +static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine) +{ + return !sched_engine->tasklet.callback; +} + static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request; } +static inline void guc_kernel_context_pin(struct intel_guc *guc, + struct intel_context *ce) +{ + if (context_guc_id_invalid(ce)) + pin_guc_id(guc, ce); + guc_lrc_desc_pin(ce, true); +} + +static inline void guc_init_lrc_mapping(struct intel_guc *guc) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; + + /* make sure all descriptors are clean... */ + xa_destroy(&guc->context_lookup); + + /* + * Some contexts might have been pinned before we enabled GuC + * submission, so we need to add them to the GuC bookeeping. + * Also, after a reset the of the GuC we want to make sure that the + * information shared with GuC is properly reset. The kernel LRCs are + * not attached to the gem_context, so they need to be added separately. + * + * Note: we purposefully do not check the return of guc_lrc_desc_pin, + * because that function can only fail if a reset is just starting. This + * is at the end of reset so presumably another reset isn't happening + * and even it did this code would be run again. + */ + + for_each_engine(engine, gt, id) + if (engine->kernel_context) + guc_kernel_context_pin(guc, engine->kernel_context); +} + static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ - tasklet_kill(&engine->execlists.tasklet); - intel_engine_cleanup_common(engine); lrc_fini_wa_ctx(engine); } +static void virtual_guc_bump_serial(struct intel_engine_cs *engine) +{ + struct intel_engine_cs *e; + intel_engine_mask_t tmp, mask = engine->mask; + + for_each_engine_masked(e, engine->gt, mask, tmp) + e->serial++; +} + static void guc_default_vfuncs(struct intel_engine_cs *engine) { /* Default vfuncs which can be overridden by each engine. */ @@ -611,13 +2415,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; + engine->add_active_request = add_to_context; + engine->remove_active_request = remove_from_context; - engine->schedule = i915_schedule; + engine->sched_engine->schedule = i915_schedule; - engine->reset.prepare = guc_reset_prepare; - engine->reset.rewind = guc_reset_rewind; - engine->reset.cancel = guc_reset_cancel; - engine->reset.finish = guc_reset_finish; + engine->reset.prepare = guc_reset_nop; + engine->reset.rewind = guc_rewind_nop; + engine->reset.cancel = guc_reset_nop; + engine->reset.finish = guc_reset_nop; engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; @@ -629,13 +2435,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->set_default_submission = guc_set_default_submission; engine->flags |= I915_ENGINE_HAS_PREEMPTION; + engine->flags |= I915_ENGINE_HAS_TIMESLICES; /* * TODO: GuC supports timeslicing and semaphores as well, but they're * handled by the firmware so some minor tweaks are required before * enabling. * - * engine->flags |= I915_ENGINE_HAS_TIMESLICES; * engine->flags |= I915_ENGINE_HAS_SEMAPHORES; */ @@ -666,9 +2472,21 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine) intel_engine_set_irq_handler(engine, cs_irq_handler); } +static void guc_sched_engine_destroy(struct kref *kref) +{ + struct i915_sched_engine *sched_engine = + container_of(kref, typeof(*sched_engine), ref); + struct intel_guc *guc = sched_engine->private_data; + + guc->sched_engine = NULL; + tasklet_kill(&sched_engine->tasklet); /* flush the callback */ + kfree(sched_engine); +} + int intel_guc_submission_setup(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; + struct intel_guc *guc = &engine->gt->uc.guc; /* * The setup relies on several assumptions (e.g. irqs always enabled) @@ -676,10 +2494,28 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) */ GEM_BUG_ON(GRAPHICS_VER(i915) < 11); - tasklet_setup(&engine->execlists.tasklet, guc_submission_tasklet); + if (!guc->sched_engine) { + guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL); + if (!guc->sched_engine) + return -ENOMEM; + + guc->sched_engine->schedule = i915_schedule; + guc->sched_engine->disabled = guc_sched_engine_disabled; + guc->sched_engine->private_data = guc; + guc->sched_engine->destroy = guc_sched_engine_destroy; + guc->sched_engine->bump_inflight_request_prio = + guc_bump_inflight_request_prio; + guc->sched_engine->retire_inflight_request_prio = + guc_retire_inflight_request_prio; + tasklet_setup(&guc->sched_engine->tasklet, + guc_submission_tasklet); + } + i915_sched_engine_put(engine->sched_engine); + engine->sched_engine = i915_sched_engine_get(guc->sched_engine); guc_default_vfuncs(engine); guc_default_irqs(engine); + guc_init_breadcrumbs(engine); if (engine->class == RENDER_CLASS) rcs_submission_override(engine); @@ -695,18 +2531,19 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) void intel_guc_submission_enable(struct intel_guc *guc) { - guc_stage_desc_init(guc); + guc_init_lrc_mapping(guc); } void intel_guc_submission_disable(struct intel_guc *guc) { - struct intel_gt *gt = guc_to_gt(guc); - - GEM_BUG_ON(gt->awake); /* GT should be parked first */ - /* Note: By the time we're here, GuC may have already been reset */ +} - guc_stage_desc_fini(guc); +static bool __guc_submission_supported(struct intel_guc *guc) +{ + /* GuC submission is unavailable for pre-Gen11 */ + return intel_guc_is_supported(guc) && + GRAPHICS_VER(guc_to_gt(guc)->i915) >= 11; } static bool __guc_submission_selected(struct intel_guc *guc) @@ -721,5 +2558,481 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) { + guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); } + +static inline struct intel_context * +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) +{ + struct intel_context *ce; + + if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) { + drm_err(&guc_to_gt(guc)->i915->drm, + "Invalid desc_idx %u", desc_idx); + return NULL; + } + + ce = __get_context(guc, desc_idx); + if (unlikely(!ce)) { + drm_err(&guc_to_gt(guc)->i915->drm, + "Context is NULL, desc_idx %u", desc_idx); + return NULL; + } + + return ce; +} + +static void decr_outstanding_submission_g2h(struct intel_guc *guc) +{ + if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) + wake_up_all(&guc->ct.wq); +} + +int intel_guc_deregister_done_process_msg(struct intel_guc *guc, + const u32 *msg, + u32 len) +{ + struct intel_context *ce; + u32 desc_idx = msg[0]; + + if (unlikely(len < 1)) { + drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + trace_intel_context_deregister_done(ce); + + if (context_wait_for_deregister_to_register(ce)) { + struct intel_runtime_pm *runtime_pm = + &ce->engine->gt->i915->runtime_pm; + intel_wakeref_t wakeref; + + /* + * Previous owner of this guc_id has been deregistered, now safe + * register this context. + */ + with_intel_runtime_pm(runtime_pm, wakeref) + register_context(ce, true); + guc_signal_context_fence(ce); + intel_context_put(ce); + } else if (context_destroyed(ce)) { + /* Context has been destroyed */ + release_guc_id(guc, ce); + __guc_context_destroy(ce); + } + + decr_outstanding_submission_g2h(guc); + + return 0; +} + +int intel_guc_sched_done_process_msg(struct intel_guc *guc, + const u32 *msg, + u32 len) +{ + struct intel_context *ce; + unsigned long flags; + u32 desc_idx = msg[0]; + + if (unlikely(len < 2)) { + drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + if (unlikely(context_destroyed(ce) || + (!context_pending_enable(ce) && + !context_pending_disable(ce)))) { + drm_err(&guc_to_gt(guc)->i915->drm, + "Bad context sched_state 0x%x, 0x%x, desc_idx %u", + atomic_read(&ce->guc_sched_state_no_lock), + ce->guc_state.sched_state, desc_idx); + return -EPROTO; + } + + trace_intel_context_sched_done(ce); + + if (context_pending_enable(ce)) { + clr_context_pending_enable(ce); + } else if (context_pending_disable(ce)) { + bool banned; + + /* + * Unpin must be done before __guc_signal_context_fence, + * otherwise a race exists between the requests getting + * submitted + retired before this unpin completes resulting in + * the pin_count going to zero and the context still being + * enabled. + */ + intel_context_sched_disable_unpin(ce); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + banned = context_banned(ce); + clr_context_banned(ce); + clr_context_pending_disable(ce); + __guc_signal_context_fence(ce); + guc_blocked_fence_complete(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + if (banned) { + guc_cancel_context_requests(ce); + intel_engine_signal_breadcrumbs(ce->engine); + } + } + + decr_outstanding_submission_g2h(guc); + intel_context_put(ce); + + return 0; +} + +static void capture_error_state(struct intel_guc *guc, + struct intel_context *ce) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; + struct intel_engine_cs *engine = __context_to_physical_engine(ce); + intel_wakeref_t wakeref; + + intel_engine_set_hung_context(engine, ce); + with_intel_runtime_pm(&i915->runtime_pm, wakeref) + i915_capture_error_state(gt, engine->mask); + atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]); +} + +static void guc_context_replay(struct intel_context *ce) +{ + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; + + __guc_reset_context(ce, true); + tasklet_hi_schedule(&sched_engine->tasklet); +} + +static void guc_handle_context_reset(struct intel_guc *guc, + struct intel_context *ce) +{ + trace_intel_context_reset(ce); + + if (likely(!intel_context_is_banned(ce))) { + capture_error_state(guc, ce); + guc_context_replay(ce); + } +} + +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len) +{ + struct intel_context *ce; + int desc_idx; + + if (unlikely(len != 1)) { + drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + desc_idx = msg[0]; + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + guc_handle_context_reset(guc, ce); + + return 0; +} + +static struct intel_engine_cs * +guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance) +{ + struct intel_gt *gt = guc_to_gt(guc); + u8 engine_class = guc_class_to_engine_class(guc_class); + + /* Class index is checked in class converter */ + GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE); + + return gt->engine_class[engine_class][instance]; +} + +int intel_guc_engine_failure_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len) +{ + struct intel_engine_cs *engine; + u8 guc_class, instance; + u32 reason; + + if (unlikely(len != 3)) { + drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + guc_class = msg[0]; + instance = msg[1]; + reason = msg[2]; + + engine = guc_lookup_engine(guc, guc_class, instance); + if (unlikely(!engine)) { + drm_err(&guc_to_gt(guc)->i915->drm, + "Invalid engine %d:%d", guc_class, instance); + return -EPROTO; + } + + intel_gt_handle_error(guc_to_gt(guc), engine->mask, + I915_ERROR_CAPTURE, + "GuC failed to reset %s (reason=0x%08x)\n", + engine->name, reason); + + return 0; +} + +void intel_guc_find_hung_context(struct intel_engine_cs *engine) +{ + struct intel_guc *guc = &engine->gt->uc.guc; + struct intel_context *ce; + struct i915_request *rq; + unsigned long index; + + /* Reset called during driver load? GuC not yet initialised! */ + if (unlikely(!guc_submission_initialized(guc))) + return; + + xa_for_each(&guc->context_lookup, index, ce) { + if (!intel_context_is_pinned(ce)) + continue; + + if (intel_engine_is_virtual(ce->engine)) { + if (!(ce->engine->mask & engine->mask)) + continue; + } else { + if (ce->engine != engine) + continue; + } + + list_for_each_entry(rq, &ce->guc_active.requests, sched.link) { + if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE) + continue; + + intel_engine_set_hung_context(engine, ce); + + /* Can only cope with one hang at a time... */ + return; + } + } +} + +void intel_guc_dump_active_requests(struct intel_engine_cs *engine, + struct i915_request *hung_rq, + struct drm_printer *m) +{ + struct intel_guc *guc = &engine->gt->uc.guc; + struct intel_context *ce; + unsigned long index; + unsigned long flags; + + /* Reset called during driver load? GuC not yet initialised! */ + if (unlikely(!guc_submission_initialized(guc))) + return; + + xa_for_each(&guc->context_lookup, index, ce) { + if (!intel_context_is_pinned(ce)) + continue; + + if (intel_engine_is_virtual(ce->engine)) { + if (!(ce->engine->mask & engine->mask)) + continue; + } else { + if (ce->engine != engine) + continue; + } + + spin_lock_irqsave(&ce->guc_active.lock, flags); + intel_engine_dump_active_requests(&ce->guc_active.requests, + hung_rq, m); + spin_unlock_irqrestore(&ce->guc_active.lock, flags); + } +} + +void intel_guc_submission_print_info(struct intel_guc *guc, + struct drm_printer *p) +{ + struct i915_sched_engine *sched_engine = guc->sched_engine; + struct rb_node *rb; + unsigned long flags; + + if (!sched_engine) + return; + + drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n", + atomic_read(&guc->outstanding_submission_g2h)); + drm_printf(p, "GuC tasklet count: %u\n\n", + atomic_read(&sched_engine->tasklet.count)); + + spin_lock_irqsave(&sched_engine->lock, flags); + drm_printf(p, "Requests in GuC submit tasklet:\n"); + for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) { + struct i915_priolist *pl = to_priolist(rb); + struct i915_request *rq; + + priolist_for_each_request(rq, pl) + drm_printf(p, "guc_id=%u, seqno=%llu\n", + rq->context->guc_id, + rq->fence.seqno); + } + spin_unlock_irqrestore(&sched_engine->lock, flags); + drm_printf(p, "\n"); +} + +static inline void guc_log_context_priority(struct drm_printer *p, + struct intel_context *ce) +{ + int i; + + drm_printf(p, "\t\tPriority: %d\n", + ce->guc_prio); + drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n"); + for (i = GUC_CLIENT_PRIORITY_KMD_HIGH; + i < GUC_CLIENT_PRIORITY_NUM; ++i) { + drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n", + i, ce->guc_prio_count[i]); + } + drm_printf(p, "\n"); +} + +void intel_guc_submission_print_context_info(struct intel_guc *guc, + struct drm_printer *p) +{ + struct intel_context *ce; + unsigned long index; + + xa_for_each(&guc->context_lookup, index, ce) { + drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id); + drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca); + drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n", + ce->ring->head, + ce->lrc_reg_state[CTX_RING_HEAD]); + drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n", + ce->ring->tail, + ce->lrc_reg_state[CTX_RING_TAIL]); + drm_printf(p, "\t\tContext Pin Count: %u\n", + atomic_read(&ce->pin_count)); + drm_printf(p, "\t\tGuC ID Ref Count: %u\n", + atomic_read(&ce->guc_id_ref)); + drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n", + ce->guc_state.sched_state, + atomic_read(&ce->guc_sched_state_no_lock)); + + guc_log_context_priority(p, ce); + } +} + +static struct intel_context * +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) +{ + struct guc_virtual_engine *ve; + struct intel_guc *guc; + unsigned int n; + int err; + + ve = kzalloc(sizeof(*ve), GFP_KERNEL); + if (!ve) + return ERR_PTR(-ENOMEM); + + guc = &siblings[0]->gt->uc.guc; + + ve->base.i915 = siblings[0]->i915; + ve->base.gt = siblings[0]->gt; + ve->base.uncore = siblings[0]->uncore; + ve->base.id = -1; + + ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; + ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; + ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; + ve->base.saturated = ALL_ENGINES; + + snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); + + ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine); + + ve->base.cops = &virtual_guc_context_ops; + ve->base.request_alloc = guc_request_alloc; + ve->base.bump_serial = virtual_guc_bump_serial; + + ve->base.submit_request = guc_submit_request; + + ve->base.flags = I915_ENGINE_IS_VIRTUAL; + + intel_context_init(&ve->context, &ve->base); + + for (n = 0; n < count; n++) { + struct intel_engine_cs *sibling = siblings[n]; + + GEM_BUG_ON(!is_power_of_2(sibling->mask)); + if (sibling->mask & ve->base.mask) { + DRM_DEBUG("duplicate %s entry in load balancer\n", + sibling->name); + err = -EINVAL; + goto err_put; + } + + ve->base.mask |= sibling->mask; + + if (n != 0 && ve->base.class != sibling->class) { + DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n", + sibling->class, ve->base.class); + err = -EINVAL; + goto err_put; + } else if (n == 0) { + ve->base.class = sibling->class; + ve->base.uabi_class = sibling->uabi_class; + snprintf(ve->base.name, sizeof(ve->base.name), + "v%dx%d", ve->base.class, count); + ve->base.context_size = sibling->context_size; + + ve->base.add_active_request = + sibling->add_active_request; + ve->base.remove_active_request = + sibling->remove_active_request; + ve->base.emit_bb_start = sibling->emit_bb_start; + ve->base.emit_flush = sibling->emit_flush; + ve->base.emit_init_breadcrumb = + sibling->emit_init_breadcrumb; + ve->base.emit_fini_breadcrumb = + sibling->emit_fini_breadcrumb; + ve->base.emit_fini_breadcrumb_dw = + sibling->emit_fini_breadcrumb_dw; + ve->base.breadcrumbs = + intel_breadcrumbs_get(sibling->breadcrumbs); + + ve->base.flags |= sibling->flags; + + ve->base.props.timeslice_duration_ms = + sibling->props.timeslice_duration_ms; + ve->base.props.preempt_timeout_ms = + sibling->props.preempt_timeout_ms; + } + } + + return &ve->context; + +err_put: + intel_context_put(&ve->context); + return ERR_PTR(err); +} + +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve) +{ + struct intel_engine_cs *engine; + intel_engine_mask_t tmp, mask = ve->mask; + + for_each_engine_masked(engine, ve->gt, mask, tmp) + if (READ_ONCE(engine->props.heartbeat_interval_ms)) + return true; + + return false; +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..c7ef44fa0c36 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@ #include "intel_guc.h" +struct drm_printer; struct intel_engine_cs; void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,11 +21,24 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_submission_print_info(struct intel_guc *guc, + struct drm_printer *p); +void intel_guc_submission_print_context_info(struct intel_guc *guc, + struct drm_printer *p); +void intel_guc_dump_active_requests(struct intel_engine_cs *engine, + struct i915_request *hung_rq, + struct drm_printer *m); + +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve); + +int intel_guc_wait_for_pending_msg(struct intel_guc *guc, + atomic_t *wait_var, + bool interruptible, + long timeout); static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { - /* XXX: GuC submission is unavailable for now */ - return false; + return guc->submission_supported; } static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..b104fb7607eb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,14 @@ static void uc_expand_default_options(struct intel_uc *uc) return; } - /* Default: enable HuC authentication only */ - i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + /* Intermediate platforms are HuC authentication only */ + if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { + i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + return; + } + + /* Default: enable HuC authentication and GuC submission */ + i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION; } /* Reset GuC providing us with fresh state for both GuC and HuC. @@ -69,16 +75,18 @@ static void __confirm_options(struct intel_uc *uc) struct drm_i915_private *i915 = uc_to_gt(uc)->i915; drm_dbg(&i915->drm, - "enable_guc=%d (guc:%s submission:%s huc:%s)\n", + "enable_guc=%d (guc:%s submission:%s huc:%s slpc:%s)\n", i915->params.enable_guc, yesno(intel_uc_wants_guc(uc)), yesno(intel_uc_wants_guc_submission(uc)), - yesno(intel_uc_wants_huc(uc))); + yesno(intel_uc_wants_huc(uc)), + yesno(intel_uc_wants_guc_slpc(uc))); if (i915->params.enable_guc == 0) { GEM_BUG_ON(intel_uc_wants_guc(uc)); GEM_BUG_ON(intel_uc_wants_guc_submission(uc)); GEM_BUG_ON(intel_uc_wants_huc(uc)); + GEM_BUG_ON(intel_uc_wants_guc_slpc(uc)); return; } @@ -120,6 +128,11 @@ void intel_uc_init_early(struct intel_uc *uc) uc->ops = &uc_ops_off; } +void intel_uc_init_late(struct intel_uc *uc) +{ + intel_guc_init_late(&uc->guc); +} + void intel_uc_driver_late_release(struct intel_uc *uc) { } @@ -207,21 +220,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc) spin_unlock_irq(&guc->irq_lock); } -static void guc_reset_interrupts(struct intel_guc *guc) -{ - guc->interrupts.reset(guc); -} - -static void guc_enable_interrupts(struct intel_guc *guc) -{ - guc->interrupts.enable(guc); -} - -static void guc_disable_interrupts(struct intel_guc *guc) -{ - guc->interrupts.disable(guc); -} - static int guc_enable_communication(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); @@ -242,7 +240,7 @@ static int guc_enable_communication(struct intel_guc *guc) guc_get_mmio_msg(guc); guc_handle_mmio_msg(guc); - guc_enable_interrupts(guc); + intel_guc_enable_interrupts(guc); /* check for CT messages received before we enabled interrupts */ spin_lock_irq(>->irq_lock); @@ -265,7 +263,7 @@ static void guc_disable_communication(struct intel_guc *guc) */ guc_clear_mmio_msg(guc); - guc_disable_interrupts(guc); + intel_guc_disable_interrupts(guc); intel_guc_ct_disable(&guc->ct); @@ -323,9 +321,6 @@ static int __uc_init(struct intel_uc *uc) if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) return -ENOMEM; - /* XXX: GuC submission is unavailable for now */ - GEM_BUG_ON(intel_uc_uses_guc_submission(uc)); - ret = intel_guc_init(guc); if (ret) return ret; @@ -463,7 +458,7 @@ static int __uc_init_hw(struct intel_uc *uc) if (ret) goto err_out; - guc_reset_interrupts(guc); + intel_guc_reset_interrupts(guc); /* WaEnableuKernelHeaderValidFix:skl */ /* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */ @@ -505,12 +500,21 @@ static int __uc_init_hw(struct intel_uc *uc) if (intel_uc_uses_guc_submission(uc)) intel_guc_submission_enable(guc); + if (intel_uc_uses_guc_slpc(uc)) { + ret = intel_guc_slpc_enable(&guc->slpc); + if (ret) + goto err_submission; + } + drm_info(&i915->drm, "%s firmware %s version %u.%u %s:%s\n", intel_uc_fw_type_repr(INTEL_UC_FW_TYPE_GUC), guc->fw.path, guc->fw.major_ver_found, guc->fw.minor_ver_found, "submission", enableddisabled(intel_uc_uses_guc_submission(uc))); + drm_info(&i915->drm, "GuC SLPC: %s\n", + enableddisabled(intel_uc_uses_guc_slpc(uc))); + if (intel_uc_uses_huc(uc)) { drm_info(&i915->drm, "%s firmware %s version %u.%u %s:%s\n", intel_uc_fw_type_repr(INTEL_UC_FW_TYPE_HUC), @@ -525,6 +529,8 @@ static int __uc_init_hw(struct intel_uc *uc) /* * We've failed to load the firmware :( */ +err_submission: + intel_guc_submission_disable(guc); err_log_capture: __uc_capture_load_err_log(uc); err_out: @@ -565,23 +571,67 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; - if (!intel_guc_is_ready(guc)) + uc->reset_in_progress = true; + + /* Nothing to do if GuC isn't supported */ + if (!intel_uc_supports_guc(uc)) return; + /* Firmware expected to be running when this function is called */ + if (!intel_guc_is_ready(guc)) + goto sanitize; + + if (intel_uc_uses_guc_submission(uc)) + intel_guc_submission_reset_prepare(guc); + +sanitize: __uc_sanitize(uc); } +void intel_uc_reset(struct intel_uc *uc, bool stalled) +{ + struct intel_guc *guc = &uc->guc; + + /* Firmware can not be running when this function is called */ + if (intel_uc_uses_guc_submission(uc)) + intel_guc_submission_reset(guc, stalled); +} + +void intel_uc_reset_finish(struct intel_uc *uc) +{ + struct intel_guc *guc = &uc->guc; + + uc->reset_in_progress = false; + + /* Firmware expected to be running when this function is called */ + if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc)) + intel_guc_submission_reset_finish(guc); +} + +void intel_uc_cancel_requests(struct intel_uc *uc) +{ + struct intel_guc *guc = &uc->guc; + + /* Firmware can not be running when this function is called */ + if (intel_uc_uses_guc_submission(uc)) + intel_guc_submission_cancel_requests(guc); +} + void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; - int err; if (!intel_guc_is_ready(guc)) return; - err = intel_guc_suspend(guc); - if (err) - DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err); + /* + * Wait for any outstanding CTB before tearing down communication /w the + * GuC. + */ +#define OUTSTANDING_CTB_TIMEOUT_PERIOD (HZ / 5) + intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h, + false, OUTSTANDING_CTB_TIMEOUT_PERIOD); + GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); guc_disable_communication(guc); } @@ -590,17 +640,22 @@ void intel_uc_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; intel_wakeref_t wakeref; + int err; if (!intel_guc_is_ready(guc)) return; - with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref) - intel_uc_runtime_suspend(uc); + with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) { + err = intel_guc_suspend(guc); + if (err) + DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err); + } } static int __uc_resume(struct intel_uc *uc, bool enable_communication) { struct intel_guc *guc = &uc->guc; + struct intel_gt *gt = guc_to_gt(guc); int err; if (!intel_guc_is_fw_running(guc)) @@ -612,6 +667,13 @@ static int __uc_resume(struct intel_uc *uc, bool enable_communication) if (enable_communication) guc_enable_communication(guc); + /* If we are only resuming GuC communication but not reloading + * GuC, we need to ensure the ARAT timer interrupt is enabled + * again. In case of GuC reload, it is enabled during SLPC enable. + */ + if (enable_communication && intel_uc_uses_guc_slpc(uc)) + intel_guc_pm_intrmsk_enable(gt); + err = intel_guc_resume(guc); if (err) { DRM_DEBUG_DRIVER("Failed to resume GuC, err=%d", err); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index 9c954c589edf..866b462821c0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -7,7 +7,9 @@ #define _INTEL_UC_H_ #include "intel_guc.h" +#include "intel_guc_rc.h" #include "intel_guc_submission.h" +#include "intel_guc_slpc.h" #include "intel_huc.h" #include "i915_params.h" @@ -30,13 +32,19 @@ struct intel_uc { /* Snapshot of GuC log from last failed load */ struct drm_i915_gem_object *load_err_log; + + bool reset_in_progress; }; void intel_uc_init_early(struct intel_uc *uc); +void intel_uc_init_late(struct intel_uc *uc); void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); @@ -77,10 +85,17 @@ __uc_state_checker(x, func, uses, used) uc_state_checkers(guc, guc); uc_state_checkers(huc, huc); uc_state_checkers(guc, guc_submission); +uc_state_checkers(guc, guc_slpc); +uc_state_checkers(guc, guc_rc); #undef uc_state_checkers #undef __uc_state_checker +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout) +{ + return intel_guc_wait_for_idle(&uc->guc, timeout); +} + #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \ static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \ { \ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index df647c9a8d56..3a16d08608a5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -48,19 +48,20 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw, * firmware as TGL. */ #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \ - fw_def(ALDERLAKE_S, 0, guc_def(tgl, 49, 0, 1), huc_def(tgl, 7, 5, 0)) \ - fw_def(ROCKETLAKE, 0, guc_def(tgl, 49, 0, 1), huc_def(tgl, 7, 5, 0)) \ - fw_def(TIGERLAKE, 0, guc_def(tgl, 49, 0, 1), huc_def(tgl, 7, 5, 0)) \ - fw_def(JASPERLAKE, 0, guc_def(ehl, 49, 0, 1), huc_def(ehl, 9, 0, 0)) \ - fw_def(ELKHARTLAKE, 0, guc_def(ehl, 49, 0, 1), huc_def(ehl, 9, 0, 0)) \ - fw_def(ICELAKE, 0, guc_def(icl, 49, 0, 1), huc_def(icl, 9, 0, 0)) \ - fw_def(COMETLAKE, 5, guc_def(cml, 49, 0, 1), huc_def(cml, 4, 0, 0)) \ - fw_def(COMETLAKE, 0, guc_def(kbl, 49, 0, 1), huc_def(kbl, 4, 0, 0)) \ - fw_def(COFFEELAKE, 0, guc_def(kbl, 49, 0, 1), huc_def(kbl, 4, 0, 0)) \ - fw_def(GEMINILAKE, 0, guc_def(glk, 49, 0, 1), huc_def(glk, 4, 0, 0)) \ - fw_def(KABYLAKE, 0, guc_def(kbl, 49, 0, 1), huc_def(kbl, 4, 0, 0)) \ - fw_def(BROXTON, 0, guc_def(bxt, 49, 0, 1), huc_def(bxt, 2, 0, 0)) \ - fw_def(SKYLAKE, 0, guc_def(skl, 49, 0, 1), huc_def(skl, 2, 0, 0)) + fw_def(ALDERLAKE_P, 0, guc_def(adlp, 62, 0, 3), huc_def(tgl, 7, 9, 3)) \ + fw_def(ALDERLAKE_S, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ + fw_def(ROCKETLAKE, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ + fw_def(TIGERLAKE, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ + fw_def(JASPERLAKE, 0, guc_def(ehl, 62, 0, 0), huc_def(ehl, 9, 0, 0)) \ + fw_def(ELKHARTLAKE, 0, guc_def(ehl, 62, 0, 0), huc_def(ehl, 9, 0, 0)) \ + fw_def(ICELAKE, 0, guc_def(icl, 62, 0, 0), huc_def(icl, 9, 0, 0)) \ + fw_def(COMETLAKE, 5, guc_def(cml, 62, 0, 0), huc_def(cml, 4, 0, 0)) \ + fw_def(COMETLAKE, 0, guc_def(kbl, 62, 0, 0), huc_def(kbl, 4, 0, 0)) \ + fw_def(COFFEELAKE, 0, guc_def(kbl, 62, 0, 0), huc_def(kbl, 4, 0, 0)) \ + fw_def(GEMINILAKE, 0, guc_def(glk, 62, 0, 0), huc_def(glk, 4, 0, 0)) \ + fw_def(KABYLAKE, 0, guc_def(kbl, 62, 0, 0), huc_def(kbl, 4, 0, 0)) \ + fw_def(BROXTON, 0, guc_def(bxt, 62, 0, 0), huc_def(bxt, 2, 0, 0)) \ + fw_def(SKYLAKE, 0, guc_def(skl, 62, 0, 0), huc_def(skl, 2, 0, 0)) #define __MAKE_UC_FW_PATH(prefix_, name_, major_, minor_, patch_) \ "i915/" \ |