linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	drm/xe: Add sysfs entries for engines under its GT	Tejas Upadhyay	2023-12-21	4	-0/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add engines sysfs directory under its GT and create sub directory for all engine class (note its not per instance) present on GT. For example, DUT# cat /sys/class/drm/cardX/device/tileN/gtN/engines/ bcs/ ccs/ V9 : - Add missing drmm_add_action_or_reset V8 : - Rebase V7 : - Remove xe_gt.h from .h and include in .c - Matt V6 : - Add kernel doc and arrange file in make file by alphabet - Matt V5 : - replace xe_engine with xe_hw_engine - Matt V4 : - Rebase to resolve conflicts - CI V3 : - Move code in its own file - Rename API name V2 : - Correct class mask logic - Himal - Remove extra parenthesis Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Rename engine to exec_queue	Francois Dugast	2023-12-21	45	-1637/+1636
\| \| \| \| \| \| \| \| \| \| \| \|	Engine was inappropriately used to refer to execution queues and it also created some confusion with hardware engines. Where it applies the exec_queue variable name is changed to q and comments are also updated. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Rename xe_engine.[ch] to xe_exec_queue.[ch]	Francois Dugast	2023-12-21	15	-14/+14
\| \| \| \| \| \| \| \|	This is a preparation commit for a larger renaming of engine to exec queue. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Fix error paths of __xe_bo_create_locked	Maarten Lankhorst	2023-12-21	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ttm_bo_init_reserved() calls the destroy() callback if it fails. Because of this, __xe_bo_create_locked is required to be responsible for freeing the bo even when it's passed in as argument. Additionally, if the placement check fails, the bo was kept alive. Fix it too. Reported-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: remove header variable from parse_g2h_msg	Matthew Brost	2023-12-21	1	-2/+1
\| \| \| \| \| \| \| \| \|	The header variable is unused, remove it. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Suggested-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel	Francois Dugast	2023-12-21	33	-219/+218
\| \| \| \| \| \| \| \| \| \| \|	Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/macro: Remove unused constant	Francois Dugast	2023-12-21	1	-1/+0
\| \| \| \| \| \| \| \|	Remove XE_EXTRA_DEBUG for cleanup as it is not used. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Add define WQ_HEADER_SIZE	Matthew Brost	2023-12-21	1	-2/+3
\| \| \| \| \| \| \| \| \|	Previously used a a magic '+ 3', use define instead. Suggested-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Remove ct->fence_context	Matthew Brost	2023-12-21	2	-3/+0
\| \| \| \| \| \| \| \| \|	This is unused, remove it. Suggested-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Remove XE_GUC_CT_SELFTEST	Matthew Brost	2023-12-21	4	-98/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	XE_GUC_CT_SELFTEST enabled a debugfs entry to which ran a very simple selftest ensuring the GuC CT code worked. This was added before the kunit framework was available and before submissions were working too. This test isn't worth porting over to the kunit frame as if the GuC CT didn't work, literally almost nothing would work so just remove this. Suggested-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/mtl: Reduce Wa_14018575942 scope to the CCS engine	Matt Roper	2023-12-21	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MTL version of Wa_14018575942 has been updated to suggest only applying the register change on the CCS engine. Note that DG2 and PVC have a functionally equivalent workaround with Wa_18018781329; for now that one is still applying to all engines, although we'll keep an eye on it in case it changes to be CCS-specific too. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230728175601.2343755-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Ensure memory eviction on s2idle.	Rodrigo Vivi	2023-12-21	2	-22/+43
\| \| \| \| \| \| \| \| \|	On discrete cards we cannot allow the pci subsystem to skip the regular suspend and we need to unblock the d3cold. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Only init runtime PM after all d3cold config is in place.	Rodrigo Vivi	2023-12-21	1	-1/+3
\| \| \| \| \| \| \| \| \|	We cannot allow runtime pm suspend after we configured the d3cold capable and threshold. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Fix the runtime_idle call and d3cold.allowed decision.	Rodrigo Vivi	2023-12-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to Documentation/power/runtime_pm.txt: int pm_runtime_put(struct device dev); - decrement the device's usage counter; if the result is 0 then run pm_request_idle(dev) and return its result int pm_runtime_put_autosuspend(struct device dev); - decrement the device's usage counter; if the result is 0 then run pm_request_autosuspend(dev) and return its result We need to ensure that the idle function is called before suspending so we take the right d3cold.allowed decision and respect the values set on vram_d3cold_threshold sysfs. So we need pm_runtime_put() instead of pm_runtime_put_autosuspend(). Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Tested-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Move d3cold_allowed decision all together.	Rodrigo Vivi	2023-12-21	3	-15/+12
\| \| \| \| \| \| \| \| \| \| \|	And let's use the VRAM threshold to keep d3cold temporarily disabled. With this we have the ability to run D3Cold experiments just by touching the vram_d3cold_threshold sysfs entry. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Only set PCI d3cold_allowed when we are really allowing.	Rodrigo Vivi	2023-12-21	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First of all it was strange to see: if (allowed) { ... } else { D3COLD_ENABLE } But besides this misalignment, let's also use the pci d3cold_allowed useful to us and know that we are not really allowing d3cold. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Introduce fault injection for gt reset	Himal Prasad Ghimiray	2023-12-21	3	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	To trigger gt reset failure: echo 100 > /sys/kernel/debug/dri/<cardX>/fail_gt_reset/probability echo 2 > /sys/kernel/debug/dri/<cardX>/fail_gt_reset/times Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Notify Userspace when gt reset fails	Himal Prasad Ghimiray	2023-12-21	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Send uevent in case of gt reset failure. This intimation can be used by userspace monitoring tool to do the device level reset/reboot when GT reset fails. udevadm can be used to monitor the uevents. v2: - Support only gt failure notification (Rodrigo) v3 - Rectify the comments in header file. v4 - Use pci kobj instead of drm kobj for notification.(Rodrigo) - Cleanup (Badal) v5 - Add tile id and gt id as additional info provided by uevent. - Provide code documentation for the uevent. (Rodrigo) Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Invert mask and val in xe_mmio_wait32.	Rodrigo Vivi	2023-12-21	7	-24/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The order: 'offset, mask, val'; is more common in other drivers and in special in i915, where any dev could copy a sequence and end up with unexpected behavior. Done with coccinelle: @rule1@ expression gt, reg, val, mask, timeout, out, atomic; @@ - xe_mmio_wait32(gt, reg, val, mask, timeout, out, atomic) + xe_mmio_wait32(gt, reg, mask, val, timeout, out, atomic) spatch -sp_file mmio.cocci .c .h compat-i915-headers/intel_uncore.h \ --in-place v2: Rebased after changes on xe_guc_mcr usage of xe_mmio_wait32. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Fix an invalid locking wait context bug	Rodrigo Vivi	2023-12-21	1	-6/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot have spin locks around xe_irq_reset, since it will call the intel_display_power_is_enabled() function, and that needs a mutex lock. Hence causing the undesired "[ BUG: Invalid wait context ]" We cannot convert i915's power domain lock to spin lock due to the nested dependency of non-atomic context waits. So, let's move the xe_irq_reset functions from the critical area, while still ensuring that we are protecting the irq.enabled and ensuring the right serialization in the irq handlers. v2: On the first version, I had missed the fact that irq.enabled is checked on the xe/display glue layer, and that i915 display code is actually using the irq spin lock properly. So, this got changed to a version suggested by Matthew Auld. v3: do not use lockdep_assert for display glue. do not save restore irq from inside IRQ or we can get bogus irq restore warnings Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/463 Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Sort xe_regs.h	Lucas De Marchi	2023-12-21	1	-30/+33
\| \| \| \| \| \| \| \| \| \| \| \|	Sort it by register address to make it easy to update when needed. v2: Do not create exception for registers with same functionality. Always sort it. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-11-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Carve out top of DSM as reserved	Lucas De Marchi	2023-12-21	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \|	Top of DSM contains the WOPCM where kernel driver shouldn't access as it contains data from other HW agents. Carve it out from the stolen memory. On a MTL system, the output now matches the expected values: Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-10-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Fix MTL+ stolen memory mapping	Lucas De Marchi	2023-12-21	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Based on commit 8d8d062be6b9 ("drm/i915/mtl: Fix MTL stolen memory GGTT mapping"). For stolen on MTL and beyond, the address in the PTE is the offset from DSM base. While at it, update the comments explaining each part of the calculation. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-9-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Set PTE_DM bit for stolen on MTL	Lucas De Marchi	2023-12-21	5	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Integrated graphics 1270 and beyond should set the PTE_LM bit in the PTE when it's stolen memory. Add a new function, xe_bo_is_stolen_devmem(), and use it when encoding the PTE. In some places in the spec the PTE bit is called "Local Memory", abbreviated as LM, and in others it's called "Device Memory" (DM). Since we moved away from "Local Memory" and preferred the "vram" terminology, also rename the macros as DM to follow the name of the new function. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Decouple vram check from xe_bo_addr()	Lucas De Marchi	2023-12-21	6	-43/+23
\| \| \| \| \| \| \| \| \| \| \|	The output arg is_vram in xe_bo_addr() is unused by several callers. It's also not what the function is mainly doing. Remove the argument and let the interested callers to call xe_bo_is_vram(). Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Remove vma arg from xe_pte_encode()	Lucas De Marchi	2023-12-21	4	-48/+13
\| \| \| \| \| \| \| \| \| \|	All the callers pass a NULL vma, so the buffer is always the BO. Remove the argument and the side effects of dealing with it. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: fix mcr semaphore locking for MTL	Daniele Ceraolo Spurio	2023-12-21	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	in commit 81593af6c88d ("drm/xe: Convert xe_mmio_wait32 to us so we can stop using wait_for_us.") the mcr semaphore register read was accidentally switched from waiting for the register to go to 1 to waiting for the register to go to 0, so we need to flip it back. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Fix checking for unset value	Lucas De Marchi	2023-12-21	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 37430402618d ("drm/xe: NULL binding implementation") introduced the NULL binding implementation, but left a case in which the out value is_vram is not set and the caller will use whatever was on stack. Eventually the is_vram out could be removed, but this should at least fix the current bug. Fixes: 37430402618d ("drm/xe: NULL binding implementation") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/engine: add missing rpm for bind engines	Matthew Auld	2023-12-21	2	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bind engines need to use the migration vm, however we don't have any rpm for such a vm, otherwise the kernel would prevent rpm suspend-resume. There are two issues here, first is the actual engine create which needs to touch the lrc, but since that is in VRAM we trigger loads of missing mem_access asserts. The second issue is when destroying the actual engine, which requires GuC CT to deregister the context. v2 (Rodrigo): - Just use ENGINE_FLAG_VM as the indicator that we need to hold an rpm ref. This also handles the case in xe_vm_create() where we create default bind engines. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/499 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/504 Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Signal out-syncs on VM binds if no operations	Matthew Brost	2023-12-21	1	-0/+2
\| \| \| \| \| \| \| \| \|	If no operations are generated for VM binds the out-syncs must still be signaled. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Always use xe_vm_queue_rebind_worker helper	Matthew Brost	2023-12-21	2	-9/+8
\| \| \| \| \| \| \| \| \|	Do not queue the rebind worker directly, rather use the helper xe_vm_queue_rebind_worker. This ensures we use the correct work queue. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Invert guc vs execlists parameters and info.	Rodrigo Vivi	2023-12-21	6	-14/+9
\| \| \| \| \| \| \| \|	The module parameter should reflect the name of the optional, experimental and unsafe option, rather than the default one. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
*	drm/xe/uapi: Remove XE_QUERY_CONFIG_FLAGS_USE_GUC	Rodrigo Vivi	2023-12-21	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	This config is the only real one. If execlist remains in the code it will forever be experimental and we shouldn't maintain an uapi like that for that experimental piece of code that should never be used by real users. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
*	drm/xe: fully turn on small-bar support	Matthew Auld	2023-12-21	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This allows vram_size > io_size, instead of just clamping the vram size to the BAR size, now that the driver supports it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/uapi: add the userspace bits for small-bar	Matthew Auld	2023-12-21	4	-4/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly the same as i915. We add a new hint for userspace to force an object into the mappable part of vram. We also need to tell userspace how large the mappable part is. In Vulkan for example, there will be two vram heaps for small-bar systems. And here the size of each heap needs to be known. Likewise the used/avail tracking needs to account for the mappable part. We also limit the available tracking going forward, such that we limit to privileged users only, since these values are system wide and are technically considered an info leak. v2 (Maarten): - s/NEEDS_CPU_ACCESS/NEEDS_VISIBLE_VRAM/ in the uapi. We also no longer require smem as an extra placement. This is more flexible, and lets us use this for clear-color surfaces, since we need CPU access there but we don't want to attach smem, since that effectively disables CCS from kernel pov. - Reject clear-color CCS buffers where NEEDS_VISIBLE_VRAM is not set, instead of migrating it behind the scenes. v3 (José): - Split the changes that limit the accounting for perfmon_capable() into a separate patch. - Use XE_BO_CREATE_VRAM_MASK. v4 (Gwan-gyeong Mun): - Add some kernel-doc for the query bits. v5: - One small kernel-doc correction. The cpu_visible_size and corresponding used tracking are always zero for non XE_MEM_REGION_CLASS_VRAM. v6: - Without perfmon_capable() it likely makes more sense to report as zero, instead of reporting as used == total size. This should give similar behaviour as i915 which rather tracks free instead of used. - Only enforce NEEDS_VISIBLE_VRAM on rc_ccs_cc_plane surfaces when the device is actually small-bar. Testcase: igt/tests/xe_query Testcase: igt/tests/xe_mmap@small-bar Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Filip Hazubski <filip.hazubski@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: Effie Yu <effie.yu@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/bo: support tiered vram allocation for small-bar	Matthew Auld	2023-12-21	4	-16/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the new flag XE_BO_NEEDS_CPU_ACCESS, to force allocating in the mappable part of vram. If no flag is specified we do a topdown allocation, to limit the chances of stealing the precious mappable part, if we don't need it. If this is a full-bar system, then this all gets nooped. For kernel users, it looks like xe_bo_create_pin_map() is the central place which users should call if they want CPU access to the object, so add the flag there. We still need to plumb this through for userspace allocations. Also it looks like page-tables are using pin_map(), which is less than ideal. If we can already use the GPU to do page-table management, then maybe we should just force that for small-bar. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: xe_engine_create_ioctl should check gt_count, not tile_count	Matt Roper	2023-12-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Platforms like MTL only have a single tile, but multiple GTs. Ensure XE_ENGINE_CREATE accepts engine creation on gt1 on such platforms. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230725003433.1992137-4-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/mtl: Map PPGTT as CPU:WC	Matt Roper	2023-12-21	3	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On MTL and beyond, the GPU performs non-coherent accesses to the PPGTT page tables. These page tables should be mapped as CPU:WC. Removes CAT errors triggered by xe_exec_basic@once-basic on MTL: xe 0000:00:02.0: [drm:__xe_pt_bind_vma [xe]] Preparing bind, with range [1a0000...1a0fff) engine 0000000000000000. xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 1 entries to update xe 0000:00:02.0: [drm:xe_vm_dbg_print_entries [xe]] 0: Update level 3 at (0 + 1) [0...8000000000) f:0 xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2 xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4 v2: - Rename to XE_BO_PAGETABLE to make it more clear that this BO is the pagetable itself, rather than just being bound in the PPGTT. (Lucas) Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://lore.kernel.org/r/20230725003433.1992137-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: add lockdep annotation for xe_device_mem_access_put()	Matthew Auld	2023-12-21	3	-2/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main motivation is with d3cold which will make the suspend and resume callbacks even more scary, but is useful regardless. We already have the needed annotation on the acquire side with xe_device_mem_access_get(), and by adding the annotation on the release side we should have a lot more confidence that our locking hierarchy is correct. v2: - Move the annotation into both callbacks for better symmetry. Also don't hold over the entire mem_access_get(); we only need to lockep to understand what is being held upon entering mem_access_get(), and how that matches up with locks in the callbacks. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Use migrate engine for page fault binds	Matthew Brost	2023-12-21	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	We must use migrate engine for page fault binds in order to avoid a deadlock as the migrate engine has a reserved BCS instance which cannot be stuck on a fault. To use the migrate engine the engine argument to xe_migrate_update_pgtables must be NULL, this was incorrectly wired up so vm->eng[tile_id] was always being used. Fix this. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Only alloc userptr part of xe_vma for userptrs	Matthew Brost	2023-12-21	2	-27/+37
\| \| \| \| \| \| \| \| \|	Only alloc userptr part of xe_vma for userptrs, this will save on space in the common BO case. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Combine destroy_cb and destroy_work in xe_vma into union	Matthew Brost	2023-12-21	1	-5/+6
\| \| \| \| \| \| \| \| \|	The callback kicks the worker thus mutually exclusive execution, combining saves a bit of space in xe_vma. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Change tile masks from u64 to u8	Matthew Brost	2023-12-21	2	-20/+20
\| \| \| \| \| \| \| \| \| \|	This will save us a few bytes in the xe_vma structure. v2: Use hweight8 rather than hweight_long (Rodrigo) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Replace list_del_init with list_del for userptr.invalidate_link cleanup	Matthew Brost	2023-12-21	1	-1/+1
\| \| \| \| \| \| \| \|	This list isn't used again, list_del is the proper call. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Reduce the number list links in xe_vma	Matthew Brost	2023-12-21	5	-41/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	Combine the userptr, rebind, and destroy links into a union as the lists these links belong to are mutually exclusive. v2: Adjust which lists are combined (Thomas H) v3: Add kernel doc why this is safe (Thomas H), remove related change of list_del_init -> list_del (Rodrigo) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Avoid doing rebinds	Matthew Brost	2023-12-21	3	-10/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we dont change page sizes we can avoid doing rebinds rather just do a partial unbind. The algorithm to determine its page size is greedy as we assume all pages in the removed VMA are the largest page used in the VMA. v2: Don't exceed 100 lines v3: struct xe_vma_op_unmap remove in different patch, remove XXX comment Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Remove xe_vma_op_unmap	Matthew Brost	2023-12-21	2	-15/+0
\| \| \| \| \| \| \| \|	xe_vma_op_unmap isn't used, remove it. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Make bind engines safe	Matthew Brost	2023-12-21	5	-0/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently have a race between bind engines which can result in corrupted page tables leading to faults. A simple example: bind A 0x0000-0x1000, engine A, has unsatisfied in-fence bind B 0x1000-0x2000, engine B, no in-fences exec A uses 0x1000-0x2000 Bind B will pass bind A and exec A will fault. This occurs as bind A programs the root of the page table in a bind job which is held up by an in-fence. Bind B in this case just programs a leaf entry of the structure. To fix use range-fence utility to track cross bind engine conflicts. In the above example bind A would insert an dependency into the range-fence tree with a key of 0x0-0x7fffffffff, bind B would find that dependency and its bind job would scheduled behind the unsatisfied in-fence and bind A's job. Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe: Introduce a range-fence utility	Thomas Hellström	2023-12-21	3	-0/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add generic utility to track range conflicts signaled by a dma-fence. Tracking implemented via an interval tree. An example use case being tracking conflicts for pending (un)binds from multiple bind engines. By being generic ths idea would this could moved to the DRM level and used in multiple drivers for similar problems. v2: Make interval tree functions static (CI) v3: Remove non-static cleanup function (CI) Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
*	drm/xe/execlist: Log when using execlist submission	Francois Dugast	2023-12-21	1	-1/+4
\| \| \| \| \| \| \| \| \|	Make explicit in the log that execlist submission is used to prevent from silently using it over GuC submission. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>