summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
Commit message (Collapse)AuthorAgeFilesLines
* treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner2025-04-051-1/+1
| | | | | | | | | | timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* drm/amdgpu/mes: clean up SDMA HQD loopAlex Deucher2025-03-211-5/+3
| | | | | | | | Follow the same logic as the other IP types. Reviewed-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: enable compute pipes across all MECAlex Deucher2025-03-211-2/+1
| | | | | | | | | Enable pipes on both MECs for MES. Fixes: 745f46b6a99f ("drm/amdgpu: enable mes v12 self test") Acked-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: drop MES 10.x leftoversAlex Deucher2025-03-211-4/+1
| | | | | | | | | Leftover from MES bring up. There is no production MES support for MES 10.x. The rest of the MES 10.x code has already been removed so drop this. Acked-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: optimize compute loop handlingAlex Deucher2025-03-211-1/+1
| | | | | | | | Break when we get to the end of the supported pipes rather than continuing the loop. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: use GFP_NOWAIT for memory allocationsChristian König2025-03-211-2/+2
| | | | | | | | | | | | | In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: keep enforce isolation up to dateAlex Deucher2025-02-251-1/+19
| | | | | | | | | | | | Re-send the mes message on resume to make sure the mes state is up to date. Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amd: Add the capability to mark certain firmware as "required"Mario Limonciello2024-12-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Some of the firmware that is loaded by amdgpu is not actually required. For example the ISP firmware on some SoCs is optional, and if it's not present the ISP IP block just won't be initialized. The firmware loader core however will show a warning when this happens like this: ``` Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2 ``` To avoid confusion for non-required firmware, adjust the amd-ucode helper to take an extra argument indicating if the firmware is required or optional. On optional firmware use firmware_request_nowarn() instead of request_firmware() to avoid the warnings. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amd/amdgpu: limit single process inside MESShaoyun Liu2024-11-121-0/+23
| | | | | | | | This is for MES to limit only one process for the user queues Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amd/amdgpu: Increase MES log buffer to dump mes scratch datashaoyunl2024-11-111-1/+1
| | | | | | | | | | MES internal scratch data is useful for mes debug, it can only located in VRAM, change the allocation type and increase size for mes 11 Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Feifei Xu <Feifei.Xu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: fix return random value when multiple threads read registers via ↵chongli22024-11-081-17/+13
| | | | | | | | | | | | | | | | | | mes. The currect code use the address "adev->mes.read_val_ptr" to store the value read from register via mes. So when multiple threads read register, multiple threads have to share the one address, and overwrite the value each other. Assign an address by "amdgpu_device_wb_get" to store register value. each thread will has an address to store register value. Signed-off-by: chongli2 <chongli2@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: fetch fw version from firmware headerAlex Deucher2024-11-051-0/+5
| | | | | | | | | | | We need this prior to the firmware being loaded so fetch from the header. v2: fetch directly from the firmware v3: store both fw versions Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ringSrinivasan Shanmugam2024-10-151-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data *ctx_data, 1146 struct amdgpu_ring **out) 1147 { 1148 struct amdgpu_ring *ring; 1149 struct amdgpu_mes_gang *gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 /* 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 */ 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 *out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes11: update mes_reset_queue function to support sdma queueJiadong Zhu2024-09-261-1/+1
| | | | | | | | | | Reset sdma queue through mmio based on me_id and queue_id. v2: simplify callflows and register calculation. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: fix queue reset issue by mmioJesse Zhang2024-09-061-0/+1
| | | | | | | | Initialize the queue type before resetting the queue using mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmioJiadong Zhu2024-09-021-0/+20
| | | | | | | | | | | The reset_queue api could be used from kfd or kgd. v2: add use_mmio parameter for mes_reset_legacy_queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: modify mes api for mmio queue resetJiadong Zhu2024-09-021-1/+2
| | | | | | | | | | | Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11Mukul Joshi2024-08-201-34/+37
| | | | | | | | | | | Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: add API for user queue resetAlex Deucher2024-08-161-0/+43
| | | | | | | Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes12: configure two pipes hardware resourcesJack Xiao2024-08-131-30/+47
| | | | | | | | | Configure two pipes with different hardware resources. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1Jack Xiao2024-08-131-1/+1
| | | | | | | | | Enable unified mes firmware to load on pipe0 and pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: add multiple mes ring instances supportJack Xiao2024-08-131-1/+3
| | | | | | | | | | Add multiple mes ring instances in mes structure to support multiple mes pipes. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: add API for legacy queue resetAlex Deucher2024-08-131-0/+24
| | | | | | | Add API for resetting kernel queues. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amd: Use a constant format string for amdgpu_ucode_requestArnd Bergmann2024-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple files in amdgpu call amdgpu_ucode_request() with a fw_name variable that the compiler cannot check for being a valid format string, as seen by enabling the (default-disabled) -Wformat-security option: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function 'amdgpu_mes_init_microcode': drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1517:61: error: format not a string literal and no format arguments [-Werror=format-security] 1517 | r = amdgpu_ucode_request(adev, &adev->mes.fw[pipe], fw_name); | ^~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c: In function 'amdgpu_uvd_sw_init': drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:9: error: format not a string literal and no format arguments [-Werror=format-security] 263 | r = amdgpu_ucode_request(adev, &adev->uvd.fw, fw_name); | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c: In function 'amdgpu_vce_sw_init': drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:161:9: error: format not a string literal and no format arguments [-Werror=format-security] 161 | r = amdgpu_ucode_request(adev, &adev->vce.fw, fw_name); | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c: In function 'amdgpu_umsch_mm_init_microcode': drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c:590:9: error: format not a string literal and no format arguments [-Werror=format-security] 590 | r = amdgpu_ucode_request(adev, &adev->umsch_mm.fw, fw_name); | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c: In function 'amdgpu_cgs_get_firmware_info': drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c:417:72: error: format not a string literal and no format arguments [-Werror=format-security] 417 | err = amdgpu_ucode_request(adev, &adev->pm.fw, fw_name); | ^~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'load_dmcu_fw': drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2221:9: error: format not a string literal and no format arguments [-Werror=format-security] 2221 | r = amdgpu_ucode_request(adev, &adev->dm.fw_dmcu, fw_name_dmcu); | ^ drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'dm_init_microcode': drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5147:9: error: format not a string literal and no format arguments [-Werror=format-security] 5147 | r = amdgpu_ucode_request(adev, &adev->dm.dmub_fw, fw_name_dmub); | ^ Change these all to use a "%s" format with the actual name as an argument, to let the compiler prove this to be correct. Fixes: e5a7d047f41b ("drm/amd: Use `amdgpu_ucode_*` helpers for CGS") Fixes: 52215e2a5d4a ("drm/amd: Use `amdgpu_ucode_*` helpers for VCE") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: increase mes log buffer size for gfx12Michael Chen2024-07-271-3/+3
| | | | | | | | | MES firmware requires larger log buffer for gfx12. Allocate proper buffer respectively for gfx11 and gfx12. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: remove amdgpu_mes_fence_wait_polling()Alex Deucher2024-06-191-12/+0
| | | | | | | No longer used so remove it. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* Revert "drm/amdgpu: Add missing locking for MES API calls"Mukul Joshi2024-06-191-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3612702852acbded39233b1600c8d9f47e40139f. This is causing a BUG message during suspend. [ 61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 [ 61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14 [ 61.603553] preempt_count: 1, expected: 0 [ 61.603555] RCU nest depth: 0, expected: 0 [ 61.603557] Preemption disabled at: [ 61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G W 6.8.0+ #7 [ 61.603795] Workqueue: events_unbound async_run_entry_fn [ 61.603801] Call Trace: [ 61.603803] <TASK> [ 61.603806] dump_stack_lvl+0x37/0x50 [ 61.603811] ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.604007] dump_stack+0x10/0x20 [ 61.604010] __might_resched+0x16f/0x1d0 [ 61.604016] __might_sleep+0x43/0x70 [ 61.604020] mutex_lock+0x1f/0x60 [ 61.604024] amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu] [ 61.604226] gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu] [ 61.604422] ? srso_alias_return_thunk+0x5/0xfbef5 [ 61.604429] amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu] [ 61.604621] gfx_v11_0_hw_fini+0xda/0x100 [amdgpu] [ 61.604814] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 61.605008] amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu] [ 61.605175] amdgpu_device_suspend+0xec/0x180 [amdgpu] Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: refine mes firmware loadingYang Wang2024-06-141-4/+2
| | | | | | | | | | | | v1: refine mes firmware loading v2: use dev_info instead of DRM_INFO Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Add missing locking for MES API callsMukul Joshi2024-06-141-0/+12
| | | | | | | | | Add missing locking at a few places when calling MES APIs to ensure exclusive access to MES queue. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: add uni_mes fw loading supportJack Xiao2024-05-021-1/+4
| | | | | | | | Add the unified mes firmware loading support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: enable mes v12 self testJack Xiao2024-05-021-1/+1
| | | | | | | | | 1. fix available compute queue to use 2. enable mes v12 self test Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: init mes ucode name for gfx v12Likun Gao2024-05-021-1/+2
| | | | | | | | | Keep gfx v12 mes fw name to gc_12_x_x_mes.bin and gc_12_x_x_mes1.bin. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: add mes mapping legacy queue supportJack Xiao2024-04-261-0/+22
| | | | | | | | | Add mes mapping legacy queue framework support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes11: Use a separate fence per transactionAlex Deucher2024-04-261-0/+12
| | | | | | | | We can't use a shared fence location because each transaction should be considered independently. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/mes: fix use-after-free issueJack Xiao2024-04-261-0/+1
| | | | | | | | | | | | Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu : Increase the mes log buffer size as per new MES FW versionshaoyunl2024-04-091-3/+2
| | | | | | | | | From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu : Add mes_log_enable to control mes log featureshaoyunl2024-04-091-1/+4
| | | | | | | | | The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in ↵Srinivasan Shanmugam2024-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | amdgpu_mes_init_microcode The snprintf function is used to write a formatted string into fw_name. The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced by the string in ucode_prefix and the second %s is replaced by either "_2" or "1" depending on the condition pipe == AMDGPU_MES_SCHED_PIPE. The length of the string "amdgpu/%s_mes%s.bin" is 16 characters plus the length of ucode_prefix and the length of the string "_2" or "1". The size of ucode_prefix is 30, so the maximum length of ucode_prefix is 29 characters (since one character is needed for the null terminator). Therefore, the maximum possible length of the string written into fw_name is 16 + 29 + 2 = 47 characters. The size of fw_name is 40, so if the length of the string written into fw_name is more than 39 characters (since one character is needed for the null terminator), it will be truncated by the snprintf function, and thus warnings will be seen. By increasing the size of fw_name to 50, we ensure that fw_name is large enough to hold the maximum possible length of the string, so the snprintf function will not truncate the output. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function ‘amdgpu_mes_init_microcode’: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:66: warning: ‘%s’ directive output may be truncated writing up to 1 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 1482 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", | ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:17: note: ‘snprintf’ output between 16 and 46 bytes into a destination of size 40 1482 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1483 | ucode_prefix, | ~~~~~~~~~~~~~ 1484 | pipe == AMDGPU_MES_SCHED_PIPE ? "" : "1"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 1 byte into a region of size between 0 and 29 [-Wformat-truncation=] 1477 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", | ^~ 1478 | ucode_prefix, 1479 | pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); | ~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 40 1477 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1478 | ucode_prefix, | ~~~~~~~~~~~~~ 1479 | pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 2 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 1477 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", | ^~ 1478 | ucode_prefix, 1479 | pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); | ~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 18 and 47 bytes into a destination of size 40 1477 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1478 | ucode_prefix, | ~~~~~~~~~~~~~ 1479 | pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:62: warning: ‘_mes.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 1489 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin", | ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 1489 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1490 | ucode_prefix); | ~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: cleanup unused variableShashank Sharma2024-03-201-7/+3
| | | | | | | | | | | This patch removes an unused input variable in the MES doorbell function. Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Only create mes event log debugfs when mes is enabledshaoyunl2024-02-071-3/+3
| | | | | | | | | | Skip the debugfs file creation for mes event log if the GPU doesn't use MES. This to prevent potential kernel oops when user try to read the event log in debugfs on a GPU without MES Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Enable seq64 manager and fix bugsArunpravin Paneer Selvam2024-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | - Enable the seq64 mapping sequence. - Fix wflinfo va conflict and other bugs. v1: - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE otherwise the areas will conflict with user space allocations (Alex) - It needs to be mapped read only in the user VM (Alex) v2: - Instead of just one define for TOP/BOTTOM reserved space separate them into two (Christian) - Fix the CPU and VA calculations and while at it also cleanup error handling and kerneldoc (Christian) Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
* Merge tag 'drm-msm-next-2023-12-15' of ↵Dave Airlie2023-12-201-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/drm/msm into drm-next Updates for v6.8: Core: - Add support for SDM670, SM8650 - Handle the CFG interconnect to fix the obscure hangs / timeouts on register write - Kconfig fix for QMP dependency - DT schema fixes DPU: - Add support for SDM670, SM8650 - Enable SmartDMA on SM8350 and SM8450 - Correct UBWC settings for SC8280XP - Fix catalog settings for SC8180X - Actually make use of the version to switch between QSEED3/3LITE/4 scalers - Use devres-managed and drm-managed allocations where appropriate - misc other fixes - Enabled YUV writeback on SC7280, SM8250 - Enabled writeback on SM8350, SM8450 - CRC fix when encoder is selected as the input source - other misc fixes MDP4: - Use devres-managed and drm-managed allocations where appropriate - flush vblank event on CRTC disable MDP5: - Use devres-managed and drm-managed allocations where appropriate DP: - Add support for SM8650 - Enable PM runtime support - Merge msm-specific debugfs dir with the generic one - Described DisplayPort on SM8150 in DeviceTree bindings - Moved dp_display_get_next_bridge() to probe() DSI: - Add support for SM8650 - Enable PM runtime support GPU/GEM: - demote userspace triggerable warnings to debug - add GEM object metadata UAPI - move GPU devcoredumps to GPU device - fix hangcheck to skip retired submits - expose UBWC config to userspace - fix a680 chip-id - drm_exec conversion - drm/ci: remove rebase-merge directory (to unblock CI) [airlied: fix drm_exec/amd interaction] Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9auYqmo-7NSd9FsbNBCDf7aBevd=4xkcF3A5G_OGvMQ@mail.gmail.com
| * drm/exec: Pass in initial # of objectsRob Clark2023-12-101-2/+2
| | | | | | | | | | | | | | | | | | In cases where the # is known ahead of time, it is silly to do the table resize dance. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Christian König <christian.koenig@amd.com> Patchwork: https://patchwork.freedesktop.org/patch/568338/
* | drm/amdkfd: fix mes set shader debugger process managementJonathan Kim2023-12-131-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MES provides the driver a call to explicitly flush stale process memory within the MES to avoid a race condition that results in a fatal memory violation. When SET_SHADER_DEBUGGER is called, the driver passes a memory address that represents a process context address MES uses to keep track of future per-process calls. Normally, MES will purge its process context list when the last queue has been removed. The driver, however, can call SET_SHADER_DEBUGGER regardless of whether a queue has been added or not. If SET_SHADER_DEBUGGER has been called with no queues as the last call prior to process termination, the passed process context address will still reside within MES. On a new process call to SET_SHADER_DEBUGGER, the driver may end up passing an identical process context address value (based on per-process gpu memory address) to MES but is now pointing to a new allocated buffer object during KFD process creation. Since the MES is unaware of this, access of the passed address points to the stale object within MES and triggers a fatal memory violation. The solution is for KFD to explicitly flush the process context address from MES on process termination. Note that the flush call and the MES debugger calls use the same MES interface but are separated as KFD calls to avoid conflicting with each other. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Alice Wong <shiwei.wong@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* | drm/amdgpu: SW part of MES event log enablementshaoyunl2023-12-071-0/+61
|/ | | | | | | | This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: fix GRBM read timeout when do mes_self_testTim Huang2023-11-031-0/+16
| | | | | | | | | | | | Use a proper MEID to make sure the CP_HQD_* and CP_GFX_HQD_* registers can be touched when initialize the compute and gfx mqd in mes_self_test. Otherwise, we expect no response from CP and an GRBM eventual timeout. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
* drm/amdgpu: Use function for IP version checkLijo Lazar2023-09-201-4/+7
| | | | | | | | | Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: cleanup MES process level doorbellsShashank Sharma2023-08-071-53/+1
| | | | | | | | | | | | | | | | | MES allocates process level doorbells, but there is no userspace client to consume it. It was only being used for the MES ring tests (in kernel), and was written by kernel doorbell write. The previous patch of this series has changed the MES ring test code to use kernel level MES doorbells. This patch now cleans up the process level doorbell allocation code which is not required. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: use doorbell mgr for MES kernel doorbellsShashank Sharma2023-08-071-57/+37
| | | | | | | | | | | | | | | | | | | | | | | | This patch: - Removes the existing doorbell management code, and its variables from the doorbell_init function, it will be done in doorbell manager now. - uses the doorbell page created for MES kernel level needs (doorbells for MES self tests) - current MES code was allocating MES doorbells in MES process context, but those were getting written using kernel doorbell calls. This patch instead allocates a MES kernel doorbell for this (in add_hw_queue). V2: Create an extra page of doorbells for MES during kernel doorbell creation (Alex) V4: Move MES doorbell size and page offset objects in this patch from patch 6. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* Merge tag 'amd-drm-next-6.6-2023-07-28' of ↵Daniel Vetter2023-08-041-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.6-2023-07-28: amdgpu: - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SubVP fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - Add PSP 14.0 support - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes radeon: - Lots of checkpatch cleanups Merge conflicts: - drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c The switch to drm eu helpers in 8a206685d36f ("drm/amdgpu: use drm_exec for GEM and CSA handling v2") clashed with the cosmetic cleanups from 30953c4d000b ("drm/amdgpu: Fix style issues in amdgpu_gem.c"). I kept the former since the cleanup up code is gone. - drivers/gpu/drm/amd/amdgpu/atom.c. adf64e214280 ("drm/amd: Avoid reading the VBIOS part number twice") removed code that 992b8fe106ab ("drm/radeon: Replace all non-returning strlcpy with strscpy") polished. From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230728214228.8102-1-alexander.deucher@amd.com [sima: some merge conflict wrangling as noted] Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>