summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
Commit message (Collapse)AuthorAgeFilesLines
* drm/amdkfd: drop struct kfd_cu_infoAlex Deucher2023-10-031-14/+14
| | | | | | | | | | | | | | | | I think this was an abstraction back from when kfd supported both radeon and amdgpu. Since we just support amdgpu now, there is no more need for this and we can use the amdgpu structures directly. This also avoids having the kfd_cu_info structures on the stack when inlining which can blow up the stack. Cc: Arnd Bergmann <arnd@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Merge debug module parametersAndré Almeida2023-09-111-1/+1
| | | | | | | | | | | | Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3Mukul Joshi2023-09-111-1/+2
| | | | | | | | | | | Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add KFD support for GC 11.5.0Lang Yu2023-08-301-0/+1
| | | | | | | Enable KFD for GC 11.5.0. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: drop IOMMUv2 supportAlex Deucher2023-08-111-73/+0
| | | | | | | | | | | | Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: ignore crat by defaultAlex Deucher2023-08-111-4/+0
| | | | | | | | | | We are dropping the IOMMUv2 path, so no need to enable this. It's often buggy on consumer platforms anyway. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Remove DUMMY_VRAM_SIZEMukul Joshi2023-06-151-5/+0
| | | | | | | | | Remove DUMMY_VRAM_SIZE as it is not needed and can result in reporting incorrect memory size. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amd: Check that a system is a NUMA system before looking for SRATMario Limonciello2023-06-091-1/+2
| | | | | | | | | | | It's pointless on laptops to look for the SRAT table as these are not NUMA. Check the number of possible nodes is > 1 to decide whether to look for SRAT. Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Move local_mem_info to kfd_nodeMukul Joshi2023-06-091-1/+1
| | | | | | | | | | | | We need to track memory usage on a per partition basis. To do that, store the local memory information in KFD node instead of kfd device. v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3Rajneesh Bhardwaj2023-06-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | [For 1P NPS1 mode driver bringup] Changes required to initialize the amdgpu driver with frontdoor firmware loading and discovery=2 with the native mode SBIOS that enables CPU GPU unified interleaved memory. sudo modprobe amdgpu discovery=2 Once PSP TMR region is reported via the ACPI interface, the dependency on the ip_discovery.bin will be removed. Choice of where to allocate driver table is given to each IP version. In general, both GTT and VRAM domains will be considered. If one of the tables has a strict restriction for VRAM domain, then only VRAM domain is considered. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> (lijo: Modified the handling for SMU Tables) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Report XGMI IOLINKs for GFXIP9.4.3Rajneesh Bhardwaj2023-06-091-1/+4
| | | | | | | | | | | GFXIP 9.4.3 could be in APU or carveout mode but we cannot use the xgmi.connected_to_cpu flag to identify the iolinks type. Use appropriate APU or Carveout mode based condition to report xgmi connection in kfd topology. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add gpu compute cores io links for gfx9.4.3Jonathan Kim2023-06-091-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | The PSP TA will only provide xGMI topology info for links between GPU sockets so links between partitions from different sockets will be hardcoded as 3 xGMI hops with 1 hops weighted as xGMI and 2 hops weighted with a new intra-socket weight to indicate the longest possible distance. If the link between a partition and the CPU is non-PCIe, then assume the CPU (CCDs) is located within the same socket as the partition and represent the link as an intra-socket weighted single hop XGMI link with memory bandwidth. Links between partitions within a single socket will be abstracted as single hop xGMI links weighted with the new intra-socket weight and will have memory bandwidth. Finally, use the unused function bits in the location ID to represent the coordinates of the compute partition within its socket. A follow on patch will resolve the requirement for GPU socket xGMI link representation sometime later. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Introduce kfd_node struct (v5)Mukul Joshi2023-06-091-14/+14
| | | | | | | | | | | | | | | | | | | | | | | Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Add GC 9.4.3 KFD supportHawking Zhang2023-03-311-0/+1
| | | | | | | | | | Add initial KFD support Convert a few structures to IP version checking (Hawking) Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add GC 11.0.4 KFD supportYifan Zhang2022-11-291-0/+1
| | | | | | | | Add initial support for GC 11.0.4 in KFD compute driver. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Fix the memory overrunMa Jun2022-11-091-1/+1
| | | | | | | | | | | Fix the memory overrun issue caused by wrong array size. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reported-by: coverity-bot <keescook+coverity-bot@chromium.org> Addresses-Coverity-ID: 1527133 ("Memory - corruptions") Fixes: c0cc999f3c32e6 ("drm/amdkfd: Fix the warning of array-index-out-of-bounds") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Fix the warning of array-index-out-of-boundsMa Jun2022-11-041-278/+34
| | | | | | | | | | | | For some GPUs with more CUs, the original sibling_map[32] in struct crat_subtype_cache is not enough to save the cache information when create the VCRAT table, so skip filling the struct crat_subtype_cache info instead fill struct kfd_cache_properties directly to fix this problem. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Remove unused variableMa Jun2022-11-011-1/+0
| | | | | | | | | kfd_topology_device->cache_count is not used by other fucntions, so remove it. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Cleanup kfd_dev structMukul Joshi2022-10-271-6/+6
| | | | | | | | | | | Cleanup kfd_dev struct by removing ddev and pdev as both drm_device and pci_dev can be fetched from amdgpu_device. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: introduce dummy cache info for property asicPrike Liang2022-10-241-1/+52
| | | | | | | | This dummy cache info will enable kfd base function support. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: correct the cache info for gfx1036Jesse Zhang2022-10-241-1/+52
| | | | | | | | | | correct the cache information for gfx1036 Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: update gfx1037 Lx cache settingPrike Liang2022-10-241-1/+52
| | | | | | | | Update the gfx1037 L1/L2 cache setting. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* amd/amdkfd: fix repeated words in commentswangjianli2022-09-131-1/+1
| | | | | | | | | Delete the redundant word 'to'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Added GFX 11.0.3 SupportDavid Belanger2022-08-301-0/+1
| | | | | | | | Added missing cases for GFX 11.0.3 code in a few switch statements. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Remove field io_link_count from struct kfd_topology_deviceRamesh Errabolu2022-06-101-2/+0
| | | | | | | | The field is redundant and does not serve any functional role Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitionsMario Limonciello2022-06-031-0/+2
| | | | | | | | | | | | | | Loading amdgpu on GC 10.3.7 shows an ERR level message: `kfd kfd: amdgpu: GC IP 0a0307 not supported in kfd` Add these targets to match yellow carp structures. Reported-by: David Chang <david.chang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Jesse(Jie) Zhang <Jesse.Zhang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.18.x
* drm/amdkfd: add GC 11.0.1 KFD supportHuang Rui2022-05-061-0/+1
| | | | | | | | | Add initial support for GC 11.0.1 in KFD compute driver. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add asic support for GC 11.0.2Eric Huang2022-05-051-0/+1
| | | | | | | | Changes are inherited from GC 11.0.0. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Add KFD support for soc21 v3Mukul Joshi2022-05-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | Add initial support for soc21 in KFD compute driver (Mukul) - Add new definition for soc21 device. - Add new file for amdgpu-kfd interface for GFX11 family. - Add new file for queue management, interrupt handling, mqd management for GFX11 family in KFD driver. - Related changes/updates for soc21 device in KFD driver. - Repurpose last 2 entries of SDMA MQD for driver use. v2: Add an optional argument into update queue operation (Mukul) v3: Switch to ip version check, replace kgd_dev with amdgpu_device (Hawking) Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add helper to generate cache info from gfx configAlex Deucher2022-05-041-0/+72
| | | | | | | | | Rather than using hardcoded tables, we can use the gfx and gmc config pulled from the IP discovery table to generate the cache configuration. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Fix circular lock dependency warningMukul Joshi2022-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ 168.544078] ====================================================== [ 168.550309] WARNING: possible circular locking dependency detected [ 168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G E [ 168.562558] ------------------------------------------------------ [ 168.568764] kfdtest/3479 is trying to acquire lock: [ 168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at: kfd_topology_device_by_id+0x16/0x60 [amdgpu] [ 168.583663] but task is already holding lock: [ 168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0xa9/0x180 [ 168.597755] which lock already depends on the new lock. [ 168.605970] the existing dependency chain (in reverse order) is: [ 168.613487] -> #3 (&mm->mmap_lock#2){++++}-{3:3}: [ 168.619700] lock_acquire+0xca/0x2e0 [ 168.623814] down_read+0x3e/0x140 [ 168.627676] do_user_addr_fault+0x40d/0x690 [ 168.632399] exc_page_fault+0x6f/0x270 [ 168.636692] asm_exc_page_fault+0x1e/0x30 [ 168.641249] filldir64+0xc8/0x1e0 [ 168.645115] call_filldir+0x7c/0x110 [ 168.649238] ext4_readdir+0x58e/0x940 [ 168.653442] iterate_dir+0x16a/0x1b0 [ 168.657558] __x64_sys_getdents64+0x83/0x140 [ 168.662375] do_syscall_64+0x35/0x80 [ 168.666492] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 168.672095] -> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}: [ 168.679008] lock_acquire+0xca/0x2e0 [ 168.683122] down_read+0x3e/0x140 [ 168.686982] path_openat+0x5b2/0xa50 [ 168.691095] do_file_open_root+0xfc/0x190 [ 168.695652] file_open_root+0xd8/0x1b0 [ 168.702010] kernel_read_file_from_path_initns+0xc4/0x140 [ 168.709542] _request_firmware+0x2e9/0x5e0 [ 168.715741] request_firmware+0x32/0x50 [ 168.721667] amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu] [ 168.730060] smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu] [ 168.738414] fiji_start_smu+0xcf/0x4e0 [amdgpu] [ 168.745539] pp_dpm_load_fw+0x21/0x30 [amdgpu] [ 168.752503] amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu] [ 168.760698] amdgpu_device_fw_loading+0xb8/0x140 [amdgpu] [ 168.768412] amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu] [ 168.776285] amdgpu_driver_load_kms+0x15/0x120 [amdgpu] [ 168.784034] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu] [ 168.791161] local_pci_probe+0x40/0x80 [ 168.797027] work_for_cpu_fn+0x10/0x20 [ 168.802839] process_one_work+0x273/0x5b0 [ 168.808903] worker_thread+0x20f/0x3d0 [ 168.814700] kthread+0x176/0x1a0 [ 168.819968] ret_from_fork+0x1f/0x30 [ 168.825563] -> #1 (&adev->pm.mutex){+.+.}-{3:3}: [ 168.834721] lock_acquire+0xca/0x2e0 [ 168.840364] __mutex_lock+0xa2/0x930 [ 168.846020] amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu] [ 168.853257] amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu] [ 168.861547] kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu] [ 168.869478] kfd_create_crat_image_virtual+0x447/0x510 [amdgpu] [ 168.877884] kfd_topology_add_device+0x5c8/0x6f0 [amdgpu] [ 168.885556] kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu] [ 168.893347] amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu] [ 168.901177] amdgpu_device_init.cold+0x141b/0x1716 [amdgpu] [ 168.909025] amdgpu_driver_load_kms+0x15/0x120 [amdgpu] [ 168.916458] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu] [ 168.923442] local_pci_probe+0x40/0x80 [ 168.929249] work_for_cpu_fn+0x10/0x20 [ 168.935008] process_one_work+0x273/0x5b0 [ 168.940944] worker_thread+0x20f/0x3d0 [ 168.946623] kthread+0x176/0x1a0 [ 168.951765] ret_from_fork+0x1f/0x30 [ 168.957277] -> #0 (&topology_lock){++++}-{3:3}: [ 168.965993] check_prev_add+0x8f/0xbf0 [ 168.971613] __lock_acquire+0x1299/0x1ca0 [ 168.977485] lock_acquire+0xca/0x2e0 [ 168.982877] down_read+0x3e/0x140 [ 168.987975] kfd_topology_device_by_id+0x16/0x60 [amdgpu] [ 168.995583] kfd_device_by_id+0xa/0x20 [amdgpu] [ 169.002180] kfd_mmap+0x95/0x200 [amdgpu] [ 169.008293] mmap_region+0x337/0x5a0 [ 169.013679] do_mmap+0x3aa/0x540 [ 169.018678] vm_mmap_pgoff+0xdc/0x180 [ 169.024095] ksys_mmap_pgoff+0x186/0x1f0 [ 169.029734] do_syscall_64+0x35/0x80 [ 169.035005] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 169.041754] other info that might help us debug this: [ 169.053276] Chain exists of: &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2 [ 169.068389] Possible unsafe locking scenario: [ 169.076661] CPU0 CPU1 [ 169.082383] ---- ---- [ 169.088087] lock(&mm->mmap_lock#2); [ 169.092922] lock(&type->i_mutex_dir_key#6); [ 169.100975] lock(&mm->mmap_lock#2); [ 169.108320] lock(&topology_lock); [ 169.112957] *** DEADLOCK *** This commit fixes the deadlock warning by ensuring pm.mutex is not held while holding the topology lock. For this, kfd_local_mem_info is moved into the KFD dev struct and filled during device init. This cached value can then be used instead of querying the value again and again. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Cleanup IO links during KFD device removalMukul Joshi2022-04-131-2/+2
| | | | | | | | | | | | | | | | | | | | | Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu <shuotaoxu@microsoft.com> Reviewed-by: Shuotao Xu <shuotaoxu@microsoft.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: make CRAT table missing message informational onlyAlex Deucher2022-02-221-1/+1
| | | | | | | | | | | The driver has a fallback so make the message informational rather than a warning. The driver has a fallback if the Component Resource Association Table (CRAT) is missing, so make this informational now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906 Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Fix leftover errors and warningsRajneesh Bhardwaj2022-02-141-1/+1
| | | | | | | | | | A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warnings remain which may be false positives so ignore them for now. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: update SPDX license headerRajneesh Bhardwaj2022-02-141-1/+2
| | | | | | | | Update the SPDX License header for all the KFD files. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: add support for GC 10.1.4Lang Yu2022-02-111-0/+1
| | | | | | | | | | Add basic support for GC 10.1.4, it uses same IP blocks with GC 10.1.3 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Check for null pointer after calling kmemdupJiasheng Jiang2022-01-111-0/+3
| | | | | | | | | | | | | As the possible failure of the allocation, kmemdup() may return NULL pointer. Therefore, it should be better to check the 'props2' in order to prevent the dereference of NULL pointer. Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: replace asic_family with asic_typeGraham Sider2021-11-171-1/+1
| | | | | | | | | | asic_family was a duplicate of asic_type, both of type amd_asic_type. Replace all instances of device_info->asic_family with adev->asic_type and remove asic_family from device_info. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: convert misc checks to IP version checkingGraham Sider2021-11-171-1/+1
| | | | | | | | | | Switch to IP version checking instead of asic_type on various KFD version checks. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: convert switches to IP version checkingGraham Sider2021-11-171-60/+64
| | | | | | | | | | Converts KFD switch statements to use IP version checking instead of asic_type. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: replace/remove remaining kgd_dev referencesGraham Sider2021-11-171-4/+2
| | | | | | | | | Remove get_amdgpu_device and other remaining kgd_dev references aside from declaration/kfd struct entry and initialization. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: replace kgd_dev in get amdgpu_amdkfd funcsGraham Sider2021-11-171-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Modified definitions: - amdgpu_amdkfd_get_fw_version - amdgpu_amdkfd_get_local_mem_info - amdgpu_amdkfd_get_gpu_clock_counter - amdgpu_amdkfd_get_max_engine_clock_in_mhz - amdgpu_amdkfd_get_cu_info - amdgpu_amdkfd_get_dmabuf_info - amdgpu_amdkfd_get_vram_usage - amdgpu_amdkfd_get_hive_id - amdgpu_amdkfd_get_unique_id - amdgpu_amdkfd_get_mmio_remap_phys_addr - amdgpu_amdkfd_get_num_gws - amdgpu_amdkfd_get_asic_rev_id - amdgpu_amdkfd_get_noretry - amdgpu_amdkfd_get_xgmi_hops_count - amdgpu_amdkfd_get_xgmi_bandwidth_mbytes - amdgpu_amdkfd_get_pcie_bandwidth_mbytes Also replaces kfd_device_by_kgd with kfd_device_by_adev, now searching via adev rather than kgd. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: enable cyan_skillfish KFDTao Zhou2021-07-231-0/+1
| | | | | | | | | Add KFD support for cyan_skillfish. v2: whitespace fixes (Alex) Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: report pcie bandwidth to the kfdJonathan Kim2021-07-231-0/+4
| | | | | | | | | Similar to xGMI reporting the min/max bandwidth between direct peers, PCIe will report the min/max bandwidth to the KFD. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: report xgmi bandwidth between direct peers to the kfdJonathan Kim2021-07-231-0/+12
| | | | | | | | | | Report the min/max bandwidth in megabytes to the kfd for direct xgmi connections only. Indirect peers will report 0 since indirect route is unknown. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add yellow carp KFD supportAaron Liu2021-06-041-0/+52
| | | | | | | | | This patch is to add GFX10 based Yellow Carp KFD support. We will bypass IOMMU v2. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: support beige_goby KFDChengming Gui2021-05-191-0/+61
| | | | | | | | | | | | Add KFD support for beige_goby v2: fix asic name typo v3: squash in updates (Alex) v4: squash in needs_atomics fix (Alex) Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: add ACPI SRAT parsing for topologyEric Huang2021-05-101-0/+91
| | | | | | | | | | In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: Update L1 and add L2/3 cache informationMike Li2021-05-101-50/+699
| | | | | | | | | | The L1 cache information has been updated and the L2/L3 information has been added. The changes have been made for Vega10 and newer ASICs. There are no changes for the older ASICs before Vega10. Signed-off-by: Mike Li <Tianxinmike.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdkfd: report the numa weight between host and device over xgmiJonathan Kim2021-05-101-0/+1
| | | | | | | | | | GPUs connected to CPUs over xGMI are bidirectional so set weight by a single hop both ways. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>