summaryrefslogtreecommitdiffstats
path: root/drivers/iommu
Commit message (Collapse)AuthorAgeFilesLines
* iommu: Don't reserve 0-length IOVA regionAshish Mhetre2024-01-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit bb57f6705960bebeb832142ce9abf43220c3eab1 ] When the bootloader/firmware doesn't setup the framebuffers, their address and size are 0 in "iommu-addresses" property. If IOVA region is reserved with 0 length, then it ends up corrupting the IOVA rbtree with an entry which has pfn_hi < pfn_lo. If we intend to use display driver in kernel without framebuffer then it's causing the display IOMMU mappings to fail as entire valid IOVA space is reserved when address and length are passed as 0. An ideal solution would be firmware removing the "iommu-addresses" property and corresponding "memory-region" if display is not present. But the kernel should be able to handle this by checking for size of IOVA region and skipping the IOVA reservation if size is 0. Also, add a warning if firmware is requesting 0-length IOVA region reservation. Fixes: a5bf3cfce8cb ("iommu: Implement of_iommu_get_resv_regions()") Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231205065656.9544-1-amhetre@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu: Map reserved memory as cacheable if device is coherentLaurentiu Tudor2024-01-251-0/+3
| | | | | | | | | | | | | | | | | | [ Upstream commit f1aad9df93f39267e890836a28d22511f23474e1 ] Check if the device is marked as DMA coherent in the DT and if so, map its reserved memory as cacheable in the IOMMU. This fixes the recently added IOMMU reserved memory support which uses IOMMU_RESV_DIRECT without properly building the PROT for the mapping. Fixes: a5bf3cfce8cb ("iommu: Implement of_iommu_get_resv_regions()") Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20230926152600.8749-1-laurentiu.tudor@nxp.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu/dma: Trace bounce buffer usage when mapping buffersIsaac J. Manjarres2024-01-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | commit a63c357b9fd56ad5fe64616f5b22835252c6a76a upstream. When commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") was introduced, it did not add the logic for tracing the bounce buffer usage from iommu_dma_map_page(). All of the users of swiotlb_tbl_map_single() trace their bounce buffer usage, except iommu_dma_map_page(). This makes it difficult to track SWIOTLB usage from that function. Thus, trace bounce buffer usage from iommu_dma_map_page(). Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Cc: stable@vger.kernel.org # v5.15+ Cc: Tom Murphy <murphyt7@tcd.ie> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Link: https://lore.kernel.org/r/20231208234141.2356157-1-isaacmanjarres@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu/arm-smmu-qcom: Add missing GMU entry to match tableRob Clark2024-01-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | commit afc95681c3068956fed1241a1ff1612c066c75ac upstream. In some cases the firmware expects cbndx 1 to be assigned to the GMU, so we also want the default domain for the GMU to be an identy domain. This way it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and later cbndx 2. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Cc: stable@vger.kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231210180655.75542-1-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu/vt-d: Support enforce_cache_coherency only for empty domainsLu Baolu2024-01-102-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e645c20e8e9cde549bc233435d3c1338e1cd27fe ] The enforce_cache_coherency callback ensures DMA cache coherency for devices attached to the domain. Intel IOMMU supports enforced DMA cache coherency when the Snoop Control bit in the IOMMU's extended capability register is set. Supporting it differs between legacy and scalable modes. In legacy mode, it's supported page-level by setting the SNP field in second-stage page-table entries. In scalable mode, it's supported in PASID-table granularity by setting the PGSNP field in PASID-table entries. In legacy mode, mappings before attaching to a device have SNP fields cleared, while mappings after the callback have them set. This means partial DMAs are cache coherent while others are not. One possible fix is replaying mappings and flipping SNP bits when attaching a domain to a device. But this seems to be over-engineered, given that all real use cases just attach an empty domain to a device. To meet practical needs while reducing mode differences, only support enforce_cache_coherency on a domain without mappings if SNP field is used. Fixes: fc0051cb9590 ("iommu/vt-d: Check domain force_snooping against attached devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu: Fix printk arg in of_iommu_get_resv_regions()Daniel Mentz2023-12-081-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit c2183b3dcc9dd41b768569ea88bededa58cceebb ] The variable phys is defined as (struct resource *) which aligns with the printk format specifier %pr. Taking the address of it results in a value of type (struct resource **) which is incompatible with the format specifier %pr. Therefore, remove the address of operator (&). Fixes: a5bf3cfce8cb ("iommu: Implement of_iommu_get_resv_regions()") Signed-off-by: Daniel Mentz <danielmentz@google.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20231108062226.928985-1-danielmentz@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu/vt-d: Make context clearing consistent with context mappingLu Baolu2023-12-081-2/+2
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 9a16ab9d640274b20813d2d17475e18d3e99d834 ] In the iommu probe_device path, domain_context_mapping() allows setting up the context entry for a non-PCI device. However, in the iommu release_device path, domain_context_clear() only clears context entries for PCI devices. Make domain_context_clear() behave consistently with domain_context_mapping() by clearing context entries for both PCI and non-PCI devices. Fixes: 579305f75d34 ("iommu/vt-d: Update to use PCI DMA aliases") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu/vt-d: Disable PCI ATS in legacy passthrough modeLu Baolu2023-12-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit da37dddcf4caf015c400a930301d2ee27a7a15fb ] When IOMMU hardware operates in legacy mode, the TT field of the context entry determines the translation type, with three supported types (Section 9.3 Context Entry): - DMA translation without device TLB support - DMA translation with device TLB support - Passthrough mode with translated and translation requests blocked Device TLB support is absent when hardware is configured in passthrough mode. Disable the PCI ATS feature when IOMMU is configured for passthrough translation type in legacy (non-scalable) mode. Fixes: 0faa19a1515f ("iommu/vt-d: Decouple PASID & PRI enabling from SVA") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu/vt-d: Omit devTLB invalidation requests when TES=0Lu Baolu2023-12-081-0/+18
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0f5432a9b839847dcfe9fa369d72e3d646102ddf ] The latest VT-d spec indicates that when remapping hardware is disabled (TES=0 in Global Status Register), upstream ATS Invalidation Completion requests are treated as UR (Unsupported Request). Consequently, the spec recommends in section 4.3 Handling of Device-TLB Invalidations that software refrain from submitting any Device-TLB invalidation requests when address remapping hardware is disabled. Verify address remapping hardware is enabled prior to submitting Device- TLB invalidation requests. Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu/vt-d: Add MTL to quirk list to skip TE disablingAbdul Halim, Mohd Syazwan2023-12-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 85b80fdffa867d75dfb9084a839e7949e29064e8 upstream. The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before switching address translation on or off and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translation(), waiting for the completion of TE transition. Add MTL to the quirk list for those devices and skips TE disabling if the qurik hits. Fixes: b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") Cc: stable@vger.kernel.org Signed-off-by: Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116022324.30120-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu: Avoid more races around device probeRobin Murphy2023-12-082-13/+19
| | | | | | | | | | | | | | | | | | | | | | | commit a2e7e59a94269484a83386972ca07c22fd188854 upstream. It turns out there are more subtle races beyond just the main part of __iommu_probe_device() itself running in parallel - the dev_iommu_free() on the way out of an unsuccessful probe can still manage to trip up concurrent accesses to a device's fwspec. Thus, extend the scope of iommu_probe_device_lock() to also serialise fwspec creation and initial retrieval. Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/ Fixes: 01657bc14a39 ("iommu: Avoid races around device probe") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: André Draszik <andre.draszik@linaro.org> Tested-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/16f433658661d7cadfea51e7c65da95826112a2b.1700071477.git.robin.murphy@arm.com Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu/vt-d: Fix incorrect cache invalidation for mm notificationLu Baolu2023-12-081-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e7ad6c2a4b1aa710db94060b716f53c812cef565 upstream. Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space. However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly. Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly. Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231117090933.75267-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommufd: Fix missing update of domains_itree after splitting iopt_areaKoichiro Den2023-11-281-0/+10
| | | | | | | | | | | | | | | commit e7250ab7ca4998fe026f2149805b03e09dc32498 upstream. In iopt_area_split(), if the original iopt_area has filled a domain and is linked to domains_itree, pages_nodes have to be properly reinserted. Otherwise the domains_itree becomes corrupted and we will UAF. Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/20231027162941.2864615-2-den@valinux.co.jp Cc: stable@vger.kernel.org Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommufd: Add iopt_area_alloc()Jason Gunthorpe2023-11-202-3/+17
| | | | | | | | | | | | | [ Upstream commit 361d744ddd61de065fbeb042aaed590d32dd92ec ] We never initialize the two interval tree nodes, and zero fill is not the same as RB_CLEAR_NODE. This can hide issues where we missed adding the area to the trees. Factor out the allocation and clear the two nodes. Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/20231030145035.GG691768@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* iommu: Avoid unnecessary cache invalidationsLu Baolu2023-10-271-1/+2
| | | | | | | | | | | | | | The iommu_create_device_direct_mappings() only needs to flush the caches when the mappings are changed in the affected domain. This is not true for non-DMA domains, or for devices attached to the domain that have no reserved regions. To avoid unnecessary cache invalidations, add a check before iommu_flush_iotlb_all(). Fixes: a48ce36e2786 ("iommu: Prevent RESV_DIRECT devices from blocking domains") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Henry Willard <henry.willard@oracle.com> Link: https://lore.kernel.org/r/20231026084942.17387-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Avoid memory allocation in iommu_suspend()Zhang Rui2023-09-252-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu_suspend() syscore suspend callback is invoked with IRQ disabled. Allocating memory with the GFP_KERNEL flag may re-enable IRQs during the suspend callback, which can cause intermittent suspend/hibernation problems with the following kernel traces: Calling iommu_suspend+0x0/0x1d0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0 ... CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G U E 6.3-intel #r1 RIP: 0010:ktime_get+0x9b/0xb0 ... Call Trace: <IRQ> tick_sched_timer+0x22/0x90 ? __pfx_tick_sched_timer+0x10/0x10 __hrtimer_run_queues+0x111/0x2b0 hrtimer_interrupt+0xfa/0x230 __sysvec_apic_timer_interrupt+0x63/0x140 sysvec_apic_timer_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ... ------------[ cut here ]------------ Interrupts enabled after iommu_suspend+0x0/0x1d0 WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270 CPU: 0 PID: 27420 Comm: rtcwake Tainted: G U W E 6.3-intel #r1 RIP: 0010:syscore_suspend+0x147/0x270 ... Call Trace: <TASK> hibernation_snapshot+0x25b/0x670 hibernate+0xcd/0x390 state_store+0xcf/0xe0 kobj_attr_store+0x13/0x30 sysfs_kf_write+0x3f/0x50 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1fd/0x3c0 ksys_write+0x6f/0xf0 __x64_sys_write+0x1d/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Given that only 4 words memory is needed, avoid the memory allocation in iommu_suspend(). CC: stable@kernel.org Fixes: 33e07157105e ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com> Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/apple-dart: Handle DMA_FQ domains in attach_dev()Hector Martin2023-09-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and apple-dart's attach_dev() callback does not expect IOMMU_DOMAIN_DMA_FQ domains. Change the logic so the IOMMU_DOMAIN_DMA codepath is the default, instead of explicitly enumerating all types. Fixes an apple-dart regression in v6.5. Cc: regressions@lists.linux.dev Cc: stable@vger.kernel.org Suggested-by: Robin Murphy <robin.murphy@arm.com> Fixes: a4fdd9762272 ("iommu: Use flush queue capability") Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230922-iommu-type-regression-v2-1-689b2ba9b673@marcan.st Signed-off-by: Joerg Roedel <jroedel@suse.de>
* Merge tag 'arm-smmu-fixes' of ↵Joerg Roedel2023-09-252-7/+26
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into iommu/fixes Arm SMMU fixes for 6.6 -rc - Fix TLB range command encoding when TTL, Num and Scale are all zero - Fix soft lockup by limiting TLB invalidation ops issued by SVA - Fix clocks description for SDM630 platform in arm-smmu DT binding
| * iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_rangeNicolin Chen2023-09-221-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running an SVA case, the following soft lockup is triggered: -------------------------------------------------------------------- watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : ffff8000d83ef290 x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 -------------------------------------------------------------------- Note that since 6.6-rc1 the arm_smmu_mm_invalidate_range above is renamed to "arm_smmu_mm_arch_invalidate_secondary_tlbs", yet the problem remains. The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur to SMMU too, since arm_smmu_mm_arch_invalidate_secondary_tlbs() is called typically next to MMU tlb flush function, e.g. tlb_flush_mmu_tlbonly { tlb_flush { __flush_tlb_range { // check MAX_TLBI_OPS } } mmu_notifier_arch_invalidate_secondary_tlbs { arm_smmu_mm_arch_invalidate_secondary_tlbs { // does not check MAX_TLBI_OPS } } } Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an SVA case SMMU uses the CPU page table, so it makes sense to align with the tlbflush code. Then, replace per-page TLBI commands with a single per-asid TLBI command, if the request size hits this threshold. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230920052257.8615-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
| * iommu/arm-smmu-v3: Avoid constructing invalid range commandsRobin Murphy2023-09-181-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although io-pgtable's non-leaf invalidations are always for full tables, I missed that SVA also uses non-leaf invalidations, while being at the mercy of whatever range the MMU notifier throws at it. This means it definitely wants the previous TTL fix as well, since it also doesn't know exactly which leaf level(s) may need invalidating, but it can also give us less-aligned ranges wherein certain corners may lead to building an invalid command where TTL, Num and Scale are all 0. It should be fine to handle this by over-invalidating an extra page, since falling back to a non-range command opens up a whole can of errata-flavoured worms. Fixes: 6833b8f2e199 ("iommu/arm-smmu-v3: Set TTL invalidation hint better") Reported-by: Rui Zhu <zhurui3@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/b99cfe71af2bd93a8a2930f20967fb2a4f7748dd.1694432734.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* | iommu/mediatek: Fix share pgtable for iova over 4GBYong Wu2023-09-251-5/+4
|/ | | | | | | | | | | | | | | | | | | | In mt8192/mt8186, there is only one MM IOMMU that supports 16GB iova space, which is shared by display, vcodec and camera. These two SoC use one pgtable and have not the flag SHARE_PGTABLE, we should also keep share pgtable for this case. In mtk_iommu_domain_finalise, MM IOMMU always share pgtable, thus remove the flag SHARE_PGTABLE checking. Infra IOMMU always uses independent pgtable. Fixes: cf69ef46dbd9 ("iommu/mediatek: Fix two IOMMU share pagetable issue") Reported-by: Laura Nao <laura.nao@collabora.com> Closes: https://lore.kernel.org/linux-iommu/20230818154156.314742-1-laura.nao@collabora.com/ Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20230819081443.8333-1-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* Merge tag 'iommu-updates-v6.6' of ↵Linus Torvalds2023-09-0130-624/+866
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops - Lift RESV_DIRECT check from VT-d driver to core - Miscellaneous cleanups and fixes ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976 - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices .. and the usual set of cleanups an smaller fixes" * tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits) iommu: Explicitly include correct DT includes iommu/amd: Remove unused declarations iommu/arm-smmu-qcom: Add SM6375 SMMUv2 iommu/arm-smmu-qcom: Add SM6350 DPU compatible iommu/arm-smmu-qcom: Add SM6375 DPU compatible iommu/arm-smmu-qcom: Sort the compatible list alphabetically dt-bindings: arm-smmu: Fix MSM8998 clocks description iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope() iommu/vt-d: Fix to convert mm pfn to dma pfn iommu/vt-d: Fix to flush cache of PASID directory table iommu/vt-d: Remove rmrr check in domain attaching device path iommu: Prevent RESV_DIRECT devices from blocking domains dmaengine/idxd: Re-enable kernel workqueue under DMA API iommu/vt-d: Add set_dev_pasid callback for dma domain iommu/vt-d: Prepare for set_dev_pasid callback iommu/vt-d: Make prq draining code generic iommu/vt-d: Remove pasid_mutex iommu/vt-d: Add domain_flush_pasid_iotlb() iommu: Move global PASID allocation from SVA to core iommu: Generalize PASID 0 for normal DMA w/o PASID ...
| *---------------. Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', ↵Joerg Roedel2023-08-2130-624/+866
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next
| | | | | | | | | | * iommu: Explicitly include correct DT includesRob Herring2023-08-217-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230714174640.4058404-1-robh@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Remove kernel-doc warningsZhu Wang2023-08-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove kernel-doc warnings: drivers/iommu/iommu.c:3261: warning: Function parameter or member 'group' not described in 'iommu_group_release_dma_owner' drivers/iommu/iommu.c:3261: warning: Excess function parameter 'dev' description in 'iommu_group_release_dma_owner' drivers/iommu/iommu.c:3275: warning: Function parameter or member 'dev' not described in 'iommu_device_release_dma_owner' drivers/iommu/iommu.c:3275: warning: Excess function parameter 'group' description in 'iommu_device_release_dma_owner' Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Fixes: 89395ccedbc1 ("iommu: Add device-centric DMA ownership interfaces") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230731112758.214775-1-wangzhu9@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Optimise PCI SAC address trickRobin Murphy2023-07-143-6/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for PCI SAC address trick") and its subsequent revert, this mechanism no longer serves its original purpose, but now only works around broken hardware/drivers in a way that is unfortunately too impactful to remove. This does not, however, prevent us from solving the performance impact which that workaround has on large-scale systems that don't need it. Once the 32-bit IOVA space fills up and a workload starts allocating and freeing on both sides of the boundary, the opportunistic SAC allocation can then end up spending significant time hunting down scattered fragments of free 32-bit space, or just reestablishing max32_alloc_size. This can easily be exacerbated by a change in allocation pattern, such as by changing the network MTU, which can increase pressure on the 32-bit space by leaving a large quantity of cached IOVAs which are now the wrong size to be recycled, but also won't be freed since the non-opportunistic allocations can still be satisfied from the whole 64-bit space without triggering the reclaim path. However, in the context of a workaround where smaller DMA addresses aren't simply a preference but a necessity, if we get to that point at all then in fact it's already the endgame. The nature of the allocator is currently such that the first IOVA we give to a device after the 32-bit space runs out will be the highest possible address for that device, ever. If that works, then great, we know we can optimise for speed by always allocating from the full range. And if it doesn't, then the worst has already happened and any brokenness is now showing, so there's little point in continuing to try to hide it. To that end, implement a flag to refine the SAC business into a per-device policy that can automatically get itself out of the way if and when it stops being useful. CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Jakub Kicinski <kuba@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/b8502b115b915d2a3fabde367e099e39106686c8.1681392791.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Avoid locking/unlocking for iommu_probe_device()Jason Gunthorpe2023-07-141-43/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the race where a hotplug of a device into an existing group will have the device installed in the group->devices, but not yet attached to the group's current domain. Move the group attachment logic from iommu_probe_device() and put it under the same mutex that updates the group->devices list so everything is atomic under the lock. We retain the two step setup of the default domain for the bus_iommu_probe() case solely so that we have a more complete view of the group when creating the default domain for boot time devices. This is not generally necessary with the current code structure but seems to be supporting some odd corner cases like alias RID's and IOMMU_RESV_DIRECT or driver bugs returning different default_domain types for the same group. During bus_iommu_probe() the group will have a device list but both group->default_domain and group->domain will be NULL. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Split iommu_group_add_device()Jason Gunthorpe2023-07-141-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the list_add_tail() for the group_device into the critical region that immediately follows in __iommu_probe_device(). This avoids one case of unlocking and immediately re-locking the group->mutex. Consistently make the caller responsible for setting dev->iommu_group, prior patches moved this into iommu_init_device(), make the no-driver path do this in iommu_group_add_device(). This completes making __iommu_group_free_device() and iommu_group_alloc_device() into pair'd functions. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Always destroy the iommu_group during iommu_release_device()Jason Gunthorpe2023-07-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have release fully clean up the iommu related parts of the struct device, no matter what state they are in. Split the logic so that the three things owned by the iommu core are always cleaned up: - Any attached iommu_group - Any allocated dev->iommu and its contents including a fwsepc - Any attached driver via a struct group_device This fixes a minor bug where a fwspec created without an iommu_group being probed would not be freed. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Do not export iommu_device_link/unlink()Jason Gunthorpe2023-07-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are not used outside iommu.c, they should not be available to modular code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Move the iommu driver sysfs setup into iommu_init/deinit_device()Jason Gunthorpe2023-07-142-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It makes logical sense that once the driver is attached to the device the sysfs links appear, even if we haven't fully created the group_device or attached the device to a domain. Fix the missing error handling on sysfs creation since iommu_init_device() can trivially handle this. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Add iommu_init/deinit_device() paired functionsJason Gunthorpe2023-07-141-79/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Simplify the __iommu_group_remove_device() flowJason Gunthorpe2023-07-141-44/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of returning the struct group_device and then later freeing it, do the entire free under the group->mutex and defer only putting the iommu_group. It is safe to remove the sysfs_links and free memory while holding that mutex. Move the sanity assert of the group status into __iommu_group_free_device(). The next patch will improve upon this and consolidate the group put and the mutex into __iommu_group_remove_device(). __iommu_group_free_device() is close to being the paired undo of iommu_group_add_device(), following patches will improve on that. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Inline iommu_group_get_for_dev() into __iommu_probe_device()Jason Gunthorpe2023-07-141-41/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the only caller, and it doesn't need the generality of the function. We already know there is no iommu_group, so it is simply two function calls. Moving it here allows the following patches to split the logic in these functions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Use iommu_group_ref_get/put() for dev->iommu_groupJason Gunthorpe2023-07-141-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No reason to open code this, use the proper helper functions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Have __iommu_probe_device() check for already probed devicesJason Gunthorpe2023-07-143-18/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a step toward making __iommu_probe_device() self contained. It should, under proper locking, check if the device is already associated with an iommu driver and resolve parallel probes. All but one of the callers open code this test using two different means, but they all rely on dev->iommu_group. Currently the bus_iommu_probe()/probe_iommu_group() and probe_acpi_namespace_devices() rejects already probed devices with an unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use device_iommu_mapped() which is the same read without the pointless refcount. Move this test into __iommu_probe_device() and put it under the iommu_probe_device_lock. The store to dev->iommu_group is in iommu_group_add_device() which is also called under this lock for iommu driver devices, making it properly locked. The only path that didn't have this check is the hotplug path triggered by BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned outside the probe path is via iommu_group_add_device(). Today the only caller is VFIO no-iommu which never associates with an iommu driver. Thus adding this additional check is safe. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Remove unused declarationsYue Haibing2023-08-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") removed the implementations but left declarations in place. Remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230814135502.4808-1-yuehaibing@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Rearrange DTE bit definationsVasant Hegde2023-08-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rearrage according to 64bit word they are in. Note that I have not rearranged gcr3 related macros even though they belong to different 64bit word as its easy to read it in current format. No functional changes intended. Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230619131908.5887-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Enable PPR/GA interrupt after interrupt handler setupVasant Hegde2023-07-141-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code enables PPR and GA interrupts before setting up the interrupt handler (in state_next()). Make sure interrupt handler is in place before enabling these interrupt. amd_iommu_enable_interrupts() gets called in normal boot, kdump as well as in suspend/resume path. Hence moving interrupt enablement to this function works fine. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Consolidate PPR log enablementVasant Hegde2023-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move PPR log interrupt bit setting to iommu_enable_ppr_log(). Also rearrange iommu_enable_ppr_log() such that PPREn bit is enabled before enabling PPRLog and PPRInt bits. So that when PPRLog bit is set it will clear the PPRLogOverflow bit and sets the PPRLogRun bit in the IOMMU Status Register [MMIO Offset 2020h]. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Disable PPR log/interrupt in iommu_disable()Vasant Hegde2023-07-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to other logs, disable PPR log/interrupt in iommu_disable() path. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Enable separate interrupt for PPR and GA logVasant Hegde2023-07-142-12/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). These logs can be configured to generate different interrupts when an entry is inserted into a log buffer. However, current implementation share single interrupt to handle all three logs. With increasing usages of the GA (for IOMMU AVIC) and PPR logs (for IOMMUv2 APIs and SVA), interrupt sharing could potentially become performance bottleneck. Hence, separate IOMMU interrupt into use three separate vectors and irq threads with corresponding name, which will be displayed in the /proc/interrupts as "AMD-Vi<x>-[Evt/PPR/GA]", where "x" is an IOMMU id. Note that this patch changes interrupt handling only in IOMMU x2apic mode (MMIO 0x18[IntCapXTEn]=1). In legacy mode it will continue to use single MSI interrupt. Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Alexey Kardashevskiy<aik@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Refactor IOMMU interrupt handling logic for Event, PPR, and GA logsVasant Hegde2023-07-142-43/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). The IOMMU driver processes these log entries when it receive an IOMMU interrupt. Then, it needs to clear the corresponding interrupt status bits. Also, when an overflow occurs, it needs to handle the log overflow by clearing the specific overflow status bit and restart the log. Since, logic for handling these logs is the same, refactor the code into a helper function called amd_iommu_handle_irq(), which handles the steps described. Then, reuse it for all types of log. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Handle PPR log overflowVasant Hegde2023-07-144-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some ATS-capable peripherals can issue requests to the processor to service peripheral page requests using PCIe PRI (the Page Request Interface). IOMMU supports PRI using PPR log buffer. IOMMU writes PRI request to PPR log buffer and sends PPR interrupt to host. When there is no space in the PPR log buffer (PPR log overflow) it will set PprOverflow bit in 'MMIO Offset 2020h IOMMU Status Register'. When this happens PPR log needs to be restarted as specified in IOMMU spec [1] section 2.6.2. When handling the event it just resumes the PPR log without resizing (similar to the way event and GA log overflow is handled). Failing to handle PPR overflow means device may not work properly as IOMMU stops processing new PPR events from device. [1] https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Generalize log overflow handlingVasant Hegde2023-07-143-22/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each IOMMU has three log buffers (Event, GA and PPR log). Once a buffer becomes full, IOMMU generates an interrupt with the corresponding overflow status bit, and stop processing the log. To handle an overflow, the IOMMU driver needs to disable the log, clear the overflow status bit, and re-enable the log. This procedure is same among all types of log buffer except it uses different overflow status bit and enabling bit. Hence, to consolidate the log buffer restarting logic, introduce a helper function amd_iommu_restart_log(), which caller can specify parameters specific for each type of log buffer. Also rename MMIO_STATUS_EVT_OVERFLOW_INT_MASK as MMIO_STATUS_EVT_OVERFLOW_MASK. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd/iommu_v2: Clear pasid state in free pathVasant Hegde2023-07-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear pasid state in device amd_iommu_free_device() path. It will make sure no new ppr notifier is registered in free path. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd/iommu_v2: Fix pasid_state refcount dec hit 0 warning on pasid unbindDaniel Marcovitch2023-07-141-2/+2
| | | | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unbinding pasid - a race condition exists vs outstanding page faults. To prevent this, the pasid_state object contains a refcount. * set to 1 on pasid bind * incremented on each ppr notification start * decremented on each ppr notification done * decremented on pasid unbind Since refcount_dec assumes that refcount will never reach 0: the current implementation causes the following to be invoked on pasid unbind: REFCOUNT_WARN("decrement hit 0; leaking memory") Fix this issue by changing refcount_dec to refcount_dec_and_test to explicitly handle refcount=1. Fixes: 8bc54824da4e ("iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count") Signed-off-by: Daniel Marcovitch <dmarcovitch@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Fix to convert mm pfn to dma pfnYanfei Xu2023-08-091-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the case that VT-d page is smaller than mm page, converting dma pfn should be handled in two cases which are for start pfn and for end pfn. Currently the calculation of end dma pfn is incorrect and the result is less than real page frame number which is causing the mapping of iova always misses some page frames. Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper for converting end dma pfn named mm_to_dma_pfn_end(). Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Fix to flush cache of PASID directory tableYanfei Xu2023-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even the PCI devices don't support pasid capability, PASID table is mandatory for a PCI device in scalable mode. However flushing cache of pasid directory table for these devices are not taken after pasid table is allocated as the "size" of table is zero. Fix it by calculating the size by page order. Found this when reading the code, no real problem encountered for now. Fixes: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230616081045.721873-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Remove rmrr check in domain attaching device pathLu Baolu2023-08-091-58/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core code now prevents devices with RMRR regions from being assigned to user space. There is no need to check for this condition in individual drivers. Remove it to avoid duplicate code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230724060352.113458-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>