summaryrefslogtreecommitdiffstats
path: root/drivers/iommu
Commit message (Collapse)AuthorAgeFilesLines
* iommu/vt-d: Avoid memory allocation in iommu_suspend()Zhang Rui2023-09-252-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu_suspend() syscore suspend callback is invoked with IRQ disabled. Allocating memory with the GFP_KERNEL flag may re-enable IRQs during the suspend callback, which can cause intermittent suspend/hibernation problems with the following kernel traces: Calling iommu_suspend+0x0/0x1d0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0 ... CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G U E 6.3-intel #r1 RIP: 0010:ktime_get+0x9b/0xb0 ... Call Trace: <IRQ> tick_sched_timer+0x22/0x90 ? __pfx_tick_sched_timer+0x10/0x10 __hrtimer_run_queues+0x111/0x2b0 hrtimer_interrupt+0xfa/0x230 __sysvec_apic_timer_interrupt+0x63/0x140 sysvec_apic_timer_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ... ------------[ cut here ]------------ Interrupts enabled after iommu_suspend+0x0/0x1d0 WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270 CPU: 0 PID: 27420 Comm: rtcwake Tainted: G U W E 6.3-intel #r1 RIP: 0010:syscore_suspend+0x147/0x270 ... Call Trace: <TASK> hibernation_snapshot+0x25b/0x670 hibernate+0xcd/0x390 state_store+0xcf/0xe0 kobj_attr_store+0x13/0x30 sysfs_kf_write+0x3f/0x50 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1fd/0x3c0 ksys_write+0x6f/0xf0 __x64_sys_write+0x1d/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Given that only 4 words memory is needed, avoid the memory allocation in iommu_suspend(). CC: stable@kernel.org Fixes: 33e07157105e ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com> Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/apple-dart: Handle DMA_FQ domains in attach_dev()Hector Martin2023-09-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and apple-dart's attach_dev() callback does not expect IOMMU_DOMAIN_DMA_FQ domains. Change the logic so the IOMMU_DOMAIN_DMA codepath is the default, instead of explicitly enumerating all types. Fixes an apple-dart regression in v6.5. Cc: regressions@lists.linux.dev Cc: stable@vger.kernel.org Suggested-by: Robin Murphy <robin.murphy@arm.com> Fixes: a4fdd9762272 ("iommu: Use flush queue capability") Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230922-iommu-type-regression-v2-1-689b2ba9b673@marcan.st Signed-off-by: Joerg Roedel <jroedel@suse.de>
* Merge tag 'arm-smmu-fixes' of ↵Joerg Roedel2023-09-252-7/+26
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into iommu/fixes Arm SMMU fixes for 6.6 -rc - Fix TLB range command encoding when TTL, Num and Scale are all zero - Fix soft lockup by limiting TLB invalidation ops issued by SVA - Fix clocks description for SDM630 platform in arm-smmu DT binding
| * iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_rangeNicolin Chen2023-09-221-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running an SVA case, the following soft lockup is triggered: -------------------------------------------------------------------- watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : ffff8000d83ef290 x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 -------------------------------------------------------------------- Note that since 6.6-rc1 the arm_smmu_mm_invalidate_range above is renamed to "arm_smmu_mm_arch_invalidate_secondary_tlbs", yet the problem remains. The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur to SMMU too, since arm_smmu_mm_arch_invalidate_secondary_tlbs() is called typically next to MMU tlb flush function, e.g. tlb_flush_mmu_tlbonly { tlb_flush { __flush_tlb_range { // check MAX_TLBI_OPS } } mmu_notifier_arch_invalidate_secondary_tlbs { arm_smmu_mm_arch_invalidate_secondary_tlbs { // does not check MAX_TLBI_OPS } } } Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an SVA case SMMU uses the CPU page table, so it makes sense to align with the tlbflush code. Then, replace per-page TLBI commands with a single per-asid TLBI command, if the request size hits this threshold. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230920052257.8615-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
| * iommu/arm-smmu-v3: Avoid constructing invalid range commandsRobin Murphy2023-09-181-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although io-pgtable's non-leaf invalidations are always for full tables, I missed that SVA also uses non-leaf invalidations, while being at the mercy of whatever range the MMU notifier throws at it. This means it definitely wants the previous TTL fix as well, since it also doesn't know exactly which leaf level(s) may need invalidating, but it can also give us less-aligned ranges wherein certain corners may lead to building an invalid command where TTL, Num and Scale are all 0. It should be fine to handle this by over-invalidating an extra page, since falling back to a non-range command opens up a whole can of errata-flavoured worms. Fixes: 6833b8f2e199 ("iommu/arm-smmu-v3: Set TTL invalidation hint better") Reported-by: Rui Zhu <zhurui3@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/b99cfe71af2bd93a8a2930f20967fb2a4f7748dd.1694432734.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* | iommu/mediatek: Fix share pgtable for iova over 4GBYong Wu2023-09-251-5/+4
|/ | | | | | | | | | | | | | | | | | | | In mt8192/mt8186, there is only one MM IOMMU that supports 16GB iova space, which is shared by display, vcodec and camera. These two SoC use one pgtable and have not the flag SHARE_PGTABLE, we should also keep share pgtable for this case. In mtk_iommu_domain_finalise, MM IOMMU always share pgtable, thus remove the flag SHARE_PGTABLE checking. Infra IOMMU always uses independent pgtable. Fixes: cf69ef46dbd9 ("iommu/mediatek: Fix two IOMMU share pagetable issue") Reported-by: Laura Nao <laura.nao@collabora.com> Closes: https://lore.kernel.org/linux-iommu/20230818154156.314742-1-laura.nao@collabora.com/ Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20230819081443.8333-1-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* Merge tag 'iommu-updates-v6.6' of ↵Linus Torvalds2023-09-0130-624/+866
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops - Lift RESV_DIRECT check from VT-d driver to core - Miscellaneous cleanups and fixes ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976 - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices .. and the usual set of cleanups an smaller fixes" * tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits) iommu: Explicitly include correct DT includes iommu/amd: Remove unused declarations iommu/arm-smmu-qcom: Add SM6375 SMMUv2 iommu/arm-smmu-qcom: Add SM6350 DPU compatible iommu/arm-smmu-qcom: Add SM6375 DPU compatible iommu/arm-smmu-qcom: Sort the compatible list alphabetically dt-bindings: arm-smmu: Fix MSM8998 clocks description iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope() iommu/vt-d: Fix to convert mm pfn to dma pfn iommu/vt-d: Fix to flush cache of PASID directory table iommu/vt-d: Remove rmrr check in domain attaching device path iommu: Prevent RESV_DIRECT devices from blocking domains dmaengine/idxd: Re-enable kernel workqueue under DMA API iommu/vt-d: Add set_dev_pasid callback for dma domain iommu/vt-d: Prepare for set_dev_pasid callback iommu/vt-d: Make prq draining code generic iommu/vt-d: Remove pasid_mutex iommu/vt-d: Add domain_flush_pasid_iotlb() iommu: Move global PASID allocation from SVA to core iommu: Generalize PASID 0 for normal DMA w/o PASID ...
| *---------------. Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', ↵Joerg Roedel2023-08-2130-624/+866
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next
| | | | | | | | | | * iommu: Explicitly include correct DT includesRob Herring2023-08-217-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230714174640.4058404-1-robh@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Remove kernel-doc warningsZhu Wang2023-08-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove kernel-doc warnings: drivers/iommu/iommu.c:3261: warning: Function parameter or member 'group' not described in 'iommu_group_release_dma_owner' drivers/iommu/iommu.c:3261: warning: Excess function parameter 'dev' description in 'iommu_group_release_dma_owner' drivers/iommu/iommu.c:3275: warning: Function parameter or member 'dev' not described in 'iommu_device_release_dma_owner' drivers/iommu/iommu.c:3275: warning: Excess function parameter 'group' description in 'iommu_device_release_dma_owner' Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Fixes: 89395ccedbc1 ("iommu: Add device-centric DMA ownership interfaces") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230731112758.214775-1-wangzhu9@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Optimise PCI SAC address trickRobin Murphy2023-07-143-6/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for PCI SAC address trick") and its subsequent revert, this mechanism no longer serves its original purpose, but now only works around broken hardware/drivers in a way that is unfortunately too impactful to remove. This does not, however, prevent us from solving the performance impact which that workaround has on large-scale systems that don't need it. Once the 32-bit IOVA space fills up and a workload starts allocating and freeing on both sides of the boundary, the opportunistic SAC allocation can then end up spending significant time hunting down scattered fragments of free 32-bit space, or just reestablishing max32_alloc_size. This can easily be exacerbated by a change in allocation pattern, such as by changing the network MTU, which can increase pressure on the 32-bit space by leaving a large quantity of cached IOVAs which are now the wrong size to be recycled, but also won't be freed since the non-opportunistic allocations can still be satisfied from the whole 64-bit space without triggering the reclaim path. However, in the context of a workaround where smaller DMA addresses aren't simply a preference but a necessity, if we get to that point at all then in fact it's already the endgame. The nature of the allocator is currently such that the first IOVA we give to a device after the 32-bit space runs out will be the highest possible address for that device, ever. If that works, then great, we know we can optimise for speed by always allocating from the full range. And if it doesn't, then the worst has already happened and any brokenness is now showing, so there's little point in continuing to try to hide it. To that end, implement a flag to refine the SAC business into a per-device policy that can automatically get itself out of the way if and when it stops being useful. CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Jakub Kicinski <kuba@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/b8502b115b915d2a3fabde367e099e39106686c8.1681392791.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Avoid locking/unlocking for iommu_probe_device()Jason Gunthorpe2023-07-141-43/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the race where a hotplug of a device into an existing group will have the device installed in the group->devices, but not yet attached to the group's current domain. Move the group attachment logic from iommu_probe_device() and put it under the same mutex that updates the group->devices list so everything is atomic under the lock. We retain the two step setup of the default domain for the bus_iommu_probe() case solely so that we have a more complete view of the group when creating the default domain for boot time devices. This is not generally necessary with the current code structure but seems to be supporting some odd corner cases like alias RID's and IOMMU_RESV_DIRECT or driver bugs returning different default_domain types for the same group. During bus_iommu_probe() the group will have a device list but both group->default_domain and group->domain will be NULL. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Split iommu_group_add_device()Jason Gunthorpe2023-07-141-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the list_add_tail() for the group_device into the critical region that immediately follows in __iommu_probe_device(). This avoids one case of unlocking and immediately re-locking the group->mutex. Consistently make the caller responsible for setting dev->iommu_group, prior patches moved this into iommu_init_device(), make the no-driver path do this in iommu_group_add_device(). This completes making __iommu_group_free_device() and iommu_group_alloc_device() into pair'd functions. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Always destroy the iommu_group during iommu_release_device()Jason Gunthorpe2023-07-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have release fully clean up the iommu related parts of the struct device, no matter what state they are in. Split the logic so that the three things owned by the iommu core are always cleaned up: - Any attached iommu_group - Any allocated dev->iommu and its contents including a fwsepc - Any attached driver via a struct group_device This fixes a minor bug where a fwspec created without an iommu_group being probed would not be freed. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Do not export iommu_device_link/unlink()Jason Gunthorpe2023-07-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are not used outside iommu.c, they should not be available to modular code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Move the iommu driver sysfs setup into iommu_init/deinit_device()Jason Gunthorpe2023-07-142-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It makes logical sense that once the driver is attached to the device the sysfs links appear, even if we haven't fully created the group_device or attached the device to a domain. Fix the missing error handling on sysfs creation since iommu_init_device() can trivially handle this. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Add iommu_init/deinit_device() paired functionsJason Gunthorpe2023-07-141-79/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the driver init and destruction code into two logically paired functions. There is a subtle ordering dependency in how the group's domains are freed, the current code does the kobject_put() on the group which will hopefully trigger the free of the domains before the module_put() that protects the domain->ops. Reorganize this to be explicit and documented. The domains are cleaned up by iommu_deinit_device() if it is the last device to be deinit'd from the group. This must be done in a specific order - after ops->release_device() and before the module_put(). Make it very clear and obvious by putting the order directly in one function. Leave WARN_ON's in case the refcounting gets messed up somehow. This also moves the module_put() and dev_iommu_free() under the group->mutex to keep the code simple. Building paired functions like this helps ensure that error cleanup flows in __iommu_probe_device() are correct because they share the same code that handles the normal flow. These details become relavent as following patches add more error unwind into __iommu_probe_device(), and ultimately a following series adds fine-grained locking to __iommu_probe_device(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Simplify the __iommu_group_remove_device() flowJason Gunthorpe2023-07-141-44/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of returning the struct group_device and then later freeing it, do the entire free under the group->mutex and defer only putting the iommu_group. It is safe to remove the sysfs_links and free memory while holding that mutex. Move the sanity assert of the group status into __iommu_group_free_device(). The next patch will improve upon this and consolidate the group put and the mutex into __iommu_group_remove_device(). __iommu_group_free_device() is close to being the paired undo of iommu_group_add_device(), following patches will improve on that. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Inline iommu_group_get_for_dev() into __iommu_probe_device()Jason Gunthorpe2023-07-141-41/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the only caller, and it doesn't need the generality of the function. We already know there is no iommu_group, so it is simply two function calls. Moving it here allows the following patches to split the logic in these functions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Use iommu_group_ref_get/put() for dev->iommu_groupJason Gunthorpe2023-07-141-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No reason to open code this, use the proper helper functions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | | * iommu: Have __iommu_probe_device() check for already probed devicesJason Gunthorpe2023-07-143-18/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a step toward making __iommu_probe_device() self contained. It should, under proper locking, check if the device is already associated with an iommu driver and resolve parallel probes. All but one of the callers open code this test using two different means, but they all rely on dev->iommu_group. Currently the bus_iommu_probe()/probe_iommu_group() and probe_acpi_namespace_devices() rejects already probed devices with an unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use device_iommu_mapped() which is the same read without the pointless refcount. Move this test into __iommu_probe_device() and put it under the iommu_probe_device_lock. The store to dev->iommu_group is in iommu_group_add_device() which is also called under this lock for iommu driver devices, making it properly locked. The only path that didn't have this check is the hotplug path triggered by BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned outside the probe path is via iommu_group_add_device(). Today the only caller is VFIO no-iommu which never associates with an iommu driver. Thus adding this additional check is safe. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Remove unused declarationsYue Haibing2023-08-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") removed the implementations but left declarations in place. Remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230814135502.4808-1-yuehaibing@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Rearrange DTE bit definationsVasant Hegde2023-08-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rearrage according to 64bit word they are in. Note that I have not rearranged gcr3 related macros even though they belong to different 64bit word as its easy to read it in current format. No functional changes intended. Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230619131908.5887-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Enable PPR/GA interrupt after interrupt handler setupVasant Hegde2023-07-141-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code enables PPR and GA interrupts before setting up the interrupt handler (in state_next()). Make sure interrupt handler is in place before enabling these interrupt. amd_iommu_enable_interrupts() gets called in normal boot, kdump as well as in suspend/resume path. Hence moving interrupt enablement to this function works fine. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Consolidate PPR log enablementVasant Hegde2023-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move PPR log interrupt bit setting to iommu_enable_ppr_log(). Also rearrange iommu_enable_ppr_log() such that PPREn bit is enabled before enabling PPRLog and PPRInt bits. So that when PPRLog bit is set it will clear the PPRLogOverflow bit and sets the PPRLogRun bit in the IOMMU Status Register [MMIO Offset 2020h]. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Disable PPR log/interrupt in iommu_disable()Vasant Hegde2023-07-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to other logs, disable PPR log/interrupt in iommu_disable() path. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Enable separate interrupt for PPR and GA logVasant Hegde2023-07-142-12/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). These logs can be configured to generate different interrupts when an entry is inserted into a log buffer. However, current implementation share single interrupt to handle all three logs. With increasing usages of the GA (for IOMMU AVIC) and PPR logs (for IOMMUv2 APIs and SVA), interrupt sharing could potentially become performance bottleneck. Hence, separate IOMMU interrupt into use three separate vectors and irq threads with corresponding name, which will be displayed in the /proc/interrupts as "AMD-Vi<x>-[Evt/PPR/GA]", where "x" is an IOMMU id. Note that this patch changes interrupt handling only in IOMMU x2apic mode (MMIO 0x18[IntCapXTEn]=1). In legacy mode it will continue to use single MSI interrupt. Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Alexey Kardashevskiy<aik@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Refactor IOMMU interrupt handling logic for Event, PPR, and GA logsVasant Hegde2023-07-142-43/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). The IOMMU driver processes these log entries when it receive an IOMMU interrupt. Then, it needs to clear the corresponding interrupt status bits. Also, when an overflow occurs, it needs to handle the log overflow by clearing the specific overflow status bit and restart the log. Since, logic for handling these logs is the same, refactor the code into a helper function called amd_iommu_handle_irq(), which handles the steps described. Then, reuse it for all types of log. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Handle PPR log overflowVasant Hegde2023-07-144-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some ATS-capable peripherals can issue requests to the processor to service peripheral page requests using PCIe PRI (the Page Request Interface). IOMMU supports PRI using PPR log buffer. IOMMU writes PRI request to PPR log buffer and sends PPR interrupt to host. When there is no space in the PPR log buffer (PPR log overflow) it will set PprOverflow bit in 'MMIO Offset 2020h IOMMU Status Register'. When this happens PPR log needs to be restarted as specified in IOMMU spec [1] section 2.6.2. When handling the event it just resumes the PPR log without resizing (similar to the way event and GA log overflow is handled). Failing to handle PPR overflow means device may not work properly as IOMMU stops processing new PPR events from device. [1] https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd: Generalize log overflow handlingVasant Hegde2023-07-143-22/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each IOMMU has three log buffers (Event, GA and PPR log). Once a buffer becomes full, IOMMU generates an interrupt with the corresponding overflow status bit, and stop processing the log. To handle an overflow, the IOMMU driver needs to disable the log, clear the overflow status bit, and re-enable the log. This procedure is same among all types of log buffer except it uses different overflow status bit and enabling bit. Hence, to consolidate the log buffer restarting logic, introduce a helper function amd_iommu_restart_log(), which caller can specify parameters specific for each type of log buffer. Also rename MMIO_STATUS_EVT_OVERFLOW_INT_MASK as MMIO_STATUS_EVT_OVERFLOW_MASK. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd/iommu_v2: Clear pasid state in free pathVasant Hegde2023-07-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear pasid state in device amd_iommu_free_device() path. It will make sure no new ppr notifier is registered in free path. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | | * | iommu/amd/iommu_v2: Fix pasid_state refcount dec hit 0 warning on pasid unbindDaniel Marcovitch2023-07-141-2/+2
| | | | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unbinding pasid - a race condition exists vs outstanding page faults. To prevent this, the pasid_state object contains a refcount. * set to 1 on pasid bind * incremented on each ppr notification start * decremented on each ppr notification done * decremented on pasid unbind Since refcount_dec assumes that refcount will never reach 0: the current implementation causes the following to be invoked on pasid unbind: REFCOUNT_WARN("decrement hit 0; leaking memory") Fix this issue by changing refcount_dec to refcount_dec_and_test to explicitly handle refcount=1. Fixes: 8bc54824da4e ("iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count") Signed-off-by: Daniel Marcovitch <dmarcovitch@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Fix to convert mm pfn to dma pfnYanfei Xu2023-08-091-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the case that VT-d page is smaller than mm page, converting dma pfn should be handled in two cases which are for start pfn and for end pfn. Currently the calculation of end dma pfn is incorrect and the result is less than real page frame number which is causing the mapping of iova always misses some page frames. Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper for converting end dma pfn named mm_to_dma_pfn_end(). Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Fix to flush cache of PASID directory tableYanfei Xu2023-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even the PCI devices don't support pasid capability, PASID table is mandatory for a PCI device in scalable mode. However flushing cache of pasid directory table for these devices are not taken after pasid table is allocated as the "size" of table is zero. Fix it by calculating the size by page order. Found this when reading the code, no real problem encountered for now. Fixes: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230616081045.721873-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Remove rmrr check in domain attaching device pathLu Baolu2023-08-091-58/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core code now prevents devices with RMRR regions from being assigned to user space. There is no need to check for this condition in individual drivers. Remove it to avoid duplicate code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230724060352.113458-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu: Prevent RESV_DIRECT devices from blocking domainsLu Baolu2023-08-091-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Add set_dev_pasid callback for dma domainLu Baolu2023-08-092-5/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the upper layers to set a domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware. The typical use cases are, for example, kernel DMA with PASID and hardware assisted mediated device drivers. The attaching device and pasid information is tracked in a per-domain list and is used for IOTLB and devTLB invalidation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-8-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Prepare for set_dev_pasid callbackLu Baolu2023-08-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The domain_flush_pasid_iotlb() helper function is used to flush the IOTLB entries for a given PASID. Previously, this function assumed that RID2PASID was only used for the first-level DMA translation. However, with the introduction of the set_dev_pasid callback, this assumption is no longer valid. Add a check before using the RID2PASID for PASID invalidation. This check ensures that the domain has been attached to a physical device before using RID2PASID. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-7-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Make prq draining code genericLu Baolu2023-08-093-26/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently draining page requests and responses for a pasid is part of SVA implementation. This is because the driver only supports attaching an SVA domain to a device pasid. As we are about to support attaching other types of domains to a device pasid, the prq draining code becomes generic. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-6-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Remove pasid_mutexLu Baolu2023-08-091-40/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pasid_mutex was used to protect the paths of set/remove_dev_pasid(). It's duplicate with iommu_sva_lock. Remove it to avoid duplicate code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-5-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu/vt-d: Add domain_flush_pasid_iotlb()Lu Baolu2023-08-091-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VT-d spec requires to use PASID-based-IOTLB invalidation descriptor to invalidate IOTLB and the paging-structure caches for a first-stage page table. Add a generic helper to do this. RID2PASID is used if the domain has been attached to a physical device, otherwise real PASIDs that the domain has been attached to will be used. The 'real' PASID attachment is handled in the subsequent change. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-4-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu: Move global PASID allocation from SVA to coreJacob Pan2023-08-092-19/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel ENQCMD requires a single PASID to be shared between multiple devices, as the PASID is stored in a single MSR register per-process and userspace can use only that one PASID. This means that the PASID allocation for any ENQCMD using device driver must always come from a shared global pool, regardless of what kind of domain the PASID will be used with. Split the code for the global PASID allocator into iommu_alloc/free_global_pasid() so that drivers can attach non-SVA domains to PASIDs as well. This patch moves global PASID allocation APIs from SVA to IOMMU APIs. Reserved PASIDs, currently only RID_PASID, are excluded from the global PASID allocation. It is expected that device drivers will use the allocated PASIDs to attach to appropriate IOMMU domains for use. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-3-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | | * | iommu: Generalize PASID 0 for normal DMA w/o PASIDJacob Pan2023-08-095-24/+22
| | |_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PCIe Process address space ID (PASID) is used to tag DMA traffic, it provides finer grained isolation than requester ID (RID). For each device/RID, 0 is a special PASID for the normal DMA (no PASID). This is universal across all architectures that supports PASID, therefore warranted to be reserved globally and declared in the common header. Consequently, we can avoid the conflict between different PASID use cases in the generic code. e.g. SVA and DMA API with PASIDs. This paved away for device drivers to choose global PASID policy while continue doing normal DMA. Noting that VT-d could support none-zero RID/NO_PASID, but currently not used. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | | * | iommu/sprd: Add missing force_apertureJason Gunthorpe2023-08-071-0/+1
| | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | force_aperture was intended to false only by GART drivers that have an identity translation outside the aperture. This does not describe sprd, so add the missing 'force_aperture = true'. Fixes: b23e4fc4e3fa ("iommu: add Unisoc IOMMU basic driver") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Chunyan Zhang <zhang.lyra@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | | | * | iommu/arm-smmu-qcom: Add SM6375 SMMUv2Konrad Dybcio2023-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SM6375 uses a qcom,smmu-v2-style SMMU just for Adreno and friends. Add a compatible for it. Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230810-topic-lost_smmu_compats-v1-4-64a0d8749404@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
| | | | | | * | iommu/arm-smmu-qcom: Add SM6350 DPU compatibleKonrad Dybcio2023-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the SM6350 DPU compatible to clients compatible list, as it also needs the workarounds. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230810-topic-lost_smmu_compats-v1-3-64a0d8749404@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
| | | | | | * | iommu/arm-smmu-qcom: Add SM6375 DPU compatibleKonrad Dybcio2023-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the SM6375 DPU compatible to clients compatible list, as it also needs the workarounds. Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230810-topic-lost_smmu_compats-v1-2-64a0d8749404@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
| | | | | | * | iommu/arm-smmu-qcom: Sort the compatible list alphabeticallyKonrad Dybcio2023-08-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It got broken at some point, fix it up. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230810-topic-lost_smmu_compats-v1-1-64a0d8749404@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
| | | | | | * | iommu/qcom: Add support for QSMMUv2 and QSMMU-500 secured contextsAngeloGioacchino Del Regno2023-08-091-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some SoCs like MSM8956, MSM8976 and others, secure contexts are also secured: these get programmed by the bootloader or TZ (as usual) but their "interesting" registers are locked out by the hypervisor, disallowing direct register writes from Linux and, in many cases, completely disallowing the reprogramming of TTBR, TCR, MAIR and other registers including, but not limited to, resetting contexts. This is referred downstream as a "v2" IOMMU but this is effectively a "v2 firmware configuration" instead. Luckily, the described behavior of version 2 is effective only on secure contexts and not on non-secure ones: add support for that, finally getting a completely working IOMMU on at least MSM8956/76. Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org> [Marijn: Rebased over next-20221111] Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230622092742.74819-7-angelogioacchino.delregno@collabora.com Signed-off-by: Will Deacon <will@kernel.org>
| | | | | | * | iommu/qcom: Index contexts by asid number to allow asid 0AngeloGioacchino Del Regno2023-08-091-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver was indexing the contexts by asid-1, which is probably done under the assumption that the first ASID is always 1. Unfortunately this is not always true: at least for MSM8956 and MSM8976's GPU IOMMU, the gpu_user context's ASID number is zero. To allow using a zero asid number, index the contexts by `asid` instead of by `asid - 1`. While at it, also enhance human readability by renaming the `num_ctxs` member of struct qcom_iommu_dev to `max_asid`. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230622092742.74819-5-angelogioacchino.delregno@collabora.com Signed-off-by: Will Deacon <will@kernel.org>