summaryrefslogtreecommitdiffstats
path: root/drivers/iommu/intel-iommu.c
Commit message (Collapse)AuthorAgeFilesLines
* iommu/vt-d: Use right Kconfig option nameLu Baolu2020-05-011-2/+2
| | | | | | | | | | | The CONFIG_ prefix should be added in the code. Fixes: 046182525db61 ("iommu/vt-d: Add Kconfig option to enable/disable scalable mode") Reported-and-tested-by: Kumar, Sanjay K <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20200501072427.14265-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Silence RCU-list debugging warning in dmar_find_atsr()Qian Cai2020-03-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | dmar_find_atsr() calls list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence this false positive. drivers/iommu/intel-iommu.c:4504 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff9755bee8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1a6/0xe19 Call Trace: dump_stack+0xa4/0xfe lockdep_rcu_suspicious+0xeb/0xf5 dmar_find_atsr+0x1ab/0x1c0 dmar_parse_one_atsr+0x64/0x220 dmar_walk_remapping_entries+0x130/0x380 dmar_table_init+0x166/0x243 intel_iommu_init+0x1ab/0xe19 pci_iommu_init+0x1a/0x44 do_one_initcall+0xae/0x4d0 kernel_init_freeable+0x412/0x4c5 kernel_init+0x19/0x193 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Populate debugfs if IOMMUs are detectedMegha Dey2020-03-141-1/+3
| | | | | | | | | | | | | | | Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel) gets populated only when DMA remapping is enabled (dmar_disabled = 0) irrespective of whether interrupt remapping is enabled or not. Instead, populate the intel iommu debugfs directory if any IOMMUs are detected. Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ee2636b8670b1 ("iommu/vt-d: Enable base Intel IOMMU debugfs support") Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + ↵Hans de Goede2020-03-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | add_taint Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Fixes: 556ab45f9a77 ("ioat2: catch and recover from broken vtd configurations v6") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200309182510.373875-1-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=701847 Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taintHans de Goede2020-03-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Some distros, e.g. Fedora, have tools watching for the kernel backtraces logged by the WARN macros and offer the user an option to file a bug for this when these are encountered. The WARN_TAINT in dmar_parse_one_rmrr + another iommu WARN_TAINT, addressed in another patch, have lead to over a 100 bugs being filed this way. This commit replaces the WARN_TAINT("...") call, with a pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) call avoiding the backtrace and thus also avoiding bug-reports being filed about this against the kernel. Fixes: f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails sanity check") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: stable@vger.kernel.org Cc: Barret Rhoden <brho@google.com> Link: https://lore.kernel.org/r/20200309140138.3753-3-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1808874
* iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()Qian Cai2020-03-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several places traverse RCU-list without holding any lock in intel_iommu_init(). Fix them by acquiring dmar_global_lock. WARNING: suspicious RCU usage ----------------------------- drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/1. Call Trace: dump_stack+0xa0/0xea lockdep_rcu_suspicious+0x102/0x10b intel_iommu_init+0x947/0xb13 pci_iommu_init+0x26/0x62 do_one_initcall+0xfe/0x500 kernel_init_freeable+0x45a/0x4f8 kernel_init+0x11/0x139 ret_from_fork+0x3a/0x50 DMAR: Intel(R) Virtualization Technology for Directed I/O Fixes: d8190dc63886 ("iommu/vt-d: Enable DMA remapping after rmrr mapped") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge pageYonghyun Hwang2020-03-021-2/+4
| | | | | | | | | | | | | | intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge page onto its corresponding physical address. This commit fixes the bug by accomodating the level of page entry for the IOVA and adds IOVA's lower address to the physical address. Cc: <stable@vger.kernel.org> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: Yonghyun Hwang <yonghyun@google.com> Fixes: 3871794642579 ("VT-d: Changes to support KVM") Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Simplify check in identity_mapping()Joerg Roedel2020-02-181-1/+1
| | | | | | | | | | | | The function only has one call-site and there it is never called with dummy or deferred devices. Simplify the check in the function to account for that. Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper") Cc: stable@vger.kernel.org # v5.5 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Remove deferred_attach_domain()Joerg Roedel2020-02-181-8/+3
| | | | | | | | | | | The function is now only a wrapper around find_domain(). Remove the function and call find_domain() directly at the call-sites. Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper") Cc: stable@vger.kernel.org # v5.5 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Do deferred attachment in iommu_need_mapping()Joerg Roedel2020-02-181-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | The attachment of deferred devices needs to happen before the check whether the device is identity mapped or not. Otherwise the check will return wrong results, cause warnings boot failures in kdump kernels, like WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 domain_get_iommu+0x61/0x70 [...] Call Trace: __intel_map_single+0x55/0x190 intel_alloc_coherent+0xac/0x110 dmam_alloc_attrs+0x50/0xa0 ahci_port_start+0xfb/0x1f0 [libahci] ata_host_start.part.39+0x104/0x1e0 [libata] With the earlier check the kdump boot succeeds and a crashdump is written. Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper") Cc: stable@vger.kernel.org # v5.5 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Move deferred device attachment into helper functionJoerg Roedel2020-02-181-8/+12
| | | | | | | | | | | Move the code that does the deferred device attachment into a separate helper function. Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper") Cc: stable@vger.kernel.org # v5.5 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/vt-d: Add attach_deferred() helperJoerg Roedel2020-02-181-4/+8
| | | | | | | | | | | Implement a helper function to check whether a device's attach process is deferred. Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper") Cc: stable@vger.kernel.org # v5.5 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* Merge tag 'iommu-updates-v5.6' of ↵Linus Torvalds2020-02-051-91/+275
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Allow compiling the ARM-SMMU drivers as modules. - Fixes and cleanups for the ARM-SMMU drivers and io-pgtable code collected by Will Deacon. The merge-commit (6855d1ba7537) has all the details. - Cleanup of the iommu_put_resv_regions() call-backs in various drivers. - AMD IOMMU driver cleanups. - Update for the x2APIC support in the AMD IOMMU driver. - Preparation patches for Intel VT-d nested mode support. - RMRR and identity domain handling fixes for the Intel VT-d driver. - More small fixes and cleanups. * tag 'iommu-updates-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits) iommu/amd: Remove the unnecessary assignment iommu/vt-d: Remove unnecessary WARN_ON_ONCE() iommu/vt-d: Unnecessary to handle default identity domain iommu/vt-d: Allow devices with RMRRs to use identity domain iommu/vt-d: Add RMRR base and end addresses sanity check iommu/vt-d: Mark firmware tainted if RMRR fails sanity check iommu/amd: Remove unused struct member iommu/amd: Replace two consecutive readl calls with one readq iommu/vt-d: Don't reject Host Bridge due to scope mismatch PCI/ATS: Add PASID stubs iommu/arm-smmu-v3: Return -EBUSY when trying to re-add a device iommu/arm-smmu-v3: Improve add_device() error handling iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE iommu/arm-smmu-v3: Add second level of context descriptor table iommu/arm-smmu-v3: Prepare for handling arm_smmu_write_ctx_desc() failure iommu/arm-smmu-v3: Propagate ssid_bits iommu/arm-smmu-v3: Add support for Substream IDs iommu/arm-smmu-v3: Add context descriptor tables allocators iommu/arm-smmu-v3: Prepare arm_smmu_s1_cfg for SSID support ACPI/IORT: Parse SSID property of named component node ...
| *---. Merge branches 'iommu/fixes', 'arm/smmu', 'x86/amd', 'x86/vt-d' and 'core' ↵Joerg Roedel2020-01-241-91/+275
| |\ \ \ | | | | | | | | | | | | | | | into next
| | | | * iommu: intel: Use generic_iommu_put_resv_regions()Thierry Reding2019-12-231-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new standard function instead of open-coding it. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | | * iommu/iova: Silence warnings under memory pressureQian Cai2019-12-231-1/+2
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call dev_err_once() there because even the "ratelimited" is too much, and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G B Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:2285 inactive_file:1838 isolated_file:0 unevictable:0 dirty:1 writeback:5 unstable:0 slab_reclaimable:13972 slab_unreclaimable:453879 mapped:2380 shmem:154 pagetables:6948 bounce:0 free:19133 free_pcp:7363 free_cma:0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Unnecessary to handle default identity domainLu Baolu2020-01-241-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu default domain framework has been designed to take care of setting identity default domain type. It's unnecessary to handle this again in the VT-d driver. Hence, remove it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Allow devices with RMRRs to use identity domainLu Baolu2020-01-241-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ea2447f700cab ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain"), the Intel IOMMU driver doesn't allow any devices with RMRR locked to use the identity domain. This was added to to fix the issue where the RMRR info for devices being placed in and out of the identity domain gets lost. This identity maps all RMRRs when setting up the identity domain, so that devices with RMRRs could also use it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Add RMRR base and end addresses sanity checkBarret Rhoden2020-01-241-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VT-d spec specifies requirements for the RMRR entries base and end (called 'Limit' in the docs) addresses. This commit will cause the DMAR processing to mark the firmware as tainted if any RMRR entries that do not meet these requirements. Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Mark firmware tainted if RMRR fails sanity checkBarret Rhoden2020-01-241-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RMRR entries describe memory regions that are DMA targets for devices outside the kernel's control. RMRR entries that fail the sanity check are pointing to regions of memory that the firmware did not tell the kernel are reserved or otherwise should not be used. Instead of aborting DMAR processing, this commit marks the firmware as tainted. These RMRRs will still be identity mapped, otherwise, some devices, e.x. graphic devices, will not work during boot. Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: f036c7fa0ab60 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: debugfs: Add support to show page table internalsLu Baolu2020-01-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export page table internals of the domain attached to each device. Example of such dump on a Skylake machine: $ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct [ ... ] Device 0000:00:14.0 with pasid 0 @0x15f3d9000 IOVA_PFN PML5E PML4E 0x000000008ced0 | 0x0000000000000000 0x000000015f3da003 0x000000008ced1 | 0x0000000000000000 0x000000015f3da003 0x000000008ced2 | 0x0000000000000000 0x000000015f3da003 0x000000008ced3 | 0x0000000000000000 0x000000015f3da003 0x000000008ced4 | 0x0000000000000000 0x000000015f3da003 0x000000008ced5 | 0x0000000000000000 0x000000015f3da003 0x000000008ced6 | 0x0000000000000000 0x000000015f3da003 0x000000008ced7 | 0x0000000000000000 0x000000015f3da003 0x000000008ced8 | 0x0000000000000000 0x000000015f3da003 0x000000008ced9 | 0x0000000000000000 0x000000015f3da003 PDPE PDE PTE 0x000000015f3db003 0x000000015f3dc003 0x000000008ced0003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced1003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced2003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced3003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced4003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced5003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced6003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced7003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced8003 0x000000015f3db003 0x000000015f3dc003 0x000000008ced9003 [ ... ] Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Use iova over first levelLu Baolu2020-01-071-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we make all map/unmap paths support first level page table. Let's turn it on if hardware supports scalable mode. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Update first level super page capabilityLu Baolu2020-01-071-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First-level translation may map input addresses to 4-KByte pages, 2-MByte pages, or 1-GByte pages. Support for 4-KByte pages and 2-Mbyte pages are mandatory for first-level translation. Hardware support for 1-GByte page is reported through the FL1GP field in the Capability Register. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Make first level IOVA canonicalLu Baolu2020-01-071-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First-level translation restricts the input-address to a canonical address (i.e., address bits 63:N have the same value as address bit [N-1], where N is 48-bits with 4-level paging and 57-bits with 5-level paging). (section 3.6 in the spec) This makes first level IOVA canonical by using IOVA with bit [N-1] always cleared. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Flush PASID-based iotlb for iova over first levelLu Baolu2020-01-071-15/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When software has changed first-level tables, it should invalidate the affected IOTLB and the paging-structure-caches using the PASID- based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Setup pasid entries for iova over first levelLu Baolu2020-01-071-5/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel VT-d in scalable mode supports two types of page tables for IOVA translation: first level and second level. The IOMMU driver can choose one from both for IOVA translation according to the use case. This sets up the pasid entry if a domain is selected to use the first-level page table for iova translation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attrLu Baolu2020-01-071-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the Intel VT-d specific callback of setting DOMAIN_ATTR_NESTING domain attribution. It is necessary to let the VT-d driver know that the domain represents a virtual machine which requires the IOMMU hardware to support nested translation mode. Return success if the IOMMU hardware suports nested mode, otherwise failure. Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Identify domains using first level page tableLu Baolu2020-01-071-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This checks whether a domain should use the first level page table for map/unmap and marks it in the domain structure. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Loose requirement for flush queue initializatonLu Baolu2020-01-071-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if flush queue initialization fails, we return error or enforce the system-wide strict mode. These are unnecessary because we always check the existence of a flush queue before queuing any iova's for lazy flushing. Printing a informational message is enough. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Avoid iova flush queue in strict modeLu Baolu2020-01-071-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If Intel IOMMU strict mode is enabled by users, it's unnecessary to create the iova flush queue. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: trace: Extend map_sg trace eventLu Baolu2020-01-071-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current map_sg stores trace message in a coarse manner. This extends it so that more detailed messages could be traced. The map_sg trace message looks like: map_sg: dev=0000:00:17.0 [1/9] dev_addr=0xf8f90000 phys_addr=0x158051000 size=4096 map_sg: dev=0000:00:17.0 [2/9] dev_addr=0xf8f91000 phys_addr=0x15a858000 size=4096 map_sg: dev=0000:00:17.0 [3/9] dev_addr=0xf8f92000 phys_addr=0x15aa13000 size=4096 map_sg: dev=0000:00:17.0 [4/9] dev_addr=0xf8f93000 phys_addr=0x1570f1000 size=8192 map_sg: dev=0000:00:17.0 [5/9] dev_addr=0xf8f95000 phys_addr=0x15c6d0000 size=4096 map_sg: dev=0000:00:17.0 [6/9] dev_addr=0xf8f96000 phys_addr=0x157194000 size=4096 map_sg: dev=0000:00:17.0 [7/9] dev_addr=0xf8f97000 phys_addr=0x169552000 size=4096 map_sg: dev=0000:00:17.0 [8/9] dev_addr=0xf8f98000 phys_addr=0x169dde000 size=4096 map_sg: dev=0000:00:17.0 [9/9] dev_addr=0xf8f99000 phys_addr=0x148351000 size=4096 Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Replace Intel specific PASID allocator with IOASIDJacob Pan2020-01-071-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of generic IOASID code to manage PASID allocation, free, and lookup. Replace Intel specific code. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Fix CPU and IOMMU SVM feature matching checksJacob Pan2020-01-071-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shared Virtual Memory(SVM) is based on a collective set of hardware features detected at runtime. There are requirements for matching CPU and IOMMU capabilities. The current code checks CPU and IOMMU feature set for SVM support but the result is never stored nor used. Therefore, SVM can still be used even when these checks failed. The consequences can be: 1. CPU uses 5-level paging mode for virtual address of 57 bits, but IOMMU can only support 4-level paging mode with 48 bits address for DMA. 2. 1GB page size is used by CPU but IOMMU does not support it. VT-d unrecoverable faults may be generated. The best solution to fix these problems is to prevent them in the first place. This patch consolidates code for checking PASID, CPU vs. IOMMU paging mode compatibility, as well as provides specific error messages for each failed checks. On sane hardware configurations, these error message shall never appear in kernel log. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Add Kconfig option to enable/disable scalable modeLu Baolu2020-01-071-1/+6
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON to make it easier for distributions to enable or disable the Intel IOMMU scalable mode by default during kernel build. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | | Merge tag 'pci-v5.6-changes' of ↵Linus Torvalds2020-01-311-7/+4
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Resource management: - Improve resource assignment for hot-added nested bridges, e.g., Thunderbolt (Nicholas Johnson) Power management: - Optionally print config space of devices before suspend (Chen Yu) - Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake) Virtualization: - Generalize DMA alias quirks (James Sewart) - Add DMA alias quirk for PLX PEX NTB (James Sewart) - Fix IOV memory leak (Navid Emamdoost) AER: - Log which device prevents error recovery (Yicong Yang) Peer-to-peer DMA: - Whitelist Intel SkyLake-E (Armen Baloyan) Broadcom iProc host bridge driver: - Apply PAXC quirk whether driver is built-in or module (Wei Liu) Broadcom STB host bridge driver: - Add Broadcom STB PCIe host controller driver (Jim Quinlan) Intel Gateway SoC host bridge driver: - Add driver for Intel Gateway SoC (Dilip Kota) Intel VMD host bridge driver: - Add support for DMA aliases on other buses (Jon Derrick) - Remove dma_map_ops overrides (Jon Derrick) - Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig) NVIDIA Tegra host bridge driver: - Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler) Panasonic UniPhier host bridge driver: - Remove module code since driver can't be built as a module (Masahiro Yamada) Qualcomm host bridge driver: - Add support for SDM845 PCIe controller (Bjorn Andersson) TI Keystone host bridge driver: - Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I) - Fix link training retries initiation (Yurii Monakov) - Fix outbound region mapping (Yurii Monakov) Misc: - Add Switchtec Gen4 support (Kelvin Cao) - Add Switchtec Intercomm Notify and Upstream Error Containment support (Logan Gunthorpe) - Use dma_set_mask_and_coherent() since Switchtec supports 64-bit addressing (Wesley Sheng)" * tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits) PCI: Allow adjust_bridge_window() to shrink resource if necessary PCI: Set resource size directly in adjust_bridge_window() PCI: Rename extend_bridge_window() to adjust_bridge_window() PCI: Rename extend_bridge_window() parameter PCI: Consider alignment of hot-added bridges when assigning resources PCI: Remove local variable usage in pci_bus_distribute_available_resources() PCI: Pass size + alignment to pci_bus_distribute_available_resources() PCI: Rename variables PCI: vmd: Add two VMD Device IDs PCI: Remove unnecessary braces PCI: brcmstb: Add MSI support PCI: brcmstb: Add Broadcom STB PCIe host controller driver x86/PCI: Remove X86_DEV_DMA_OPS PCI: vmd: Remove dma_map_ops overrides iommu/vt-d: Remove VMD child device sanity check iommu/vt-d: Use pci_real_dma_dev() for mapping PCI: Introduce pci_real_dma_dev() x86/PCI: Expose VMD's pci_dev in struct pci_sysdata x86/PCI: Add to_pci_sysdata() helper PCI/AER: Initialize aer_fifo ...
| * | iommu/vt-d: Remove VMD child device sanity checkJon Derrick2020-01-241-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the sanity check required for VMD child devices. The new pci_real_dma_dev() DMA alias mechanism places them in the same IOMMU group as the VMD endpoint. Assignment of the group would require assigning the VMD endpoint, where unbinding the VMD endpoint removes the child device domain from the hierarchy. Link: https://lore.kernel.org/r/1579613871-301529-6-git-send-email-jonathan.derrick@intel.com Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
| * | iommu/vt-d: Use pci_real_dma_dev() for mappingJon Derrick2020-01-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PCI device may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. This case requires the real DMA device for the IOMMU mapping, so use pci_real_dma_dev() to find that device. Link: https://lore.kernel.org/r/1579613871-301529-5-git-send-email-jonathan.derrick@intel.com Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
* | | iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointerJerry Snitselaar2020-01-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible for archdata.iommu to be set to DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for those values before calling __dmar_remove_one_dev_info. Without a check it can result in a null pointer dereference. This has been seen while booting a kdump kernel on an HP dl380 gen9. Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org # 5.3+ Cc: linux-kernel@vger.kernel.org Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | | iommu/vt-d: Unlink device if failed to add to groupJon Derrick2020-01-071-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device fails to be added to the group, make sure to unlink the reference before returning. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device") Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | | iommu/vt-d: Fix adding non-PCI devices to Intel IOMMUPatrick Steinhardt2020-01-071-1/+8
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with commit fa212a97f3a3 ("iommu/vt-d: Probe DMA-capable ACPI name space devices"), we now probe DMA-capable ACPI name space devices. On Dell XPS 13 9343, which has an Intel LPSS platform device INTL9C60 enumerated via ACPI, this change leads to the following warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1 at pci_device_group+0x11a/0x130 CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.5.0-rc3+ #22 Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A20 06/06/2019 RIP: 0010:pci_device_group+0x11a/0x130 Code: f0 ff ff 48 85 c0 49 89 c4 75 c4 48 8d 74 24 10 48 89 ef e8 48 ef ff ff 48 85 c0 49 89 c4 75 af e8 db f7 ff ff 49 89 c4 eb a5 <0f> 0b 49 c7 c4 ea ff ff ff eb 9a e8 96 1e c7 ff 66 0f 1f 44 00 00 RSP: 0000:ffffc0d6c0043cb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffa3d1d43dd810 RCX: 0000000000000000 RDX: ffffa3d1d4fecf80 RSI: ffffa3d12943dcc0 RDI: ffffa3d1d43dd810 RBP: ffffa3d1d43dd810 R08: 0000000000000000 R09: ffffa3d1d4c04a80 R10: ffffa3d1d4c00880 R11: ffffa3d1d44ba000 R12: 0000000000000000 R13: ffffa3d1d4383b80 R14: ffffa3d1d4c090d0 R15: ffffa3d1d4324530 FS: 0000000000000000(0000) GS:ffffa3d1d6700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000460a001 CR4: 00000000003606e0 Call Trace: ? iommu_group_get_for_dev+0x81/0x1f0 ? intel_iommu_add_device+0x61/0x170 ? iommu_probe_device+0x43/0xd0 ? intel_iommu_init+0x1fa2/0x2235 ? pci_iommu_init+0x52/0xe7 ? e820__memblock_setup+0x15c/0x15c ? do_one_initcall+0xcc/0x27e ? kernel_init_freeable+0x169/0x259 ? rest_init+0x95/0x95 ? kernel_init+0x5/0xeb ? ret_from_fork+0x35/0x40 ---[ end trace 28473e7abc25b92c ]--- DMAR: ACPI name space devices didn't probe correctly The bug results from the fact that while we now enumerate ACPI devices, we aren't able to handle any non-PCI device when generating the device group. Fix the issue by implementing an Intel-specific callback that returns `pci_device_group` only if the device is a PCI device. Otherwise, it will return a generic device group. Fixes: fa212a97f3a3 ("iommu/vt-d: Probe DMA-capable ACPI name space devices") Signed-off-by: Patrick Steinhardt <ps@pks.im> Cc: stable@vger.kernel.org # v5.3+ Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | iommu/vt-d: Allocate reserved region for ISA with correct permissionJerry Snitselaar2019-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the reserved region for ISA is allocated with no permissions. If a dma domain is being used, mapping this region will fail. Set the permissions to DMA_PTE_READ|DMA_PTE_WRITE. Cc: Joerg Roedel <jroedel@suse.de> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: iommu@lists.linux-foundation.org Cc: stable@vger.kernel.org # v5.3+ Fixes: d850c2ee5fe2 ("iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | iommu/vt-d: Fix dmar pte read access not set errorLu Baolu2019-12-171-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the default DMA domain of a group doesn't fit a device, it will still sit in the group but use a private identity domain. When map/unmap/iova_to_phys come through iommu API, the driver should still serve them, otherwise, other devices in the same group will be impacted. Since identity domain has been mapped with the whole available memory space and RMRRs, we don't need to worry about the impact on it. Link: https://www.spinics.net/lists/iommu/msg40416.html Cc: Jerry Snitselaar <jsnitsel@redhat.com> Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with private") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | iommu/vt-d: Set ISA bridge reserved region as relaxableAlex Williamson2019-12-171-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d850c2ee5fe2 ("iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions") created a direct-mapped reserved memory region in order to replace the static identity mapping of the ISA address space, where the latter was then removed in commit df4f3c603aeb ("iommu/vt-d: Remove static identity map code"). According to the history of this code and the Kconfig option surrounding it, this direct mapping exists for the benefit of legacy ISA drivers that are not compatible with the DMA API. In conjuntion with commit 9b77e5c79840 ("vfio/type1: check dma map request is within a valid iova range") this change introduced a regression where the vfio IOMMU backend enforces reserved memory regions per IOMMU group, preventing userspace from creating IOMMU mappings conflicting with prescribed reserved regions. A necessary prerequisite for the vfio change was the introduction of "relaxable" direct mappings introduced by commit adfd37382090 ("iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions"). These relaxable direct mappings provide the same identity mapping support in the default domain, but also indicate that the reservation is software imposed and may be relaxed under some conditions, such as device assignment. Convert the ISA bridge direct-mapped reserved region to relaxable to reflect that the restriction is self imposed and need not be enforced by drivers such as vfio. Fixes: 1c5c59fbad20 ("iommu/vt-d: Differentiate relaxable and non relaxable RMRRs") Cc: stable@vger.kernel.org # v5.3+ Link: https://lore.kernel.org/linux-iommu/20191211082304.2d4fab45@x1.home Reported-by: cprt <cprt@protonmail.com> Tested-by: cprt <cprt@protonmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
*---. Merge branches 'iommu/fixes', 'arm/qcom', 'arm/renesas', 'arm/rockchip', ↵Joerg Roedel2019-11-121-15/+46
|\ \ \ | | | | | | | | | | | | 'arm/mediatek', 'arm/tegra', 'arm/smmu', 'x86/amd', 'x86/vt-d', 'virtio' and 'core' into next
| | | * iommu/vt-d: Turn off translations at shutdownDeepa Dinamani2019-11-111-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intel-iommu driver assumes that the iommu state is cleaned up at the start of the new kernel. But, when we try to kexec boot something other than the Linux kernel, the cleanup cannot be relied upon. Hence, cleanup before we go down for reboot. Keeping the cleanup at initialization also, in case BIOS leaves the IOMMU enabled. I considered turning off iommu only during kexec reboot, but a clean shutdown seems always a good idea. But if someone wants to make it conditional, such as VMM live update, we can do that. There doesn't seem to be such a condition at this time. Tested that before, the info message 'DMAR: Translation was enabled for <iommu> but we are not in kdump mode' would be reported for each iommu. The message will not appear when the DMA-remapping is not enabled on entry to the kernel. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reservedYian Chen2019-11-111-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VT-d RMRR (Reserved Memory Region Reporting) regions are reserved for device use only and should not be part of allocable memory pool of OS. BIOS e820_table reports complete memory map to OS, including OS usable memory ranges and BIOS reserved memory ranges etc. x86 BIOS may not be trusted to include RMRR regions as reserved type of memory in its e820 memory map, hence validate every RMRR entry with the e820 memory map to make sure the RMRR regions will not be used by OS for any other purposes. ia64 EFI is working fine so implement RMRR validation as a dummy function Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Yian Chen <yian.chen@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/vt-d: Refactor find_domain() helperLu Baolu2019-10-151-13/+18
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current find_domain() helper checks and does the deferred domain attachment and return the domain in use. This isn't always the use case for the callers. Some callers only want to retrieve the current domain in use. This refactors find_domain() into two helpers: 1) find_domain() only returns the domain in use; 2) deferred_attach_domain() does the deferred domain attachment if required and return the domain in use. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | * iommu: Add gfp parameter to iommu_ops::mapTom Murphy2019-10-151-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a gfp_t parameter to the iommu_ops::map function. Remove the needless locking in the AMD iommu driver. The iommu_ops::map function (or the iommu_map function which calls it) was always supposed to be sleepable (according to Joerg's comment in this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so should probably have had a "might_sleep()" since it was written. However currently the dma-iommu api can call iommu_map in an atomic context, which it shouldn't do. This doesn't cause any problems because any iommu driver which uses the dma-iommu api uses gfp_atomic in it's iommu_ops::map function. But doing this wastes the memory allocators atomic pools. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | iommu/vt-d: Fix panic after kexec -p for kdumpJohn Donnelly2019-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cures a panic on restart after a kexec operation on 5.3 and 5.4 kernels. The underlying state of the iommu registers (iommu->flags & VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as "DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping(). [ 43.654737] BUG: kernel NULL pointer dereference, address: 0000000000000056 [ 43.655720] #PF: supervisor read access in kernel mode [ 43.655720] #PF: error_code(0x0000) - not-present page [ 43.655720] PGD 0 P4D 0 [ 43.655720] Oops: 0000 [#1] SMP PTI [ 43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.2-1940.el8uek.x86_64 #1 [ 43.655720] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018 [ 43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0 [ 43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff 74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0 01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b [ 43.655720] RSP: 0018:ffffc9000001b9b0 EFLAGS: 00010246 [ 43.655720] RAX: 0000000000000001 RBX: 0000000000001000 RCX: fffffffffffffffd [ 43.655720] RDX: fffffffffffffffe RSI: ffff8880719b8000 RDI: ffff8880477460b0 [ 43.655720] RBP: ffffc9000001b9e8 R08: 0000000000000000 R09: ffff888047c01700 [ 43.655720] R10: 00002194036fc692 R11: 0000000000000000 R12: 0000000000000000 [ 43.655720] R13: ffff8880477460b0 R14: 0000000000000cc0 R15: ffff888072d2b558 [ 43.655720] FS: 0000000000000000(0000) GS:ffff888071c00000(0000) knlGS:0000000000000000 [ 43.655720] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.655720] CR2: 0000000000000056 CR3: 000000007440a002 CR4: 00000000001606b0 [ 43.655720] Call Trace: [ 43.655720] ? intel_alloc_coherent+0x2a/0x180 [ 43.655720] ? __schedule+0x2c2/0x650 [ 43.655720] dma_alloc_attrs+0x8c/0xd0 [ 43.655720] dma_pool_alloc+0xdf/0x200 [ 43.655720] ehci_qh_alloc+0x58/0x130 [ 43.655720] ehci_setup+0x287/0x7ba [ 43.655720] ? _dev_info+0x6c/0x83 [ 43.655720] ehci_pci_setup+0x91/0x436 [ 43.655720] usb_add_hcd.cold.48+0x1d4/0x754 [ 43.655720] usb_hcd_pci_probe+0x2bc/0x3f0 [ 43.655720] ehci_pci_probe+0x39/0x40 [ 43.655720] local_pci_probe+0x47/0x80 [ 43.655720] pci_device_probe+0xff/0x1b0 [ 43.655720] really_probe+0xf5/0x3a0 [ 43.655720] driver_probe_device+0xbb/0x100 [ 43.655720] device_driver_attach+0x58/0x60 [ 43.655720] __driver_attach+0x8f/0x150 [ 43.655720] ? device_driver_attach+0x60/0x60 [ 43.655720] bus_for_each_dev+0x74/0xb0 [ 43.655720] driver_attach+0x1e/0x20 [ 43.655720] bus_add_driver+0x151/0x1f0 [ 43.655720] ? ehci_hcd_init+0xb2/0xb2 [ 43.655720] ? do_early_param+0x95/0x95 [ 43.655720] driver_register+0x70/0xc0 [ 43.655720] ? ehci_hcd_init+0xb2/0xb2 [ 43.655720] __pci_register_driver+0x57/0x60 [ 43.655720] ehci_pci_init+0x6a/0x6c [ 43.655720] do_one_initcall+0x4a/0x1fa [ 43.655720] ? do_early_param+0x95/0x95 [ 43.655720] kernel_init_freeable+0x1bd/0x262 [ 43.655720] ? rest_init+0xb0/0xb0 [ 43.655720] kernel_init+0xe/0x110 [ 43.655720] ret_from_fork+0x24/0x50 Fixes: 8af46c784ecfe ("iommu/vt-d: Implement is_attach_deferred iommu ops entry") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | iommu/vt-d: Return the correct dma mask when we are bypassing the IOMMUArvind Sankar2019-10-181-1/+9
|/ | | | | | | | | | | | | | | We must return a mask covering the full physical RAM when bypassing the IOMMU mapping. Also, in iommu_need_mapping, we need to check using dma_direct_get_required_mask to ensure that the device's dma_mask can cover physical RAM before deciding to bypass IOMMU mapping. Based on an earlier patch from Christoph Hellwig. Fixes: 249baa547901 ("dma-mapping: provide a better default ->get_required_mask") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>