summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* VT-d: adapt device attach and detach functions for IOMMU APIJoerg Roedel2009-01-032-16/+15
| | | | Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* VT-d: adapt domain init and destroy functions for IOMMU APIJoerg Roedel2009-01-032-17/+18
| | | | Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* KVM: change KVM to use IOMMU APIJoerg Roedel2009-01-039-34/+33
| | | | Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* select IOMMU_API when DMAR and/or AMD_IOMMU is selectedJoerg Roedel2009-01-033-0/+7
| | | | | | | These two IOMMUs can implement the current version of this API. So select the API if one or both of these IOMMU drivers is selected. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* add frontend implementation for the IOMMU APIJoerg Roedel2009-01-031-0/+100
| | | | | | | | | | | This API can be used by KVM for accessing different types of IOMMUs to do device passthrough to guests. Beside that this API can also be used by device drivers to map non-linear host memory into dma-linear addresses to prevent sgather-gather DMA. UIO may be another user for this API. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
* introcude linux/iommu.h for an iommu apiJoerg Roedel2009-01-031-0/+112
| | | | | | | | This patch introduces the API to abstract the exported VT-d functions for KVM into a generic API. This way the AMD IOMMU implementation can plug into this API later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* KVM: rename vtd.c to iommu.cJoerg Roedel2009-01-033-2/+2
| | | | | | | | | Impact: file renamed The code in the vtd.c file can be reused for other IOMMUs as well. So rename it to make it clear that it handle more than VT-d. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Deassign device in kvm_free_assgined_deviceWeidong Han2009-01-033-10/+2
| | | | | | | In kvm_iommu_unmap_memslots(), assigned_dev_head is already empty. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* KVM: support device deassignmentWeidong Han2009-01-033-0/+74
| | | | | | | Support device deassignment, it can be used in device hotplug. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* KVM: use the new intel iommu APIsWeidong Han2009-01-033-49/+71
| | | | | | | | | intel iommu APIs are updated, use the new APIs. In addition, change kvm_iommu_map_guest() to just create the domain, let kvm_iommu_assign_device() assign device. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Check agaw is sufficient for mapped memoryWeidong Han2009-01-031-0/+61
| | | | | | | When domain is related to multiple iommus, need to check if the minimum agaw is sufficient for the mapped memory Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Change intel iommu APIs of virtual machine domainWeidong Han2009-01-032-79/+70
| | | | | | | These APIs are used by KVM to use VT-d Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Change domain_context_mapping_one for virtual machine domainWeidong Han2009-01-031-3/+52
| | | | | | | | | vm_domid won't be set in context, find available domain id for a device from its iommu. For a virtual machine domain, a default agaw will be set, and skip top levels of page tables for iommu which has less agaw than default. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Allocation and free functions of virtual machine domainWeidong Han2009-01-031-2/+105
| | | | | | | virtual machine domain is different from native DMA-API domain, implement separate allocation and free functions for virtual machine domain. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Add domain_flush_cacheWeidong Han2009-01-031-17/+26
| | | | | | | | | Because virtual machine domain may have multiple devices from different iommus, it cannot use __iommu_flush_cache. In some common low level functions, use domain_flush_cache instead of __iommu_flush_cache. On the other hand, in some functions, iommu can is specified or domain cannot be got, still use __iommu_flush_cache Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Add/remove domain device info for virtual machine domainWeidong Han2009-01-031-5/+166
| | | | | | | | | Add iommu reference count in domain, and add a lock to protect iommu setting including iommu_bmp, iommu_count and iommu_coherency. virtual machine domain may have multiple devices from different iommus, so it needs to do more things when add/remove domain device info. Thus implement separate these functions for virtual machine domain. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Add domain flag DOMAIN_FLAG_VIRTUAL_MACHINEWeidong Han2009-01-031-0/+7
| | | | | | | Add this flag for VT-d used in virtual machine, like KVM. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* iommu coherencyWeidong Han2009-01-031-0/+24
| | | | | | | In dmar_domain, more than one iommus may be included in iommu_bmp. Due to "Coherency" capability may be different across iommus, set this variable to indicate iommu access is coherent or not. Only when all related iommus in a dmar_domain are all coherent, iommu access of this domain is coherent. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* calculate agaw for each iommuWeidong Han2009-01-034-0/+34
| | | | | | | "SAGAW" capability may be different across iommus. Use a default agaw, but if default agaw is not supported in some iommus, choose a less supported agaw. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* iommu bitmap instead of iommu pointer in dmar_domainWeidong Han2009-01-031-30/+67
| | | | | | | In order to support assigning multiple devices from different iommus to a domain, iommu bitmap is used to keep all iommus the domain are related to. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Get iommu from g_iommus for deferred flushWeidong Han2009-01-031-3/+4
| | | | | | | deferred_flush[] uses the iommu seq_id to index, so its iommu is fixed and can get it from g_iommus. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Add global iommu listWeidong Han2009-01-031-0/+25
| | | | | Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* change P2P domain flagsWeidong Han2009-01-031-3/+5
| | | | | Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Initialize domain flags to 0Weidong Han2009-01-031-0/+1
| | | | | | | It's random number after the domain is allocated by kmem_cache_alloc Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* VT-d: fix segment number being ignored when searching DRHDYu Zhao2009-01-031-18/+18
| | | | | | | | | | On platforms with multiple PCI segments, any of the segments can have a DRHD with INCLUDE_PCI_ALL flag. So need to check the DRHD's segment number against the PCI device's when searching its DRHD. Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* intel-iommu: trivially inline DMA PTE macrosMark McLoughlin2009-01-031-23/+48
| | | | | Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: trivially inline context entry macrosMark McLoughlin2009-01-031-30/+55
| | | | | | | | | | | | | Some macros were unused, so I just dropped them: context_fault_disable context_translation_type context_address_root context_address_width context_domain_id Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move iommu_prepare_gfx_mapping() out of dma_remapping.hMark McLoughlin2009-01-032-7/+5
| | | | | Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: kill off duplicate def of dmar_disabledMark McLoughlin2009-01-031-1/+0
| | | | | | | | This is only used in dmar.c and intel-iommu.h, so dma_remapping.h seems like the appropriate place for it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move struct device_domain_info out of dma_remapping.hMark McLoughlin2009-01-032-10/+10
| | | | | Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move struct dmar_domain def out dma_remapping.hMark McLoughlin2009-01-032-20/+20
| | | | | Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move DMA PTE defs out of dma_remapping.hMark McLoughlin2009-01-032-22/+22
| | | | | | | DMA_PTE_READ/WRITE are needed by kvm. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move context entry defs out from dma_remapping.hMark McLoughlin2009-01-032-38/+38
| | | | | Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move root entry defs from dma_remapping.hMark McLoughlin2009-01-032-33/+34
| | | | | | | | We keep the struct root_entry forward declaration for the pointer in struct intel_iommu. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: move DMA_32/64BIT_PFN into intel-iommu.cMark McLoughlin2009-01-032-5/+3
| | | | | Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: make init_dmars() staticMark McLoughlin2009-01-032-2/+1
| | | | | | | init_dmars() is not used outside of drivers/pci/intel-iommu.c Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* intel-iommu: remove some unused struct intel_iommu fieldsMark McLoughlin2009-01-032-6/+0
| | | | | | | | | | | | | | | | The seg, saved_msg and sysdev fields appear to be unused since before the code was first merged. linux/msi.h is not needed in linux/intel-iommu.h anymore since there is no longer a reference to struct msi_msg. The MSI code in drivers/pci/intel-iommu.c still has linux/msi.h included via linux/dmar.h. linux/sysdev.h isn't needed because there is no reference to struct sys_device. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* Disallow gcc versions 3.{0,1}Ingo Molnar2009-01-021-0/+4
| | | | | | | | | | GCC 3.0 and 3.1 are too old to build a working kernel. Signed-off-by: Ingo Molnar <mingo@elte.hu> [ This check got dropped as obsolete when I simplified the gcc header inclusion mess in f153b82121b0366fe0e5f9553545cce237335175, but Willy Tarreau reports actually having those old versions still.. -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'cpus4096-for-linus-2' of ↵Linus Torvalds2009-01-02198-1423/+2018
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
| * x86: export vector_used_by_percpu_irqIngo Molnar2008-12-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix lguest can be built as a module and makes use of this new symbol: ERROR: "vector_used_by_percpu_irq" [drivers/lguest/lg.ko] undefined! export it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()Suresh Siddha2008-12-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These commits: commit 95d313cf1c1ecedc8bec5727b09bdacbf67dfc45 Author: Mike Travis <travis@sgi.com> Date: Tue Dec 16 17:33:54 2008 -0800 x86: Add cpu_mask_to_apicid_and and commit 6eeb7c5a99434596c5953a95baa17d2f085664e3 Author: Mike Travis <travis@sgi.com> Date: Tue Dec 16 17:33:55 2008 -0800 x86: update add-cpu_mask_to_apicid_and to use struct cpumask* broke interrupt delivery on x2apic platforms. As x2apic cluster mode uses logical delivery mode, we need to use logical apicid instead of physical apicid in x2apic_cpu_mask_to_apicid_and() Impact: fixes the broken interrupt delivery issue on generic x2apic platforms. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Mike Travis <travis@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: nominate preferred wakeup cpu, fixVaidyanathan Srinivasan2008-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrew Morton reported: > kernel/sched.c: In function 'schedule': > kernel/sched.c:3679: warning: 'active_balance' may be used uninitialized in this function > > This warning is correct - the code is buggy. In sched.c load_balance_newidle, there's real potential use of uninitialised variable - fix it. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: fix lguest used_vectors breakage, -v2Yinghai Lu2008-12-238-24/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix lguest, clean up 32-bit lguest used used_vectors to record vectors, but that model of allocating vectors changed and got broken, after we changed vector allocation to a per_cpu array. Try enable that for 64bit, and the array is used for all vectors that are not managed by vector_irq per_cpu array. Also kill system_vectors[], that is now a duplication of the used_vectors bitmap. [ merged in cpus4096 due to io_apic.c cpumask changes. ] [ -v2, fix build failure ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: fix warning in arch/x86/kernel/io_apic.cIngo Molnar2008-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | this warning: arch/x86/kernel/io_apic.c: In function ‘ir_set_msi_irq_affinity’: arch/x86/kernel/io_apic.c:3373: warning: ‘cfg’ may be used uninitialized in this function triggers because the variable was truly uninitialized. We'd crash on entering this code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: fix warning in kernel/sched.cIngo Molnar2008-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix cpumask conversion bug this warning: kernel/sched.c: In function ‘find_busiest_group’: kernel/sched.c:3429: warning: passing argument 1 of ‘__first_cpu’ from incompatible pointer type shows that we forgot to convert a new patch to the new cpumask APIs. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: move test_sd_parent() to an SMP section of sched.hIngo Molnar2008-12-191-9/+9
| | | | | | | | | | | | Impact: build fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0Vaidyanathan Srinivasan2008-12-192-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: change task balancing to save power more agressively Add SD_BALANCE_NEWIDLE flag at MC level and CPU level if sched_mc is set. This helps power savings and will not affect performance when sched_mc=0 Ingo and Mike Galbraith have optimised the SD flags by removing SD_BALANCE_NEWIDLE at MC and CPU level. This helps performance but hurts power savings since this slows down task consolidation by reducing the number of times load_balance is run. sched: fine-tune SD_MC_INIT commit 14800984706bf6936bbec5187f736e928be5c218 Author: Mike Galbraith <efault@gmx.de> Date: Fri Nov 7 15:26:50 2008 +0100 sched: re-tune balancing -- revert commit 9fcd18c9e63e325dbd2b4c726623f760788d5aa8 Author: Ingo Molnar <mingo@elte.hu> Date: Wed Nov 5 16:52:08 2008 +0100 This patch selectively enables SD_BALANCE_NEWIDLE flag only when sched_mc is set to 1 or 2. This helps power savings by task consolidation and also does not hurt performance at sched_mc=0 where all power saving optimisations are turned off. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: activate active load balancing in new idle cpusVaidyanathan Srinivasan2008-12-191-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: tweak task balancing to save power more agressively Active load balancing is a process by which migration thread is woken up on the target CPU in order to pull current running task on another package into this newly idle package. This method is already in use with normal load_balance(), this patch introduces this method to new idle cpus when sched_mc is set to POWERSAVINGS_BALANCE_WAKEUP. This logic provides effective consolidation of short running daemon jobs in a almost idle system The side effect of this patch may be ping-ponging of tasks if the system is moderately utilised. May need to adjust the iterations before triggering. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: bias task wakeups to preferred semi-idle packagesVaidyanathan Srinivasan2008-12-191-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: tweak task wakeup to save power more agressively Preferred wakeup cpu (from a semi idle package) has been nominated in find_busiest_group() in the previous patch. Use this information in sched_mc_preferred_wakeup_cpu in function wake_idle() to bias task wakeups if the following conditions are satisfied: - The present cpu that is trying to wakeup the process is idle and waking the target process on this cpu will potentially wakeup a completely idle package - The previous cpu on which the target process ran is also idle and hence selecting the previous cpu may wakeup a semi idle cpu package - The task being woken up is allowed to run in the nominated cpu (cpu affinity and restrictions) Basically if both the current cpu and the previous cpu on which the task ran is idle, select the nominated cpu from semi idle cpu package for running the new task that is waking up. Cache hotness is considered since the actual biasing happens in wake_idle() only if the application is cache cold. This technique will effectively move short running bursty jobs in a mostly idle system. Wakeup biasing for power savings gets automatically disabled if system utilisation increases due to the fact that the probability of finding both this_cpu and prev_cpu idle decreases. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: nominate preferred wakeup cpuVaidyanathan Srinivasan2008-12-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: extend load-balancing code (no change in behavior yet) When the system utilisation is low and more cpus are idle, then the process waking up from sleep should prefer to wakeup an idle cpu from semi-idle cpu package (multi core package) rather than a completely idle cpu package which would waste power. Use the sched_mc balance logic in find_busiest_group() to nominate a preferred wakeup cpu. This info can be stored in appropriate sched_domain, but updating this info in all copies of sched_domain is not practical. Hence this information is stored in root_domain struct which is one copy per partitioned sched domain. The root_domain can be accessed from each cpu's runqueue and there is one copy per partitioned sched domain. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>