summaryrefslogtreecommitdiffstats
path: root/virt
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | KVM: use after free in kvm_ioctl_create_device()Dan Carpenter2016-12-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should move the ops->destroy(dev) after the list_del(&dev->vm_node) so that we don't use "dev" after freeing it. Fixes: a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | | Merge tag 'kvm-arm-for-4.9-rc7' of ↵Radim Krčmář2016-12-012-4/+8
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc7 - Do not call kvm_notify_acked for PPIs
| | * | | KVM: arm/arm64: vgic: Don't notify EOI for non-SPIsMarc Zyngier2016-11-242-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we inject a level triggerered interrupt (and unless it is backed by the physical distributor - timer style), we request a maintenance interrupt. Part of the processing for that interrupt is to feed to the rest of KVM (and to the eventfd subsystem) the information that the interrupt has been EOIed. But that notification only makes sense for SPIs, and not PPIs (such as the PMU interrupt). Skip over the notification if the interrupt is not an SPI. Cc: stable@vger.kernel.org # 4.7+ Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | | | | kvm: Introduce kvm_write_guest_offset_cached()Pan Xinhui2016-11-221-6/+14
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It allows us to update some status or field of a structure partially. We can also save a kvm_read_guest_cached() call if we just update one fild of the struct regardless of its current value. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | KVM: async_pf: avoid recursive flushing of work itemsPaolo Bonzini2016-11-191-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was reported by syzkaller: [ INFO: possible recursive locking detected ] 4.9.0-rc4+ #49 Not tainted --------------------------------------------- kworker/2:1/5658 is trying to acquire lock: ([ 1644.769018] (&work->work) [< inline >] list_empty include/linux/compiler.h:243 [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511 but task is already holding lock: ([ 1644.769018] (&work->work) [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093 stack backtrace: CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events async_pf_execute ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480 0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27 ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0 Call Trace: ... [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846 [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916 [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951 [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126 [< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841 [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946 [< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731 [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752 [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111 [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096 [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230 [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209 [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 The reason is that kvm_put_kvm is causing the destruction of the VM, but the page fault is still on the ->queue list. The ->queue list is owned by the VCPU, not by the work items, so we cannot just add list_del to the work item. Instead, use work->vcpu to note async page faults that have been resolved and will be processed through the done list. There is no need to flush those. Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | | | Merge tag 'kvm-arm-for-4.9-rc6' of ↵Radim Krčmář2016-11-191-3/+5
|\| | | | |/ / |/| | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc6 - Fix handling of the 32bit cycle counter - Fix cycle counter filtering
| * | KVM: arm64: Fix the issues when guest PMCCFILTR is configuredWei Huang2016-11-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured. But this function can't deals with PMCCFILTR correctly because the evtCount bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this function shouldn't return immediately; instead it needs to check further if select_idx is ARMV8_PMU_CYCLE_IDX. Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER blindly to attr.config. Instead it ought to convert the request to the "cpu cycle" event type (i.e. 0x11). To support this patch and to prevent duplicated definitions, a limited set of ARMv8 perf event types were relocated from perf_event.c to asm/perf_event.h. Cc: stable@vger.kernel.org # 4.6+ Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | | Merge tag 'kvm-arm-for-v4.9-rc4' of ↵Paolo Bonzini2016-11-113-21/+46
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for v4.9-rc4 - Kick the vcpu when a pending interrupt becomes pending again - Prevent access to invalid interrupt registers - Invalid TLBs when two vcpus from the same VM share a CPU
| * | KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQsShih-Wei Li2016-11-041-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cases like IPI, we could be queueing an interrupt for a VCPU that is already running and is not about to exit, because the VCPU has entered the VM with the interrupt pending and would not trap on EOI'ing that interrupt. This could result to delays in interrupt deliveries or even loss of interrupts. To guarantee prompt interrupt injection, here we have to try to kick the VCPU. Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: vgic: Prevent access to invalid SPIsAndre Przywara2016-11-042-21/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In our VGIC implementation we limit the number of SPIs to a number that the userland application told us. Accordingly we limit the allocation of memory for virtual IRQs to that number. However in our MMIO dispatcher we didn't check if we ever access an IRQ beyond that limit, leading to out-of-bound accesses. Add a test against the number of allocated SPIs in check_region(). Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which is not implemented on ARM(32). [maz: cleaned-up original patch] Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-11-042-3/+25
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP
| * | KVM: fix OOPS on flush_workPaolo Bonzini2016-10-262-3/+25
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion done by commit 3706feacd007 ("KVM: Remove deprecated create_singlethread_workqueue") is broken. It flushes a single work item &irqfd->shutdown instead of all of them, and even worse if there is no irqfd on the list then you get a NULL pointer dereference. Revert the virt/kvm/eventfd.c part of that patch; to avoid the deprecated function, just allocate our own workqueue---it does not even have to be unbound---with alloc_workqueue. Fixes: 3706feacd007 Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* / mm: unexport __get_user_pages()Lorenzo Stoakes2016-10-241-6/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch unexports the low-level __get_user_pages() function. Recent refactoring of the get_user_pages* functions allow flags to be passed through get_user_pages() which eliminates the need for access to this function from its one user, kvm. We can see that the two calls to get_user_pages() which replace __get_user_pages() in kvm_main.c are equivalent by examining their call stacks: get_user_page_nowait(): get_user_pages(start, 1, flags, page, NULL) __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, start, 1, flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL) check_user_page_hwpoison(): get_user_pages(addr, 1, flags, NULL, NULL) __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL, NULL, NULL) Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove write/force parameters from __get_user_pages_unlocked()Lorenzo Stoakes2016-10-182-4/+10
| | | | | | | | | | | | | This removes the redundant 'write' and 'force' parameters from __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'kvm-arm-for-v4.9' of ↵Radim Krčmář2016-09-2914-147/+694
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs
| * KVM: arm/arm64: vgic: Don't flush/sync without a working vgicChristoffer Dall2016-09-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the vgic hasn't been created and initialized, we shouldn't attempt to look at its data structures or flush/sync anything to the GIC hardware. This fixes an issue reported by Alexander Graf when using a userspace irqchip. Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Cc: stable@vger.kernel.org Reported-by: Alexander Graf <agraf@suse.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: Require in-kernel irqchip for PMU supportChristoffer Dall2016-09-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If userspace creates a PMU for the VCPU, but doesn't create an in-kernel irqchip, then we end up in a nasty path where we try to take an uninitialized spinlock, which can lead to all sorts of breakages. Luckily, QEMU always creates the VGIC before the PMU, so we can establish this as ABI and check for the VGIC in the PMU init stage. This can be relaxed at a later time if we want to support PMU with a userspace irqchip. Cc: stable@vger.kernel.org Cc: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * ARM: KVM: Support vgic-v3Vladimir Murzin2016-09-224-66/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows to build and use vgic-v3 in 32-bit mode. Unfortunately, it can not be split in several steps without extra stubs to keep patches independent and bisectable. For instance, virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre to be already defined. It is how support has been done: * handle SGI requests from the guest * report configured SRE on access to GICv3 cpu interface from the guest * required vgic-v3 macros are provided via uapi.h * static keys are used to select GIC backend * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with the static inlines Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm: vgic: Support 64-bit data manipulation on 32-bit host systemsVladimir Murzin2016-09-222-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have couple of 64-bit registers defined in GICv3 architecture, so unsigned long accesses to these registers will only access a single 32-bit part of that regitser. On the other hand these registers can't be accessed as 64-bit with a single instruction like ldrd/strd or ldmia/stmia if we run a 32-bit host because KVM does not support access to MMIO space done by these instructions. It means that a 32-bit guest accesses these registers in 32-bit chunks, so the only thing we need to do is to ensure that extract_bytes() always takes 64-bit data. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm: vgic: Fix compiler warnings when built for 32-bitVladimir Murzin2016-09-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Well, this patch is looking ahead of time, but we'll get following compiler warnings as soon as we introduce vgic-v3 to 32-bit world CC arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer': arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow] value = (mpidr & GENMASK(23, 0)) << 32; ^ In file included from ./include/linux/kernel.h:10:0, from ./include/asm-generic/bug.h:13, from ./arch/arm/include/asm/bug.h:59, from ./include/linux/bug.h:4, from ./include/linux/io.h:23, from ./arch/arm/include/asm/arch_gicv3.h:23, from ./include/linux/irqchip/arm-gic-v3.h:411, from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14: arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi': ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow] #define BIT(nr) (1UL << (nr)) ^ arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT' broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT); ^ Let's fix them now. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: vgic-its: Introduce config option to guard ITS specific codeVladimir Murzin2016-09-223-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was introduced to hide everything specific to vgic-v3 from 32-bit world. We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3 will gone, but we don't have support for ITS there yet and we need to continue keeping ITS away. Introduce the new config option to prevent ITS code being build in 32-bit mode when support for vgic-v3 is done. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Move vgic-v3 save/restore to virt/kvm/arm/hypVladimir Murzin2016-09-221-0/+328
| | | | | | | | | | | | | | | | So we can reuse the code under arch/arm Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Use static keys for selecting the GIC backendVladimir Murzin2016-09-222-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently GIC backend is selected via alternative framework and this is fine. We are going to introduce vgic-v3 to 32-bit world and there we don't have patching framework in hand, so we can either check support for GICv3 every time we need to choose which backend to use or try to optimise it by using static keys. The later looks quite promising because we can share logic involved in selecting GIC backend between architectures if both uses static keys. This patch moves arm64 from alternative to static keys framework for selecting GIC backend. For that we embed static key into vgic_global and enable the key during vgic initialisation based on what has already been exposed by the host GIC driver. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: ARM: cleanup kvm_timer_hyp_initPaolo Bonzini2016-09-081-5/+1
| | | | | | | | | | | | | | | | Remove two unnecessary labels now that kvm_timer_hyp_init is not creating its own workqueue anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Inject a vSerror if detecting a bad GICV access at EL2Marc Zyngier2016-09-081-5/+16
| | | | | | | | | | | | | | | | | | If, when proxying a GICV access at EL2, we detect that the guest is doing something silly, report an EL1 SError instead ofgnoring the access. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: vgic-v2: Enable GICV access from HYP if access from guest is unsafeMarc Zyngier2016-09-081-27/+42
| | | | | | | | | | | | | | | | | | | | | | So far, we've been disabling KVM on systems where the GICV region couldn't be safely given to a guest. Now that we're able to handle this access safely by emulating it in HYP, we can enable this feature when we detect an unsafe configuration. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: vgic-v2: Add GICV access from HYPMarc Zyngier2016-09-081-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have the necessary infrastructure to handle MMIO accesses in HYP, perform the GICV access on behalf of the guest. This requires checking that the access is strictly 32bit, properly aligned, and falls within the expected range. When all condition are satisfied, we perform the access and tell the rest of the HYP code that the instruction has been correctly emulated. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: vgic-v2: Add the GICV emulation infrastructureMarc Zyngier2016-09-082-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to efficiently perform the GICV access on behalf of the guest, we need to be able to avoid going back all the way to the host kernel. For this, we introduce a new hook in the world switch code, conveniently placed just after populating the fault info. At that point, we only have saved/restored the GP registers, and we can quickly perform all the required checks (data abort, translation fault, valid faulting syndrome, not an external abort, not a PTW). Coming back from the emulation code, we need to skip the emulated instruction. This involves an additional bit of save/restore in order to be able to access the guest's PC (and possibly CPSR if this is a 32bit guest). At this stage, no emulation code is provided. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Make kvm_skip_instr32 available to HYPMarc Zyngier2016-09-081-2/+3
| | | | | | | | | | | | | | | | | | As we plan to do some emulation at HYP, let's make kvm_skip_instr32 as part of the hyp_text section. This doesn't preclude the kernel from using it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm: KVM: Use common AArch32 conditional execution codeMarc Zyngier2016-09-081-0/+5
| | | | | | | | | | | | | | | | Add the bit of glue and const-ification that is required to use the code inherited from the arm64 port, and move over to it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Move the AArch32 conditional execution to common codeMarc Zyngier2016-09-081-0/+146
| | | | | | | | | | | | | | | | | | | | | | | | It would make some sense to share the conditional execution code between 32 and 64bit. In order to achieve this, let's move that code to virt/kvm/arm/aarch32.c. While we're at it, drop a superfluous BUG_ON() that wasn't that useful. Following patches will migrate the 32bit port to that code base. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm: vgic: Drop build compatibility hack for older kernel versionsMarc Zyngier2016-09-081-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | As kvm_set_routing_entry() was changing prototype between 4.7 and 4.8, an ugly hack was put in place in order to survive both building in -next and the merge window. Now that everything has been merged, let's dump the compatibility hack for good. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Rename vgic_attr_regs_access to vgic_attr_regs_access_v2Christoffer Dall2016-09-081-15/+11
| | | | | | | | | | | | | | | | | | | | | | Just a rename so we can implement a v3-specific function later. We take the chance to get rid of the V2/V3 ops comments as well. No functional change. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Factor out vgic_attr_regs_access functionalityChristoffer Dall2016-09-081-27/+73
| | | | | | | | | | | | | | | | | | | | | | As we are about to deal with multiple data types and situations where the vgic should not be initialized when doing userspace accesses on the register attributes, factor out the functionality of vgic_attr_regs_access into smaller bits which can be reused by a new function later. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
* | kvm: create per-vcpu dirs in debugfsLuiz Capitulino2016-09-161-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the ability for archs to export per-vcpu information via a new per-vcpu dir in the VM's debugfs directory. If kvm_arch_has_vcpu_debugfs() returns true, then KVM will create a vcpu dir for each vCPU in the VM's debugfs directory. Then kvm_arch_create_vcpu_debugfs() is responsible for populating each vcpu directory with arch specific entries. The per-vcpu path in debugfs will look like: /sys/kernel/debug/kvm/29162-10/vcpu0 /sys/kernel/debug/kvm/29162-10/vcpu1 This is all arch specific for now because the only user of this interface (x86) wants to export x86-specific per-vcpu information to user-space. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | kvm: kvm_destroy_vm_debugfs(): check debugfs_stat_data pointerLuiz Capitulino2016-09-161-3/+5
| | | | | | | | | | | | | | | | | | This make it possible to call kvm_destroy_vm_debugfs() from kvm_create_vm_debugfs() in error conditions. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2016-09-131-2/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: The highlights are: * Reduced latency for interrupts from PCI pass-through devices, from Suresh Warrier and me. * Halt-polling implementation from Suraj Jitindar Singh. * 64-bit VCPU statistics, also from Suraj. * Various other minor fixes and improvements.
| * KVM: Add provisioning for ulong vm stats and u64 vcpu statsSuraj Jitindar Singh2016-09-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar2016-09-073-34/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The workqueue "irqfd_cleanup_wq" queues a single work item &irqfd->shutdown and hence doesn't require ordering. It is a host-wide workqueue for issuing deferred shutdown requests aggregated from all vm* instances. It is not being used on a memory reclaim path. Hence, it has been converted to use system_wq. The work item has been flushed in kvm_irqfd_release(). The workqueue "wqueue" queues a single work item &timer->expired and hence doesn't require ordering. Also, it is not being used on a memory reclaim path. Hence, it has been converted to use system_wq. System workqueues have been able to handle high level of concurrency for a long time now and hence it's not required to have a singlethreaded workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue created with create_singlethread_workqueue(), system_wq allows multiple work items to overlap executions even on the same CPU; however, a per-cpu workqueue doesn't have any CPU locality or global ordering guarantee unless the target CPU is explicitly specified and thus the increase of local concurrency shouldn't make any difference. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'kvm-arm-for-v4.8-rc3' of ↵Paolo Bonzini2016-08-186-60/+159
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Fixes for v4.8-rc3 This tag contains the following fixes on top of v4.8-rc1: - ITS init issues - ITS error handling issues - ITS IRQ leakage fix - Plug a couple of ITS race conditions - An erratum workaround for timers - Some removal of misleading use of errors and comments - A fix for GICv3 on 32-bit guests
| * KVM: arm/arm64: timer: Workaround misconfigured timer interruptMarc Zyngier2016-08-171-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarily to f005bd7e3b84 ("clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered"), make sure we can survive an interrupt that has been misconfigured as edge-triggered by forcing it to be level-triggered (active low is assumed, but the GIC doesn't really care whether this is high or low). Hopefully, the amount of shouting in the kernel log will convince the user to do something about their firmware. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: ITS: avoid re-mapping LPIsAndre Przywara2016-08-161-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | When a guest wants to map a device-ID/event-ID combination that is already mapped, we may end up in a situation where an LPI is never "put", thus never being freed. Since the GICv3 spec says that mapping an already mapped LPI is UNPREDICTABLE, lets just bail out early in this situation to avoid any potential leaks. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: check for ITS device on MSI injectionAndre Przywara2016-08-151-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When userspace provides the doorbell address for an MSI to be injected into the guest, we find a KVM device which feels responsible. Lets check that this device is really an emulated ITS before we make real use of the container_of-ed pointer. [ Moved NULL-pointer check to caller of static function - Christoffer ] Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: ITS: move ITS registration into first VCPU runAndre Przywara2016-08-153-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we register an ITS device upon userland issuing the CTLR_INIT ioctl to mark initialization of the ITS as done. This deviates from the initialization sequence of the existing GIC devices and does not play well with the way QEMU handles things. To be more in line with what we are used to, register the ITS(es) just before the first VCPU is about to run, so in the map_resources() call. This involves iterating through the list of KVM devices and map each ITS that we find. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomicChristoffer Dall2016-08-151-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two problems with the current implementation of the MMIO handlers for the propbaser and pendbaser: First, the write to the value itself is not guaranteed to be an atomic 64-bit write so two concurrent writes to the structure field could be intermixed. Second, because we do a read-modify-update operation without any synchronization, if we have two 32-bit accesses to separate parts of the register, we can loose one of them. By using the atomic cmpxchg64 we should cover both issues above. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: vgic-its: Plug race in vgic_put_irqChristoffer Dall2016-08-101-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now the following sequence of events can happen: 1. Thread X calls vgic_put_irq 2. Thread Y calls vgic_add_lpi 3. Thread Y gets lpi_list_lock 4. Thread X drops the ref count to 0 and blocks on lpi_list_lock 5. Thread Y finds the irq via the lpi_list_lock, raises the ref count to 1, and release the lpi_list_lock. 6. Thread X proceeds and frees the irq. Avoid this by holding the spinlock around the kref_put. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: vgic-its: Handle errors from vgic_add_lpiChristoffer Dall2016-08-101-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During low memory conditions, we could be dereferencing a NULL pointer when vgic_add_lpi fails to allocate memory. Consider for example this call sequence: vgic_its_cmd_handle_mapi itte->irq = vgic_add_lpi(kvm, lpi_nr); update_lpi_config(kvm, itte->irq, NULL); ret = kvm_read_guest(kvm, propbase + irq->intid ^^^^ kaboom? Instead, return an error pointer from vgic_add_lpi and check the return value from its single caller. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm64: ITS: return 1 on successful MSI injectionAndre Przywara2016-08-091-19/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the KVM API documentation a successful MSI injection should return a value > 0 on success. Return possible errors in vgic_its_trigger_msi() and report a successful injection back to userland, while also reporting the case where the MSI could not be delivered due to the guest not having the LPI mapped, for instance. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | KVM: Protect device ops->create and list_add with kvm->lockChristoffer Dall2016-08-122-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | KVM: PPC: Move xics_debugfs_init out of createChristoffer Dall2016-08-121-0/+3
|/ | | | | | | | | | | | | | As we are about to hold the kvm->lock during the create operation on KVM devices, we should move the call to xics_debugfs_init into its own function, since holding a mutex over extended amounts of time might not be a good idea. Introduce an init operation on the kvm_device_ops struct which cannot fail and call this, if configured, after the device has been created. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>