summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* kvm: avoid page allocation failure in kvm_set_memory_region()Igor Mammedov2015-03-231-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: call irq notifiers with directed EOIRadim Krčmář2015-03-232-3/+4
| | | | | | | | | | | | | kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled. We need to do that for irq notifiers. (Like with edge interrupts.) Fix it by skipping EOI broadcast only. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211 Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: nVMX: mask unrestricted_guest if disabled on L0Radim Krčmář2015-03-171-2/+5
| | | | | | | | | | | | | | | | | | If EPT was enabled, unrestricted_guest was allowed in L1 regardless of L0. L1 triple faulted when running L2 guest that required emulation. Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)' in L0's dmesg: WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] () Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when the host doesn't have it enabled. Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0") Cc: stable@vger.kernel.org Tested-By: Kashyap Chamarthy <kchamart@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2015-03-1713-73/+66
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes from all around the place: - a KASLR related revert where we ran out of time to get a fix - this represents a substantial portion of the diffstat, - two FPU fixes, - two x86 platform fixes: an ACPI reduced-hw fix and a NumaChip fix, - an entry code fix, - and a VDSO build fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86/mm/ASLR: Propagate base load address calculation" x86/fpu: Drop_fpu() should not assume that tsk equals current x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() x86/apic/numachip: Fix sibling map with NumaChip x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced mode x86/asm/entry/32: Fix user_mode() misuses x86/vdso: Fix the build on GCC5
| * Revert "x86/mm/ASLR: Propagate base load address calculation"Borislav Petkov2015-03-167-61/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation") The main reason for the revert is that the new boot flag does not work at all currently, and in order to make this work, we need non-trivial changes to the x86 boot code which we didn't manage to get done in time for merging. And even if we did, they would've been too risky so instead of rushing things and break booting 4.1 on boxes left and right, we will be very strict and conservative and will take our time with this to fix and test it properly. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/fpu: Drop_fpu() should not assume that tsk equals currentOleg Nesterov2015-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drop_fpu() does clear_used_math() and usually this is correct because tsk == current. However switch_fpu_finish()->restore_fpu_checking() is called before __switch_to() updates the "current_task" variable. If it fails, we will wrongly clear the PF_USED_MATH flag of the previous task. So use clear_stopped_child_used_math() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/fpu: Avoid math_state_restore() without used_math() in ↵Oleg Nesterov2015-03-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __restore_xstate_sig() math_state_restore() assumes it is called with irqs disabled, but this is not true if the caller is __restore_xstate_sig(). This means that if ia32_fxstate == T and __copy_from_user() fails, __restore_xstate_sig() returns with irqs disabled too. This triggers: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 dump_stack ___might_sleep ? _raw_spin_unlock_irqrestore __might_sleep down_read ? _raw_spin_unlock_irqrestore print_vma_addr signal_fault sys32_rt_sigreturn Change __restore_xstate_sig() to call set_used_math() unconditionally. This avoids enabling and disabling interrupts in math_state_restore(). If copy_from_user() fails, we can simply do fpu_finit() by hand. [ Note: this is only the first step. math_state_restore() should not check used_math(), it should set this flag. While init_fpu() should simply die. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/apic/numachip: Fix sibling map with NumaChipDaniel J Blueman2015-03-121-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On NumaChip systems, the physical processor ID assignment wasn't accounting for the number of nodes in AMD multi-module processors, giving an incorrect sibling map: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ff000000 core_siblings_list:24-31 physical_package_id:3 thread_siblings:00000000,30000000 thread_siblings_list:28-29 This fixes it: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ffff0000 core_siblings_list:16-31 physical_package_id:1 thread_siblings:00000000,30000000 thread_siblings_list:28-29 Signed-off-by: Daniel J Blueman <daniel@numascale.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426135950-10110-1-git-send-email-daniel@numascale.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced modeLi, Aubrey2015-03-121-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a platform in ACPI Hardware-reduced mode, the legacy PIC and PIT may not be initialized even though they may be present in silicon. Touching these legacy components causes unexpected results on the system. On the Bay Trail-T(ASUS-T100) platform, touching these legacy components blocks platform hardware low idle power state(S0ix) during system suspend. So we should bypass them in ACPI hardware reduced mode. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Li Aubrey <aubrey.li@linux.intel.com> Cc: <alan@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/54FFF81C.20703@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/asm/entry/32: Fix user_mode() misusesAndy Lutomirski2015-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The one in do_debug() is probably harmless, but better safe than sorry. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d67deaa9df5458363623001f252d1aee3215d014.1425948056.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/vdso: Fix the build on GCC5Jiri Slaby2015-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On gcc5 the kernel does not link: ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670. Because prior GCC versions always emitted NOPs on ALIGN directives, but gcc5 started omitting them. .LSTARTFDEDLSI1 says: /* HACK: The dwarf2 unwind routines will subtract 1 from the return address to get an address in the middle of the presumed call instruction. Since we didn't get here via a call, we need to include the nop before the real start to make up for it. */ .long .LSTART_sigreturn-1-. /* PC-relative start address */ But commit 69d0627a7f6e ("x86 vDSO: reorder vdso32 code") from 2.6.25 replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before __kernel_sigreturn. Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN". So fix this by adding to that point at least a single NOP and make the function ALIGN possibly with more NOPs then. Kudos for reporting and diagnosing should go to Richard. Reported-by: Richard Biener <rguenther@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
| |
| \
*-. \ Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of ↵Linus Torvalds2015-03-174-7/+9
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf and timer fixes from Ingo Molnar: "Two small perf fixes: - kernel side context leak fix - tooling crash fix And two clocksource driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix context leak in put_event() perf annotate: Fix fallback to unparsed disassembler line * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: sun5i: Fix setup_irq init sequence clocksource: efm32: Fix a NULL pointer dereference
| | * \ Merge branch 'clockevents/4.0-rc2' of ↵Ingo Molnar2015-03-052-6/+6
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://git.linaro.org/people/daniel.lezcano/linux into timers/urgent Pull clockevents fixes from Daniel Lezcano: " These two patches fix a potential crash at boot time. - Fix setup_irq / clockevents_config_and_register init ordering in order to prevent to have an interrupt to be fired before the handler is set for sun5i and efm32. (Yongbae Park)" Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | | * | clockevents: sun5i: Fix setup_irq init sequenceYongbae Park2015-03-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interrupt is enabled before the handler is set. Even this bug did not appear, it is potentially dangerous as it can lead to a NULL pointer dereference. Fix the error by enabling the interrupt after clockevents_config_and_register() is called. Cc: stable@vger.kernel.org Signed-off-by: Yongbae Park <yongbae2@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| | | * | clocksource: efm32: Fix a NULL pointer dereferenceYongbae Park2015-03-051-2/+2
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initialisation of the efm32 clocksource first sets up the irq and only after that initialises the data needed for irq handling. In case this initialisation is delayed the irq handler would dereference a NULL pointer. I'm not aware of anything that could delay the process in such a way, but it's better to be safe than sorry, so setup the irq only when the clock event device is ready. Cc: stable@vger.kernel.org Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Yongbae Park <yongbae2@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
| * | | perf: Fix context leak in put_event()Leon Yu2015-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: a83fe28e2e45 ("perf: Fix put_event() ctx lock") changed the locking logic in put_event() by replacing mutex_lock_nested() with perf_event_ctx_lock_nested(), but didn't fix the subsequent mutex_unlock() with a correct counterpart, perf_event_ctx_unlock(). Contexts are thus leaked as a result of incremented refcount in perf_event_ctx_lock_nested(). Signed-off-by: Leon Yu <chianglungyu@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: a83fe28e2e45 ("perf: Fix put_event() ctx lock") Link: http://lkml.kernel.org/r/1424954613-5034-1-git-send-email-chianglungyu@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2015-03-051-0/+2
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fix from Arnaldo Carvalho de Melo: " - Fix SEGFAULT when freeing unparsed lock prefixed instructions in 'perf annotate' (Arnaldo Carvalho de Melo)" Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | perf annotate: Fix fallback to unparsed disassembler lineArnaldo Carvalho de Melo2015-03-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When annotating source/disasm lines the perf tools parse the output of objdump, trying to provide augmented output that allows navigating jumps, calls, etc. But when a line output by objdump can't be parsed the annotation code falls back to just presenting the unparsed line. When fixing a leak in the 0fb9f2aab738 commit ("perf annotate: Fix memory leaks in LOCK handling") we failed to take that into account and instead tried to free one of the data structures that should be freed only when successfully allocated, oops, segfault. There was a change in the way the objdump output for lock prefixed instructions is formatted that lead the relevant parser to fail to grok it. At least RHEL7 works ok, but Fedora 20 segfaults. Fix it by making the ins__delete() destructor work like the most basic destructor: free(). Namely make it accept a NULL pointer and when handling it just do nothing. Further investigation is needed to figure out the nature of the objdump output change so as to make the parser grok it. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rabin Vincent <rabin@rab.in> Link: http://lkml.kernel.org/n/tip-7wsy0zo292pif0yjoqpfryrz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | | Merge tag 'regulator-fix-v4.0-rc4' of ↵Linus Torvalds2015-03-172-17/+18
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "The two main fixes here from Javier and Doug both fix issues seen on the Exynos-based ARM Chromebooks with reference counting of GPIO regulators over system suspend. The GPIO enable code didn't properly take account of this case (a full analysis is in Doug's commit log). This is fixed by both fixing the reference counting directly and by making the resume code skip enables it doesn't need to do. We could skip the change in the resume code but it's a very simple change and adds extra robustness against problems in other drivers" * tag 'regulator-fix-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: tps65910: Add missing #include <linux/of.h> regulator: core: Fix enable GPIO reference counting regulator: Only enable disabled regulators on resume
| | \ \ \ \
| | \ \ \ \
| *-. \ \ \ \ Merge remote-tracking branches 'regulator/fix/gpio-enable' and ↵Mark Brown2015-03-162-17/+18
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | 'regulator/fix/tps65910' into regulator-linus
| | | * | | | | regulator: tps65910: Add missing #include <linux/of.h>Geert Uytterhoeven2015-03-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/regulator/tps65910-regulator.c: In function ‘tps65910_parse_dt_reg_data’: drivers/regulator/tps65910-regulator.c:1018: error: implicit declaration of function ‘of_get_child_by_name’ drivers/regulator/tps65910-regulator.c:1018: warning: assignment makes pointer from integer without a cast drivers/regulator/tps65910-regulator.c:1034: error: implicit declaration of function ‘of_node_put’ drivers/regulator/tps65910-regulator.c:1056: error: implicit declaration of function ‘of_property_read_u32’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Mark Brown <broonie@kernel.org>
| | * | | | | | regulator: core: Fix enable GPIO reference countingDoug Anderson2015-03-081-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally _regulator_do_enable() isn't called on an already-enabled rdev. That's because the main caller, _regulator_enable() always calls _regulator_is_enabled() and only calls _regulator_do_enable() if the rdev was not already enabled. However, there is one caller of _regulator_do_enable() that doesn't check: regulator_suspend_finish(). While we might want to make regulator_suspend_finish() behave more like _regulator_enable(), it's probably also a good idea to make _regulator_do_enable() robust if it is called on an already enabled rdev. At the moment, _regulator_do_enable() is _not_ robust for already enabled rdevs if we're using an ena_pin. Each time _regulator_do_enable() is called for an rdev using an ena_pin the reference count of the ena_pin is incremented even if the rdev was already enabled. This is not as intended because the ena_pin is for something else: for keeping track of how many active rdevs there are sharing the same ena_pin. Here's how the reference counting works here: * Each time _regulator_enable() is called we increment rdev->use_count, so _regulator_enable() calls need to be balanced with _regulator_disable() calls. * There is no explicit reference counting in _regulator_do_enable() which is normally just a warapper around rdev->desc->ops->enable() with code for supporting delays. It's not expected that the "ops->enable()" call do reference counting. * Since regulator_ena_gpio_ctrl() does have reference counting (handling the sharing of the pin amongst multiple rdevs), we shouldn't call it if the current rdev is already enabled. Note that as part of this we cleanup (remove) the initting of ena_gpio_state in regulator_register(). In _regulator_do_enable(), _regulator_do_disable() and _regulator_is_enabled() is is clear that ena_gpio_state should be the state of whether this particular rdev has requested the GPIO be enabled. regulator_register() was initting it as the actual state of the pin. Fixes: 967cfb18c0e3 ("regulator: core: manage enable GPIO list") Signed-off-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
| | * | | | | | regulator: Only enable disabled regulators on resumeJavier Martinez Canillas2015-03-081-3/+5
| | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The _regulator_do_enable() call ought to be a no-op when called on an already-enabled regulator. However, as an optimization _regulator_enable() doesn't call _regulator_do_enable() on an already enabled regulator. That means we never test the case of calling _regulator_do_enable() during normal usage and there may be hidden bugs or warnings. We have seen warnings issued by the tps65090 driver and bugs when using the GPIO enable pin. Let's match the same optimization that _regulator_enable() in regulator_suspend_finish(). That may speed up suspend/resume and also avoids exposing hidden bugs. [Use much clearer commit message from Doug Anderson] Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
* | | | | | | Merge tag 'regmap-v4.0-rc4' of ↵Linus Torvalds2015-03-173-4/+7
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "A few things here: - a change from Lars to fix insertion of cache values at the start of rather than end of a rbtree block. This hadn't been noticed before since almost everything lists registers in ascending order. - a fix from Takashi for spurious warnings during cache sync with read once registers, a problem which can be very noticeable on devices that it affects. - a fix from Valentin for a tighening of the oneshot IRQ request interface which would have broken affected devices" * tag 'regmap-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: regcache-rbtree: Fix present bitmap resize regmap: Skip read-only registers in regcache_sync() regmap-irq: set IRQF_ONESHOT flag to ensure IRQ request
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| *---. \ \ \ \ \ \ Merge remote-tracking branches 'regmap/fix/irq', 'regmap/fix/rbtree' and ↵Mark Brown2015-03-073-4/+7
| |\ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | 'regmap/fix/sync' into regmap-linus
| | | | * | | | | | regmap: Skip read-only registers in regcache_sync()Takashi Iwai2015-03-041-2/+4
| | | | | |/ / / / | | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | regcache_sync() spews warnings when a value was cached for a read-only register as it tries to write all registers no matter whether they are writable or not. This patch adds regmap_wrtieable() checks for avoiding it in regcache_sync_block_single() and regcache_block_raw(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mark Brown <broonie@kernel.org>
| | | * / | | | | regmap: regcache-rbtree: Fix present bitmap resizeLars-Peter Clausen2015-03-071-1/+1
| | | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When inserting a new register into a block at the lower end the present bitmap is currently shifted into the wrong direction. The effect of this is that the bitmap becomes corrupted and registers which are present might be reported as not present and vice versa. Fix this by shifting left rather than right. Fixes: 472fdec7380c("regmap: rbtree: Reduce number of nodes, take 2") Reported-by: Daniel Baluta <daniel.baluta@gmail.com> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
| | * | | | | | regmap-irq: set IRQF_ONESHOT flag to ensure IRQ requestValentin Rothberg2015-02-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 1c6c69525b40eb76de8adf039409722015927dc3 ("genirq: Reject bogus threaded irq requests") threaded IRQs without a primary handler need to be requested with IRQF_ONESHOT, otherwise the request will fail. The %irq_flags flag is used to request the threaded IRQ and is also a parameter of the caller. Hence, we cannot be sure that IRQF_ONESHOT is set. This change avoids the potentially missing flag by setting IRQF_ONESHOT when requesting the threaded IRQ. Generated by: scripts/coccinelle/misc/irqf_oneshot.cocci Signed-off-by: Valentin Rothberg <Valentin.Rothberg@lip6.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
* | | | | | | | Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds2015-03-177-23/+168
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio fixes from Rusty Russell: "Not entirely surprising: the ongoing QEMU work on virtio 1.0 has revealed more minor issues with our virtio 1.0 drivers just introduced in the kernel. (I would normally use my fixes branch for this, but there were a batch of them...)" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_mmio: fix access width for mmio uapi/virtio_scsi: allow overriding CDB/SENSE size virtio_mmio: generation support virtio_rpmsg: set DRIVER_OK before using device 9p/trans_virtio: fix hot-unplug virtio-balloon: do not call blocking ops when !TASK_RUNNING virtio_blk: fix comment for virtio 1.0 virtio_blk: typo fix virtio_balloon: set DRIVER_OK before using device virtio_console: avoid config access from irq virtio_console: init work unconditionally
| * | | | | | | | virtio_mmio: fix access width for mmioMichael S. Tsirkin2015-03-171-8/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Going over the virtio mmio code, I noticed that it doesn't correctly access modern device config values using "natural" accessors: it uses readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states: 4.2.2.2 Driver Requirements: MMIO Device Register Layout ... The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers described in table 4.1. For the device-specific configuration space, the driver MUST use 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and aligned accesses for 32 and 64 bit wide fields. Borrow code from virtio_pci_modern to do this correctly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | uapi/virtio_scsi: allow overriding CDB/SENSE sizeMichael S. Tsirkin2015-03-131-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU wants to use virtio scsi structures with a different VIRTIO_SCSI_CDB_SIZE/VIRTIO_SCSI_SENSE_SIZE, let's add ifdefs to allow overriding them. Keep the old defines under new names: VIRTIO_SCSI_CDB_DEFAULT_SIZE/VIRTIO_SCSI_SENSE_DEFAULT_SIZE, since that's what these values really are: defaults for cdb/sense size fields. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | virtio_mmio: generation supportMichael S. Tsirkin2015-03-131-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio_mmio currently lacks generation support which makes multi-byte field access racy. Fix by getting the value at offset 0xfc for version 2 devices. Nothing we can do for version 1, so return generation id 0. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | virtio_rpmsg: set DRIVER_OK before using deviceMichael S. Tsirkin2015-03-131-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio spec requires that all drivers set DRIVER_OK before using devices. While rpmsg isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio rpmsg violates this rule: is calls kick before setting DRIVER_OK. The fix isn't trivial since simply calling virtio_device_ready earlier would mean we might get an interrupt in parallel with adding buffers. Instead, split kick out to prepare+notify calls. prepare before virtio_device_ready - when we know we won't get interrupts. notify right afterwards. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | 9p/trans_virtio: fix hot-unplugMichael S. Tsirkin2015-03-131-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On device hot-unplug, 9p/virtio currently will kfree channel while it might still be in use. Of course, it might stay used forever, so it's an extremely ugly hack, but it seems better than use-after-free that we have now. [ Unused variable removed, whitespace cleanup, msg single-lined --RR ] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | virtio-balloon: do not call blocking ops when !TASK_RUNNINGMichael S. Tsirkin2015-03-101-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio balloon has this code: wait_event_interruptible(vb->config_change, (diff = towards_target(vb)) != 0 || vb->need_stats_update || kthread_should_stop() || freezing(current)); Which is a problem because towards_target() call might block after wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing the task_struct::state collision typical of nesting of sleeping primitives See also http://lwn.net/Articles/628628/ or Thomas's bug report http://article.gmane.org/gmane.linux.kernel.virtualization/24846 for a fuller explanation. To fix, rewrite using wait_woken. Cc: stable@vger.kernel.org Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | virtio_blk: fix comment for virtio 1.0Michael S. Tsirkin2015-03-101-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up comment to match virtio 1.0 logic: virtio_blk_outhdr isn't the first elements anymore, the only requirement is that it comes first in the s/g list. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | virtio_blk: typo fixMichael S. Tsirkin2015-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that QEmu reuses linux virtio headers, we noticed a typo in the exported virtio block header. Fix it up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | virtio_balloon: set DRIVER_OK before using deviceMichael S. Tsirkin2015-03-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio spec requires that all drivers set DRIVER_OK before using devices. While balloon isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio balloon might violate this rule: probe calls kthread_run before setting DRIVER_OK, which might run immediately and cause balloon to inflate/deflate. To fix, call virtio_device_ready before running the kthread. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
| * | | | | | | | virtio_console: avoid config access from irqMichael S. Tsirkin2015-03-051-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when multiport is off, virtio console invokes config access from irq context, config access is blocking on s390. Fix this up by scheduling work from config irq - similar to what we do for multiport configs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
| * | | | | | | | virtio_console: init work unconditionallyMichael S. Tsirkin2015-03-051-1/+2
| | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when multiport is off, we don't initialize config work, but we then cancel uninitialized control_work on freeze. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
* | | | | | | | Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-03-1713-81/+114
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Marcelo Tosatti: "KVM bug fixes (ARM and x86)" * git://git.kernel.org/pub/scm/virt/kvm/kvm: arm/arm64: KVM: Keep elrsr/aisr in sync with software model KVM: VMX: Set msr bitmap correctly if vcpu is in guest mode arm/arm64: KVM: fix missing unlock on error in kvm_vgic_create() kvm: x86: i8259: return initialized data on invalid-size read arm64: KVM: Fix outdated comment about VTCR_EL2.PS arm64: KVM: Do not use pgd_index to index stage-2 pgd arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting kvm: move advertising of KVM_CAP_IRQFD to common code
| * \ \ \ \ \ \ \ Merge tag 'kvm-arm-fixes-4.0-rc5' of ↵Marcelo Tosatti2015-03-168-75/+105
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest.
| | * | | | | | | | arm/arm64: KVM: Keep elrsr/aisr in sync with software modelChristoffer Dall2015-03-144-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | | arm/arm64: KVM: fix missing unlock on error in kvm_vgic_create()Wei Yongjun2015-03-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing unlock before return from function kvm_vgic_create() in the error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | | arm64: KVM: Fix outdated comment about VTCR_EL2.PSMarc Zyngier2015-03-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 87366d8cf7b3 ("arm64: Add boot time configuration of Intermediate Physical Address size") removed the hardcoded setting of VTCR_EL2.PS to use ID_AA64MMFR0_EL1.PARange instead, but didn't remove the (now rather misleading) comment. Fix the comments to match reality (at least for the next few minutes). Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | | arm64: KVM: Do not use pgd_index to index stage-2 pgdMarc Zyngier2015-03-113-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel's pgd_index macro is designed to index a normal, page sized array. KVM is a bit diffferent, as we can use concatenated pages to have a bigger address space (for example 40bit IPA with 4kB pages gives us an 8kB PGD. In the above case, the use of pgd_index will always return an index inside the first 4kB, which makes a guest that has memory above 0x8000000000 rather unhappy, as it spins forever in a page fault, whist the host happilly corrupts the lower pgd. The obvious fix is to get our own kvm_pgd_index that does the right thing(tm). Tested on X-Gene with a hacked kvmtool that put memory at a stupidly high address. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | | arm64: KVM: Fix stage-2 PGD allocation to have per-page refcountingMarc Zyngier2015-03-113-66/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're using __get_free_pages with to allocate the guest's stage-2 PGD. The standard behaviour of this function is to return a set of pages where only the head page has a valid refcount. This behaviour gets us into trouble when we're trying to increment the refcount on a non-head page: page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) BUG: failure at include/linux/mm.h:548/get_page()! Kernel panic - not syncing: BUG! CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 Hardware name: APM X-Gene Mustang board (DT) Call trace: [<ffff80000008a09c>] dump_backtrace+0x0/0x13c [<ffff80000008a1e8>] show_stack+0x10/0x1c [<ffff800000691da8>] dump_stack+0x74/0x94 [<ffff800000690d78>] panic+0x100/0x240 [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 [<ffff8000000a420c>] handle_exit+0x58/0x180 [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 CPU0: stopping A possible approach for this is to split the compound page using split_page() at allocation time, and change the teardown path to free one page at a time. It turns out that alloc_pages_exact() and free_pages_exact() does exactly that. While we're at it, the PGD allocation code is reworked to reduce duplication. This has been tested on an X-Gene platform with a 4kB/48bit-VA host kernel, and kvmtool hacked to place memory in the second page of the hardware PGD (PUD for the host kernel). Also regression-tested on a Cubietruck (Cortex-A7). [ Reworked to use alloc_pages_exact() and free_pages_exact() and to return pointers directly instead of by reference as arguments - Christoffer ] Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | | | | | | | | KVM: VMX: Set msr bitmap correctly if vcpu is in guest modeWincy Van2015-03-131-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap"), we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This is not enough since the field will be modified by following vmx_set_efer. Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is in guest mode. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | | | | | | kvm: x86: i8259: return initialized data on invalid-size readPetr Matousek2015-03-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If data is read from PIC with invalid access size, the return data stays uninitialized even though success is returned. Fix this by always initializing the data. Signed-off-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | | | | | | kvm: move advertising of KVM_CAP_IRQFD to common codePaolo Bonzini2015-03-103-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>