summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* MIPS: SMP: Update cpu_foreign_map on CPU disableJames Hogan2016-07-296-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | When a CPU is disabled via CPU hotplug, cpu_foreign_map is not updated. This could result in cache management SMP calls being sent to offline CPUs instead of online siblings in the same core. Add a call to calculate_cpu_foreign_map() in the various MIPS cpu disable callbacks after set_cpu_online(). All cases are updated for consistency and to keep cpu_foreign_map strictly up to date, not just those which may support hardware multithreading. Fixes: cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: David Daney <david.daney@cavium.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: Hongliang Tao <taohl@lemote.com> Cc: Hua Yan <yanh@lemote.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13799/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: SMP: Clear ASID without confusing has_valid_asid()James Hogan2016-07-292-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SMP flush_tlb_*() functions may clear the memory map's ASIDs for other CPUs if the mm has only a single user (the current CPU) in order to avoid SMP calls. However this makes it appear to has_valid_asid(), which is used by various cache flush functions, as if the CPUs have never run in the mm, and therefore can't have cached any of its memory. For flush_tlb_mm() this doesn't sound unreasonable. flush_tlb_range() corresponds to flush_cache_range() which does do full indexed cache flushes, but only on the icache if the specified mapping is executable, otherwise it doesn't guarantee that there are no cache contents left for the mm. flush_tlb_page() corresponds to flush_cache_page(), which will perform address based cache ops on the specified page only, and also only touches the icache if the page is executable. It does not guarantee that there are no cache contents left for the mm. For example, this affects flush_cache_range() which uses the has_valid_asid() optimisation. It is required to flush the icache when mappings are made executable (e.g. using mprotect) so they are immediately usable. If some code is changed to non executable in order to be modified then it will not be flushed from the icache during that time, but the ASID on other CPUs may still be cleared for TLB flushing. When the code is changed back to executable, flush_cache_range() will assume the code hasn't run on those other CPUs due to the zero ASID, and won't invalidate the icache on them. This is fixed by clearing the other CPUs ASIDs to 1 instead of 0 for the above two flush_tlb_*() functions when the corresponding cache flushes are likely to be incomplete (non executable range flush, or any page flush). This ASID appears valid to has_valid_asid(), but still triggers ASID regeneration due to the upper ASID version bits being 0, which is less than the minimum ASID version of 1 and so always treated as stale. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13795/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFOJames Hogan2016-07-282-0/+3
| | | | | | | | | | | | | | | | | | | | | AT_VECTOR_SIZE_ARCH should be defined with the maximum number of NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined for MIPS at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for the VDSO address. This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for AT_BASE_PLATFORM which MIPS doesn't use, but lets define it now and add the comment above ARCH_DLINFO as found in several other architectures to remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to date. Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13823/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Octeon: Improve USB reset code for OCTEON II.Steven J. Hill2016-07-281-48/+60
| | | | | | | | | | | | At boot time, do a better job of resetting the USB host controller to make the frequency "eye" diagram more compliant with the USB standard while making the controller more reliable. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13831/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Octeon: Put restrictions on DMA descriptors.Steven J. Hill2016-07-281-1/+5
| | | | | | | | | | | Set the DMA mask such that all descriptors stay in the lower 4GB of memory. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13830/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Octeon: Remove forced mappings of USB interrupts.Steven J. Hill2016-07-282-14/+0
| | | | | | | | | | | Get rid of unnecessary forced interrupt mappings for the USB host controller on OCTEON II. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13824/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspaceDavid Howells2016-07-282-2/+2
| | | | | | | | | | | | | | | | MIPS64 needs to use compat_sys_keyctl for 32-bit userspace rather than calling sys_keyctl. The latter will work in a lot of cases, thereby hiding the issue. Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: keyrings@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13832/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Print segment physical address when EU=1James Hogan2016-07-281-5/+8
| | | | | | | | | | | | | | | | Currently the debugfs interface to print the segment configuration refuses to print the physical address of mapped segments. However if the EU bit is set these become unmapped at error level (when CP0_Status.ERL=1), so the physical address is still relevant. Update the logic to print the physical address of mapped segments when the EU bit is set, while still hiding the Cache Coherency Attribute (since EU overrides that to uncached when ERL=1 too). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13833/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: tlbex: Avoid duplicated single_insn_swpdJames Hogan2016-07-241-1/+1
| | | | | | | | | | | | | The expression "uasm_in_compat_space_p(swpd) && !uasm_rel_lo(swpd)" is used twice in build_get_pgd_vmalloc64(), one of which is assigned to the local variable single_insn_swpd. Update the other use to just use single_insn_swpd instead to remove the duplication. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: David Daney <ddaney@caviumnetworks.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13779/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: uasm: Handle low values in uasm_in_compat_space_p()James Hogan2016-07-241-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | uasm_in_compat_space_p() determines whether the given value is in the 32-bit compatibility part of the 64-bit address space, i.e. is in 32-bit sign-extended form, however it only handles the top half of the value space (corresponding to the kernel compatibility segments in the upper half of the address space). Since values < 2^31 (corresponding to the low 2GiB of the address space) can also be handled using 32-bit instructions (e.g. a LUI and ADDIU) rather than convoluted 64-bit immediate generation, rewrite it with a cast to check whether the address matches its 32-bit sign extended form. This allows UASM_i_LA to be used to generate arbitrary 32-bit immediates more efficiently on 64-bit CPUs, i.e. more like the li (load immediate) pseudo-instruction. For example this code to load the immediate (ST0_EXL | KSU_USER | ST0_BEV | ST0_KX) into k0 with UASM_i_LA(): lui k0,0x0 dsll k0,k0,0x10 daddiu k0,k0,64 dsll k0,k0,0x10 daddiu k0,k0,146 Changes to this more efficient version: lui k0,0x40 addiu k0,k0,146 Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13778/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Add default configuration for ath25Sergey Ryazanov2016-07-241-0/+119
| | | | | | | Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Cc: Linux MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/13700/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Pistachio: Remove plat_setup_iocoherencyZubair Lutfullah Kakakhel2016-07-242-26/+1
| | | | | | | | | | | | | | | | The Pistachio SoC does not have an IOCU. Hence, DMA is non-coherent. Remove the function checking for iocoherency and select CONFIG_DMA_NONCOHERENT in Kconfig This code is probably accidentally inherited from Malta. Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com> Reviewed-by: James Hartley <james.hartley@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13433/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Delete use of ARCH_WANT_OPTIONAL_GPIOLIBLinus Walleij2016-07-241-1/+0
| | | | | | | | | | | The Loongson1 added a new instance of ARCH_WANT_OPTIONAL_GPIOLIB which is no longer required to have GPIOLIB available in Kconfig. Delete it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13543/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Lantiq: Use the real EXIN countJohn Crispin2016-07-241-3/+3
| | | | | | | | | | | | | We runtime load the available external interrupts into an array and store the number inside exin_avail. Some of the code however uses MAX_EIU for looping over the array which may partially be 0. This is a cosmetic fix as the existing code works as is. It is just nicer to only loop over the array elements that were actually populated during probe. Signed-off-by: John Crispin <john@phrozen.org> Cc: Linux-MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/13602/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Lantiq: Fix eiu interrupt loading codeJohn Crispin2016-07-241-9/+10
| | | | | | | | | | | | Using of_irq_count to load the irq index from the devicetree is incorrect. This will cause the kernel to map them regardless, even if they dont actually get used. Change the code to use of_property_count_u32_elems() instead which is the correct API to use in this case. Signed-off-by: John Crispin <john@phrozen.org> Cc: Linux-MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/13601/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Move CPU Hotplug config option into submenuMatt Redfearn2016-07-241-10/+10
| | | | | | | | | | | The KConfig option HOTPLUG_CPU should appear in the "Kernel Type" submenu. Relocate it to where SMP support is configured. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13751/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: smp-cps: Add support for CPU hotplug of MIPSr6 processorsMatt Redfearn2016-07-241-5/+27
| | | | | | | | | | | | | | | | | Introduce support for hotplug of Virtual Processors in MIPSr6 systems. The method is simpler than the VPE parallel from the now-deprecated MT ASE, it can now simply write the VP_STOP register with the mask of VPs to halt, and use the VP_RUNNING register to determine when the VP has halted. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reviewed-by: Paul Burton <paul.burton@imgtec.com> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13752/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: smp-cps: Allow booting of CPU other than VP0 within a coreMatt Redfearn2016-07-241-4/+5
| | | | | | | | | | | | | | | | | | | | | The boot_core function was hardcoded to always start VP0 when starting a core via the CPC. When hotplugging a CPU this may not be the desired behaviour. Make boot_core receive the VP ID to start running on the core, such that alternate VPs can be started via CPU hotplug. Also ensure that all other VPs within the core are stopped before bringing the core out of reset so that only the desired VP starts. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reviewed-by: Paul Burton <paul.burton@imgtec.com> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13750/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Octeon: cavium_octeon_defconfig: Enable OCTEON SATAMatt Redfearn2016-07-211-0/+2
| | | | | | | | | | | | | Commit a2127e400edd ("libata: support AHCI on OCTEON platform") added a driver for the OCTEON AHCI controller. Enable this driver in the OCTEON defconfig. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13816/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Octeon: Changes to support readq()/writeq() usage.Steven J. Hill2016-07-113-31/+32
| | | | | | | | | | | | | | Update OCTEON port mangling code to support readq() and writeq() functions to allow driver code to be more portable. Updates also for word and long function pairs. We also remove SWAP_IO_SPACE for OCTEON platforms as the function macros are redundant with the new mangling code. Signed-off-by: Steven J. Hill <steven.hill@cavium.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13780/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Fix page table corruption on THP permission changes.David Daney2016-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the core THP code is modifying the permissions of a huge page it calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit of the page table entry. The result can be kernel messages like: mm/memory.c:397: bad pmd 000000040080004d. mm/memory.c:397: bad pmd 00000003ff00004d. mm/memory.c:397: bad pmd 000000040100004d. or: ------------[ cut here ]------------ WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158() Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80 CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4 Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0 0000000000000000 0000000000000000 ffffffff85110000 0000000000000119 0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f 0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000 0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80 ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001 0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924 80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780 80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f 0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000 ... Call Trace: [<ffffffff80865c4c>] show_stack+0x6c/0xf8 [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8 [<ffffffff809207a0>] exit_mmap+0x150/0x158 [<ffffffff80882d44>] mmput+0x5c/0x110 [<ffffffff8088b450>] do_exit+0x230/0xa68 [<ffffffff8088be34>] do_group_exit+0x54/0x1d0 [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18 ---[ end trace c7b38293191c57dc ]--- BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536 Fix by not clearing _PAGE_HUGE bit. Signed-off-by: David Daney <david.daney@cavium.com> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: stable@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13687/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* MIPS: Remove cpu_has_safe_index_cacheopsRalf Baechle2016-07-061-9/+3
| | | | | | | | | | | Very early versions of the 1004K had an hardware issue that made index cache ops unsafe so they had to be avoided and hit ops be used instead. This may significantly slow down cache maintenance operations. Only very early FPGA versions of the 1004K were affected so let's get rid of the workaround which was only implemented for the DMA cache maintenance operations anyway. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds2016-07-021-4/+6
|\ | | | | | | | | | | | | | | | | Pull MIPS fix from Ralf Baechle: "Only a single fix for 4.7 pending at this point. It fixes an issue that may lead to corruption of the cache mode bits in the page table" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Fix possible corruption of cache mode by mprotect.
| * MIPS: Fix possible corruption of cache mode by mprotect.Ralf Baechle2016-07-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following testcase may result in a page table entries with a invalid CCA field being generated: static void *bindstack; static int sysrqfd; static void protect_low(int protect) { mprotect(bindstack, BINDSTACK_SIZE, protect); } static void sigbus_handler(int signal, siginfo_t * info, void *context) { void *addr = info->si_addr; write(sysrqfd, "x", 1); printf("sigbus, fault address %p (should not happen, but might)\n", addr); abort(); } static void run_bind_test(void) { unsigned int *p = bindstack; p[0] = 0xf001f001; write(sysrqfd, "x", 1); /* Set trap on access to p[0] */ protect_low(PROT_NONE); write(sysrqfd, "x", 1); /* Clear trap on access to p[0] */ protect_low(PROT_READ | PROT_WRITE | PROT_EXEC); write(sysrqfd, "x", 1); /* Check the contents of p[0] */ if (p[0] != 0xf001f001) { write(sysrqfd, "x", 1); /* Reached, but shouldn't be */ printf("badness, shouldn't happen but does\n"); abort(); } } int main(void) { struct sigaction sa; sysrqfd = open("/proc/sysrq-trigger", O_WRONLY); if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) { perror("sigprocmask"); return 0; } sa.sa_sigaction = sigbus_handler; sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART; if (sigaction(SIGBUS, &sa, NULL)) { perror("sigaction"); return 0; } bindstack = mmap(NULL, BINDSTACK_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (bindstack == MAP_FAILED) { perror("mmap bindstack"); return 0; } printf("bindstack: %p\n", bindstack); run_bind_test(); printf("done\n"); return 0; } There are multiple ingredients for this: 1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3 on all platforms except SB1 where it's CCA 5. 2) _page_cachable_default must have bits set which are not set _CACHE_CACHABLE_NONCOHERENT. 3) Either the defective version of pte_modify for XPA or the standard version must be in used. However pte_modify for the 36 bit address space support is no affected. In that case additional bits in the final CCA mode may generate an invalid value for the CCA field. On the R10000 system where this was tracked down for example a CCA 7 has been observed, which is Uncached Accelerated. Fixed by: 1) Using the proper CCA mode for PAGE_NONE just like for all the other PAGE_* pte/pmd bits. 2) Fix the two affected variants of pte_modify. Further code inspection also shows the same issue to exist in pmd_modify which would affect huge page systems. Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE and pmd_modify issue found by me. The history of this goes back beyond Linus' git history. Chris Dearman's commit 351336929ccf222ae38ff0cb7a8dd5fd5c6236a0 ("[MIPS] Allow setting of the cache attribute at run time.") missed the opportunity to fix this but it was originally introduced in lmo commit d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.") and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option CONFIG_MIPS_UNCACHED.") Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
* | Merge tag 'powerpc-4.7-5' of ↵Linus Torvalds2016-07-027-19/+65
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan - Initialise pci_io_base as early as possible from Darren Stevens * tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Initialise pci_io_base as early as possible powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 powerpc/eeh: Fix wrong argument passed to eeh_rmv_device() powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
| * | powerpc: Initialise pci_io_base as early as possibleDarren Stevens2016-06-304-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix") turned kernel memory and IO addresses from #defined constants to variables initialised at runtime. On PA6T (pasemi) systems the setup_arch() machine call initialises the onboard PCI-e root-ports, and uses pci_io_base to do this, which is now before its value has been set, resulting in a panic early in boot before console IO is initialised. Move the pci_io_base initialisation to the same place as vmalloc ranges are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the earliest possible place we can initialise it. Fixes: d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Darren Stevens <darren@stevens-zone.net> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Add #ifdef CONFIG_PCI, massage change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0Michael Neuling2016-06-291-17/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have 2 segments that are bolted for the kernel linear mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel stacks. Anything accessed outside of these regions may need to be faulted in. (In practice machines with TM always have 1T segments) If a machine has < 2TB of memory we never fault on the kernel linear mapping as these two segments cover all physical memory. If a machine has > 2TB of memory, there may be structures outside of these two segments that need to be faulted in. This faulting can occur when running as a guest as the hypervisor may remove any SLB that's not bolted. When we treclaim and trecheckpoint we have a window where we need to run with the userspace GPRs. This means that we no longer have a valid stack pointer in r1. For this window we therefore clear MSR RI to indicate that any exceptions taken at this point won't be able to be handled. This means that we can't take segment misses in this RI=0 window. In this RI=0 region, we currently access the thread_struct for the process being context switched to or from. This thread_struct access may cause a segment fault since it's not guaranteed to be covered by the two bolted segment entries described above. We've seen this with a crash when running as a guest with > 2TB of memory on PowerVM: Unrecoverable exception 4100 at c00000000004f138 Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1 task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000 NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20 REGS: c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default) MSR: 8000000100001031 <SF,ME,IR,DR,LE> CR: 24000048 XER: 00000000 CFAR: c00000000004ed18 SOFTE: 0 GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288 GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620 GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0 GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211 GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110 GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050 GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30 GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680 NIP [c00000000004f138] restore_gprs+0xd0/0x16c LR [0000000010003a24] 0x10003a24 Call Trace: [c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable) [c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0 [c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350 [c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0 [c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0 [c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0 [c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130 [c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 Instruction dump: 7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6 38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098 ---[ end trace 602126d0a1dedd54 ]--- This fixes this by copying the required data from the thread_struct to the stack before we clear MSR RI. Then once we clear RI, we only access the stack, guaranteeing there's no segment miss. We also tighten the region over which we set RI=0 on the treclaim() path. This may have a slight performance impact since we're adding an mtmsr instruction. Fixes: 090b9284d725 ("powerpc/tm: Clear MSR RI in non-recoverable TM code") Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()Gavin Shan2016-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug case, @rmv_data instead of its address is the proper argument. Otherwise, the stack frame is corrupted when writing to @rmv_data (actually its address) in eeh_rmv_device(). It results in kernel crash as observed. This fixes the issue by passing @rmv_data, not its address to eeh_rmv_device() in eeh_reset_device(). Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/tm: Always reclaim in start_thread() for exec() class syscallsCyril Bur2016-06-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can quite legitimately perform an exec() syscall with a suspended transaction. exec() does not return to the old process, rather it load a new one and starts that, the expectation therefore is that the new process starts not in a transaction. Currently exec() is not treated any differently to any other syscall which creates problems. Firstly it could allow a new process to start with a suspended transaction for a binary that no longer exists. This means that the checkpointed state won't be valid and if the suspended transaction were ever to be resumed and subsequently aborted (a possibility which is exceedingly likely as exec()ing will likely doom the transaction) the new process will jump to invalid state. Secondly the incorrect attempt to keep the transactional state while still zeroing state for the new process creates at least two TM Bad Things. The first triggers on the rfid to return to userspace as start_thread() has given the new process a 'clean' MSR but the suspend will still be set in the hardware MSR. The second TM Bad Thing triggers in __switch_to() as the processor is still transactionally suspended but __switch_to() wants to zero the TM sprs for the new process. This is an example of the outcome of calling exec() with a suspended transaction. Note the first 700 is likely the first TM bad thing decsribed earlier only the kernel can't report it as we've loaded userspace registers. c000000000009980 is the rfid in fast_exception_return() Bad kernel stack pointer 3fffcfa1a370 at c000000000009980 Oops: Bad kernel stack pointer, sig: 6 [#1] CPU: 0 PID: 2006 Comm: tm-execed Not tainted NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000 REGS: c00000003ffefd40 TRAP: 0700 Not tainted MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000 CFAR: c0000000000098b4 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000009980] fast_exception_return+0xb0/0xb8 LR [0000000000000000] (null) Call Trace: Instruction dump: f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b Kernel BUG at c000000000043e80 [verbose debug info unavailable] Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033) Oops: Unrecoverable exception, sig: 6 [#2] CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000 NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000 REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000 CFAR: c000000000015a20 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000 GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000 GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004 GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000 GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000 GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80 NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c LR [c000000000015a24] __switch_to+0x1f4/0x420 Call Trace: Instruction dump: 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 This fixes CVE-2016-5828. Fixes: bc2a9408fa65 ("powerpc: Hook in new transactional memory code") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-06-307-37/+39
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "ARM and x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode. KVM: LAPIC: cap __delay at lapic_timer_advance_ns KVM: x86: move nsec_to_cycles from x86.c to x86.h pvclock: Get rid of __pvclock_read_cycles in function pvclock_read_flags pvclock: Cleanup to remove function pvclock_get_nsec_offset pvclock: Add CPU barriers to get correct version value KVM: arm/arm64: Stop leaking vcpu pid references arm64: KVM: fix build with CONFIG_ARM_PMU disabled
| * \ \ Merge tag 'kvm-arm-for-v4.7-rc6' of ↵Paolo Bonzini2016-06-301-0/+1
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM Fixes for v4.7-rc6: Fixes a build issue without CONFIG_ARM_PMU and plugs pid leak on arm/arm64.
| | * | | KVM: arm/arm64: Stop leaking vcpu pid referencesJames Morse2016-06-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm provides kvm_vcpu_uninit(), which amongst other things, releases the last reference to the struct pid of the task that was last running the vcpu. On arm64 built with CONFIG_DEBUG_KMEMLEAK, starting a guest with kvmtool, then killing it with SIGKILL results (after some considerable time) in: > cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff80007d5ea080 (size 128): > comm "lkvm", pid 2025, jiffies 4294942645 (age 1107.776s) > hex dump (first 32 bytes): > 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffff8000001b30ec>] create_object+0xfc/0x278 > [<ffff80000071da34>] kmemleak_alloc+0x34/0x70 > [<ffff80000019fa2c>] kmem_cache_alloc+0x16c/0x1d8 > [<ffff8000000d0474>] alloc_pid+0x34/0x4d0 > [<ffff8000000b5674>] copy_process.isra.6+0x79c/0x1338 > [<ffff8000000b633c>] _do_fork+0x74/0x320 > [<ffff8000000b66b0>] SyS_clone+0x18/0x20 > [<ffff800000085cb0>] el0_svc_naked+0x24/0x28 > [<ffffffffffffffff>] 0xffffffffffffffff On x86 kvm_vcpu_uninit() is called on the path from kvm_arch_destroy_vm(), on arm no equivalent call is made. Add the call to kvm_arch_vcpu_free(). Signed-off-by: James Morse <james.morse@arm.com> Fixes: 749cf76c5a36 ("KVM: ARM: Initial skeleton to compile KVM support") Cc: <stable@vger.kernel.org> # 3.10+ Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | | | KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.Quentin Casasnovas2016-06-271-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was getting a GP(0) on a rather unspecial vmread from Xen: (XEN) ----[ Xen-4.7.0-rc x86_64 debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450 (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d1v0) (XEN) rax: ffff82d0801e6288 rbx: ffff83003ffbfb7c rcx: fffffffffffab928 (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff83000bdd0000 (XEN) rbp: ffff83000bdd0000 rsp: ffff83003ffbfab0 r8: ffff830038813910 (XEN) r9: ffff83003faf3958 r10: 0000000a3b9f7640 r11: ffff83003f82d418 (XEN) r12: 0000000000000000 r13: ffff83003ffbffff r14: 0000000000004802 (XEN) r15: 0000000000000008 cr0: 0000000080050033 cr4: 00000000001526e0 (XEN) cr3: 000000003fc79000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450): (XEN) 00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00 (XEN) Xen stack trace from rsp=ffff83003ffbfab0: ... (XEN) Xen call trace: (XEN) [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450 (XEN) [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300 (XEN) [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60 (XEN) [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70 (XEN) [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0 (XEN) [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0 (XEN) [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0 (XEN) [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270 (XEN) [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0 (XEN) [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90 (XEN) [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140 (XEN) [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140 (XEN) [<ffff82d080164aeb>] context_switch+0x13b/0xe40 (XEN) [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570 (XEN) [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90 (XEN) [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=0000] (XEN) **************************************** Tracing my host KVM showed it was the one injecting the GP(0) when emulating the VMREAD and checking the destination segment permissions in get_vmx_mem_address(): 3) | vmx_handle_exit() { 3) | handle_vmread() { 3) | nested_vmx_check_permission() { 3) | vmx_get_segment() { 3) 0.074 us | vmx_read_guest_seg_base(); 3) 0.065 us | vmx_read_guest_seg_selector(); 3) 0.066 us | vmx_read_guest_seg_ar(); 3) 1.636 us | } 3) 0.058 us | vmx_get_rflags(); 3) 0.062 us | vmx_read_guest_seg_ar(); 3) 3.469 us | } 3) | vmx_get_cs_db_l_bits() { 3) 0.058 us | vmx_read_guest_seg_ar(); 3) 0.662 us | } 3) | get_vmx_mem_address() { 3) 0.068 us | vmx_cache_reg(); 3) | vmx_get_segment() { 3) 0.074 us | vmx_read_guest_seg_base(); 3) 0.068 us | vmx_read_guest_seg_selector(); 3) 0.071 us | vmx_read_guest_seg_ar(); 3) 1.756 us | } 3) | kvm_queue_exception_e() { 3) 0.066 us | kvm_multiple_exception(); 3) 0.684 us | } 3) 4.085 us | } 3) 9.833 us | } 3) + 10.366 us | } Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine Control Structure", I found that we're enforcing that the destination operand is NOT located in a read-only data segment or any code segment when the L1 is in long mode - BUT that check should only happen when it is in protected mode. Shuffling the code a bit to make our emulation follow the specification allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests without problems. Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Eugene Korenevsky <ekorenevsky@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: LAPIC: cap __delay at lapic_timer_advance_nsMarcelo Tosatti2016-06-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The host timer which emulates the guest LAPIC TSC deadline timer has its expiration diminished by lapic_timer_advance_ns nanoseconds. Therefore if, at wait_lapic_expire, a difference larger than lapic_timer_advance_ns is encountered, delay at most lapic_timer_advance_ns. This fixes a problem where the guest can cause the host to delay for large amounts of time. Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86: move nsec_to_cycles from x86.c to x86.hMarcelo Tosatti2016-06-272-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the inline function nsec_to_cycles from x86.c to x86.h, as the next patch uses it from lapic.c. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | pvclock: Get rid of __pvclock_read_cycles in function pvclock_read_flagsMinfei Huang2016-06-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a generic function __pvclock_read_cycles to be used to get both flags and cycles. For function pvclock_read_flags, it's useless to get cycles value. To make this function be more effective, get this variable flags directly in function. Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | pvclock: Cleanup to remove function pvclock_get_nsec_offsetMinfei Huang2016-06-271-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function __pvclock_read_cycles is short enough, so there is no need to have another function pvclock_get_nsec_offset to calculate tsc delta. It's better to combine it into function __pvclock_read_cycles. Remove useless variables in function __pvclock_read_cycles. Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | pvclock: Add CPU barriers to get correct version valueMinfei Huang2016-06-272-0/+6
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Protocol for the "version" fields is: hypervisor raises it (making it uneven) before it starts updating the fields and raises it again (making it even) when it is done. Thus the guest can make sure the time values it got are consistent by checking the version before and after reading them. Add CPU barries after getting version value just like what function vread_pvclock does, because all of callees in this function is inline. Fixes: 502dfeff239e8313bfbe906ca0a1a6827ac8481b Cc: stable@vger.kernel.org Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | Merge tag 'arc-4.7-rc6-fixes' of ↵Linus Torvalds2016-06-302-3/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fix from Vineet Gupta: "Reinstate dwarf unwinder/loadable-modules with new gnu tools" * tag 'arc-4.7-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: arc: unwind: warn only once if DW2_UNWIND is disabled ARC: unwind: ensure that .debug_frame is generated (vs. .eh_frame)
| * | | | arc: unwind: warn only once if DW2_UNWIND is disabledAlexey Brodkin2016-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_ARC_DW2_UNWIND is disabled every time arc_unwind_core() gets called following message gets printed in debug console: ----------------->8--------------- CONFIG_ARC_DW2_UNWIND needs to be enabled ----------------->8--------------- That message makes sense if user indeed wants to see a backtrace or get nice function call-graphs in perf but what if user disabled unwinder for the purpose? Why pollute his debug console? So instead we'll warn user about possibly missing feature once and let him decide if that was what he or she really wanted. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: unwind: ensure that .debug_frame is generated (vs. .eh_frame)Vineet Gupta2016-06-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With recent binutils update to support dwarf CFI pseudo-ops in gas, we now get .eh_frame vs. .debug_frame. Although the call frame info is exactly the same in both, the CIE differs, which the current kernel unwinder can't cope with. This broke both the kernel unwinder as well as loadable modules (latter because of a new unhandled relo R_ARC_32_PCREL from .rela.eh_frame in the module loader) The ideal solution would be to switch unwinder to .eh_frame. For now however we can make do by just ensureing .debug_frame is generated by removing -fasynchronous-unwind-tables .eh_frame generated with -gdwarf-2 -fasynchronous-unwind-tables .debug_frame generated with -gdwarf-2 Fixes STAR 9001058196 Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* | | | | s390: fix test_fp_ctl inline assembly contraintsMartin Schwidefsky2016-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test_fp_ctl function is used to test if a given value is a valid floating-point control. The inline assembly in test_fp_ctl uses an incorrect constraint for the 'orig_fpc' variable. If the compiler chooses the same register for 'fpc' and 'orig_fpc' the test_fp_ctl() function always returns true. This allows user space to trigger kernel oopses with invalid floating-point control values on the signal stack. This problem has been introduced with git commit 4725c86055f5bbdcdf "s390: fix save and restore of the floating-point-control register" Cc: stable@vger.kernel.org # v3.13+ Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | | | Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"Michael Holzheu2016-06-281-7/+0
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 852ffd0f4e23248b47531058e531066a988434b5. There are use cases where an intermediate boot kernel (1) uses kexec to boot the final production kernel (2). For this scenario we should provide the original boot information to the production kernel (2). Therefore clearing the boot information during kexec() should not be done. Cc: stable@vger.kernel.org # v3.17+ Reported-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-06-251-0/+12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kprobe fix from Thomas Gleixner: "A single fix clearing the TF bit when a fault is single stepped" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Clear TF bit in fault on single-stepping
| * | | | kprobes/x86: Clear TF bit in fault on single-steppingMasami Hiramatsu2016-06-141-0/+12
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kprobe_fault_handler() to clear the TF (trap flag) bit of the flags register in the case of a fault fixup on single-stepping. If we put a kprobe on the instruction which caused a page fault (e.g. actual mov instructions in copy_user_*), that fault happens on the single-stepping buffer. In this case, kprobes resets running instance so that the CPU can retry execution on the original ip address. However, current code forgets to reset the TF bit. Since this fault happens with TF bit set for enabling single-stepping, when it retries, it causes a debug exception and kprobes can not handle it because it already reset itself. On the most of x86-64 platform, it can be easily reproduced by using kprobe tracer. E.g. # cd /sys/kernel/debug/tracing # echo p copy_user_enhanced_fast_string+5 > kprobe_events # echo 1 > events/kprobes/enable And you'll see a kernel panic on do_debug(), since the debug trap is not handled by kprobes. To fix this problem, we just need to clear the TF bit when resetting running kprobe. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: systemtap@sourceware.org Cc: stable@vger.kernel.org # All the way back to ancient kernels Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox [ Updated the comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge tag 'powerpc-4.7-4' of ↵Linus Torvalds2016-06-2515-51/+137
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "mm/radix (Aneesh Kumar K.V): - Update to tlb functions ric argument - Flush page walk cache when freeing page table - Update Radix tree size as per ISA 3.0 mm/hash (Aneesh Kumar K.V): - Use the correct PPP mask when updating HPTE - Don't add memory coherence if cache inhibited is set eeh (Gavin Shan): - Fix invalid cached PE primary bus bpf/jit (Naveen N. Rao): - Disable classic BPF JIT on ppc64le .. and fix faults caused by radix patching of SLB miss handler (Michael Ellerman)" * tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/bpf/jit: Disable classic BPF JIT on ppc64le powerpc: Fix faults caused by radix patching of SLB miss handler powerpc/eeh: Fix invalid cached PE primary bus powerpc/mm/radix: Update Radix tree size as per ISA 3.0 powerpc/mm/hash: Don't add memory coherence if cache inhibited is set powerpc/mm/hash: Use the correct PPP mask when updating HPTE powerpc/mm/radix: Flush page walk cache when freeing page table powerpc/mm/radix: Update to tlb functions ric argument
| * | | powerpc/bpf/jit: Disable classic BPF JIT on ppc64leNaveen N. Rao2016-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Classic BPF JIT was never ported completely to work on little endian powerpc. However, it can be enabled and will crash the system when used. As such, disable use of BPF JIT on ppc64le. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Reported-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | powerpc: Fix faults caused by radix patching of SLB miss handlerMichael Ellerman2016-06-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of the Radix MMU support we added some feature sections in the SLB miss handler. These are intended to catch the case that we incorrectly take an SLB miss when Radix is enabled, and instead of crashing weirdly they bail out to a well defined exit path and trigger an oops. However the way they were written meant the bailout case was enabled by default until we did CPU feature patching. On powermacs the early debug prints in setup_system() can cause an SLB miss, which happens before code patching, and so the SLB miss handler would incorrectly bailout and crash during boot. Fix it by inverting the sense of the feature section, so that the code which is in place at boot is correct for the hash case. Once we determine we are using Radix - which will never happen on a powermac - only then do we patch in the bailout case which unconditionally jumps. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: Denis Kirjanov <kda@linux-powerpc.org> Tested-by: Denis Kirjanov <kda@linux-powerpc.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | powerpc/eeh: Fix invalid cached PE primary busGavin Shan2016-06-171-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PE primary bus cannot be got from its child devices when having full hotplug in error recovery. The PE primary bus is cached, which is done in commit <05ba75f84864> ("powerpc/eeh: Fix stale cached primary bus"). In eeh_reset_device(), the flag (EEH_PE_PRI_BUS) is cleared before the PCI hot remove. eeh_pe_bus_get() then returns NULL as the PE primary bus in pnv_eeh_reset() and it crashes the kernel eventually. This fixes the issue by clearing the flag (EEH_PE_PRI_BUS) before the PCI hot add. With it, the PowerNV EEH reset backend (pnv_eeh_reset()) can get valid PE primary bus through eeh_pe_bus_get(). Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | powerpc/mm/radix: Update Radix tree size as per ISA 3.0Aneesh Kumar K.V2016-06-173-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISA 3.0 updated it to be encoded as Radix tree size = 2^(RTS + 31). We have it encoded as 2^(RTS + 28). Add a helper with the correct encoding and use it instead of opencoding. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>