summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/mm/4k: Limit 4k page size config to 64TB virtual address spaceAneesh Kumar K.V2017-06-082-14/+13
| | | | | | | | | | | Supporting 512TB requires us to do a order 3 allocation for level 1 page table (pgd). This results in page allocation failures with certain workloads. For now limit 4k linux page size config to 64TB. Fixes: f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/numa: Fix percpu allocations to be NUMA awareMichael Ellerman2017-06-061-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we switched to the generic implementation of cpu_to_node(), which uses a percpu variable to hold the NUMA node for each CPU. Unfortunately we neglected to notice that we use cpu_to_node() in the allocation of our percpu areas, leading to a chicken and egg problem. In practice what happens is when we are setting up the percpu areas, cpu_to_node() reports that all CPUs are on node 0, so we allocate all percpu areas on node 0. This is visible in the dmesg output, as all pcpu allocs being in group 0: pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31 pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39 pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47 To fix it we need an early_cpu_to_node() which can run prior to percpu being setup. We already have the numa_cpu_lookup_table we can use, so just plumb it in. With the patch dmesg output shows two groups, 0 and 1: pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31 pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39 pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47 We can also check the data_offset in the paca of various CPUs, with the fix we see: CPU 0: data_offset = 0x0ffe8b0000 CPU 24: data_offset = 0x1ffe5b0000 And we can see from dmesg that CPU 24 has an allocation on node 1: node 0: [mem 0x0000000000000000-0x0000000fffffffff] node 1: [mem 0x0000001000000000-0x0000001fffffffff] Cc: stable@vger.kernel.org # v3.16+ Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64: Reclaim CPU_FTR_SUBCOREMichael Ellerman2017-06-011-2/+1
| | | | | | | | | | | | | | | | | | | | We are running low on CPU feature bits, so we only want to use them when it's really necessary. CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't need it in order to make asm patching work. It can only be set on "Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL. There are no plans to implement it on future CPUs, but if there ever were we could retrofit it then. Although KVM uses subcores, it never looks at the CPU feature, it either looks at the ISA level or the threads_per_subcore value. So drop the CPU feature and do a PVR check instead. Drop the device tree "subcore" feature as we no longer support doing anything with it, and we will drop it from skiboot too. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hashMichael Ellerman2017-05-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on an address. What this means in practice is that it should only return true for addresses in the linear mapping which are backed by a valid PFN. We are failing to properly check that the address is in the linear mapping, because virt_to_pfn() will return a valid looking PFN for more or less any address. That bug is actually caused by __pa(), used in virt_to_pfn(). eg: __pa(0xc000000000010000) = 0x10000 # Good __pa(0xd000000000010000) = 0x10000 # Bad! __pa(0x0000000000010000) = 0x10000 # Bad! This started happening after commit bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give you something bogus back. Until we can verify if that GCC bug is no longer an issue, or come up with another solution, this commit does the minimal fix to make virt_addr_valid() work, by explicitly checking that the address is in the linear mapping region. Fixes: bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Breno Leitao <breno.leitao@gmail.com>
* powerpc/modules: If mprofile-kernel is enabled add it to vermagicMichael Ellerman2017-05-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On powerpc we can build the kernel with two different ABIs for mcount(), which is used by ftrace. Kernels built with one ABI do not know how to load modules built with the other ABI. The new style ABI is called "mprofile-kernel", for want of a better name. Currently if we build a module using the old style ABI, and the kernel with mprofile-kernel, when we load the module we'll oops something like: # insmod autofs4-no-mprofile-kernel.ko ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount ------------[ cut here ]------------ WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0 CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 #11 ... NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0 LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590 Call Trace: alloc_pages_current+0xc4/0x1d0 (unreliable) ftrace_process_locs+0x4a8/0x590 load_module+0x1c8c/0x28f0 SyS_finit_module+0x110/0x140 system_call+0x38/0xfc ... ftrace failed to modify [<d000000002a31024>] 0xd000000002a31024 actual: 35:65:00:48 We can avoid this by including in the vermagic whether the kernel/module was built with mprofile-kernel. Which results in: # insmod autofs4-pg.ko autofs4: version magic '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 SMP mod_unload modversions ' should be '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269-dirty SMP mod_unload modversions mprofile-kernel' insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format Fixes: 8c50b72a3b4f ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Jessica Yu <jeyu@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge tag 'powerpc-4.12-2' of ↵Linus Torvalds2017-05-127-6/+42
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "The change to the Linux page table geometry was delayed for more testing with 16G pages, and there's the new CPU features stuff which just needed one more polish before going in. Plus a few changes from Scott which came in a bit late. And then various fixes, mostly minor. Summary highlights: - rework the Linux page table geometry to lower memory usage on 64-bit Book3S (IBM chips) using the Hash MMU. - support for a new device tree binding for discovering CPU features on future firmwares. - Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression on 64e, a fix for a kernel hang on 64e when using a debugger inside a relocated kernel, a qman fix, and misc qe improvements." Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong, Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp" * tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Support new device tree binding for discovering CPU features powerpc: Don't print cpu_spec->cpu_name if it's NULL of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle powerpc/64s: Fix unnecessary machine check handler relocation branch powerpc/mm/book3s/64: Rework page table geometry for lower memory usage powerpc: Fix distclean with Makefile.postlink powerpc/64e: Don't place the stack beyond TASK_SIZE powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery powerpc/8xx: Adding support of IRQ in MPC8xx GPIO soc/fsl/qbman: Disable IRQs for deferred QBMan work soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions soc/fsl/qe: only apply QE_General4 workaround on affected SoCs soc/fsl/qe: round brg_freq to 1kHz granularity soc/fsl/qe: get rid of immrbar_virt_to_phys() net: ethernet: ucc_geth: fix MEM_PART_MURAM mode powerpc/64e: Fix hang when debugging programs with relocated kernel
| * powerpc/64s: Support new device tree binding for discovering CPU featuresNicholas Piggin2017-05-094-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ibm,powerpc-cpu-features device tree binding describes CPU features with ASCII names and extensible compatibility, privilege, and enablement metadata that allows improved flexibility and compatibility with new hardware. The interface is described in detail in ibm,powerpc-cpu-features.txt in this patch. Currently this code is not enabled by default, and there are no released firmwares that provide the binding. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * Merge branch 'next' of ↵Michael Ellerman2017-05-092-0/+7
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression on 64e, a fix for a kernel hang on 64e when using a debugger inside a relocated kernel, a qman fix, and misc qe improvements."
| | * powerpc/64e: Don't place the stack beyond TASK_SIZEScott Wood2017-05-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") increased the task size on book3s, and introduced a mechanism to dynamically control whether a task uses these larger addresses. While the change to the task size itself was ifdef-protected to only apply on book3s, the change to STACK_TOP_USER64 was not. On book3e, this had the effect of trying to use addresses up to 128TiB for the stack despite a 64TiB task size limit -- which broke 64-bit userspace producing the following errors: Starting init: /sbin/init exists but couldn't execute it (error -14) Starting init: /bin/sh exists but couldn't execute it (error -14) Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Scott Wood <oss@buserror.net>
| | * powerpc/8xx: Adding support of IRQ in MPC8xx GPIOChristophe Leroy2017-05-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows the use of IRQ to notify the change of GPIO status on MPC8xx CPM IO ports. This then allows to associate IRQs to GPIOs in the Device Tree. Ex: CPM1_PIO_C: gpio-controller@960 { #gpio-cells = <2>; compatible = "fsl,cpm1-pario-bank-c"; reg = <0x960 0x10>; fsl,cpm1-gpio-irq-mask = <0x0fff>; interrupts = <1 2 6 9 10 11 14 15 23 24 26 31>; interrupt-parent = <&CPM_PIC>; gpio-controller; }; The property 'fsl,cpm1-gpio-irq-mask' defines which of the 16 GPIOs have the associated interrupts defined in the 'interrupts' property. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| * | powerpc/mm/book3s/64: Rework page table geometry for lower memory usageMichael Ellerman2017-05-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently in commit f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB") we increased the virtual address space for user processes to 128TB by default, and up to 512TB if user space opts in. This obviously required expanding the range of the Linux page tables. For Book3s 64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries. This meant we could cover the full address range, while still being able to insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD. The downside of that geometry is that it uses a lot of memory for the PGD, and in particular makes the PGD a 4-page allocation, which means it's much more likely to fail under memory pressure. Instead we can make the PMD larger, so that a single PUD entry maps 16G, allowing the 16G hugepages to sit at that level in the tree. We're then able to split the remaining bits between the PUG and PGD. We make the PGD slightly larger as that results in lower memory usage for typical programs. When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14 bytes, which is large but still < PAGE_SIZE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
* | | Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2017-05-094-6/+107
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
| * \ \ Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-nextPaul Mackerras2017-04-2812-60/+536
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt2017-04-274-6/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | | | Merge tag 'pci-v4.12-changes' of ↵Linus Torvalds2017-05-082-5/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...
| * \ \ \ \ Merge branch 'pci/resource-mmap' into nextBjorn Helgaas2017-04-281-5/+4
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pci/resource-mmap: ia64: Use generic pci_mmap_resource_range() ia64: Remove redundant checks for WC in pci_mmap_page_range() ia64: Remove redundant valid_mmap_phys_addr_range() from pci_mmap_page_range() PCI: Add I/O BAR support to generic pci_mmap_resource_range() x86/PCI: Use generic pci_mmap_resource_range() unicore32/PCI: Use generic pci_mmap_resource_range() sh/PCI: Use generic pci_mmap_resource_range() parisc: Use generic pci_mmap_resource_range() mn10300/PCI: Use generic pci_mmap_resource_range() MIPS: PCI: Use generic pci_mmap_resource_range() cris/PCI: Use generic pci_mmap_resource_range() ARM/PCI: Use generic pci_mmap_resource_range() PCI: Add pci_mmap_resource_range() and use it for ARM64 PCI: Add BAR index argument to pci_mmap_page_range() PCI: Use BAR index in sysfs attr->private instead of resource pointer PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h> PCI: Add arch_can_pci_mmap_wc() macro xtensa/PCI: Do not mmap PCI BARs to userspace as write-through PCI: Only allow WC mmap on prefetchable resources PCI: Fix another sanity check bug in /proc/pci mmap PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
| | * | | | | PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O spaceDavid Woodhouse2017-04-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is relatively esoteric, and knowing that we don't have it makes life easier in some cases rather than just an eventual -EINVAL from pci_mmap_page_range(). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| | * | | | | PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h>David Woodhouse2017-04-181-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can declare it <linux/pci.h> even on platforms where it isn't going to be defined. There's no need to have it littered through the various <asm/pci.h> files. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| | * | | | | PCI: Add arch_can_pci_mmap_wc() macroDavid Woodhouse2017-04-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the almost-identical versions of pci_mmap_page_range() silently ignore the 'write_combine' argument and give uncached mappings. Yet we allow the PCIIOC_WRITE_COMBINE ioctl in /proc/bus/pci, expose the 'resourceX_wc' file in sysfs, and allow an attempted mapping to apparently succeed. To fix this, introduce a macro arch_can_pci_mmap_wc() which indicates whether the platform can do a write-combining mapping. On x86 this ends up being pat_enabled(), while the few other platforms that support it can just set it to a literal '1'. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | | | | | powerpc/powernv: Override pcibios_default_alignment() to force PCI devices ↵Yongji Xie2017-04-191-0/+2
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to be page aligned Override pcibios_default_alignment() to set default alignment to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page BARs would not share a page and could be mapped into guest when VFIO passthrough them. Signed-off-by: Yongji Xie <elohimes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2017-05-081-0/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
| * | | | | | powerpc/fadump: remove dependency with CONFIG_KEXECHari Bathini2017-05-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that crashkernel parameter parsing and vmcoreinfo related code is moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid of definitions of fadump_append_elf_note() & fadump_final_note() functions to reuse similar functions compiled under CONFIG_CRASH_CORE. Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-05-085-11/+129
|\ \ \ \ \ \ \ | |/ / / / / / |/| | / / / / | | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...
| * | | | | KVM: PPC: VFIO: Add in-kernel acceleration for VFIOAlexey Kardashevskiy2017-04-202-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | KVM: PPC: iommu: Unify TCE checkingAlexey Kardashevskiy2017-04-202-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reworks helpers for checking TCE update parameters in way they can be used in KVM. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | KVM: PPC: Pass kvm* to kvmppc_find_table()Alexey Kardashevskiy2017-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm* there. This will be used in the following patches where we will be attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than to VCPU). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2017-04-202-2/+14
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This merges in the commits in the topic/ppc-kvm branch of the powerpc tree to get the changes to arch/powerpc which subsequent patches will rely on. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | KVM: PPC: Book3S PR: Preserve storage control bitsAlexey Kardashevskiy2017-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PR KVM page fault handler performs eaddr to pte translation for a guest, however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits (storage control) in the kvmppc_pte struct. If PR KVM is running as a second level guest under HV KVM, and PR KVM tries inserting HPT entry, this fails in HV KVM if it already has this mapping. This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and kvmppc_mmu_map_page(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | KVM: PPC: Add MMIO emulation for remaining floating-point instructionsPaul Mackerras2017-04-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For completeness, this adds emulation of the lfiwax and lfiwzx instructions. With this, all floating-point load and store instructions as of Power ISA V2.07 are emulated. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | KVM: PPC: Emulation for more integer loads and storesPaul Mackerras2017-04-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds emulation for the following integer loads and stores, thus enabling them to be used in a guest for accessing emulated MMIO locations. - lhaux - lwaux - lwzux - ldu - lwa - stdux - stwux - stdu - ldbrx - stdbrx Previously, most of these would cause an emulation failure exit to userspace, though ldu and lwa got treated incorrectly as ld, and stdu got treated incorrectly as std. This also tidies up some of the formatting and updates the comment listing instructions that still need to be implemented. With this, all integer loads and stores that are defined in the Power ISA v2.07 are emulated, except for those that are permitted to trap when used on cache-inhibited or write-through mappings (and which do in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx, lq/stq, and l[bhwdq]arx/st[bhwdq]cx. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed)Alexey Kardashevskiy2017-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds missing stdx emulation for emulated MMIO accesses by KVM guests. This allows the Mellanox mlx5_core driver from recent kernels to work when MMIO emulation is enforced by userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructionsBin Lu2017-04-204-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides the MMIO load/store emulation for instructions of 'double & vector unsigned char & vector signed char & vector unsigned short & vector signed short & vector unsigned int & vector signed int & vector double '. The instructions that this adds emulation for are: - ldx, ldux, lwax, - lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux, - stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx, - lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx, - stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x [paulus@ozlabs.org - some cleanups, fixes and rework, make it compile for Book E, fix build when PR KVM is built in] Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interruptsPaul Mackerras2017-04-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides functions that can be used for generating interrupts indicating that a given functional unit (floating point, vector, or VSX) is unavailable. These functions will be used in instruction emulation code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET publicPaolo Bonzini2017-04-071-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | | | | | | Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds2017-05-0548-456/+981
|\ \ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
| * | | | | | powerpc/mm/hash: Fix off-by-one in comment about kernel contexts idsMichael Ellerman2017-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michal Suchánek noticed a comment in book3s/64/mmu-hash.h about the context ids we use for the kernel was inconsistent with the code and other comments in the same file. It should read 1-4 not 1-5. While we're touching it, update "address" to "addresses" which makes more sense as it's referring to more than one address below. Reported-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc: Add struct smp_ops_t.cause_nmi_ipi operationNicholas Piggin2017-04-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have the NMI IPI code use this op when the platform defines it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc: Add NMI IPI infrastructureNicholas Piggin2017-04-281-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a simple NMI IPI system that handles concurrency and reentrancy. The platform does not have to implement a true non-maskable interrupt, the default is to simply use the debugger break IPI message. This has now been co-opted for a general IPI message, and users (debugger and crash) have been reimplemented on top of the NMI system. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Incorporate incremental fixes from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc/64s: Dedicated system reset interrupt stackNicholas Piggin2017-04-282-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The system reset interrupt is used for crash/debug situations, so it is desirable to have as little impact on the normal state of the system as possible. Currently it uses the current kernel stack to process the exception. This stores into the stack which may be involved with the crash. The stack pointer may be corrupted, or it may have overflowed. Avoid or minimise these problems by creating a dedicated NMI stack for the system reset interrupt to use. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc/64s: Disallow system reset vs system reset reentrancyNicholas Piggin2017-04-282-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for using a dedicated stack for system reset interrupts, prevent a nested system reset from recovering, in order to simplify code that is called in crash/debug path. This allows a system reset interrupt to just use the base stack pointer. Keep an in_nmi nesting counter similarly to the in_mce counter. Consider the interrrupt non-recoverable if it is taken inside another system reset. Interrupt nesting could be allowed similarly to MCE, but system reset is a special case that's not for normal operation, so simplicity wins until there is requirement for nested system reset interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc/64s: Fix system reset vs general interrupt reentrancyNicholas Piggin2017-04-282-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The system reset interrupt can occur when MSR_EE=0, and it currently uses the PACA_EXGEN save area. Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0 when the save area is still in use. A system reset interrupt in this window can lead to undetected corruption when the save area gets overwritten. This patch introduces PACA_EXNMI save area for system reset exceptions, which closes this corruption window. It's also helpful to retain the EXGEN state for debugging situations, even if not considering the recoverability aspect. This patch also moves the PACA_EXMC area down to a less frequently used part of the paca with the new save area. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc/64s: Exception macro for stack frame and initial register saveNicholas Piggin2017-04-281-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code is common to a few exceptions, and another user will be added. This causes a trivial change to generated code: - 604: std r9,416(r1) - 608: mfspr r11,314 - 60c: std r11,368(r1) - 610: mfspr r12,315 + 604: mfspr r11,314 + 608: mfspr r12,315 + 60c: std r9,416(r1) + 610: std r11,368(r1) machine_check_powernv_early could also use this, but that requires non trivial changes to generated code, so that's for another patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | powerpc/64s: Add exception macro that does not enable RINicholas Piggin2017-04-281-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subsequent patches will add more non-RI variant exceptions, so create a macro for it rather than open-code it. This does not change generated instructions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | | Merge branch 'topic/ppc-kvm' into nextMichael Ellerman2017-04-282-2/+14
| |\ \ \ \ \ \ | | | |/ / / / | | |/| | | / | | |_|_|_|/ | |/| | | | Merge the topic branch we were sharing with kvm-ppc, Paul has also merged it.
| | * | | | powerpc/vfio_spapr_tce: Add reference counting to iommu_tableAlexey Kardashevskiy2017-03-301-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far iommu_table obejcts were only used in virtual mode and had a single owner. We are going to change this by implementing in-kernel acceleration of DMA mapping requests. The proposed acceleration will handle requests in real mode and KVM will keep references to tables. This adds a kref to iommu_table and defines new helpers to update it. This replaces iommu_free_table() with iommu_tce_table_put() and makes iommu_free_table() static. iommu_tce_table_get() is not used in this patch but it will be in the following patch. Since this touches prototypes, this also removes @node_name parameter as it has never been really useful on powernv and carrying it for the pseries platform code to iommu_free_table() seems to be quite useless as well. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | | | powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()Alexey Kardashevskiy2017-03-301-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In real mode, TCE tables are invalidated using special cache-inhibited store instructions which are not available in virtual mode This defines and implements exchange_rm() callback. This does not define set_rm/clear_rm/flush_rm callbacks as there is no user for those - exchange/exchange_rm are only to be used by KVM for VFIO. The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. This replaces list_for_each_entry_rcu with its lockless version as from now on pnv_pci_ioda2_tce_invalidate() can be called in the real mode too. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | | | powerpc/mmu: Add real mode support for IOMMU preregistered memoryAlexey Kardashevskiy2017-03-271-0/+4
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes mm_iommu_lookup() able to work in realmode by replacing list_for_each_entry_rcu() (which can do debug stuff which can fail in real mode) with list_for_each_entry_lockless(). This adds realmode version of mm_iommu_ua_to_hpa() which adds explicit vmalloc'd-to-linear address conversion. Unlike mm_iommu_ua_to_hpa(), mm_iommu_ua_to_hpa_rm() can fail. This changes mm_iommu_preregistered() to receive @mm as in real mode @current does not always have a correct pointer. This adds realmode version of mm_iommu_lookup() which receives @mm (for the same reason as for mm_iommu_preregistered()) and uses lockless version of list_for_each_entry_rcu(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/mm: Fix missing page attributes in page table dumpChristophe Leroy2017-04-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some targets, _PAGE_RW is 0 and this is _PAGE_RO which is used. There is also _PAGE_SHARED that is missing. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/mm: Ensure IRQs are off in switch_mm()David Gibson2017-04-251-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | powerpc expects IRQs to already be (soft) disabled when switch_mm() is called, as made clear in the commit message of 9c1e105238c4 ("powerpc: Allow perf_counters to access user memory at interrupt time"). Aside from any race conditions that might exist between switch_mm() and an IRQ, there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't followed at some point by an IRQ enable then interrupts will remain disabled until we return to userspace. It is true that when switch_mm() is called from the scheduler IRQs are off, but not when it's called by use_mm(). Looking closer we see that last year in commit f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") this was made more explicit by the addition of switch_mm_irqs_off() which is now called by the scheduler, vs switch_mm() which is used by use_mm(). Arguably it is a bug in use_mm() to call switch_mm() in a different context than it expects, but fixing that will take time. This was discovered recently when vhost started throwing warnings such as: BUG: sleeping function called from invalid context at kernel/mutex.c:578 in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760 no locks held by vhost-10760/10768. irq event stamp: 10 hardirqs last enabled at (9): _raw_spin_unlock_irq+0x40/0x80 hardirqs last disabled at (10): switch_slb+0x2e4/0x490 softirqs last enabled at (0): copy_process+0x5e8/0x1260 softirqs last disabled at (0): (null) Call Trace: show_stack+0x88/0x390 (unreliable) dump_stack+0x30/0x44 __might_sleep+0x1c4/0x2d0 mutex_lock_nested+0x74/0x5c0 cgroup_attach_task_all+0x5c/0x180 vhost_attach_cgroups_work+0x58/0x80 [vhost] vhost_worker+0x24c/0x3d0 [vhost] kthread+0xec/0x100 ret_from_kernel_thread+0x5c/0xd4 Prior to commit 04b96e5528ca ("vhost: lockless enqueuing") (Aug 2016) the vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(), which had the effect of reenabling IRQs. Since that commit removed the locking in vhost_worker() the body of the vhost_worker() loop now runs with interrupts off causing the warnings. This patch addresses the problem by making the powerpc code mirror the x86 code, ie. we disable interrupts in switch_mm(), and optimise the scheduler case by defining switch_mm_irqs_off(). Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Flesh out/rewrite change log, add stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | Merge branch 'topic/kprobes' into nextMichael Ellerman2017-04-252-53/+51
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although most of these kprobes patches are powerpc specific, there's a couple that touch generic code (with Acks). At the moment there's one conflict with acme's tree, but it's not too bad. Still just in case some other conflicts show up, we've put these in a topic branch so another tree could merge some or all of it if necessary.