summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm/book3s/64/mmu.h
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/mm: Cleanup memory block size probingAneesh Kumar K.V2023-08-181-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | Parse the device tree in early init to find the memory block size to be used by the kernel. Consolidate the memory block size device tree parsing to one helper and use that on both powernv and pseries. We still want to use machine-specific callback because on all machine types other than powernv and pseries we continue to return MIN_MEMORY_BLOCK_SIZE. pseries_memory_block_size used to look for the second memory block (memory@x) to determine the memory_block_size value. This patch changed that to look at all memory blocks and make sure we can map them all correctly using the computed memory block size value. Add workaround to force 256MB memory block size if device driver managed memory such as GPU memory is present. This helps to add GPU memory that is not aligned to 1G. Co-developed-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230801044447.11275-1-aneesh.kumar@linux.ibm.com
* powerpc/64s: Use dec_mm_active_cpus helperNicholas Piggin2023-08-021-1/+1
| | | | | | | | | | | Avoid open-coded atomic_dec on mm->context.active_cpus and use the function made for it. Add CONFIG_DEBUG_VM underflow checking on the counter. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230524060821.148015-3-npiggin@gmail.com
* powerpc/mm: Move get_unmapped_area functions to slice.cChristophe Leroy2022-05-051-6/+0
| | | | | | | | | | | | | | | hugetlb_get_unmapped_area() is now identical to the generic version if only RADIX is enabled, so move it to slice.c and let it fallback on the generic one when HASH MMU is not compiled in. Do the same with arch_get_unmapped_area() and arch_get_unmapped_area_topdown(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b5d9c124e82889e0cb115c150915a0c0d84eb960.1649523076.git.christophe.leroy@csgroup.eu
* powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not setMurilo Opsfelder Araujo2022-03-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not set: arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’: arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’? 811 | if (mmu_linear_psize == MMU_PAGE_4K) | ^~~~~~~~~~~~~~~~ | mmu_virtual_psize arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in Move the declaration of mmu_linear_psize outside of CONFIG_PPC_64S_HASH_MMU ifdef. After the above is fixed, it fails later with the following error: ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe': file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range' Fix that, too, by conditioning add_htab_mem_range() symbol to CONFIG_PPC_64S_HASH_MMU. Fixes: 387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU") Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567 Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com
* powerpc/pseries: Add __init attribute to eligible functionsNick Child2021-12-231-1/+1
| | | | | | | | | | | | Some functions defined in 'arch/powerpc/platforms/pseries' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
* powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMUNicholas Piggin2021-12-091-3/+18
| | | | | | | | | | | | | Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
* powerpc/64s: Always define arch unmapped area callsNicholas Piggin2021-12-091-0/+6
| | | | | | | | | | | To avoid any functional changes to radix paths when building with hash MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the arch get_unmapped_area calls on 64s platforms. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
* powerpc/64s: Get LPID bit width from device treeNicholas Piggin2021-11-301-4/+5
| | | | | | | | | | | | | | | | | | | | | Allow the LPID bit width and partition table size to be set at runtime from the device tree. Move the PID bit width detection into the same place. KVM does not support using the extra bits yet, this is mainly required to get the PTCR register values correct (so KVM will run but it will not allocate > 4096 LPIDs). OPAL firmware provides this property for POWER10 CPUs since skiboot commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for POWER10 CPUs"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
* powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_defBharata B Rao2021-06-211-0/+1
| | | | | | | | | | | | | Add a field to mmu_psize_def to store the page size encodings of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix AP encodings. This will be used when invalidating with required page size encoding in the hcall. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-3-bharata@linux.ibm.com
* powerpc: remove unneeded semicolonsChengyang Fan2021-02-091-1/+1
| | | | | | | | Remove superfluous semicolons after function definitions. Signed-off-by: Chengyang Fan <cy.fan@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210125095338.1719405-1-cy.fan@huawei.com
* Merge tag 'powerpc-5.11-1' of ↵Linus Torvalds2020-12-171-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu. * tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits) powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug powerpc: Add config fragment for disabling -Werror powerpc/configs: Add ppc64le_allnoconfig target powerpc/powernv: Rate limit opal-elog read failure message powerpc/pseries/memhotplug: Quieten some DLPAR operations powerpc/ps3: use dma_mapping_error() powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 powerpc/perf: Fix Threshold Event Counter Multiplier width for P10 powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp KVM: PPC: fix comparison to bool warning KVM: PPC: Book3S: Assign boolean values to a bool variable powerpc: Inline setup_kup() powerpc/64s: Mark the kuap/kuep functions non __init KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering powerpc/xive: Improve error reporting of OPAL calls powerpc/xive: Simplify xive_do_source_eoi() powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG ...
| * powerpc/book3s64/kuap/kuep: Add PPC_PKEY config on book3s64Aneesh Kumar K.V2020-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | The config CONFIG_PPC_PKEY is used to select the base support that is required for PPC_MEM_KEYS, KUAP, and KUEP. Adding this dependency reduces the code complexity(in terms of #ifdefs) and enables us to move some of the initialization code to pkeys.c Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-4-aneesh.kumar@linux.ibm.com
| * powerpc/vdso: Replace vdso_base by vdsoChristophe Leroy2020-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All other architectures but s390 use a void pointer named 'vdso' to reference the VDSO mapping. In a following patch, the VDSO data page will be put in front of text, vdso_base will then not anymore point to VDSO text. To avoid confusion between vdso_base and VDSO text, rename vdso_base into vdso and make it a void __user *. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8e6cefe474aa4ceba028abb729485cd46c140990.1601197618.git.christophe.leroy@csgroup.eu
* | powerpc/64s: Trim offlined CPUs from mm_cpumasksNicholas Piggin2020-11-271-0/+12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When offlining a CPU, powerpc/64s does not flush TLBs, rather it just leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs to manage its TLBs. However the exit_flush_lazy_tlbs() function expects that after returning, all CPUs (except self) have flushed TLBs for that mm, in which case TLBIEL can be used for this flush. This breaks for offline CPUs because they don't get the IPI to flush their TLB. This can lead to stale translations. Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs before going offline. These offlined CPU bits stuck in the cpumask also prevents the cpumask from being trimmed back to local mode, which means continual broadcast IPIs or TLBIEs are needed for TLB flushing. This patch prevents that situation too. A cast of many were involved in working this out, but in particular Milton, Aneesh, Paul made key discoveries. Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Milton Miller <miltonm@us.ibm.com> Debugged-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-5-npiggin@gmail.com
* powerpc/book3s64/radix: Make radix_mem_block_size 64bitAneesh Kumar K.V2020-10-081-1/+1
| | | | | | | | | | Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic") make sure different variables tracking lmb_size are updated to be 64 bit. Fixes: af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007114836.282468-4-aneesh.kumar@linux.ibm.com
* powerpc/mm/book3s: Split radix and hash MAX_PHYSMEM limitAneesh Kumar K.V2020-09-151-15/+0
| | | | | | | | | | | | | MAX_PHYSMEM #define is used along with sparsemem to determine the SECTION_SHIFT value. Powerpc also uses the same value to limit the max memory enabled on the system. With 4K PAGE_SIZE and hash translation mode, we want to limit the max memory enabled to 64TB due to page table size restrictions. However, with radix translation, we don't have these restrictions. Hence split the radix and hash MA_PHYSMEM limit and use different limit for each of them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200608070904.387440-4-aneesh.kumar@linux.ibm.com
* powerpc/book3s64/radix: Fix boot failure with large amount of guest memoryAneesh Kumar K.V2020-08-281-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the hypervisor doesn't support hugepages, the kernel ends up allocating a large number of page table pages. The early page table allocation was wrongly setting the max memblock limit to ppc64_rma_size with radix translation which resulted in boot failure as shown below. Kernel panic - not syncing: early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000 nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2 Call Trace: [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114 (unreliable) [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418 [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4 [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170 This was because the kernel was checking for the radix feature before we enable the feature via mmu_features. This resulted in the kernel using hash restrictions on radix. Rework the early init code such that the kernel boot with memblock restrictions as imposed by hash. At that point, the kernel still hasn't finalized the translation the kernel will end up using. We have three different ways of detecting radix. 1. dt_cpu_ftrs_scan -> used only in case of PowerNV 2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan 3. CAS -> Where we negotiate with hypervisor about the supported translation. We look at 1 or 2 early in the boot and after that, we look at the CAS vector to finalize the translation the kernel will use. We also support a kernel command line option (disable_radix) to switch to hash. Update the memblock limit after mmu_early_init_devtree() if the kernel is going to use radix translation. This forces some of the memblock allocations we do before mmu_early_init_devtree() to be within the RMA limit. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Reported-by: Shirisha Ganta <shiganta@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200828100852.426575-1-aneesh.kumar@linux.ibm.com
* powerpc/book3s64/pkeys: Add MMU_FTR_PKEYAneesh Kumar K.V2020-07-201-0/+6
| | | | | | | | | | | | Parse storage keys related device tree entry in early_init_devtree and enable MMU feature MMU_FTR_PKEY if pkeys are supported. MMU feature is used instead of CPU feature because this enables us to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-14-aneesh.kumar@linux.ibm.com
* powerpc/mm/radix: Create separate mappings for hot-plugged memoryAneesh Kumar K.V2020-07-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To enable memory unplug without splitting kernel page table mapping, we force the max mapping size to the LMB size. LMB size is the unit in which hypervisor will do memory add/remove operation. Pseries systems supports max LMB size of 256MB. Hence on pseries, we now end up mapping memory with 2M page size instead of 1G. To improve that we want hypervisor to hint the kernel about the hotplug memory range. That was added that as part of commit b6eca183e23e ("powerpc/kernel: Enables memory hot-remove after reboot on pseries guests") But PowerVM doesn't provide that hint yet. Once we get PowerVM updated, we can then force the 2M mapping only to hot-pluggable memory region using memblock_is_hotpluggable(). Till then let's depend on LMB size for finding the mapping page size for linear range. With this change KVM guest will also be doing linear mapping with 2M page size. The actual TLB benefit of mapping guest page table entries with hugepage size can only be materialized if the partition scoped entries are also using the same or higher page size. A guest using 1G hugetlbfs backing guest memory can have a performance impact with the above change. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Fold in fix from Aneesh spotted by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-5-aneesh.kumar@linux.ibm.com
* powerpc/64s: Fix early_init_mmu section mismatchNicholas Piggin2020-05-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christian reports: MODPOST vmlinux.o WARNING: modpost: vmlinux.o(.text.unlikely+0x1a0): Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() The function .early_init_mmu() references the function __init .radix__early_init_mmu(). This is often because .early_init_mmu lacks a __init annotation or the annotation of .radix__early_init_mmu is wrong. WARNING: modpost: vmlinux.o(.text.unlikely+0x1ac): Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.hash__early_init_mmu() The function .early_init_mmu() references the function __init .hash__early_init_mmu(). This is often because .early_init_mmu lacks a __init annotation or the annotation of .hash__early_init_mmu is wrong. The compiler is uninlining early_init_mmu and not putting it in an init section because there is no annotation. Add it. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lore.kernel.org/r/20200429070247.1678172-1-npiggin@gmail.com
* powerpc: Use mm_context vas_windows counter to issue CP_ABORTHaren Myneni2020-04-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_thread_uses_vas() sets used_vas flag for a process that opened VAS window and issue CP_ABORT during context switch for only that process. In multi-thread application, windows can be shared. For example Thread A can open a window and Thread B can run COPY/PASTE instructions to send NX request which may cause corruption or snooping or a covert channel Also once this flag is set, continue to run CP_ABORT even the VAS window is closed. So define vas-windows counter in process mm_context, increment this counter for each window open and decrement it for window close. If vas-windows is set, issue CP_ABORT during context switch. It means clear the foreign real address mapping only if the process / thread uses COPY/PASTE. Then disable it for that process if windows are not open. Moved set_thread_uses_vas() code to vas_tx_win_open() as this functionality is needed only for userspace open windows. We are adding VAS userspace support along with this fix. So no need to include this fix in stable releases. Fixes: 9d2a4d71332c ("powerpc: Define set_thread_uses_vas()") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reported-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Milton Miller <miltonm@us.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017291.2275.1077.camel@hbabu-laptop
* powerpc/64s: remove register_process_table callbackNicholas Piggin2019-09-051-4/+0
| | | | | | | | | | | | This callback is only required because the partition table init comes before process table allocation on powernv (aka bare metal aka native). Change the order to allocate the process table first, and remove the callback. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-2-npiggin@gmail.com
* powerpc/powernv: remove unused NPU DMA codeChristoph Hellwig2019-07-011-2/+0
| | | | | | | | None of these routines were ever used anywhere in the kernel tree since they were added to the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: move pgtable_t in asm/mmu.hChristophe Leroy2019-05-031-8/+0
| | | | | | | | | pgtable_t is now identical for all subarches, move it to the top level asm/mmu.h Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: get rid of mm_ctx_slice_mask_xxx()Christophe Leroy2019-05-031-28/+4
| | | | | | | | | | Now that slice_mask_for_size() is in mmu.h, the mm_ctx_slice_mask_xxx() are not needed anymore, so drop them. Note that the 8xx ones where not used anyway. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: move slice_mask_for_size() into mmu.hChristophe Leroy2019-05-031-0/+17
| | | | | | | | | Move slice_mask_for_size() into subarch mmu.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Retain the BUG_ON()s, rather than converting to VM_BUG_ON()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerc/mm/hash: Reduce hash_mm_context sizeAneesh Kumar K.V2019-04-211-1/+1
| | | | | | | | | | | | | | Allocate subpage protect related variables only if we use the feature. This helps in reducing the hash related mm context struct by around 4K Before the patch sizeof(struct hash_mm_context) = 8288 After the patch sizeof(struct hash_mm_context) = 4160 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Reduce memory usage for mm_context_t for radixAneesh Kumar K.V2019-04-211-37/+12
| | | | | | | | | | | | | | | Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask and subpage protection details. We can skip allocating these with radix translation. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Add helpers for accessing hash translation related variablesAneesh Kumar K.V2019-04-211-1/+62
| | | | | | | | | We want to switch to allocating them runtime only when hash translation is enabled. Add helpers so that both book3s and nohash can be adapted to upcoming change easily. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Remove PPC_MM_SLICES #ifdef for book3s64Aneesh Kumar K.V2019-04-211-4/+0
| | | | | | | Book3s64 always have PPC_MM_SLICES enabled. So remove the unncessary #ifdef Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Fix build error with FLATMEM book3s64 configAneesh Kumar K.V2019-04-211-0/+15
| | | | | | | | | | | | | | | | The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs. We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32 bit configs never expected a value to be set for MAX_PHYSMEM_BITS. Dependent code such as zsmalloc derived the right values based on other fields. Instead of finding a value that works with different configs, use new values only for book3s_64. For 64 bit booke, use the definition of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase sparsemem defaults") That change was done in 2005 and hopefully will work with book3e 64. Fixes: 8bc086899816 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Move pgtable_t into platform headersChristophe Leroy2018-12-041-0/+9
| | | | | | | | | | | This patch move pgtable_t into platform headers. It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64 as nohash/64 doesn't support CONFIG_PPC_64K_PAGES. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm/hash: Rename get_ea_context to get_user_contextAneesh Kumar K.V2018-10-141-2/+2
| | | | | | | | We will be adding get_kernel_context later. Update function name to indicate this handle context allocation user space address. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Implement helpers for pagetable fragment support at PMD levelAneesh Kumar K.V2018-05-151-0/+1
| | | | | Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragmentAneesh Kumar K.V2018-05-151-3/+3
| | | | | | | | | | | 4K config use one full page at level 4 of the pagetable. Add support for single fragment allocation in pagetable fragment code and and use that for 4K config. This makes both 4k and 64k use the same code path. Later we will switch pmd to use the page table fragment code. This is done only for 64bit platforms which is using page table fragment support. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Add support for handling > 512TB address in SLB missAneesh Kumar K.V2018-03-311-1/+32
| | | | | | | | | | | | | | For addresses above 512TB we allocate additional mmu contexts. To make it all easy, addresses above 512TB are handled with IR/DR=1 and with stack frame setup. The mmu_context_t is also updated to track the new extended_ids. To support upto 4PB we need a total 8 contexts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Minor formatting tweaks and comment wording, switch BUG to WARN in get_ea_context().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge branch 'fixes' into nextMichael Ellerman2018-03-281-0/+3
|\ | | | | | | | | | | | | | | | | Merge our fixes branch from the 4.16 cycle. There were a number of important fixes merged, in particular some Power9 workarounds that we want in next for testing purposes. There's also been some conflicting changes in the CPU features code which are best merged and tested before going upstream.
| * powerpc/mm: Add tracking of the number of coprocessors using a contextBenjamin Herrenschmidt2018-03-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when using coprocessors (which use the Nest MMU), we simply increment the active_cpu count to force all TLB invalidations to be come broadcast. Unfortunately, due to an errata in POWER9, we will need to know more specifically that coprocessors are in use. This maintains a separate copros counter in the MMU context for that purpose. NB. The commit mentioned in the fixes tag below is not at fault for the bug we're fixing in this commit and the next, but this fix applies on top the infrastructure it introduced. Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm/slice: implement a slice mask cacheNicholas Piggin2018-03-131-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculating the slice mask can become a signifcant overhead for get_unmapped_area. This patch adds a struct slice_mask for each page size in the mm_context, and keeps these in synch with the slices psize arrays and slb_addr_limit. On Book3S/64 this adds 288 bytes to the mm_context_t for the slice mask caches. On POWER8, this increases vfork+exec+exit performance by 9.9% and reduces time to mmap+munmap a 64kB page by 28%. Reduces time to mmap+munmap by about 10% on 8xx. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm/slice: Allow up to 64 low slicesChristophe Leroy2018-03-061-1/+2
|/ | | | | | | | | | | | | | | | | | | While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize element in struct mm_context_t On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This means we could have at least 64 slices. In order to override this limitation, this patch switches the handling of low_slices_psize to char array as done already for high_slices_psize. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: introduce execute-only pkeyRam Pai2018-01-201-0/+1
| | | | | | | | | | | This patch provides the implementation of execute-only pkey. The architecture-independent layer expects the arch-dependent layer, to support the ability to create and enable a special key which has execute-only permission. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: track allocation status of all pkeysRam Pai2018-01-201-0/+9
| | | | | | | | | | | | | | | | | | | | Total 32 keys are available on power7 and above. However pkey 0,1 are reserved. So effectively we have 30 pkeys. On 4K kernels, we do not have 5 bits in the PTE to represent all the keys; we only have 3bits. Two of those keys are reserved; pkey 0 and pkey 1. So effectively we have 6 pkeys. This patch keeps track of reserved keys, allocated keys and keys that are currently free. Also it adds skeletal functions and macros, that the architecture-independent code expects to be available. Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge tag 'powerpc-4.15-1' of ↵Linus Torvalds2017-11-161-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "A bit of a small release, I suspect in part due to me travelling for KS. But my backlog of patches to review is smaller than usual, so I think in part folks just didn't send as much this cycle. Non-highlights: - Five fixes for the >128T address space handling, both to fix bugs in our implementation and to bring the semantics exactly into line with x86. Highlights: - Support for a new OPAL call on bare metal machines which gives us a true NMI (ie. is not masked by MSR[EE]=0) for debugging etc. - Support for Power9 DD2 in the CXL driver. - Improvements to machine check handling so that uncorrectable errors can be reported into the generic memory_failure() machinery. - Some fixes and improvements for VPHN, which is used under PowerVM to notify the Linux partition of topology changes. - Plumbing to enable TM (transactional memory) without suspend on some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND). - Support for emulating vector loads form cache-inhibited memory, on some Power9 revisions. - Disable the fast-endian switch "syscall" by default (behind a CONFIG), we believe it has never had any users. - A major rework of the API drivers use when initiating and waiting for long running operations performed by OPAL firmware, and changes to the powernv_flash driver to use the new API. - Several fixes for the handling of FP/VMX/VSX while processes are using transactional memory. - Optimisations of TLB range flushes when using the radix MMU on Power9. - Improvements to the VAS facility used to access coprocessors on Power9, and related improvements to the way the NX crypto driver handles requests. - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit. Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A. Kennington III" * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits) powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature powerpc/64s: Fix masking of SRR1 bits on instruction fault powerpc/64s: mm_context.addr_limit is only used on hash powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary powerpc/64s/hash: Fix fork() with 512TB process address space powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Fix 512T hint detection to use >= 128T powerpc: Fix DABR match on hash based systems powerpc/signal: Properly handle return value from uprobe_deny_signal() powerpc/fadump: use kstrtoint to handle sysfs store powerpc/lib: Implement UACCESS_FLUSHCACHE API powerpc/lib: Implement PMEM API powerpc/powernv/npu: Don't explicitly flush nmmu tlb powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm() powerpc/powernv/idle: Round up latency and residency values powerpc/kprobes: refactor kprobe_lookup_name for safer string operations powerpc/kprobes: Blacklist emulate_update_regs() from kprobes powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace powerpc/kprobes: Disable preemption before invoking probe handler for optprobes ...
| * powerpc/64s: mm_context.addr_limit is only used on hashNicholas Piggin2017-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | Radix keeps no meaningful state in addr_limit, so remove it from radix code and rename to slb_addr_limit to make it clear it applies to hash only. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-021-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/mm: Optimize detection of thread local mm'sBenjamin Herrenschmidt2017-08-231-0/+3
| | | | | | | | | | Instead of comparing the whole CPU mask every time, let's keep a counter of how many bits are set in the mask. Thus testing for a local mm only requires testing if that counter is 1 and the current CPU bit is set in the mask. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge branch 'fixes' into nextMichael Ellerman2017-08-231-7/+8
|\ | | | | | | | | | | There's a non-trivial dependency between some commits we want to put in next and the KVM prefetch work around that went into fixes. So merge fixes into next.
| * powerpc/mm/radix: Workaround prefetch issue with KVMBenjamin Herrenschmidt2017-07-261-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a somewhat architectural issue with Radix MMU and KVM. When coming out of a guest with AIL (Alternate Interrupt Location, ie, MMU enabled), we start executing hypervisor code with the PID register still containing whatever the guest has been using. The problem is that the CPU can (and will) then start prefetching or speculatively load from whatever host context has that same PID (if any), thus bringing translations for that context into the TLB, which Linux doesn't know about. This can cause stale translations and subsequent crashes. Fixing this in a way that is neither racy nor a huge performance impact is difficult. We could just make the host invalidations always use broadcast forms but that would hurt single threaded programs for example. We chose to fix it instead by partitioning the PID space between guest and host. This is possible because today Linux only use 19 out of the 20 bits of PID space, so existing guests will work if we make the host use the top half of the 20 bits space. We additionally add support for a property to indicate to Linux the size of the PID register which will be useful if we eventually have processors with a larger PID space available. There is still an issue with malicious guests purposefully setting the PID register to a value in the hosts PID range. Hopefully future HW can prevent that, but in the meantime, we handle it with a pair of kludges: - On the way out of a guest, before we clear the current VCPU in the PACA, we check the PID and if it's outside of the permitted range we flush the TLB for that PID. - When context switching, if the mm is "new" on that CPU (the corresponding bit was set for the first time in the mm cpumask), we check if any sibling thread is in KVM (has a non-NULL VCPU pointer in the PACA). If that is the case, we also flush the PID for that CPU (core). This second part is needed to handle the case where a process is migrated (or starts a new pthread) on a sibling thread of the CPU coming out of KVM, as there's a window where stale translations can exist before we detect it and flush them out. A future optimization could be added by keeping track of whether the PID has ever been used and avoid doing that for completely fresh PIDs. We could similarily mark PIDs that have been the subject of a global invalidation as "fresh". But for now this will do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop unneeded include of kvm_book3s_asm.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Remove old unused icswx based coprocessor supportBenjamin Herrenschmidt2017-08-031-5/+0
|/ | | | | | | | | | | | | | | | We have a whole pile of unused code to maintain the ACOP register, allocate coprocessor PIDs and handle ACOP faults. This mechanism was used for the HFI adapter on POWER7 which is dead and gone and whose driver never went upstream. It was used on some A2 core based stuff that also never saw the light of day. Take out all that code. There is still some POWER8 coprocessor code that uses icswx but it's kernel only and thus doesn't use any of that infrastructure. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Introduce address translation services for Nvlink2Alistair Popple2017-04-041-0/+6
| | | | | | | | | | | | | | | | | | | | | Nvlink2 supports address translation services (ATS) allowing devices to request address translations from an mmu known as the nest MMU which is setup to walk the CPU page tables. To access this functionality certain firmware calls are required to setup and manage hardware context tables in the nvlink processing unit (NPU). The NPU also manages forwarding of TLB invalidates (known as address translation shootdowns/ATSDs) to attached devices. This patch exports several methods to allow device drivers to register a process id (PASID/PID) in the hardware tables and to receive notification of when a device should stop issuing address translation requests (ATRs). It also adds a fault handler to allow device drivers to demand fault pages in. Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Fix up comment formatting, use flush_tlb_mm()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>