summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds2022-12-198-5/+165
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
| * powerpc/code-patching: Remove protection against patching init addresses ↵Christophe Leroy2022-12-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | after init Once init section is freed, attempting to patch init code ends up in the weed. Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") protected patch_instruction() against that, but it is the responsibility of the caller to ensure that the patched memory is valid. All callers have now been verified and fixed so the check can be removed. This improves ftrace activation by about 2% on 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu
| * powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faultsNicholas Piggin2022-12-024-1/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This option increases the number of hash misses by limiting the number of kernel HPT entries, by keeping a per-CPU record of the last kernel HPTEs installed, and removing that from the hash table on the next hash insertion. A timer round-robins CPUs removing remaining kernel HPTEs and clearing the TLB (in the case of bare metal) to increase and slightly randomise kernel fault activity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comment about NR_CPUS usage, fixup whitespace] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
| * powerpc/tlb: Add local flush for page given mm_struct and psizeBenjamin Gray2022-11-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a local TLB flush operation that works given an mm_struct, VA to flush, and page size representation. Most implementations mirror the surrounding code. The book3s/32/tlbflush.h implementation is left as a BUILD_BUG because it is more complicated and not required for anything as yet. This removes the need to create a vm_area_struct, which the temporary patching mm work does not need. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-8-bgray@linux.ibm.com
| * powerpc/book3e: remove #include <generated/utsrelease.h>Thomas Weißschuh2022-11-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7ad4bd887d27 ("powerpc/book3e: get rid of #include <generated/compile.h>") removed the usage of the define UTS_RELEASE but forgot to drop the include. utsrelease.h is potentially generated on each build. By removing the unused include we can get rid of some spurious recompilations. Fixes: 7ad4bd887d27 ("powerpc/book3e: get rid of #include <generated/compile.h>") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix typo in change log and add more explanation] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126051002.123199-2-linux@weissschuh.net
| * Merge branch 'fixes' into nextMichael Ellerman2022-11-303-17/+70
| |\ | | | | | | | | | | | | Merge our fixes branch to bring in some changes that are prerequisites for work in next.
| * | powerpc: Remove find_current_mm_pte()Christophe Leroy2022-11-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Last usage of find_current_mm_pte() was removed by commit 15759cb054ef ("powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow") Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec79f462a3bfa8365b7df505e574d5d85246bc68.1646818177.git.christophe.leroy@csgroup.eu
* | | hugetlb: simplify hugetlb handling in follow_page_maskMike Kravetz2022-11-081-37/+0
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During discussions of this series [1], it was suggested that hugetlb handling code in follow_page_mask could be simplified. At the beginning of follow_page_mask, there currently is a call to follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only architecture which provides a follow_huge_addr routine that does not return error. Instead, at each level of the page table a check is made for a hugetlb entry. If a hugetlb entry is found, a call to a routine associated with that entry is made. Currently, there are two checks for hugetlb entries at each page table level. The first check is of the form: if (p?d_huge()) page = follow_huge_p?d(); the second check is of the form: if (is_hugepd()) page = follow_huge_pd(). We can replace these checks, as well as the special handling routines such as follow_huge_p?d() and follow_huge_pd() with a single routine to handle hugetlb vmas. A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the beginning of follow_page_mask. hugetlb_follow_page_mask will use the existing routine huge_pte_offset to walk page tables looking for hugetlb entries. huge_pte_offset can be overwritten by architectures, and already handles special cases such as hugepd entries. [1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.wang@linux.alibaba.com/ [mike.kravetz@oracle.com: remove vma (pmd sharing) per Peter] Link: https://lkml.kernel.org/r/20221028181108.119432-1-mike.kravetz@oracle.com [mike.kravetz@oracle.com: remove left over hugetlb_vma_unlock_read()] Link: https://lkml.kernel.org/r/20221030225825.40872-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20220919021348.22151-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | powerpc/64s: Fix hash__change_memory_range preemption warningNicholas Piggin2022-10-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stop_machine_cpuslocked takes a mutex so it must be called in a preemptible context, so it can't simply be fixed by disabling preemption. This is not a bug, because CPU hotplug is locked, so this processor will call in to the stop machine function. So raw_smp_processor_id() could be used. This leaves a small chance that this thread will be migrated to another CPU, so the master work would be done by a CPU from a different context. Better for test coverage to make that a common case by just having the first CPU to call in become the master. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013151647.1857994-2-npiggin@gmail.com
* | powerpc/64s: make linear_map_hash_lock a raw spinlockNicholas Piggin2022-10-181-6/+6
| | | | | | | | | | | | | | | | | | | | | | This lock is taken while the raw kfence_freelist_lock is held, so it must also be a raw spinlock, as reported by lockdep when raw lock nesting checking is enabled. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013230710.1987253-3-npiggin@gmail.com
* | powerpc/64s: make HPTE lock and native_tlbie_lock irq-safeNicholas Piggin2022-10-181-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With kfence enabled, there are several cases where HPTE and TLBIE locks are called from softirq context, for example: WARNING: inconsistent lock state 6.0.0-11845-g0cbbc95b12ac #1 Tainted: G N -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600 {IN-SOFTIRQ-W} state was registered at: .lock_acquire+0x20c/0x520 ._raw_spin_lock+0x4c/0x70 .native_hpte_invalidate+0x62c/0x840 .hash__kernel_map_pages+0x450/0x640 .kfence_protect+0x58/0xc0 .kfence_guarded_free+0x374/0x5a0 .__slab_free+0x3d0/0x630 .put_cred_rcu+0xcc/0x120 .rcu_core+0x3c4/0x14e0 .__do_softirq+0x1dc/0x7dc .do_softirq_own_stack+0x40/0x60 Fix this by consistently disabling irqs while taking either of these locks. Don't just disable bh because several of the more common cases already disable irqs, so this just makes the locks always irq-safe. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013230710.1987253-2-npiggin@gmail.com
* | powerpc/64s: Add lockdep for HPTE lockNicholas Piggin2022-10-181-7/+35
|/ | | | | | | | | | | | Add lockdep annotation for the HPTE bit-spinlock. Modern systems don't take the tlbie lock, so this shows up some of the same lockdep warnings that were being reported by the ppc970. And they're not taken in exactly the same places so this is nice to have in its own right. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013230710.1987253-1-npiggin@gmail.com
* Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds2022-10-102-16/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
| * powerpc: remove mmap linked list walksMatthew Wilcox (Oracle)2022-09-262-16/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the VMA iterator instead. Link: https://lkml.kernel.org/r/20220906194824.2110408-34-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | Merge tag 'powerpc-6.1-1' of ↵Linus Torvalds2022-10-0920-160/+128
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. - Add support for syscall wrappers. - Add support for KFENCE on 64-bit. - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. - Support execute-only memory when using the Radix MMU (P9 or later). - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. - Updates to our linker script to move more data into read-only sections. - Allow the VDSO to be randomised on 32-bit. - Many other small features and fixes. Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng Yongjun. * tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) KVM: PPC: Book3S HV: Fix stack frame regs marker powerpc: Don't add __powerpc_ prefix to syscall entry points powerpc/64s/interrupt: Fix stack frame regs marker powerpc/64: Fix msr_check_and_set/clear MSR[EE] race powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN powerpc/pseries: Add firmware details to the hardware description powerpc/powernv: Add opal details to the hardware description powerpc: Add device-tree model to the hardware description powerpc/64: Add logical PVR to the hardware description powerpc: Add PVR & CPU name to hardware description powerpc: Add hardware description string powerpc/configs: Enable PPC_UV in powernv_defconfig powerpc/configs: Update config files for removed/renamed symbols powerpc/mm: Fix UBSAN warning reported on hugetlb powerpc/mm: Always update max/min_low_pfn in mem_topology_setup() powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range powerpc: Drops STABS_DEBUG from linker scripts powerpc/64s: Remove lost/old comment powerpc/64s: Remove old STAB comment powerpc: remove orphan systbl_chk.sh ...
| * powerpc/mm: Fix UBSAN warning reported on hugetlbAneesh Kumar K.V2022-09-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Powerpc architecture supports 16GB hugetlb pages with hash translation. For 4K page size, this is implemented as a hugepage directory entry at PGD level and for 64K it is implemented as a huge page pte at PUD level With 16GB hugetlb size, offset within a page is greater than 32 bits. Hence switch to use unsigned long type when using hugepd_shift. In order to keep things simpler, we make sure we always use unsigned long type when using hugepd_shift() even though all the hugetlb page size won't require that. The hugetlb_free_p*d_range changes are all related to nohash usage where we can have multiple pgd entries pointing to the same hugepd entries. Hence on book3s64 where we can have > 4GB hugetlb page size we will always find more < next even if we compute the value of more correctly. Hence there is no functional change in this patch except that it fixes the below warning. UBSAN: shift-out-of-bounds in arch/powerpc/mm/hugetlbpage.c:499:21 shift exponent 34 is too large for 32-bit type 'int' CPU: 39 PID: 1673 Comm: a.out Not tainted 6.0.0-rc2-00327-gee88a56e8517-dirty #1 Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) ubsan_epilogue+0x18/0x70 __ubsan_handle_shift_out_of_bounds+0x1bc/0x390 hugetlb_free_pgd_range+0x5d8/0x600 free_pgtables+0x114/0x290 exit_mmap+0x150/0x550 mmput+0xcc/0x210 do_exit+0x420/0xdd0 do_group_exit+0x4c/0xd0 sys_exit_group+0x24/0x30 system_call_exception+0x250/0x600 system_call_common+0xec/0x250 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Drop generic change to be sent separately, change 1ULL to 1UL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908072440.258301-1-aneesh.kumar@linux.ibm.com
| * powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()Aneesh Kumar K.V2022-09-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For both CONFIG_NUMA enabled/disabled use mem_topology_setup() to update max/min_low_pfn. This also adds min_low_pfn update to CONFIG_NUMA which was initialized to zero before. (mpe: Though MEMORY_START is == 0 for PPC64=y which is all possible NUMA=y systems) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220704063851.295482-1-aneesh.kumar@linux.ibm.com
| * powerpc/mm/book3s/hash: Rename flush_tlb_pmd_rangeAneesh Kumar K.V2022-09-302-2/+2
| | | | | | | | | | | | | | | | | | | | | | This function does the hash page table update. Hence rename it to indicate this better to avoid confusion with flush_pmd_tlb_range() Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Drop unnecessary extern] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220907081941.209501-1-aneesh.kumar@linux.ibm.com
| * powerpc: Ignore DSI error caused by the copy/paste instructionHaren Myneni2022-09-281-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data storage interrupt (DSI) error will be generated when the paste operation is issued on the suspended Nest Accelerator (NX) window due to NX state changes. The hypervisor expects the partition to ignore this error during page fault handling. To differentiate DSI caused by an actual HW configuration or by the NX window, a new “ibm,pi-features” type value is defined. Byte 0, bit 3 of pi-attribute-specifier-type is now defined to indicate this DSI error. If this error is not ignored, the user space can get SIGBUS when the NX request is issued. This patch adds changes to read ibm,pi-features property and ignore DSI error during page fault handling if MMU_FTR_NX_DSI is defined. Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Mention PAPR version in comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b9cd844b85eb8f70459109ce1b14e44c4cc85fa7.camel@linux.ibm.com
| * powerpc/64e: provide an addressing macro for use with TOC in alternate registerNicholas Piggin2022-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interrupt entry code carefully saves a minimal number of registers, so in some places the TOC is required, it is loaded into a different register, so provide a macro that can supply an alternate TOC register. This continues to use got addressing because TOC-relative results in "got/toc optimization is not supported" messages by the linker. Having r2 be one of the saved registers and using that for TOC addressing may be the best way to avoid that and switch this to TOC addressing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com
| * powerpc/64: provide a helper macro to load r2 with the kernel TOCNicholas Piggin2022-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | A later change stops the kernel using r2 and loads it with a poison value. Provide a PACATOC loading abstraction which can hide this detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com
| * powerpc/64s: Enable KFENCE on book3s64Nicholas Miehlbradt2022-09-282-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KFENCE support was added for ppc32 in commit 90cbac0e995d ("powerpc: Enable KFENCE for PPC32"). Enable KFENCE on ppc64 architecture with hash and radix MMUs. It uses the same mechanism as debug pagealloc to protect/unprotect pages. All KFENCE kunit tests pass on both MMUs. KFENCE memory is initially allocated using memblock but is later marked as SLAB allocated. This necessitates the change to __pud_free to ensure that the KFENCE pages are freed appropriately. Based on previous work by Christophe Leroy and Jordan Niethe. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-4-nicholas@linux.ibm.com
| * powerpc/64s: Allow double call of kernel_[un]map_linear_page()Christophe Leroy2022-09-281-1/+7
| | | | | | | | | | | | | | | | | | | | If the page is already mapped resp. already unmapped, bail out. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-3-nicholas@linux.ibm.com
| * powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utilsChristophe Leroy2022-09-281-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debug_pagealloc_enabled() is always defined and constant folds to 'false' when CONFIG_DEBUG_PAGEALLOC is not enabled. Remove the #ifdefs, the code and associated static variables will be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is not defined. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-2-nicholas@linux.ibm.com
| * powerpc/64s: Add DEBUG_PAGEALLOC for radixNicholas Miehlbradt2022-09-281-4/+14
| | | | | | | | | | | | | | | | | | | | There is support for DEBUG_PAGEALLOC on hash but not on radix. Add support on radix. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-1-nicholas@linux.ibm.com
| * powerpc: Remove impossible mmu_psize_defs[] on nohashChristophe Leroy2022-09-261-49/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today there is: if e500 or 8xx if e500 mmu_psize_defs[] = else if 8xx mmu_psize_defs[] = else mmu_psize_defs[] = endif endif The else leg is dead definition. Drop that else leg and rewrite as: if e500 mmu_psize_defs[] = endif if 8xx mmu_psize_defs[] = endif Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/030a843449f348c0b709ca5349640624f36a016f.1663606876.git.christophe.leroy@csgroup.eu
| * powerpc: Remove CONFIG_PPC_BOOK3E_MMUChristophe Leroy2022-09-262-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_PPC_BOOK3E_MMU is redundant with CONFIG_PPC_E500. Remove it. Also rename mmu-book3e.h to mmu-e500.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c5549cd59a131204ff94ab909cad2e2dad4ddf2f.1663606876.git.christophe.leroy@csgroup.eu
| * powerpc: Remove CONFIG_PPC_FSL_BOOK3EChristophe Leroy2022-09-268-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500. Remove it. And rename five files accordingly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Rename include guards to match new file names] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/795cb93b88c9a0279289712e674f39e3b108a1b4.1663606876.git.christophe.leroy@csgroup.eu
| * powerpc: Remove CONFIG_PPC_BOOK3EChristophe Leroy2022-09-262-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_PPC_BOOK3E is redundant with CONFIG_PPC_BOOK3E_64. The later is more explicit about the fact that it's a 64 bits target. Remove CONFIG_PPC_BOOK3E. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5d0891490813c19cdcfc04678f512ea68cba3e64.1663606876.git.christophe.leroy@csgroup.eu
| * powerpc: Remove CONFIG_FSL_BOOKEChristophe Leroy2022-09-265-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PPC_85xx is PPC32 only. PPC_85xx always selects E500 and is the only PPC32 that selects E500. FSL_BOOKE is selected when E500 and PPC32 are selected. So FSL_BOOKE is redundant with PPC_85xx. Remove FSL_BOOKE. And rename four files accordingly. cpu_setup_fsl_booke.S is not renamed because it is linked to PPC_FSL_BOOK3E and not to FSL_BOOKE as suggested by its name. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/08e3e15594e66d63b9e89c5b4f9c35153913c28f.1663606875.git.christophe.leroy@csgroup.eu
| * powerpc/64e: Remove unnecessary #ifdef CONFIG_PPC_FSL_BOOK3EChristophe Leroy2022-09-261-6/+0
| | | | | | | | | | | | | | | | | | | | CONFIG_PPC_BOOK3E_64 implies CONFIG_PPC_FSL_BOOK3E so no need of additional #ifdefs in files built exclusively for CONFIG_PPC_BOOK3E_64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/df16255c13b63b0221c9be63b94a6864bed22c12.1663606875.git.christophe.leroy@csgroup.eu
| * powerpc/highmem: Properly handle fragmented memoryChristophe Leroy2022-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In addition to checking whether a page is reserved before allocating it to highmem, verify that it is valid memory. Otherwise the kernel Oopses as below: mem auto-init: stack:off, heap alloc:off, heap free:off Kernel attempted to read user page (7df58) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0x0007df58 Faulting instruction address: 0xc01c8348 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=2 P2020RDB-PC Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc2-0caacb197b677410bdac81bc34f05235+ #121 NIP: c01c8348 LR: c01cb2bc CTR: 0000000a REGS: c10d7e20 TRAP: 0300 Not tainted (6.0.0-rc2-0caacb197b677410bdac81bc34f05235+) MSR: 00021000 <CE,ME> CR: 48044224 XER: 00000000 DEAR: 0007df58 ESR: 00000000 GPR00: c01cb294 c10d7f10 c1045340 00000001 00000004 c112bcc0 00000015 eedf1000 GPR08: 00000003 0007df58 00000000 f0000000 28044228 00000200 00000000 00000000 GPR16: 00000000 00000000 00000000 0275cb7a c0000000 00000001 0000075f 00000000 GPR24: c1031004 00000000 00000000 00000001 c10f0000 eedf1000 00080000 00080000 NIP free_unref_page_prepare.part.93+0x48/0x60 LR free_unref_page+0x84/0x4b8 Call Trace: 0xeedf1000 (unreliable) free_unref_page+0x5c/0x4b8 mem_init+0xd0/0x194 start_kernel+0x4c0/0x6d0 set_ivor+0x13c/0x178 Reported-by: Pali Rohár <pali@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Fixes: b0e0d68b1c52 ("powerpc/32: Allow fragmented physical memory") Tested-by: Pali Rohár <pali@kernel.org> [mpe: Trim oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f08cca5c46d67399c53262eca48e015dcf1841f9.1663695394.git.christophe.leroy@csgroup.eu
| * powerpc/book3s: Inline first level of update_mmu_cache()Christophe Leroy2022-09-262-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | update_mmu_cache() voids when hash page tables are not used. On PPC32 that means when MMU_FTR_HPTE_TABLE is not defined. On PPC64 that means when RADIX is enabled. Rename core part of update_mmu_cache() as __update_mmu_cache() and include the initial verification in an inlined caller. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bea5ad0de7f83eff256116816d46c84fa0a444de.1662370698.git.christophe.leroy@csgroup.eu
| * powerpc: move __end_rodata to cover arch read-only sectionsNicholas Piggin2022-09-264-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | powerpc has a number of read-only sections and tables that are put after RO_DATA(). Move the __end_rodata symbol to cover these as well. Setting memory to read-only at boot is done using __init_begin, change that to use __end_rodata. This makes is_kernel_rodata() exactly cover the read-only region, as well as other things using __end_rodata (e.g., kernel/dma/debug.c). Boot dmesg also prints the rodata size more accurately. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220916040755.2398112-2-npiggin@gmail.com
| * powerpc/vmlinux.lds: Add an explicit symbol for the SRWX boundaryMichael Ellerman2022-09-262-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently __init_begin is used as the boundary for strict RWX between executable/read-only text and data, and non-executable (after boot) code and data. But that's a little subtle, so add an explicit symbol to document that the SRWX boundary lies there, and add a comment making it clear that __init_begin must also begin there. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220916131422.318752-2-mpe@ellerman.id.au
| * powerpc/32: Remove wii_memory_fixups()Christophe Leroy2022-09-011-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | wii_memory_fixups() is not called anymore, remove it. Also remove left-overs in mmu_decl.h which were forgotten by commit 160985f3025b ("powerpc/wii: remove wii_mmu_mapin_mem2()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5f2091f86528b59ef92ef1daed5d3dd8c0d7bebd.1661938317.git.christophe.leroy@csgroup.eu
| * powerpc/32: Allow fragmented physical memoryChristophe Leroy2022-09-011-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 9e849f231c3c ("powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.") it is possible to map all blocks as RAM on any PPC32. Remove related restrictions. And remove call to wii_memory_fixups() which doesn't do anything else than checks since commit 160985f3025b ("powerpc/wii: remove wii_mmu_mapin_mem2()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fc9a5b8ef49f6eb86e970d4c7ccfba9b407fd4eb.1661938317.git.christophe.leroy@csgroup.eu
| * powerpc/32: Drop a stale comment about reservation of gigantic pagesChristophe Leroy2022-09-011-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | A comment about the reservation of gigantic pages was left in MMU_init() after commit 79cc38ded1e1 ("powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line") Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/959d77be630b9b46a7458f0fbd41dc3a94ec811a.1661938317.git.christophe.leroy@csgroup.eu
| * powerpc/mm: Support execute-only memory on the Radix MMURussell Currey2022-08-262-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for execute-only memory (XOM) for the Radix MMU by using an execute-only mapping, as opposed to the RX mapping used by powerpc's other MMUs. The Hash MMU already supports XOM through the execute-only pkey, which is a separate mechanism shared with x86. A PROT_EXEC-only mapping will map to RX, and then the pkey will be applied on top of it. mmap() and mprotect() consumers in userspace should observe the same behaviour on Hash and Radix despite the differences in implementation. Replacing the vma_is_accessible() check in access_error() with a read check should be functionally equivalent for non-Radix MMUs, since it follows write and execute checks. For Radix, the change enables detecting faults on execute-only mappings where vma_is_accessible() would return true. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220817050640.406017-1-ruscur@russell.cc
| * powerpc/fsl_booke: Make calc_cam_sz() staticChristophe Leroy2022-08-262-4/+2
| | | | | | | | | | | | | | | | | | calc_cam_sz() is used only in fsl_book3e.c, make it static. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a7469848371b2cf5e8f654ec79800e209d88595e.1660919200.git.christophe.leroy@csgroup.eu
| * powerpc: Remove stale declarations in mmu_decl.hChristophe Leroy2022-08-261-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtas_size and rtas_data are not used anymore since at least commit 7c8c6b9776fb ("powerpc: Merge lmb.c and make MM initialization use it.") Remove them. Since commit 4b74a35fc7e9 ("powerpc/32s: Make Hash var static") the forward declaration of struct hash_pte is unneeded. Remove it. __initial_memory_limit_addr was removed by commit e63075a3c937 ("memblock: Introduce default allocation limit and use it to replace explicit ones") Remove the declaration. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a821e8397dd56b8177ecc04966d3b3a7c4bda6d4.1660919016.git.christophe.leroy@csgroup.eu
* | powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flushYang Shi2022-09-261-9/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | | The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will move to use RCU instead of disabling local interrupts in fast-GUP. Using an IPI is the old-styled way of serializing against fast-GUP although it still works as expected now. And fast-GUP now fixed the potential race with THP collapse by checking whether PMD is changed or not. So IPI broadcast in radix pmd collapse flush is not necessary anymore. But it is still needed for hash TLB. Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds2022-08-101-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.0: - Introduce a 'struct cxl_region' object with support for provisioning and assembling persistent memory regions. - Introduce alloc_free_mem_region() to accompany the existing request_free_mem_region() as a method to allocate physical memory capacity out of an existing resource. - Export insert_resource_expand_to_fit() for the CXL subsystem to late-publish CXL platform windows in iomem_resource. - Add a polled mode PCI DOE (Data Object Exchange) driver service and use it in cxl_pci to retrieve the CDAT (Coherent Device Attribute Table)" * tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (74 commits) cxl/hdm: Fix skip allocations vs multiple pmem allocations cxl/region: Disallow region granularity != window granularity cxl/region: Fix x1 interleave to greater than x1 interleave routing cxl/region: Move HPA setup to cxl_region_attach() cxl/region: Fix decoder interleave programming Documentation: cxl: remove dangling kernel-doc reference cxl/region: describe targets and nr_targets members of cxl_region_params cxl/regions: add padding for cxl_rr_ep_add nested lists cxl/region: Fix IS_ERR() vs NULL check cxl/region: Fix region reference target accounting cxl/region: Fix region commit uninitialized variable warning cxl/region: Fix port setup uninitialized variable warnings cxl/region: Stop initializing interleave granularity cxl/hdm: Fix DPA reservation vs cxl_endpoint_decoder lifetime cxl/acpi: Minimize granularity for x1 interleaves cxl/region: Delete 'region' attribute from root decoders cxl/acpi: Autoload driver for 'cxl_acpi' test devices cxl/region: decrement ->nr_targets on error in cxl_region_attach() cxl/region: prevent underflow in ways_to_cxl() cxl/region: uninitialized variable in alloc_hpa() ...
| * powerpc/mm: Export memory_add_physaddr_to_nid() for modulesMichael Ellerman2022-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | The cxl_pmem module wants to call memory_add_physaddr_to_nid(), so export the symbol. Link: http://lore.kernel.org/r/87sfmkbfyg.fsf@mpe.ellerman.id.au Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | Merge tag 'powerpc-6.0-1' of ↵Linus Torvalds2022-08-0622-212/+256
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for syscall stack randomization - Add support for atomic operations to the 32 & 64-bit BPF JIT - Full support for KASAN on 64-bit Book3E - Add a watchdog driver for the new PowerVM hypervisor watchdog - Add a number of new selftests for the Power10 PMU support - Add a driver for the PowerVM Platform KeyStore - Increase the NMI watchdog timeout during live partition migration, to avoid timeouts due to increased memory access latency - Add support for using the 'linux,pci-domain' device tree property for PCI domain assignment - Many other small features and fixes Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A. Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada, Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár, Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu Jianfeng, and Zhouyi Zhou. * tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits) powerpc/64e: Fix kexec build error EDAC/ppc_4xx: Include required of_irq header directly powerpc/pci: Fix PHB numbering when using opal-phbid powerpc/64: Init jump labels before parse_early_param() selftests/powerpc: Avoid GCC 12 uninitialised variable warning powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address powerpc/xive: Fix refcount leak in xive_get_max_prio powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader powerpc/perf: Include caps feature for power10 DD1 version powerpc: add support for syscall stack randomization powerpc: Move system_call_exception() to syscall.c powerpc/powernv: rename remaining rng powernv_ functions to pnv_ powerpc/powernv/kvm: Use darn for H_RANDOM on Power9 powerpc/powernv: Avoid crashing if rng is NULL selftests/powerpc: Fix matrix multiply assist test powerpc/signal: Update comment for clarity powerpc: make facility_unavailable_exception 64s powerpc/platforms/83xx/suspend: Remove write-only global variable powerpc/platforms/83xx/suspend: Prevent unloading the driver powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration ...
| * | powerpc/44x: Fix build failure with GCC 12 (unrecognized opcode: `wrteei')Christophe Leroy2022-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building ppc40x_defconfig leads to following error CC arch/powerpc/kernel/idle.o {standard input}: Assembler messages: {standard input}:67: Error: unrecognized opcode: `wrteei' {standard input}:78: Error: unrecognized opcode: `wrteei' Add -mcpu=440 by default and alternatively 464 and 476. Once that's done, -mcpu=powerpc is only for book3s/32 now. But then comes CC arch/powerpc/kernel/io.o {standard input}: Assembler messages: {standard input}:198: Error: unrecognized opcode: `eieio' {standard input}:230: Error: unrecognized opcode: `eieio' {standard input}:245: Error: unrecognized opcode: `eieio' {standard input}:254: Error: unrecognized opcode: `eieio' {standard input}:273: Error: unrecognized opcode: `eieio' {standard input}:396: Error: unrecognized opcode: `eieio' {standard input}:404: Error: unrecognized opcode: `eieio' {standard input}:423: Error: unrecognized opcode: `eieio' {standard input}:512: Error: unrecognized opcode: `eieio' {standard input}:520: Error: unrecognized opcode: `eieio' {standard input}:539: Error: unrecognized opcode: `eieio' {standard input}:628: Error: unrecognized opcode: `eieio' {standard input}:636: Error: unrecognized opcode: `eieio' {standard input}:655: Error: unrecognized opcode: `eieio' Fix it by replacing eieio by mbar on booke. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b0d982e223314ed82ab959f5d4ad2c4c00bedb99.1657549153.git.christophe.leroy@csgroup.eu
| * | powerpc/32s: Fix boot failure with KASAN + SMP + JUMP_LABEL_FEATURE_CHECK_DEBUGChristophe Leroy2022-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 4291d085b0b0 ("powerpc/32s: Make pte_update() non atomic on 603 core"), pte_update() has been using mmu_has_feature(MMU_FTR_HPTE_TABLE) to avoid a useless atomic operation on 603 cores. When kasan_early_init() sets up the early zero shadow, it uses __set_pte_at(). On book3s/32, __set_pte_at() calls pte_update() when CONFIG_SMP is selected in order to ensure the preservation of _PAGE_HASHPTE in case of concurrent update of the PTE. But that's too early for mmu_has_feature(), so when CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG is selected, mmu_has_feature() calls printk(). That's too early to call printk() because KASAN early zero shadow page is not set up yet. It leads to a deadlock. However, when kasan_early_init() is called, there is only one CPU running and no risk of concurrent PTE update. So __set_pte_at() can be called with the 'percpu' flag. With that flag set, the PTE is written directly instead of being written via pte_update(). Fixes: 4291d085b0b0 ("powerpc/32s: Make pte_update() non atomic on 603 core") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2ee707512b8b212b079b877f4ceb525a1606a3fb.1656655567.git.christophe.leroy@csgroup.eu
| * | powerpc/32: Set an IBAT covering up to _einittext during initChristophe Leroy2022-07-271-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always set an IBAT covering up to _einittext during init because when CONFIG_MODULES is not selected there is no reason to have an exception handler for kernel instruction TLB misses. It implies DBAT and IBAT are now totaly independent, IBATs are set by setibat() and DBAT by setbat(). This allows to revert commit 9bb162fa26ed ("powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE") Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ce7f04a39593934d9b1ee68c69144ccd3d4da4a1.1655202804.git.christophe.leroy@csgroup.eu
| * | powerpc/32: Call mmu_mark_initmem_nx() regardless of data block mapping.Christophe Leroy2022-07-272-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mark_initmem_nx() calls either mmu_mark_initmem_nx() or set_memory_attr() based on return from v_block_mapped() of _sinittext. But we can now handle text and data independently, so that text may be mapped by block even when data is mapped by pages. On the 8xx for instance, at startup 32Mbytes of memory are pinned in TLB. So the pinned entries need to go away for sinittext. In next patch a BAT will be set to also covers sinittext on book3s/32. So it will also be needed to call mmu_mark_initmem_nx() even when data above sinittext is not mapped with BATs. As this is highly dependent on the platform, call mmu_mark_initmem_nx() regardless of data block mapping. Then the platform will know what to do. Modify 8xx mmu_mark_initmem_nx() so that inittext mapping is modified only when pagealloc debug and kfence are not active, otherwise inittext is mapped with standard pages. And don't do anything on kernel text which is already mapped with PAGE_KERNEL_TEXT. Fixes: da1adea07576 ("powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/db3fc14f3bfa6215b0786ef58a6e2bc1e1f964d7.1655202804.git.christophe.leroy@csgroup.eu
| * | powerpc/64s: POWER10 nest MMU can upgrade PTE access authority without TLB flushNicholas Piggin2022-07-272-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nest MMU in POWER9 does not re-fetch the PTE in response to permission mismatch, contrary to the architecture[*] and unlike the core MMU. This requires a TLB flush before upgrading permissions of valid PTEs, for any address space with a coprocessor attached. Per (non-public) Nest MMU Workbook, POWER10 nest MMU conforms to the architecture in this regard, so skip the workaround. [*] See: Power ISA Version 3.1B, 6.10.1.2 Modifying a Translation Table Entry, Setting a Reference or Change Bit or Upgrading Access Authority (PTE Subject to Atomic Hardware Updates): "If the only change being made to a valid PTE that is subject to atomic hardware updates is to set the Reference or Change bit to 1 or to upgrade access authority, a simpler sequence suffices because the translation hardware will refetch the PTE if an access is attempted for which the only problems were reference and/or change bits needing to be set or insufficient access authority." Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525022358.780745-3-npiggin@gmail.com