From 071c95c1acbd96e76bab8b25b5cad0d71a011f37 Mon Sep 17 00:00:00 2001 From: Benjamin Gray Date: Wed, 9 Nov 2022 15:51:05 +1100 Subject: powerpc/code-patching: Use WARN_ON and fix check in poking_init BUG_ON() when failing to initialise the code patching window is unnecessary, and use of BUG_ON is discouraged. We don't set poking_init_done in this case, so failure to init the boot CPU will result in a strict RWX error when a following patch_instruction uses raw_patch_instruction. If it only fails for later CPUs, they won't be onlined in the first place. The return value of cpuhp_setup_state() is also >= 0 on success, so check for < 0. Signed-off-by: Benjamin Gray Reviewed-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20221109045112.187069-3-bgray@linux.ibm.com --- arch/powerpc/lib/code-patching.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'arch/powerpc/lib/code-patching.c') diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index ad0cf3108dd0..3055eef7dcdc 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -81,16 +81,17 @@ static int text_area_cpu_down(unsigned int cpu) static __ro_after_init DEFINE_STATIC_KEY_FALSE(poking_init_done); -/* - * Although BUG_ON() is rude, in this case it should only happen if ENOMEM, and - * we judge it as being preferable to a kernel that will crash later when - * someone tries to use patch_instruction(). - */ void __init poking_init(void) { - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, - "powerpc/text_poke:online", text_area_cpu_up, - text_area_cpu_down)); + int ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, + "powerpc/text_poke:online", + text_area_cpu_up, + text_area_cpu_down); + + /* cpuhp_setup_state returns >= 0 on success */ + if (WARN_ON(ret < 0)) + return; + static_branch_enable(&poking_init_done); } -- cgit v1.2.3 From c28c15b6d28a776538482101522cbcd9f906b15c Mon Sep 17 00:00:00 2001 From: "Christopher M. Riedl" Date: Wed, 9 Nov 2022 15:51:11 +1100 Subject: powerpc/code-patching: Use temporary mm for Radix MMU x86 supports the notion of a temporary mm which restricts access to temporary PTEs to a single CPU. A temporary mm is useful for situations where a CPU needs to perform sensitive operations (such as patching a STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing said mappings to other CPUs. Another benefit is that other CPU TLBs do not need to be flushed when the temporary mm is torn down. Mappings in the temporary mm can be set in the userspace portion of the address-space. Interrupts must be disabled while the temporary mm is in use. HW breakpoints, which may have been set by userspace as watchpoints on addresses now within the temporary mm, are saved and disabled when loading the temporary mm. The HW breakpoints are restored when unloading the temporary mm. All HW breakpoints are indiscriminately disabled while the temporary mm is in use - this may include breakpoints set by perf. Use the `poking_init` init hook to prepare a temporary mm and patching address. Initialize the temporary mm using mm_alloc(). Choose a randomized patching address inside the temporary mm userspace address space. The patching address is randomized between PAGE_SIZE and DEFAULT_MAP_WINDOW-PAGE_SIZE. Bits of entropy with 64K page size on BOOK3S_64: bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB bits of entropy = log2(128TB / 64K) bits of entropy = 31 The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU operates - by default the space above DEFAULT_MAP_WINDOW is not available. Currently the Hash MMU does not use a temporary mm so technically this upper limit isn't necessary; however, a larger randomization range does not further "harden" this overall approach and future work may introduce patching with a temporary mm on Hash as well. Randomization occurs only once during initialization for each CPU as it comes online. The patching page is mapped with PAGE_KERNEL to set EAA[0] for the PTE which ignores the AMR (so no need to unlock/lock KUAP) according to PowerISA v3.0b Figure 35 on Radix. Based on x86 implementation: commit 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching") and: commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") From: Benjamin Gray Synchronisation is done according to ISA 3.1B Book 3 Chapter 13 "Synchronization Requirements for Context Alterations". Switching the mm is a change to the PID, which requires a CSI before and after the change, and a hwsync between the last instruction that performs address translation for an associated storage access. Instruction fetch is an associated storage access, but the instruction address mappings are not being changed, so it should not matter which context they use. We must still perform a hwsync to guard arbitrary prior code that may have accessed a userspace address. TLB invalidation is local and VA specific. Local because only this core used the patching mm, and VA specific because we only care that the writable mapping is purged. Leaving the other mappings intact is more efficient, especially when performing many code patches in a row (e.g., as ftrace would). Signed-off-by: Christopher M. Riedl Signed-off-by: Benjamin Gray [mpe: Use mm_alloc() per 107b6828a7cd ("x86/mm: Use mm_alloc() in poking_init()")] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20221109045112.187069-9-bgray@linux.ibm.com --- arch/powerpc/lib/code-patching.c | 177 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 172 insertions(+), 5 deletions(-) (limited to 'arch/powerpc/lib/code-patching.c') diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 3055eef7dcdc..a1902241ff5d 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -4,12 +4,17 @@ */ #include +#include +#include #include #include #include #include #include +#include +#include +#include #include #include #include @@ -42,11 +47,54 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr) } #ifdef CONFIG_STRICT_KERNEL_RWX + static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); +static DEFINE_PER_CPU(struct mm_struct *, cpu_patching_mm); +static DEFINE_PER_CPU(unsigned long, cpu_patching_addr); +static DEFINE_PER_CPU(pte_t *, cpu_patching_pte); static int map_patch_area(void *addr, unsigned long text_poke_addr); static void unmap_patch_area(unsigned long addr); +static bool mm_patch_enabled(void) +{ + return IS_ENABLED(CONFIG_SMP) && radix_enabled(); +} + +/* + * The following applies for Radix MMU. Hash MMU has different requirements, + * and so is not supported. + * + * Changing mm requires context synchronising instructions on both sides of + * the context switch, as well as a hwsync between the last instruction for + * which the address of an associated storage access was translated using + * the current context. + * + * switch_mm_irqs_off() performs an isync after the context switch. It is + * the responsibility of the caller to perform the CSI and hwsync before + * starting/stopping the temp mm. + */ +static struct mm_struct *start_using_temp_mm(struct mm_struct *temp_mm) +{ + struct mm_struct *orig_mm = current->active_mm; + + lockdep_assert_irqs_disabled(); + switch_mm_irqs_off(orig_mm, temp_mm, current); + + WARN_ON(!mm_is_thread_local(temp_mm)); + + suspend_breakpoints(); + return orig_mm; +} + +static void stop_using_temp_mm(struct mm_struct *temp_mm, + struct mm_struct *orig_mm) +{ + lockdep_assert_irqs_disabled(); + switch_mm_irqs_off(temp_mm, orig_mm, current); + restore_breakpoints(); +} + static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; @@ -79,14 +127,86 @@ static int text_area_cpu_down(unsigned int cpu) return 0; } +static void put_patching_mm(struct mm_struct *mm, unsigned long patching_addr) +{ + struct mmu_gather tlb; + + tlb_gather_mmu(&tlb, mm); + free_pgd_range(&tlb, patching_addr, patching_addr + PAGE_SIZE, 0, 0); + mmput(mm); +} + +static int text_area_cpu_up_mm(unsigned int cpu) +{ + struct mm_struct *mm; + unsigned long addr; + pte_t *pte; + spinlock_t *ptl; + + mm = mm_alloc(); + if (WARN_ON(!mm)) + goto fail_no_mm; + + /* + * Choose a random page-aligned address from the interval + * [PAGE_SIZE .. DEFAULT_MAP_WINDOW - PAGE_SIZE]. + * The lower address bound is PAGE_SIZE to avoid the zero-page. + */ + addr = (1 + (get_random_long() % (DEFAULT_MAP_WINDOW / PAGE_SIZE - 2))) << PAGE_SHIFT; + + /* + * PTE allocation uses GFP_KERNEL which means we need to + * pre-allocate the PTE here because we cannot do the + * allocation during patching when IRQs are disabled. + * + * Using get_locked_pte() to avoid open coding, the lock + * is unnecessary. + */ + pte = get_locked_pte(mm, addr, &ptl); + if (!pte) + goto fail_no_pte; + pte_unmap_unlock(pte, ptl); + + this_cpu_write(cpu_patching_mm, mm); + this_cpu_write(cpu_patching_addr, addr); + this_cpu_write(cpu_patching_pte, pte); + + return 0; + +fail_no_pte: + put_patching_mm(mm, addr); +fail_no_mm: + return -ENOMEM; +} + +static int text_area_cpu_down_mm(unsigned int cpu) +{ + put_patching_mm(this_cpu_read(cpu_patching_mm), + this_cpu_read(cpu_patching_addr)); + + this_cpu_write(cpu_patching_mm, NULL); + this_cpu_write(cpu_patching_addr, 0); + this_cpu_write(cpu_patching_pte, NULL); + + return 0; +} + static __ro_after_init DEFINE_STATIC_KEY_FALSE(poking_init_done); void __init poking_init(void) { - int ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, - "powerpc/text_poke:online", - text_area_cpu_up, - text_area_cpu_down); + int ret; + + if (mm_patch_enabled()) + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, + "powerpc/text_poke_mm:online", + text_area_cpu_up_mm, + text_area_cpu_down_mm); + else + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, + "powerpc/text_poke:online", + text_area_cpu_up, + text_area_cpu_down); /* cpuhp_setup_state returns >= 0 on success */ if (WARN_ON(ret < 0)) @@ -148,6 +268,50 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } +static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) +{ + int err; + u32 *patch_addr; + unsigned long text_poke_addr; + pte_t *pte; + unsigned long pfn = get_patch_pfn(addr); + struct mm_struct *patching_mm; + struct mm_struct *orig_mm; + + patching_mm = __this_cpu_read(cpu_patching_mm); + pte = __this_cpu_read(cpu_patching_pte); + text_poke_addr = __this_cpu_read(cpu_patching_addr); + patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); + + __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); + + /* order PTE update before use, also serves as the hwsync */ + asm volatile("ptesync": : :"memory"); + + /* order context switch after arbitrary prior code */ + isync(); + + orig_mm = start_using_temp_mm(patching_mm); + + err = __patch_instruction(addr, instr, patch_addr); + + /* hwsync performed by __patch_instruction (sync) if successful */ + if (err) + mb(); /* sync */ + + /* context synchronisation performed by __patch_instruction (isync or exception) */ + stop_using_temp_mm(patching_mm, orig_mm); + + pte_clear(patching_mm, text_poke_addr, pte); + /* + * ptesync to order PTE update before TLB invalidation done + * by radix__local_flush_tlb_page_psize (in _tlbiel_va) + */ + local_flush_tlb_page_psize(patching_mm, text_poke_addr, mmu_virtual_psize); + + return err; +} + static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) { int err; @@ -187,7 +351,10 @@ static int do_patch_instruction(u32 *addr, ppc_inst_t instr) return raw_patch_instruction(addr, instr); local_irq_save(flags); - err = __do_patch_instruction(addr, instr); + if (mm_patch_enabled()) + err = __do_patch_instruction_mm(addr, instr); + else + err = __do_patch_instruction(addr, instr); local_irq_restore(flags); return err; -- cgit v1.2.3 From 2f228ee1ade5d8d1f26cf94863a36c5693023c58 Mon Sep 17 00:00:00 2001 From: Benjamin Gray Date: Wed, 9 Nov 2022 15:51:12 +1100 Subject: powerpc/code-patching: Consolidate and cache per-cpu patching context With the temp mm context support, there are CPU local variables to hold the patch address and pte. Use these in the non-temp mm path as well instead of adding a level of indirection through the text_poke_area vm_struct and pointer chasing the pte. As both paths use these fields now, there is no need to let unreferenced variables be dropped by the compiler, so it is cleaner to merge them into a single context struct. This has the additional benefit of removing a redundant CPU local pointer, as only one of cpu_patching_mm / text_poke_area is ever used, while remaining well-typed. It also groups each CPU's data into a single cacheline. Signed-off-by: Benjamin Gray [mpe: Shorten name to 'area' as suggested by Christophe] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20221109045112.187069-10-bgray@linux.ibm.com --- arch/powerpc/lib/code-patching.c | 49 ++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 19 deletions(-) (limited to 'arch/powerpc/lib/code-patching.c') diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index a1902241ff5d..5b8f87db1217 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -48,10 +48,16 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr) #ifdef CONFIG_STRICT_KERNEL_RWX -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); -static DEFINE_PER_CPU(struct mm_struct *, cpu_patching_mm); -static DEFINE_PER_CPU(unsigned long, cpu_patching_addr); -static DEFINE_PER_CPU(pte_t *, cpu_patching_pte); +struct patch_context { + union { + struct vm_struct *area; + struct mm_struct *mm; + }; + unsigned long addr; + pte_t *pte; +}; + +static DEFINE_PER_CPU(struct patch_context, cpu_patching_context); static int map_patch_area(void *addr, unsigned long text_poke_addr); static void unmap_patch_area(unsigned long addr); @@ -116,14 +122,19 @@ static int text_area_cpu_up(unsigned int cpu) unmap_patch_area(addr); - this_cpu_write(text_poke_area, area); + this_cpu_write(cpu_patching_context.area, area); + this_cpu_write(cpu_patching_context.addr, addr); + this_cpu_write(cpu_patching_context.pte, virt_to_kpte(addr)); return 0; } static int text_area_cpu_down(unsigned int cpu) { - free_vm_area(this_cpu_read(text_poke_area)); + free_vm_area(this_cpu_read(cpu_patching_context.area)); + this_cpu_write(cpu_patching_context.area, NULL); + this_cpu_write(cpu_patching_context.addr, 0); + this_cpu_write(cpu_patching_context.pte, NULL); return 0; } @@ -167,9 +178,9 @@ static int text_area_cpu_up_mm(unsigned int cpu) goto fail_no_pte; pte_unmap_unlock(pte, ptl); - this_cpu_write(cpu_patching_mm, mm); - this_cpu_write(cpu_patching_addr, addr); - this_cpu_write(cpu_patching_pte, pte); + this_cpu_write(cpu_patching_context.mm, mm); + this_cpu_write(cpu_patching_context.addr, addr); + this_cpu_write(cpu_patching_context.pte, pte); return 0; @@ -181,12 +192,12 @@ fail_no_mm: static int text_area_cpu_down_mm(unsigned int cpu) { - put_patching_mm(this_cpu_read(cpu_patching_mm), - this_cpu_read(cpu_patching_addr)); + put_patching_mm(this_cpu_read(cpu_patching_context.mm), + this_cpu_read(cpu_patching_context.addr)); - this_cpu_write(cpu_patching_mm, NULL); - this_cpu_write(cpu_patching_addr, 0); - this_cpu_write(cpu_patching_pte, NULL); + this_cpu_write(cpu_patching_context.mm, NULL); + this_cpu_write(cpu_patching_context.addr, 0); + this_cpu_write(cpu_patching_context.pte, NULL); return 0; } @@ -278,9 +289,9 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) struct mm_struct *patching_mm; struct mm_struct *orig_mm; - patching_mm = __this_cpu_read(cpu_patching_mm); - pte = __this_cpu_read(cpu_patching_pte); - text_poke_addr = __this_cpu_read(cpu_patching_addr); + patching_mm = __this_cpu_read(cpu_patching_context.mm); + pte = __this_cpu_read(cpu_patching_context.pte); + text_poke_addr = __this_cpu_read(cpu_patching_context.addr); patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); @@ -320,10 +331,10 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) pte_t *pte; unsigned long pfn = get_patch_pfn(addr); - text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK; + text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK; patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); - pte = virt_to_kpte(text_poke_addr); + pte = __this_cpu_read(cpu_patching_context.pte); __set_pte_at(&init_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); /* See ptesync comment in radix__set_pte_at() */ if (radix_enabled()) -- cgit v1.2.3 From 84ecfe6f38ae4ee779ebd97ee173937fff565bf9 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 2 Dec 2022 09:31:39 +0100 Subject: powerpc/code-patching: Remove #ifdef CONFIG_STRICT_KERNEL_RWX No need to have one implementation of patch_instruction() for CONFIG_STRICT_KERNEL_RWX and one for !CONFIG_STRICT_KERNEL_RWX. In patch_instruction(), call raw_patch_instruction() when !CONFIG_STRICT_KERNEL_RWX. In poking_init(), bail out immediately, it will be equivalent to the weak default implementation. Everything else is declared static and will be discarded by GCC when !CONFIG_STRICT_KERNEL_RWX. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/f67d2a109404d03e8fdf1ea15388c8778337a76b.1669969781.git.christophe.leroy@csgroup.eu --- arch/powerpc/lib/code-patching.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) (limited to 'arch/powerpc/lib/code-patching.c') diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 5b8f87db1217..a6a5047f8ba2 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -46,8 +46,6 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr) return __patch_instruction(addr, instr, addr); } -#ifdef CONFIG_STRICT_KERNEL_RWX - struct patch_context { union { struct vm_struct *area; @@ -208,6 +206,9 @@ void __init poking_init(void) { int ret; + if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) + return; + if (mm_patch_enabled()) ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/text_poke_mm:online", @@ -358,7 +359,8 @@ static int do_patch_instruction(u32 *addr, ppc_inst_t instr) * when text_poke_area is not ready, but we still need * to allow patching. We just do the plain old patching */ - if (!static_branch_likely(&poking_init_done)) + if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) || + !static_branch_likely(&poking_init_done)) return raw_patch_instruction(addr, instr); local_irq_save(flags); @@ -370,14 +372,6 @@ static int do_patch_instruction(u32 *addr, ppc_inst_t instr) return err; } -#else /* !CONFIG_STRICT_KERNEL_RWX */ - -static int do_patch_instruction(u32 *addr, ppc_inst_t instr) -{ - return raw_patch_instruction(addr, instr); -} - -#endif /* CONFIG_STRICT_KERNEL_RWX */ __ro_after_init DEFINE_STATIC_KEY_FALSE(init_mem_is_free); -- cgit v1.2.3 From 6f3a81b60091031c2c14eb2373d1937b027deb46 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 2 Dec 2022 09:31:43 +0100 Subject: powerpc/code-patching: Remove protection against patching init addresses after init Once init section is freed, attempting to patch init code ends up in the weed. Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") protected patch_instruction() against that, but it is the responsibility of the caller to ensure that the patched memory is valid. All callers have now been verified and fixed so the check can be removed. This improves ftrace activation by about 2% on 8xx. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu --- arch/powerpc/include/asm/code-patching.h | 2 -- arch/powerpc/lib/code-patching.c | 13 +------------ arch/powerpc/mm/mem.c | 1 - 3 files changed, 1 insertion(+), 15 deletions(-) (limited to 'arch/powerpc/lib/code-patching.c') diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 1c6316ec4b74..3f881548fb61 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -22,8 +22,6 @@ #define BRANCH_SET_LINK 0x1 #define BRANCH_ABSOLUTE 0x2 -DECLARE_STATIC_KEY_FALSE(init_mem_is_free); - /* * Powerpc branch instruction is : * diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index a6a5047f8ba2..73ce4b90bb1b 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -349,7 +349,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) return err; } -static int do_patch_instruction(u32 *addr, ppc_inst_t instr) +int patch_instruction(u32 *addr, ppc_inst_t instr) { int err; unsigned long flags; @@ -372,17 +372,6 @@ static int do_patch_instruction(u32 *addr, ppc_inst_t instr) return err; } - -__ro_after_init DEFINE_STATIC_KEY_FALSE(init_mem_is_free); - -int patch_instruction(u32 *addr, ppc_inst_t instr) -{ - /* Make sure we aren't patching a freed init section */ - if (static_branch_likely(&init_mem_is_free) && init_section_contains(addr, 4)) - return 0; - - return do_patch_instruction(addr, instr); -} NOKPROBE_SYMBOL(patch_instruction); int patch_branch(u32 *addr, unsigned long target, int flags) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 84d171953ba4..8b121df7b08f 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -344,7 +344,6 @@ void free_initmem(void) { ppc_md.progress = ppc_printk_progress; mark_initmem_nx(); - static_branch_enable(&init_mem_is_free); free_initmem_default(POISON_FREE_INITMEM); ftrace_free_init_tramp(); } -- cgit v1.2.3 From 980411a4d1bb925d28cd9e8d8301dc982ece788d Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 16 Dec 2022 12:43:12 +1100 Subject: powerpc/code-patching: Fix oops with DEBUG_VM enabled Nathan reported that the new per-cpu mm patching oopses if DEBUG_VM is enabled: ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/pgtable.c:333! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV ... NIP assert_pte_locked+0x180/0x1a0 LR assert_pte_locked+0x170/0x1a0 Call Trace: 0x60000000 (unreliable) patch_instruction+0x618/0x6d0 arch_prepare_kprobe+0xfc/0x2d0 register_kprobe+0x520/0x7c0 arch_init_kprobes+0x28/0x3c init_kprobes+0x108/0x184 do_one_initcall+0x60/0x2e0 kernel_init_freeable+0x1f0/0x3e0 kernel_init+0x34/0x1d0 ret_from_kernel_thread+0x5c/0x64 It's caused by the assert_spin_locked() failing in assert_pte_locked(). The assert fails because the PTE was unlocked in text_area_cpu_up_mm(), and never relocked. The PTE page shouldn't be freed, the patching_mm is only used for patching on this CPU, only that single PTE is ever mapped, and it's only unmapped at CPU offline. In fact assert_pte_locked() has a special case to ignore init_mm entirely, and the patching_mm is more-or-less like init_mm, so possibly the check could be skipped for patching_mm too. But for now be conservative, and use the proper PTE accessors at patching time, so that the PTE lock is held while the PTE is used. That also avoids the warning in assert_pte_locked(). With that it's no longer necessary to save the PTE in cpu_patching_context for the mm_patch_enabled() case. Fixes: c28c15b6d28a ("powerpc/code-patching: Use temporary mm for Radix MMU") Reported-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20221216125913.990972-1-mpe@ellerman.id.au --- arch/powerpc/lib/code-patching.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'arch/powerpc/lib/code-patching.c') diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 73ce4b90bb1b..b00112d7ad46 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -178,7 +178,6 @@ static int text_area_cpu_up_mm(unsigned int cpu) this_cpu_write(cpu_patching_context.mm, mm); this_cpu_write(cpu_patching_context.addr, addr); - this_cpu_write(cpu_patching_context.pte, pte); return 0; @@ -195,7 +194,6 @@ static int text_area_cpu_down_mm(unsigned int cpu) this_cpu_write(cpu_patching_context.mm, NULL); this_cpu_write(cpu_patching_context.addr, 0); - this_cpu_write(cpu_patching_context.pte, NULL); return 0; } @@ -289,12 +287,16 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) unsigned long pfn = get_patch_pfn(addr); struct mm_struct *patching_mm; struct mm_struct *orig_mm; + spinlock_t *ptl; patching_mm = __this_cpu_read(cpu_patching_context.mm); - pte = __this_cpu_read(cpu_patching_context.pte); text_poke_addr = __this_cpu_read(cpu_patching_context.addr); patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); + pte = get_locked_pte(patching_mm, text_poke_addr, &ptl); + if (!pte) + return -ENOMEM; + __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); /* order PTE update before use, also serves as the hwsync */ @@ -321,6 +323,8 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) */ local_flush_tlb_page_psize(patching_mm, text_poke_addr, mmu_virtual_psize); + pte_unmap_unlock(pte, ptl); + return err; } -- cgit v1.2.3