summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* ARM: 8875/1: Kconfig: default to AEABI w/ ClangNick Desaulniers2019-10-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a05b9608456e0d4464c6f7ca8572324ace57a3f4 ] Clang produces references to __aeabi_uidivmod and __aeabi_idivmod for arm-linux-gnueabi and arm-linux-gnueabihf targets incorrectly when AEABI is not selected (such as when OABI_COMPAT is selected). While this means that OABI userspaces wont be able to upgraded to kernels built with Clang, it means that boards that don't enable AEABI like s3c2410_defconfig will stop failing to link in KernelCI when built with Clang. Link: https://github.com/ClangBuiltLinux/linux/issues/482 Link: https://groups.google.com/forum/#!msg/clang-built-linux/yydsAAux5hk/GxjqJSW-AQAJ Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: 8898/1: mm: Don't treat faults reported from cache maintenance as writesWill Deacon2019-10-072-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 834020366da9ab3fb87d1eb9a3160eb22dbed63a ] Translation faults arising from cache maintenance instructions are rather unhelpfully reported with an FSR value where the WnR field is set to 1, indicating that the faulting access was a write. Since cache maintenance instructions on 32-bit ARM do not require any particular permissions, this can cause our private 'cacheflush' system call to fail spuriously if a translation fault is generated due to page aging when targetting a read-only VMA. In this situation, we will return -EFAULT to userspace, although this is unfortunately suppressed by the popular '__builtin___clear_cache()' intrinsic provided by GCC, which returns void. Although it's tempting to write this off as a userspace issue, we can actually do a little bit better on CPUs that support LPAE, even if the short-descriptor format is in use. On these CPUs, cache maintenance faults additionally set the CM field in the FSR, which we can use to suppress the write permission checks in the page fault handler and succeed in performing cache maintenance to read-only areas even in the presence of a translation fault. Reported-by: Orion Hodson <oth@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mips/atomic: Fix smp_mb__{before,after}_atomic()Peter Zijlstra2019-10-074-29/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 42344113ba7a1ed7b5654cd5270af0d5698d8521 ] Recent probing at the Linux Kernel Memory Model uncovered a 'surprise'. Strongly ordered architectures where the atomic RmW primitive implies full memory ordering and smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS without WEAK_REORDERING_BEYOND_LLSC) fail for: *x = 1; atomic_inc(u); smp_mb__after_atomic(); r0 = *y; Because, while the atomic_inc() implies memory order, it (surprisingly) does not provide a compiler barrier. This then allows the compiler to re-order like so: atomic_inc(u); *x = 1; smp_mb__after_atomic(); r0 = *y; Which the CPU is then allowed to re-order (under TSO rules) like: atomic_inc(u); r0 = *y; *x = 1; And this very much was not intended. Therefore strengthen the atomic RmW ops to include a compiler barrier. Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mips/atomic: Fix loongson_llsc_mb() wreckagePeter Zijlstra2019-10-075-16/+32
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1c6c1ca318585f1096d4d04bc722297c85e9fb8a ] The comment describing the loongson_llsc_mb() reorder case doesn't make any sense what so ever. Instruction re-ordering is not an SMP artifact, but rather a CPU local phenomenon. Clarify the comment by explaining that these issue cause a coherence fail. For the branch speculation case; if futex_atomic_cmpxchg_inatomic() needs one at the bne branch target, then surely the normal __cmpxch_asm() implementation does too. We cannot rely on the barriers from cmpxchg() because cmpxchg_local() is implemented with the same macro, and branch prediction and speculation are, too, CPU local. Fixes: e02e07e3127d ("MIPS: Loongson: Introduce and use loongson_llsc_mb()") Cc: Huacai Chen <chenhc@lemote.com> Cc: Huang Pei <huangpei@loongson.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* MIPS: tlbex: Explicitly cast _PAGE_NO_EXEC to a booleanNathan Chancellor2019-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c59ae0a1055127dd3828a88e111a0db59b254104 ] clang warns: arch/mips/mm/tlbex.c:634:19: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand] if (cpu_has_rixi && _PAGE_NO_EXEC) { ^ ~~~~~~~~~~~~~ arch/mips/mm/tlbex.c:634:19: note: use '&' for a bitwise operation if (cpu_has_rixi && _PAGE_NO_EXEC) { ^~ & arch/mips/mm/tlbex.c:634:19: note: remove constant to silence this warning if (cpu_has_rixi && _PAGE_NO_EXEC) { ~^~~~~~~~~~~~~~~~ 1 error generated. Explicitly cast this value to a boolean so that clang understands we intend for this to be a non-zero value. Fixes: 00bf1c691d08 ("MIPS: tlbex: Avoid placing software PTE bits in Entry* PFN fields") Link: https://github.com/ClangBuiltLinux/linux/issues/609 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: clang-built-linux@googlegroups.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* MIPS: Don't use bc_false uninitialized in __mm_isBranchInstrNathan Chancellor2019-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c2869aafe7191d366d74c55cb8a93c6d0baba317 ] clang warns: arch/mips/kernel/branch.c:148:8: error: variable 'bc_false' is used uninitialized whenever switch case is taken [-Werror,-Wsometimes-uninitialized] case mm_bc2t_op: ^~~~~~~~~~ arch/mips/kernel/branch.c:157:8: note: uninitialized use occurs here if (bc_false) ^~~~~~~~ arch/mips/kernel/branch.c:149:8: error: variable 'bc_false' is used uninitialized whenever switch case is taken [-Werror,-Wsometimes-uninitialized] case mm_bc1t_op: ^~~~~~~~~~ arch/mips/kernel/branch.c:157:8: note: uninitialized use occurs here if (bc_false) ^~~~~~~~ arch/mips/kernel/branch.c:142:4: note: variable 'bc_false' is declared here int bc_false = 0; ^ 2 errors generated. When mm_bc1t_op and mm_bc2t_op are taken, the bc_false initialization does not happen, which leads to a garbage value upon use, as illustrated below with a small sample program. $ mipsel-linux-gnu-gcc --version | head -n1 mipsel-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0 $ clang --version | head -n1 ClangBuiltLinux clang version 9.0.0 (git://github.com/llvm/llvm-project 544315b4197034a3be8acd12cba56a75fb1f08dc) (based on LLVM 9.0.0svn) $ cat test.c #include <stdio.h> static void switch_scoped(int opcode) { switch (opcode) { case 1: case 2: { int bc_false = 0; bc_false = 4; case 3: case 4: printf("\t* switch scoped bc_false = %d\n", bc_false); } } } static void function_scoped(int opcode) { int bc_false = 0; switch (opcode) { case 1: case 2: { bc_false = 4; case 3: case 4: printf("\t* function scoped bc_false = %d\n", bc_false); } } } int main(void) { int opcode; for (opcode = 1; opcode < 5; opcode++) { printf("opcode = %d:\n", opcode); switch_scoped(opcode); function_scoped(opcode); printf("\n"); } return 0; } $ mipsel-linux-gnu-gcc -std=gnu89 -static test.c && \ qemu-mipsel a.out opcode = 1: * switch scoped bc_false = 4 * function scoped bc_false = 4 opcode = 2: * switch scoped bc_false = 4 * function scoped bc_false = 4 opcode = 3: * switch scoped bc_false = 2147483004 * function scoped bc_false = 0 opcode = 4: * switch scoped bc_false = 2147483004 * function scoped bc_false = 0 $ clang -std=gnu89 --target=mipsel-linux-gnu -m32 -static test.c && \ qemu-mipsel a.out opcode = 1: * switch scoped bc_false = 4 * function scoped bc_false = 4 opcode = 2: * switch scoped bc_false = 4 * function scoped bc_false = 4 opcode = 3: * switch scoped bc_false = 2147483004 * function scoped bc_false = 0 opcode = 4: * switch scoped bc_false = 2147483004 * function scoped bc_false = 0 Move the definition up so that we get the right behavior and mark it __maybe_unused as it will not be used when CONFIG_MIPS_FP_SUPPORT isn't enabled. Fixes: 6a1cc218b9cc ("MIPS: branch: Remove FP branch handling when CONFIG_MIPS_FP_SUPPORT=n") Link: https://github.com/ClangBuiltLinux/linux/issues/603 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: clang-built-linux@googlegroups.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* MIPS: Ingenic: Disable broken BTB lookup optimization.Zhou Yanjie2019-10-072-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 053951dda71ecb4b554a2cdbe26f5f6f9bee9dd2 ] In order to further reduce power consumption, the XBurst core by default attempts to avoid branch target buffer lookups by detecting & special casing loops. This feature will cause BogoMIPS and lpj calculate in error. Set cp0 config7 bit 4 to disable this feature. Signed-off-by: Zhou Yanjie <zhouyanjie@zoho.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: ralf@linux-mips.org Cc: paul@crapouillou.net Cc: jhogan@kernel.org Cc: malat@debian.org Cc: gregkh@linuxfoundation.org Cc: tglx@linutronix.de Cc: allison@lohutok.net Cc: syq@debian.org Cc: chenhc@lemote.com Cc: jiaxun.yang@flygoat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc: dump kernel log before carrying out fadump or kdumpGanesh Goudar2019-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | [ Upstream commit e7ca44ed3ba77fc26cf32650bb71584896662474 ] Since commit 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path"), pstore dmesg file is not updated when dump is triggered from HMC. This commit modified system reset (sreset) handler to invoke fadump or kdump (if configured), without pushing dmesg to pstore. This leaves pstore to have old dmesg data which won't be much of a help if kdump fails to capture the dump. This patch fixes that by calling kmsg_dump() before heading to fadump ot kdump. Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path") Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: fix unreachable code issue with cmpxchgArnd Bergmann2019-10-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 920fdab7b3ce98c14c840261e364f490f3679a62 ] On arm64 build with clang, sometimes the __cmpxchg_mb is not inlined when CONFIG_OPTIMIZE_INLINING is set. Clang then fails a compile-time assertion, because it cannot tell at compile time what the size of the argument is: mm/memcontrol.o: In function `__cmpxchg_mb': memcontrol.c:(.text+0x1a4c): undefined reference to `__compiletime_assert_175' memcontrol.c:(.text+0x1a4c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `__compiletime_assert_175' Mark all of the cmpxchg() style functions as __always_inline to ensure that the compiler can see the result. Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/648 Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Tested-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pseries: correctly track irq state in default idleNathan Lynch2019-10-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 92c94dfb69e350471473fd3075c74bc68150879e ] prep_irq_for_idle() is intended to be called before entering H_CEDE (and it is used by the pseries cpuidle driver). However the default pseries idle routine does not call it, leading to mismanaged lazy irq state when the cpuidle driver isn't in use. Manifestations of this include: * Dropped IPIs in the time immediately after a cpu comes online (before it has installed the cpuidle handler), making the online operation block indefinitely waiting for the new cpu to respond. * Hitting this WARN_ON in arch_local_irq_restore(): /* * We should already be hard disabled here. We had bugs * where that wasn't the case so let's dbl check it and * warn if we are wrong. Only do that when IRQ tracing * is enabled as mfmsr() can be costly. */ if (WARN_ON_ONCE(mfmsr() & MSR_EE)) __hard_irq_disable(); Call prep_irq_for_idle() from pseries_lpar_idle() and honor its result. Fixes: 363edbe2614a ("powerpc: Default arch idle could cede processor on pseries") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/eeh: Clean up EEH PEs after recovery finishesOliver O'Halloran2019-10-073-3/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 799abe283e5103d48e079149579b4f167c95ea0e ] When the last device in an eeh_pe is removed the eeh_pe structure itself (and any empty parents) are freed since they are no longer needed. This results in a crash when a hotplug driver is involved since the following may occur: 1. Device is suprise removed. 2. Driver performs an MMIO, which fails and queues and eeh_event. 3. Hotplug driver receives a hotplug interrupt and removes any pci_devs that were under the slot. 4. pci_dev is torn down and the eeh_pe is freed. 5. The EEH event handler thread processes the eeh_event and crashes since the eeh_pe pointer in the eeh_event structure is no longer valid. Crashing is generally considered poor form. Instead of doing that use the fact PEs are marked as EEH_PE_INVALID to keep them around until the end of the recovery cycle, at which point we can safely prune any empty PEs. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190903101605.2890-2-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s/exception: machine check use correct cfar for late handlerNicholas Piggin2019-10-071-0/+4
| | | | | | | | | | | | | | | | | | | [ Upstream commit 0b66370c61fcf5fcc1d6901013e110284da6e2bb ] Bare metal machine checks run an "early" handler in real mode before running the main handler which reports the event. The main handler runs exactly as a normal interrupt handler, after the "windup" which sets registers back as they were at interrupt entry. CFAR does not get restored by the windup code, so that will be wrong when the handler is run. Restore the CFAR to the saved value before running the late handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flagSam Bobroff2019-10-071-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit aa06e3d60e245284d1e55497eb3108828092818d ] The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the use of driver callbacks in drivers that have been bound part way through the recovery process. This is necessary to prevent later stage handlers from being called when the earlier stage handlers haven't, which can be confusing for drivers. However, the flag is set for all devices that are added after boot time and only cleared at the end of the EEH recovery process. This results in hot plugged devices erroneously having the flag set during the first recovery after they are added (causing their driver's handlers to be incorrectly ignored). To remedy this, clear the flag at the beginning of recovery processing. The flag is still cleared at the end of recovery processing, although it is no longer really necessary. Also clear the flag during eeh_handle_special_event(), for the same reasons. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b8ca5629d27de74c957d4f4b250177d1b6fc4bbd.1565930772.git.sbobroff@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/perf: fix imc allocation failure handlingNicholas Piggin2019-10-071-11/+18
| | | | | | | | | | | | | | [ Upstream commit 10c4bd7cd28e77aeb8cfa65b23cb3c632ede2a49 ] The alloc_pages_node return value should be tested for failure before being passed to page_address. Tested-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-3-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/pseries/mobility: use cond_resched when updating device treeNathan Lynch2019-10-071-0/+9
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit ccfb5bd71d3d1228090a8633800ae7cdf42a94ac ] After a partition migration, pseries_devicetree_update() processes changes to the device tree communicated from the platform to Linux. This is a relatively heavyweight operation, with multiple device tree searches, memory allocations, and conversations with partition firmware. There's a few levels of nested loops which are bounded only by decisions made by the platform, outside of Linux's control, and indeed we have seen RCU stalls on large systems while executing this call graph. Use cond_resched() in these loops so that the cpu is yielded when needed. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802192926.19277-4-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/64s/radix: Fix memory hotplug section page table creationNicholas Piggin2019-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8f51e3929470942e6a8744061254fdeef646cd36 ] create_physical_mapping expects physical addresses, but creating and splitting these mappings after boot is supplying virtual (effective) addresses. This can be irritated by booting with mem= to limit memory then probing an unused physical memory range: echo <addr> > /sys/devices/system/memory/probe This mostly works by accident, firstly because __va(__va(x)) == __va(x) so the virtual address does not get corrupted. Secondly because pfn_pte masks out the upper bits of the pfn beyond the physical address limit, so a pfn constructed with a 0xc000000000000000 virtual linear address will be masked back to the correct physical address in the pte. Fixes: 6cc27341b21a8 ("powerpc/mm: add radix__create_section_mapping()") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this functionChristophe Leroy2019-10-071-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 38a0d0cdb46d3f91534e5b9839ec2d67be14c59d ] We see warnings such as: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ kernel/futex.c:1651:6: note: 'oldval' was declared here int oldval, ret; ^ This is because arch_futex_atomic_op_inuser() only sets *oval if ret is 0 and GCC doesn't see that it will only use it when ret is 0. Anyway, the non-zero ret path is an error path that won't suffer from setting *oval, and as *oval is a local var in futex_atomic_op_inuser() it will have no impact. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: reword change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/86b72f0c134367b214910b27b9a6dd3321af93bb.1565774657.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/rtas: use device model APIs and serialization during LPMNathan Lynch2019-10-071-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a6717c01ddc259f6f73364779df058e2c67309f8 ] The LPAR migration implementation and userspace-initiated cpu hotplug can interleave their executions like so: 1. Set cpu 7 offline via sysfs. 2. Begin a partition migration, whose implementation requires the OS to ensure all present cpus are online; cpu 7 is onlined: rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up This sets cpu 7 online in all respects except for the cpu's corresponding struct device; dev->offline remains true. 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is already online and returns success. The driver core (device_online) sets dev->offline = false. 4. The migration completes and restores cpu 7 to offline state: rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down This leaves cpu7 in a state where the driver core considers the cpu device online, but in all other respects it is offline and unused. Attempts to online the cpu via sysfs appear to succeed but the driver core actually does not pass the request to the lower-level cpuhp support code. This makes the cpu unusable until the cpu device is manually set offline and then online again via sysfs. Instead of directly calling cpu_up/cpu_down, the migration code should use the higher-level device core APIs to maintain consistent state and serialize operations. Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/xmon: Check for HV mode when dumping XIVE info from OPALCédric Le Goater2019-10-071-7/+10
| | | | | | | | | | | | | | | [ Upstream commit c3e0dbd7f780a58c4695f1cd8fc8afde80376737 ] Currently, the xmon 'dx' command calls OPAL to dump the XIVE state in the OPAL logs and also outputs some of the fields of the internal XIVE structures in Linux. The OPAL calls can only be done on baremetal (PowerNV) and they crash a pseries machine. Fix by checking the hypervisor feature of the CPU. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190814154754.23682-2-clg@kaod.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA ↵Alexey Kardashevskiy2019-10-072-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | window [ Upstream commit c37c792dec0929dbb6360a609fb00fa20bb16fc2 ] We allocate only the first level of multilevel TCE tables for KVM already (alloc_userspace_copy==true), and the rest is allocated on demand. This is not enabled though for bare metal. This removes the KVM limitation (implicit, via the alloc_userspace_copy parameter) and always allocates just the first level. The on-demand allocation of missing levels is already implemented. As from now on DMA map might happen with disabled interrupts, this allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1]. In practice just a single page is allocated there so chances for failure are quite low. To save time when creating a new clean table, this skips non-allocated indirect TCE entries in pnv_tce_free just like we already do in the VFIO IOMMU TCE driver. This changes the default level number from 1 to 2 to reduce the amount of memory required for the default 32bit DMA window at the boot time. The default window size is up to 2GB which requires 4MB of TCEs which is unlikely to be used entirely or at all as most devices these days are 64bit capable so by switching to 2 levels by default we save 4032KB of RAM per a device. While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace can trigger this path via VFIO, see the failure and try creating a table again with different parameters which might succeed. [1]: === BUG: sleeping function called from invalid context at mm/page_alloc.c:4596 in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1 2 locks held by scsi_eh_1/1038: #0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80 #1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0 irq event stamp: 500 hardirqs last enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120 softirqs last enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Call Trace: [c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable) [c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310 [c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560 [c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130 [c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0 [c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0 [c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0 [c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550 [c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0 [c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470 [c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0 [c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0 [c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640 [c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0 [c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0 [c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0 [c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0 [c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100 [c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510 [c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0 [c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70 ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) irq event stamp: 2305 ======================================================== hardirqs last enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34 hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654 WARNING: possible irq lock inversion dependency detected 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G W softirqs last enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654 softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180 -------------------------------------------------------- swapper/0/0 just changed the state of lock: 0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120 but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&(&host->lock)->rlock); lock(fs_reclaim); <Interrupt> lock(&(&host->lock)->rlock); *** DEADLOCK *** no locks held by swapper/0/0. the shortest dependencies between 2nd lock and 1st lock: -> (fs_reclaim){+.+.} ops: 167579 { HARDIRQ-ON-W at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 SOFTIRQ-ON-W at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 INITIAL USE at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 } === Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328Shawn Lin2019-10-051-0/+3
| | | | | | | | | | | | | | | | | | | commit 03e61929c0d227ed3e1c322fc3804216ea298b7e upstream. 150MHz is a fundamental limitation of RK3328 Soc, w/o this limitation, eMMC, for instance, will run into 200MHz clock rate in HS200 mode, which makes the RK3328 boards not always boot properly. By adding it in rk3328.dtsi would also obviate the worry of missing it when adding new boards. Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Cc: Robin Murphy <robin.murphy@arm.com> Cc: Liang Chen <cl@rock-chips.com> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: tlb: Ensure we execute an ISB following walk cache invalidationWill Deacon2019-10-051-0/+1
| | | | | | | | | | | | | | | | | | | commit 51696d346c49c6cf4f29e9b20d6e15832a2e3408 upstream. 05f2d2f83b5a ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable") added a new TLB invalidation helper which is used when freeing intermediate levels of page table used for kernel mappings, but is missing the required ISB instruction after completion of the TLBI instruction. Add the missing barrier. Cc: <stable@vger.kernel.org> Fixes: 05f2d2f83b5a ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-upLuis Araneda2019-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b7005d4ef4f3aa2dc24019ffba03a322557ac43d upstream. This fixes a kernel panic on memcpy when FORTIFY_SOURCE is enabled. The initial smp implementation on commit aa7eb2bb4e4a ("arm: zynq: Add smp support") used memcpy, which worked fine until commit ee333554fed5 ("ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE") enabled overflow checks at runtime, producing a read overflow panic. The computed size of memcpy args are: - p_size (dst): 4294967295 = (size_t) -1 - q_size (src): 1 - size (len): 8 Additionally, the memory is marked as __iomem, so one of the memcpy_* functions should be used for read/write. Fixes: aa7eb2bb4e4a ("arm: zynq: Add smp support") Signed-off-by: Luis Araneda <luaraneda@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: samsung: Fix system restart on S3C6410Lihua Yao2019-10-051-0/+1
| | | | | | | | | | | | | commit 16986074035cc0205472882a00d404ed9d213313 upstream. S3C6410 system restart is triggered by watchdog reset. Cc: <stable@vger.kernel.org> Fixes: 9f55342cc2de ("ARM: dts: s3c64xx: Fix infinite interrupt in soft mode") Signed-off-by: Lihua Yao <ylhuajnu@outlook.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* spi: ep93xx: Repair SPI CS lookup tablesAlexander Sverdlin2019-10-054-5/+5
| | | | | | | | | | | | | | | | | | | | | commit 4fbc485324d2975c54201091dfad0a7dd4902324 upstream. The actual device name of the SPI controller being registered on EP93xx is "spi0" (as seen by gpiod_find_lookup_table()). This patch fixes all relevant lookup tables and the following failure (seen on EDB9302): ep93xx-spi ep93xx-spi.0: failed to register SPI master ep93xx-spi: probe of ep93xx-spi.0 failed with error -22 Fixes: 1dfbf334f1236 ("spi: ep93xx: Convert to use CS GPIO descriptors") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Lukasz Majewski <lukma@denx.de> Link: https://lore.kernel.org/r/20190831180402.10008-1-alexander.sverdlin@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptesSean Christopherson2019-10-051-14/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 92f58b5c0181596d9f1e317b49ada2e728fb76eb upstream. Use the fast invalidate mechasim to zap MMIO sptes on a MMIO generation wrap. The fast invalidate flow was reintroduced to fix a livelock bug in kvm_mmu_zap_all() that can occur if kvm_mmu_zap_all() is invoked when the guest has live vCPUs. I.e. using kvm_mmu_zap_all() to handle the MMIO generation wrap is theoretically susceptible to the livelock bug. This effectively reverts commit 4771450c345dc ("Revert "KVM: MMU: drop kvm_mmu_zap_mmio_sptes""), i.e. restores the behavior of commit a8eca9dcc656a ("KVM: MMU: drop kvm_mmu_zap_mmio_sptes"). Note, this actually fixes commit 571c5af06e303 ("KVM: x86/mmu: Voluntarily reschedule as needed when zapping MMIO sptes"), but there is no need to incrementally revert back to using fast invalidate, e.g. doing so doesn't provide any bisection or stability benefits. Fixes: 571c5af06e303 ("KVM: x86/mmu: Voluntarily reschedule as needed when zapping MMIO sptes") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: Disable posted interrupts for non-standard IRQs delivery modesAlexander Graf2019-10-053-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fdcf756213756c23b533ca4974d1f48c6a4d4281 upstream. We can easily route hardware interrupts directly into VM context when they target the "Fixed" or "LowPriority" delivery modes. However, on modes such as "SMI" or "Init", we need to go via KVM code to actually put the vCPU into a different mode of operation, so we can not post the interrupt Add code in the VMX and SVM PI logic to explicitly refuse to establish posted mappings for advanced IRQ deliver modes. This reflects the logic in __apic_accept_irq() which also only ever passes Fixed and LowPriority interrupts as posted interrupts into the guest. This fixes a bug I have with code which configures real hardware to inject virtual SMIs into my guest. Signed-off-by: Alexander Graf <graf@amazon.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: Manually calculate reserved bits when loading PDPTRSSean Christopherson2019-10-051-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream. Manually generate the PDPTR reserved bit mask when explicitly loading PDPTRs. The reserved bits that are being tracked by the MMU reflect the current paging mode, which is unlikely to be PAE paging in the vast majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation, __set_sregs(), etc... This can cause KVM to incorrectly signal a bad PDPTR, or more likely, miss a reserved bit check and subsequently fail a VM-Enter due to a bad VMCS.GUEST_PDPTR. Add a one off helper to generate the reserved bits instead of sharing code across the MMU's calculations and the PDPTR emulation. The PDPTR reserved bits are basically set in stone, and pushing a helper into the MMU's calculation adds unnecessary complexity without improving readability. Oppurtunistically fix/update the comment for load_pdptrs(). Note, the buggy commit also introduced a deliberate functional change, "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was effectively (and correctly) reverted by commit cd9ae5fe47df ("KVM: x86: Fix page-tables reserved bits"). A bit of SDM archaeology shows that the SDM from late 2008 had a bug (likely a copy+paste error) where it listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and always have been reserved. Fixes: 20c466b56168d ("KVM: Use rsvd_bits_mask in load_pdptrs()") Cc: stable@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Reported-by: Doug Reiland <doug.reiland@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: set ctxt->have_exception in x86_decode_insn()Jan Dakinevich2019-10-052-0/+8
| | | | | | | | | | | | | | | | | | commit c8848cee74ff05638e913582a476bde879c968ad upstream. x86_emulate_instruction() takes into account ctxt->have_exception flag during instruction decoding, but in practice this flag is never set in x86_decode_insn(). Fixes: 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: always stop emulation on page faultJan Dakinevich2019-10-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8530a79c5a9f4e29e6ffb35ec1a79d81f4968ec8 upstream. inject_emulated_exception() returns true if and only if nested page fault happens. However, page fault can come from guest page tables walk, either nested or not nested. In both cases we should stop an attempt to read under RIP and give guest to step over its own page fault handler. This is also visible when an emulated instruction causes a #GP fault and the VMware backdoor is enabled. To handle the VMware backdoor, KVM intercepts #GP faults; with only the next patch applied, x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi() (or gp_interception() for SVM) to re-inject the original #GP because it thinks emulation failed due to a non-VMware opcode. This patch prevents the issue as x86_emulate_instruction() will return EMULATE_DONE after injecting the #GP. Fixes: 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/imc: Dont create debugfs files for cpu-less nodesMadhavan Srinivasan2019-10-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 41ba17f20ea835c489e77bd54e2da73184e22060 upstream. Commit <684d984038aa> ('powerpc/powernv: Add debugfs interface for imc-mode and imc') added debugfs interface for the nest imc pmu devices to support changing of different ucode modes. Primarily adding this capability for debug. But when doing so, the code did not consider the case of cpu-less nodes. So when reading the _cmd_ or _mode_ file of a cpu-less node will create this crash. Faulting instruction address: 0xc0000000000d0d58 Oops: Kernel access of bad area, sig: 11 [#1] ... CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-20190627+ #19 NIP: c0000000000d0d58 LR: c00000000049aa18 CTR:c0000000000d0d50 REGS: c00020194548f9e0 TRAP: 0300 Not tainted (5.2.0-rc6-next-20190627+) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:28022822 XER: 00000000 CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:40000000 IRQMASK: 0 ... NIP imc_mem_get+0x8/0x20 LR simple_attr_read+0x118/0x170 Call Trace: simple_attr_read+0x70/0x170 (unreliable) debugfs_attr_read+0x6c/0xb0 __vfs_read+0x3c/0x70 vfs_read+0xbc/0x1a0 ksys_read+0x7c/0x140 system_call+0x5c/0x70 Patch fixes the issue with a more robust check for vbase to NULL. Before patch, ls output for the debugfs imc directory # ls /sys/kernel/debug/powerpc/imc/ imc_cmd_0 imc_cmd_251 imc_cmd_253 imc_cmd_255 imc_mode_0 imc_mode_251 imc_mode_253 imc_mode_255 imc_cmd_250 imc_cmd_252 imc_cmd_254 imc_cmd_8 imc_mode_250 imc_mode_252 imc_mode_254 imc_mode_8 After patch, ls output for the debugfs imc directory # ls /sys/kernel/debug/powerpc/imc/ imc_cmd_0 imc_cmd_8 imc_mode_0 imc_mode_8 Actual bug here is that, we have two loops with potentially different loop counts. That is, in imc_get_mem_addr_nest(), loop count is obtained from the dt entries. But in case of export_imc_mode_and_cmd(), loop was based on for_each_nid() count. Patch fixes the loop count in latter based on the struct mem_info. Ideally it would be better to have array size in struct imc_pmu. Fixes: 684d984038aa ('powerpc/powernv: Add debugfs interface for imc-mode and imc') Reported-by: Qian Cai <cai@lca.pw> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190827101635.6942-1-maddy@linux.vnet.ibm.com Cc: Jan Stancek <jstancek@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: dts: am3517-evm: Fix missing videoAdam Ford2019-10-051-19/+4
| | | | | | | | | | | | | | | | | commit 24cf23276a54dd2825d3e3965c1b1b453e2a113d upstream. A previous commit removed the panel-dpi driver, which made the video on the AM3517-evm stop working because it relied on the dpi driver for setting video timings. Now that the simple-panel driver is available in omap2plus, this patch migrates the am3517-evm to use a similar panel and remove the manual timing requirements. Fixes: 8bf4b1621178 ("drm/omap: Remove panel-dpi driver") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: omap2plus_defconfig: Fix missing videoAdam Ford2019-10-051-0/+1
| | | | | | | | | | | | | | | | | | | commit 4957eccf979b025286b39388fd1a60cde601a10a upstream. When the panel-dpi driver was removed, the simple-panels driver was never enabled, so anyone who used the panel-dpi driver lost video, and those who used it inconjunction with simple-panels would have to manually enable CONFIG_DRM_PANEL_SIMPLE. This patch makes CONFIG_DRM_PANEL_SIMPLE a module in the same way the deprecated panel-dpi was. Fixes: 8bf4b1621178 ("drm/omap: Remove panel-dpi driver") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: dts: logicpd-torpedo-baseboard: Fix missing videoAdam Ford2019-10-051-31/+6
| | | | | | | | | | | | | | | | | commit f9f5518a38684e031d913f40482721ff553f5ba2 upstream. A previous commit removed the panel-dpi driver, which made the Torpedo video stop working because it relied on the dpi driver for setting video timings. Now that the simple-panel driver is available in omap2plus, this patch migrates the Torpedo dev kits to use a similar panel and remove the manual timing requirements. Fixes: 8bf4b1621178 ("drm/omap: Remove panel-dpi driver") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kvm: Nested KVM MMUs need PAE root tooJiří Paleček2019-10-051-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1cfff4d9a5d01fa61e5768a6afffc81ae1c8ecb9 ] On AMD processors, in PAE 32bit mode, nested KVM instances don't work. The L0 host get a kernel OOPS, which is related to arch.mmu->pae_root being NULL. The reason for this is that when setting up nested KVM instance, arch.mmu is set to &arch.guest_mmu (while normally, it would be &arch.root_mmu). However, the initialization and allocation of pae_root only creates it in root_mmu. KVM code (ie. in mmu_alloc_shadow_roots) then accesses arch.mmu->pae_root, which is the unallocated arch.guest_mmu->pae_root. This fix just allocates (and frees) pae_root in both guest_mmu and root_mmu (and also lm_root if it was allocated). The allocation is subject to previous restrictions ie. it won't allocate anything on 64-bit and AFAIK not on Intel. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203923 Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu") Signed-off-by: Jiri Palecek <jpalecek@web.de> Tested-by: Jiri Palecek <jpalecek@web.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/cpu: Add Tiger Lake to Intel familyGayatri Kammela2019-10-051-0/+3
| | | | | | | | | | | | | | | | | | | [ Upstream commit 6e1c32c5dbb4b90eea8f964c2869d0bde050dbe0 ] Add the model numbers/CPUIDs of Tiger Lake mobile and desktop to the Intel family. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190905193020.14707-2-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/crypto: xts-aes-s390 fix extra run-time crypto self tests findingHarald Freudenberger2019-10-051-0/+6
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 9e323d45ba94262620a073a3f9945ca927c07c71 ] With 'extra run-time crypto self tests' enabled, the selftest for s390-xts fails with alg: skcipher: xts-aes-s390 encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace use_digest nosimd src_divs=[2.61%@+4006, 84.44%@+21, 1.55%@+13, 4.50%@+344, 4.26%@+21, 2.64%@+27]" This special case with nbytes=0 is not handled correctly and this fix now makes sure that -EINVAL is returned when there is en/decrypt called with 0 bytes to en/decrypt. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/mm: Fix cpumask_of_node() error conditionPeter Zijlstra2019-10-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit bc04a049f058a472695aa22905d57e2b1f4c77d9 ] When CONFIG_DEBUG_PER_CPU_MAPS=y we validate that the @node argument of cpumask_of_node() is a valid node_id. It however forgets to check for negative numbers. Fix this by explicitly casting to unsigned int. (unsigned)node >= nr_node_ids verifies: 0 <= node < nr_node_ids Also ammend the error message to match the condition. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lkml.kernel.org/r/20190903075352.GY2369@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/amd_nb: Add PCI device IDs for family 17h, model 70hMarcel Bocu2019-10-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit af4e1c5eca95bed1192d8dc45c8ed63aea2209e8 ] The AMD Ryzen gen 3 processors came with a different PCI IDs for the function 3 & 4 which are used to access the SMN interface. The root PCI address however remained at the same address as the model 30h. Adding the F3/F4 PCI IDs respectively to the misc and link ids appear to be sufficient for k10temp, so let's add them and follow up on the patch if other functions need more tweaking. Vicki Pfau sent an identical patch after I checked that no-one had written this patch. I would have been happy about dropping my patch but unlike for his patch series, I had already Cc:ed the x86 people and they already reviewed the changes. Since Vicki has not answered to any email after his initial series, let's assume she is on vacation and let's avoid duplication of reviews from the maintainers and merge my series. To acknowledge Vicki's anteriority, I added her S-o-b to the patch. v2, suggested by Guenter Roeck and Brian Woods: - rename from 71h to 70h Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Marcel Bocu <marcel.p.bocu@gmail.com> Tested-by: Marcel Bocu <marcel.p.bocu@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Brian Woods <brian.woods@amd.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: "Woods, Brian" <Brian.Woods@amd.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: linux-hwmon@vger.kernel.org Link: https://lore.kernel.org/r/20190722174510.2179-1-marcel.p.bocu@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi ChromebooksMarek Szyprowski2019-10-052-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5b0eeeaa37615df37a9a30929b73e9defe61ca84 ] Commit aff138bf8e37 ("ARM: dts: exynos: Add TMU nodes regulator supply for Peach boards") assigned LDO10 to Exynos Thermal Measurement Unit, but it turned out that it supplies also some other critical parts and board freezes/crashes when it is turned off. The mentioned commit made Exynos TMU a consumer of that regulator and in typical case Exynos TMU driver keeps it enabled from early boot. However there are such configurations (example is multi_v7_defconfig), in which some of the regulators are compiled as modules and are not available from early boot. In such case it may happen that LDO10 is turned off by regulator core, because it has no consumers yet (in this case consumer drivers cannot get it, because the supply regulators for it are not yet available). This in turn causes the board to crash. This patch restores 'always-on' property for the LDO10 regulator. Fixes: aff138bf8e37 ("ARM: dts: exynos: Add TMU nodes regulator supply for Peach boards") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()Song Liu2019-10-051-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 825d0b73cd7526b0bb186798583fae810091cbac ] pti_clone_pmds() assumes that the supplied address is either: - properly PUD/PMD aligned or - the address is actually mapped which means that independently of the mapping level (PUD/PMD/PTE) the next higher mapping exists. If that's not the case the unaligned address can be incremented by PUD or PMD size incorrectly. All callers supply mapped and/or aligned addresses, but for the sake of robustness it's better to handle that case properly and to emit a warning. [ tglx: Rewrote changelog and added WARN_ON_ONCE() ] Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282352470.1938@nanos.tec.linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/mm/pti: Do not invoke PTI functions when PTI is disabledThomas Gleixner2019-10-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 990784b57731192b7d90c8d4049e6318d81e887d ] When PTI is disabled at boot time either because the CPU is not affected or PTI has been disabled on the command line, the boot code still calls into pti_finalize() which then unconditionally invokes: pti_clone_entry_text() pti_clone_kernel_text() pti_clone_kernel_text() was called unconditionally before the 32bit support was added and 32bit added the call to pti_clone_entry_text(). The call has no side effects as cloning the page tables into the available second one, which was allocated for PTI does not create damage. But it does not make sense either and in case that this functionality would be extended later this might actually lead to hard to diagnose issues. Neither function should be called when PTI is runtime disabled. Make the invocation conditional. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190828143124.063353972@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: kpti: ensure patched kernel text is fetched from PoUMark Rutland2019-10-051-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f32c7a8e45105bd0af76872bf6eef0438ff12fb2 ] While the MMUs is disabled, I-cache speculation can result in instructions being fetched from the PoC. During boot we may patch instructions (e.g. for alternatives and jump labels), and these may be dirty at the PoU (and stale at the PoC). Thus, while the MMU is disabled in the KPTI pagetable fixup code we may load stale instructions into the I-cache, potentially leading to subsequent crashes when executing regions of code which have been modified at runtime. Similarly to commit: 8ec41987436d566f ("arm64: mm: ensure patched kernel text is fetched from PoU") ... we can invalidate the I-cache after enabling the MMU to prevent such issues. The KPTI pagetable fixup code itself should be clean to the PoC per the boot protocol, so no maintenance is required for this code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/apic/vector: Warn when vector space exhaustion breaks affinityNeil Horman2019-10-051-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 743dac494d61d991967ebcfab92e4f80dc7583b3 ] On x86, CPUs are limited in the number of interrupts they can have affined to them as they only support 256 interrupt vectors per CPU. 32 vectors are reserved for the CPU and the kernel reserves another 22 for internal purposes. That leaves 202 vectors for assignement to devices. When an interrupt is set up or the affinity is changed by the kernel or the administrator, the vector assignment code attempts to honor the requested affinity mask. If the vector space on the CPUs in that affinity mask is exhausted the code falls back to a wider set of CPUs and assigns a vector on a CPU outside of the requested affinity mask silently. While the effective affinity is reflected in the corresponding /proc/irq/$N/effective_affinity* files the silent breakage of the requested affinity can lead to unexpected behaviour for administrators. Add a pr_warn() when this happens so that adminstrators get at least informed about it in the syslog. [ tglx: Massaged changelog and made the pr_warn() more informative ] Reported-by: djuran@redhat.com Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: djuran@redhat.com Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: OMAP2+: move platform-specific asm-offset.h to arch/arm/mach-omap2Masahiro Yamada2019-10-054-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ccf4975dca233b1d6a74752d6ab35c239edc0d58 ] <generated/ti-pm-asm-offsets.h> is only generated and included by arch/arm/mach-omap2/, so it does not need to reside in the globally visible include/generated/. I renamed it to arch/arm/mach-omap2/pm-asm-offsets.h since the prefix 'ti-' is just redundant in mach-omap2/. My main motivation of this change is to avoid the race condition for the parallel build (-j) when CONFIG_IKHEADERS is enabled. When it is enabled, all the headers under include/ are archived into kernel/kheaders_data.tar.xz and exposed in the sysfs. In the parallel build, we have no idea in which order files are built. - If ti-pm-asm-offsets.h is built before kheaders_data.tar.xz, the header will be included in the archive. Probably nobody will use it, but it is harmless except that it will increase the archive size needlessly. - If kheaders_data.tar.xz is built before ti-pm-asm-offsets.h, the header will not be included in the archive. However, in the next build, the archive will be re-generated to include the newly-found ti-pm-asm-offsets.h. This is not nice from the build system point of view. - If ti-pm-asm-offsets.h and kheaders_data.tar.xz are built at the same time, the corrupted header might be included in the archive, which does not look nice either. This commit fixes the race. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: at91: move platform-specific asm-offset.h to arch/arm/mach-at91Masahiro Yamada2019-10-053-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9fac85a6db8999922f2cd92dfe2e83e063b31a94 ] <generated/at91_pm_data-offsets.h> is only generated and included by arch/arm/mach-at91/, so it does not need to reside in the globally visible include/generated/. I renamed it to arch/arm/mach-at91/pm_data-offsets.h since the prefix 'at91_' is just redundant in mach-at91/. My main motivation of this change is to avoid the race condition for the parallel build (-j) when CONFIG_IKHEADERS is enabled. When it is enabled, all the headers under include/ are archived into kernel/kheaders_data.tar.xz and exposed in the sysfs. In the parallel build, we have no idea in which order files are built. - If at91_pm_data-offsets.h is built before kheaders_data.tar.xz, the header will be included in the archive. Probably nobody will use it, but it is harmless except that it will increase the archive size needlessly. - If kheaders_data.tar.xz is built before at91_pm_data-offsets.h, the header will not be included in the archive. However, in the next build, the archive will be re-generated to include the newly-found at91_pm_data-offsets.h. This is not nice from the build system point of view. - If at91_pm_data-offsets.h and kheaders_data.tar.xz are built at the same time, the corrupted header might be included in the archive, which does not look nice either. This commit fixes the race. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Link: https://lore.kernel.org/r/20190823024346.591-1-yamada.masahiro@socionext.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* s390/kasan: provide uninstrumented __strlenVasily Gorbik2019-10-051-2/+7
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f45f7b5bdaa4828ce871cf03f7c01599a0de57a5 ] s390 kasan code uses sclp_early_printk to report initialization failures. The code doing that should not be instrumented, because kasan shadow memory has not been set up yet. Even though sclp_early_core.c is compiled with instrumentation disabled it uses strlen function, which is instrumented and would produce shadow memory access if used. To avoid that, introduce uninstrumented __strlen function to be used instead. Before commit 7e0d92f00246 ("s390/kasan: improve string/memory functions checks") few string functions (including strlen) were escaping kasan instrumentation due to usage of platform specific versions which are implemented in inline assembly. Fixes: 7e0d92f00246 ("s390/kasan: improve string/memory functions checks") Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: entry: Move ct_user_exit before any other exceptionJames Morse2019-10-053-17/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2671828c3ff4ffadf777f793a1f3232d6e51394a ] When taking an SError or Debug exception from EL0, we run the C handler for these exceptions before updating the context tracking code and unmasking lower priority interrupts. When booting with nohz_full lockdep tells us we got this wrong: | ============================= | WARNING: suspicious RCU usage | 5.3.0-rc2-00010-gb4b5e9dcb11b-dirty #11271 Not tainted | ----------------------------- | include/linux/rcupdate.h:643 rcu_read_unlock() used illegally wh! | | other info that might help us debug this: | | | RCU used illegally from idle CPU! | rcu_scheduler_active = 2, debug_locks = 1 | RCU used illegally from extended quiescent state! | 1 lock held by a.out/432: | #0: 00000000c7a79515 (rcu_read_lock){....}, at: brk_handler+0x00 | | stack backtrace: | CPU: 1 PID: 432 Comm: a.out Not tainted 5.3.0-rc2-00010-gb4b5e9d1 | Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno De8 | Call trace: | dump_backtrace+0x0/0x140 | show_stack+0x14/0x20 | dump_stack+0xbc/0x104 | lockdep_rcu_suspicious+0xf8/0x108 | brk_handler+0x164/0x1b0 | do_debug_exception+0x11c/0x278 | el0_dbg+0x14/0x20 Moving the ct_user_exit calls to be before do_debug_exception() means they are also before trace_hardirqs_off() has been updated. Add a new ct_user_exit_irqoff macro to avoid the context-tracking code using irqsave/restore before we've updated trace_hardirqs_off(). To be consistent, do this everywhere. The C helper is called enter_from_user_mode() to match x86 in the hope we can merge them into kernel/context_tracking.c later. Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 6c81fe7925cc4c42 ("arm64: enable context tracking") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/Makefile: Always pass --synthetic to nm if supportedMichael Ellerman2019-10-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 117acf5c29dd89e4c86761c365b9724dba0d9763 ] Back in 2004 we added logic to arch/ppc64/Makefile to pass the --synthetic option to nm, if it was supported by nm. Then in 2005 when arch/ppc64 and arch/ppc were merged, the logic to add --synthetic was moved inside an #ifdef CONFIG_PPC64 block within arch/powerpc/Makefile, and has remained there since. That was fine, though crufty, until recently when a change to init/Kconfig added a config time check that uses $(NM). On powerpc that leads to an infinite loop because Kconfig uses $(NM) to calculate some values, then the powerpc Makefile changes $(NM), which Kconfig notices and restarts. The original commit that added --synthetic simply said: On new toolchains we need to use nm --synthetic or we miss code symbols. And the nm man page says that the --synthetic option causes nm to: Include synthetic symbols in the output. These are special symbols created by the linker for various purposes. So it seems safe to always pass --synthetic if nm supports it, ie. on 32-bit and 64-bit, it just means 32-bit kernels might have more symbols reported (and in practice I see no extra symbols). Making it unconditional avoids the #ifdef CONFIG_PPC64, which in turn avoids the infinite loop. Debugged-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/platform/intel/iosf_mbi Rewrite lockingHans de Goede2019-10-051-38/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 00452ba9fdb5bf6fb5fea1dae5227b4bbed44fc4 ] There are 2 problems with the old iosf PMIC I2C bus arbritration code which need to be addressed: 1. The lockdep code complains about a possible deadlock in the iosf_mbi_[un]block_punit_i2c_access code: [ 6.712662] ====================================================== [ 6.712673] WARNING: possible circular locking dependency detected [ 6.712685] 5.3.0-rc2+ #79 Not tainted [ 6.712692] ------------------------------------------------------ [ 6.712702] kworker/0:1/7 is trying to acquire lock: [ 6.712712] 00000000df1c5681 (iosf_mbi_block_punit_i2c_access_count_mutex){+.+.}, at: iosf_mbi_unblock_punit_i2c_access+0x13/0x90 [ 6.712739] but task is already holding lock: [ 6.712749] 0000000067cb23e7 (iosf_mbi_punit_mutex){+.+.}, at: iosf_mbi_block_punit_i2c_access+0x97/0x186 [ 6.712768] which lock already depends on the new lock. [ 6.712780] the existing dependency chain (in reverse order) is: [ 6.712792] -> #1 (iosf_mbi_punit_mutex){+.+.}: [ 6.712808] __mutex_lock+0xa8/0x9a0 [ 6.712818] iosf_mbi_block_punit_i2c_access+0x97/0x186 [ 6.712831] i2c_dw_acquire_lock+0x20/0x30 [ 6.712841] i2c_dw_set_reg_access+0x15/0xb0 [ 6.712851] i2c_dw_probe+0x57/0x473 [ 6.712861] dw_i2c_plat_probe+0x33e/0x640 [ 6.712874] platform_drv_probe+0x38/0x80 [ 6.712884] really_probe+0xf3/0x380 [ 6.712894] driver_probe_device+0x59/0xd0 [ 6.712905] bus_for_each_drv+0x84/0xd0 [ 6.712915] __device_attach+0xe4/0x170 [ 6.712925] bus_probe_device+0x9f/0xb0 [ 6.712935] deferred_probe_work_func+0x79/0xd0 [ 6.712946] process_one_work+0x234/0x560 [ 6.712957] worker_thread+0x50/0x3b0 [ 6.712967] kthread+0x10a/0x140 [ 6.712977] ret_from_fork+0x3a/0x50 [ 6.712986] -> #0 (iosf_mbi_block_punit_i2c_access_count_mutex){+.+.}: [ 6.713004] __lock_acquire+0xe07/0x1930 [ 6.713015] lock_acquire+0x9d/0x1a0 [ 6.713025] __mutex_lock+0xa8/0x9a0 [ 6.713035] iosf_mbi_unblock_punit_i2c_access+0x13/0x90 [ 6.713047] i2c_dw_set_reg_access+0x4d/0xb0 [ 6.713058] i2c_dw_probe+0x57/0x473 [ 6.713068] dw_i2c_plat_probe+0x33e/0x640 [ 6.713079] platform_drv_probe+0x38/0x80 [ 6.713089] really_probe+0xf3/0x380 [ 6.713099] driver_probe_device+0x59/0xd0 [ 6.713109] bus_for_each_drv+0x84/0xd0 [ 6.713119] __device_attach+0xe4/0x170 [ 6.713129] bus_probe_device+0x9f/0xb0 [ 6.713140] deferred_probe_work_func+0x79/0xd0 [ 6.713150] process_one_work+0x234/0x560 [ 6.713160] worker_thread+0x50/0x3b0 [ 6.713170] kthread+0x10a/0x140 [ 6.713180] ret_from_fork+0x3a/0x50 [ 6.713189] other info that might help us debug this: [ 6.713202] Possible unsafe locking scenario: [ 6.713212] CPU0 CPU1 [ 6.713221] ---- ---- [ 6.713229] lock(iosf_mbi_punit_mutex); [ 6.713239] lock(iosf_mbi_block_punit_i2c_access_count_mutex); [ 6.713253] lock(iosf_mbi_punit_mutex); [ 6.713265] lock(iosf_mbi_block_punit_i2c_access_count_mutex); [ 6.713276] *** DEADLOCK *** In practice can never happen because only the first caller which increments iosf_mbi_block_punit_i2c_access_count will also take iosf_mbi_punit_mutex, that is the whole purpose of the counter, which itself is protected by iosf_mbi_block_punit_i2c_access_count_mutex. But there is no way to tell the lockdep code about this and we really want to be able to run a kernel with lockdep enabled without these warnings being triggered. 2. The lockdep warning also points out another real problem, if 2 threads both are in a block of code protected by iosf_mbi_block_punit_i2c_access and the first thread to acquire the block exits before the second thread then the second thread will call mutex_unlock on iosf_mbi_punit_mutex, but it is not the thread which took the mutex and unlocking by another thread is not allowed. Fix this by getting rid of the notion of holding a mutex for the entire duration of the PMIC accesses, be it either from the PUnit side, or from an in kernel I2C driver. In general holding a mutex after exiting a function is a bad idea and the above problems show this case is no different. Instead 2 counters are now used, one for PMIC accesses from the PUnit and one for accesses from in kernel I2C code. When access is requested now the code will wait (using a waitqueue) for the counter of the other type of access to reach 0 and on release, if the counter reaches 0 the wakequeue is woken. Note that the counter approach is necessary to allow nested calls. The main reason for this is so that a series of i2c transfers can be done with the punit blocked from accessing the bus the whole time. This is necessary to be able to safely read/modify/write a PMIC register without racing with the PUNIT doing the same thing. Allowing nested iosf_mbi_block_punit_i2c_access() calls also is desirable from a performance pov since the whole dance necessary to block the PUnit from accessing the PMIC I2C bus is somewhat expensive. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lkml.kernel.org/r/20190812102113.95794-1-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>