summaryrefslogtreecommitdiffstats
path: root/arch/mips
Commit message (Collapse)AuthorAgeFilesLines
* MIPS: kernel: Reserve exception base early to prevent corruptionThomas Bogendoerfer2021-03-094-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception space. Additional we reserve the first 4k (1k for R3k) for either normal exception vector space (legacy CPUs) or special vectors like cache exceptions. Huge thanks to Serge for analysing and proposing a solution to this issue. Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
* MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytesBjørn Mork2021-03-081-1/+5
| | | | | | | | | | | | | | | | | | | | | The devicetree specification requires 8-byte alignment in memory. This is now enforced by libfdt since commit 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") which included the upstream commit 5e735860c478 ("libfdt: Check for 8-byte address alignment in fdt_ro_probe_()"). This broke the MIPS raw appended DTBs which would be appended to the image immediately following the initramfs section. This ends with a 32bit size, resulting in a 4-byte alignment of the DTB. Fix by padding with zeroes to 8-bytes when MIPS_RAW_APPENDED_DTB is defined. Fixes: 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
* crypto: mips/poly1305 - enable for all MIPS processorsMaciej W. Rozycki2021-03-081-2/+2
| | | | | | | | | | | | | | The MIPS Poly1305 implementation is generic MIPS code written such as to support down to the original MIPS I and MIPS III ISA for the 32-bit and 64-bit variant respectively. Lift the current limitation then to enable code for MIPSr1 ISA or newer processors only and have it available for all MIPS processors. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: a11d055e7a64 ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation") Cc: stable@vger.kernel.org # v5.5+ Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
* MIPS: boot/compressed: Copy DTB to aligned addressPaul Cercueil2021-03-082-0/+10
| | | | | | | | | | | | | | | Since 5.12-rc1, the Device Tree blob must now be properly aligned. Therefore, the decompress routine must be careful to copy the blob at the next aligned address after the kernel image. This commit fixes the kernel sometimes not booting with a Device Tree blob appended to it. Fixes: 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") Signed-off-by: Paul Cercueil <paul@crapouillou.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
* Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-blockLinus Torvalds2021-02-271-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring thread rewrite from Jens Axboe: "This converts the io-wq workers to be forked off the tasks in question instead of being kernel threads that assume various bits of the original task identity. This kills > 400 lines of code from io_uring/io-wq, and it's the worst part of the code. We've had several bugs in this area, and the worry is always that we could be missing some pieces for file types doing unusual things (recent /dev/tty example comes to mind, userfaultfd reads installing file descriptors is another fun one... - both of which need special handling, and I bet it's not the last weird oddity we'll find). With these identical workers, we can have full confidence that we're never missing anything. That, in itself, is a huge win. Outside of that, it's also more efficient since we're not wasting space and code on tracking state, or switching between different states. I'm sure we're going to find little things to patch up after this series, but testing has been pretty thorough, from the usual regression suite to production. Any issue that may crop up should be manageable. There's also a nice series of further reductions we can do on top of this, but I wanted to get the meat of it out sooner rather than later. The general worry here isn't that it's fundamentally broken. Most of the little issues we've found over the last week have been related to just changes in how thread startup/exit is done, since that's the main difference between using kthreads and these kinds of threads. In fact, if all goes according to plan, I want to get this into the 5.10 and 5.11 stable branches as well. That said, the changes outside of io_uring/io-wq are: - arch setup, simple one-liner to each arch copy_thread() implementation. - Removal of net and proc restrictions for io_uring, they are no longer needed or useful" * tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits) io-wq: remove now unused IO_WQ_BIT_ERROR io_uring: fix SQPOLL thread handling over exec io-wq: improve manager/worker handling over exec io_uring: ensure SQPOLL startup is triggered before error shutdown io-wq: make buffered file write hashed work map per-ctx io-wq: fix race around io_worker grabbing io-wq: fix races around manager/worker creation and task exit io_uring: ensure io-wq context is always destroyed for tasks arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread() io_uring: cleanup ->user usage io-wq: remove nr_process accounting io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS net: remove cmsg restriction from io_uring based send/recvmsg calls Revert "proc: don't allow async path resolution of /proc/self components" Revert "proc: don't allow async path resolution of /proc/thread-self components" io_uring: move SQPOLL thread io-wq forked worker io-wq: make io_wq_fork_thread() available to other users io-wq: only remove worker from free_list, if it was there io_uring: remove io_identity io_uring: remove any grabbing of context ...
| * arch: setup PF_IO_WORKER threads like PF_KTHREADJens Axboe2021-02-211-1/+1
| | | | | | | | | | | | | | | | | | PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the sense that we don't assign ->set_child_tid with our own structure. Just ensure that every arch sets up the PF_IO_WORKER threads like kthreads in the arch implementation of copy_thread(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2021-02-261-14/+16
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge more updates from Andrew Morton: "118 patches: - The rest of MM. Includes kfence - another runtime memory validator. Not as thorough as KASAN, but it has unmeasurable overhead and is intended to be usable in production builds. - Everything else Subsystems affected by this patch series: alpha, procfs, sysctl, misc, core-kernel, MAINTAINERS, lib, bitops, checkpatch, init, coredump, seq_file, gdb, ubsan, initramfs, and mm (thp, cma, vmstat, memory-hotplug, mlock, rmap, zswap, zsmalloc, cleanups, kfence, kasan2, and pagemap2)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) MIPS: make userspace mapping young by default initramfs: panic with memory information ubsan: remove overflow checks kgdb: fix to kill breakpoints on initmem after boot scripts/gdb: fix list_for_each x86: fix seq_file iteration for pat/memtype.c seq_file: document how per-entry resources are managed. fs/coredump: use kmap_local_page() init/Kconfig: fix a typo in CC_VERSION_TEXT help text init: clean up early_param_on_off() macro init/version.c: remove Version_<LINUX_VERSION_CODE> symbol checkpatch: do not apply "initialise globals to 0" check to BPF progs checkpatch: don't warn about colon termination in linker scripts checkpatch: add kmalloc_array_node to unnecessary OOM message check checkpatch: add warning for avoiding .L prefix symbols in assembly files checkpatch: improve TYPECAST_INT_CONSTANT test message checkpatch: prefer ftrace over function entry/exit printks checkpatch: trivial style fixes checkpatch: ignore warning designated initializers using NR_CPUS checkpatch: improve blank line after declaration test ...
| * | MIPS: make userspace mapping young by defaultHuang Pei2021-02-261-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MIPS page fault path(except huge page) takes 3 exceptions (1 TLB Miss + 2 TLB Invalid), butthe second TLB Invalid exception is just triggered by __update_tlb from do_page_fault writing tlb without _PAGE_VALID set. With this patch, user space mapping prot is made young by default (with both _PAGE_VALID and _PAGE_YOUNG set), and it only take 1 TLB Miss + 1 TLB Invalid exception Remove pte_sw_mkyoung without polluting MM code and make page fault delay of MIPS on par with other architecture Link: https://lkml.kernel.org/r/20210204013942.8398-1-huangpei@loongson.cn Signed-off-by: Huang Pei <huangpei@loongson.cn> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: <huangpei@loongson.cn> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <ambrosehua@gmail.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Li Xuefeng <lixuefeng@loongson.cn> Cc: Yang Tiezhu <yangtiezhu@loongson.cn> Cc: Gao Juxin <gaojuxin@loongson.cn> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Huacai Chen <chenhc@lemote.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'mips_5.12_1' of ↵Linus Torvalds2021-02-254-4/+4
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull more MIPS updates from Thomas Bogendoerfer: - added n64 block driver - fix for ubsan warnings - fix for bcm63xx platform - update of linux-mips mailinglist * tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: arch: mips: update references to current linux-mips list mips: bmips: init clocks earlier vmlinux.lds.h: catch even more instrumentation symbols into .data n64: store dev instance into disk private data n64: cleanup n64cart_probe() n64: cosmetics changes n64: remove curly brackets n64: use sector SECTOR_SHIFT instead 512 n64: use enums for reg n64: move module param at the top n64: move module info at the end n64: use pr_fmt to avoid duplicate string block: Add n64 cart driver
| * | arch: mips: update references to current linux-mips listLukas Bulwahn2021-02-233-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The linux-mips mailing list now lives at kernel.org. Update all references in the kernel tree. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Huacai Chen <chenhuacai@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | mips: bmips: init clocks earlierÁlvaro Fernández Rojas2021-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | device_initcall() is too late for bcm63xx. We need to call of_clk_init() earlier in order to properly boot. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
* | | Merge tag 'kbuild-v5.12' of ↵Linus Torvalds2021-02-253-18/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic syscallhdr.sh scripts: add generic syscalltbl.sh arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...
| * | | arch: syscalls: remove $(srctree)/ prefix from syscall tablesMasahiro Yamada2021-02-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'syscall' variables are not directly used in the commands. Remove the $(srctree)/ prefix because we can rely on VPATH. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | arch: syscalls: add missing FORCE and fix 'targets' to make if_changed workMasahiro Yamada2021-02-221-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rules in these Makefiles cannot detect the command line change because the prerequisite 'FORCE' is missing. Adding 'FORCE' will result in the headers being rebuilt every time because the 'targets' additions are also wrong; the file paths in 'targets' must be relative to the current Makefile. Fix all of them so the if_changed rules work correctly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
| * | | kbuild: LD_VERSION redenominationMasahiro Yamada2021-02-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ccbef1674a15 ("Kbuild, lto: add ld-version and ld-ifversion macros") introduced scripts/ld-version.sh for GCC LTO. At that time, this script handled 5 version fields because GCC LTO needed the downstream binutils. (https://lkml.org/lkml/2014/4/8/272) The code snippet from the submitted patch was as follows: # We need HJ Lu's Linux binutils because mainline binutils does not # support mixing assembler and LTO code in the same ld -r object. # XXX check if the gcc plugin ld is the expected one too # XXX some Fedora binutils should also support it. How to check for that? ifeq ($(call ld-ifversion,-ge,22710001,y),y) ... However, GCC LTO was not merged into the mainline after all. (https://lkml.org/lkml/2014/4/8/272) So, the 4th and 5th fields were never used, and finally removed by commit 0d61ed17dd30 ("ld-version: Drop the 4th and 5th version components"). Since then, the last 4-digits returned by this script is always zeros. Remove the meaningless last 4-digits. This makes the version format consistent with GCC_VERSION, CLANG_VERSION, LLD_VERSION. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
* | | | MIPS: do not call flush_tlb_all when setting pmd entryBibo Mao2021-02-242-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function set_pmd_at is to set pmd entry, if tlb entry need to be flushed, there exists pmdp_huge_clear_flush alike function before set_pmd_at is called. So it is not necessary to call flush_tlb_all in this function. In these scenarios, tlb for the pmd range needs to be flushed: - privilege degrade such as wrprotect is set on the pmd entry - pmd entry is cleared - there is exception if set_pmd_at is issued by dup_mmap, since flush_tlb_mm is called for parent process, it is not necessary to flush tlb in function copy_huge_pmd. Link: http://lkml.kernel.org/r/1592990792-1923-3-git-send-email-maobibo@loongson.cn Signed-off-by: Bibo Mao <maobibo@loongson.cn> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Daniel Silsby <dansilsby@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds2021-02-233-0/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
| * | | | fs: add mount_setattr()Christian Brauner2021-01-243-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the missing mount_setattr() syscall. While the new mount api allows to change the properties of a superblock there is currently no way to change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. In addition the old mount api has the restriction that mount options cannot be applied recursively. This hasn't changed since changing mount options on a per-mount basis was implemented in [1] and has been a frequent request not just for convenience but also for security reasons. The legacy mount syscall is unable to accommodate this behavior without introducing a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND | MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost mount. Changing MS_REC to apply to the whole mount tree would mean introducing a significant uapi change and would likely cause significant regressions. The new mount_setattr() syscall allows to recursively clear and set mount options in one shot. Multiple calls to change mount options requesting the same changes are idempotent: int mount_setattr(int dfd, const char *path, unsigned flags, struct mount_attr *uattr, size_t usize); Flags to modify path resolution behavior are specified in the @flags argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to restrict path resolution as introduced with openat2() might be supported in the future. The mount_setattr() syscall can be expected to grow over time and is designed with extensibility in mind. It follows the extensible syscall pattern we have used with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and others. The set of mount options is passed in the uapi struct mount_attr which currently has the following layout: struct mount_attr { __u64 attr_set; __u64 attr_clr; __u64 propagation; __u64 userns_fd; }; The @attr_set and @attr_clr members are used to clear and set mount options. This way a user can e.g. request that a set of flags is to be raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the same time requesting that another set of flags is to be lowered such as removing noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr. Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a bitmap, users wanting to transition to a different atime setting cannot simply specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in @attr_clr. The @propagation field lets callers specify the propagation type of a mount tree. Propagation is a single property that has four different settings and as such is not really a flag argument but an enum. Specifically, it would be unclear what setting and clearing propagation settings in combination would amount to. The legacy mount() syscall thus forbids the combination of multiple propagation settings too. The goal is to keep the semantics of mount propagation somewhat simple as they are overly complex as it is. The @userns_fd field lets user specify a user namespace whose idmapping becomes the idmapping of the mount. This is implemented and explained in detail in the next patch. [1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount") Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com Cc: David Howells <dhowells@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | | | | Merge tag 'modules-for-v5.12' of ↵Linus Torvalds2021-02-232-2/+0
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: - Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These export types were introduced between 2006 - 2008. All the of the unused symbols have been long removed and gpl future symbols were converted to gpl quite a long time ago, and I don't believe these export types have been used ever since. So, I think it should be safe to retire those export types now (Christoph Hellwig) - Refactor and clean up some aged code cruft in the module loader (Christoph Hellwig) - Build {,module_}kallsyms_on_each_symbol only when livepatching is enabled, as it is the only caller (Christoph Hellwig) - Unexport find_module() and module_mutex and fix the last module callers to not rely on these anymore. Make module_mutex internal to the module loader (Christoph Hellwig) - Harden ELF checks on module load and validate ELF structures before checking the module signature (Frank van der Linden) - Fix undefined symbol warning for clang (Fangrui Song) - Fix smatch warning (Dan Carpenter) * tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: potential uninitialized return in module_kallsyms_on_each_symbol() module: remove EXPORT_UNUSED_SYMBOL* module: remove EXPORT_SYMBOL_GPL_FUTURE module: move struct symsearch to module.c module: pass struct find_symbol_args to find_symbol module: merge each_symbol_section into find_symbol module: remove each_symbol_in_section module: mark module_mutex static kallsyms: only build {,module_}kallsyms_on_each_symbol when required kallsyms: refactor {,module_}kallsyms_on_each_symbol module: use RCU to synchronize find_module module: unexport find_module and module_mutex drm: remove drm_fb_helper_modinit powerpc/powernv: remove get_cxl_module module: harden ELF info handling module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
| * | | | module: remove EXPORT_UNUSED_SYMBOL*Christoph Hellwig2021-02-082-2/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXPORT_UNUSED_SYMBOL* is not actually used anywhere. Remove the unused functionality as we generally just remove unused code anyway. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jessica Yu <jeyu@kernel.org>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-02-212-2/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
| * | | | Merge tag 'kvmarm-5.12' of ↵Paolo Bonzini2021-02-125-4/+25
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.12 - Make the nVHE EL2 object relocatable, resulting in much more maintainable code - Handle concurrent translation faults hitting the same page in a more elegant way - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Allow the disabling of symbol export from assembly code - Simplification of the early init hypercall handling
| * | | | locking/arch: Move qrwlock.h include after qspinlock.hWaiman Long2021-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via asm-generic/qspinlock.h. However, this does not work because architectures might be using queued rwlocks but not queued spinlocks (csky), or because they might be defining their own queued_* macros before including asm/qspinlock.h. To fix this, ensure that asm/spinlock.h always includes qrwlock.h after defining arch_spin_is_locked (either directly for csky, or via asm/qspinlock.h for other architectures). The only inclusion elsewhere is in kernel/locking/qrwlock.c. That one is really unnecessary because the file is only compiled in SMP configurations (config QUEUED_RWLOCKS depends on SMP) and in that case linux/spinlock.h already includes asm/qrwlock.h if needed, via asm/spinlock.h. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Waiman Long <longman@redhat.com> Fixes: 26128cb6c7e6 ("locking/rwlocks: Add contention detection for rwlocks") Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ben Gardon <bgardon@google.com> [Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: Raise the maximum number of user memslotsVitaly Kuznetsov2021-02-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86, 32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are allocated dynamically in KVM when added so the only real limitation is 'id_to_index' array which is 'short'. We don't have any other KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures. Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations. In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per vCPU and the guest is free to pick any GFN for each of them, this fragments memslots as QEMU wants to have a separate memslot for each of these pages (which are supposed to act as 'overlay' pages). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | Merge tag 'mips_5.12' of ↵Linus Torvalds2021-02-21122-898/+1021
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - added support for Nintendo N64 - added support for Realtek RTL83XX SoCs - kaslr support for Loongson64 - first steps to get rid of set_fs() - DMA runtime coherent/non-coherent selection cleanup - cleanups and fixes * tag 'mips_5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (98 commits) Revert "MIPS: Add basic support for ptrace single step" vmlinux.lds.h: catch more UBSAN symbols into .data MIPS: kernel: Drop kgdb_call_nmi_hook MAINTAINERS: Add git tree for KVM/mips MIPS: Use common way to parse elfcorehdr MIPS: Simplify EVA cache handling Revert "MIPS: kernel: {ftrace,kgdb}: Set correct address limit for cache flushes" MIPS: remove CONFIG_DMA_PERDEV_COHERENT MIPS: remove CONFIG_DMA_MAYBE_COHERENT driver core: lift dma_default_coherent into common code MIPS: refactor the runtime coherent vs noncoherent DMA indicators MIPS/alchemy: factor out the DMA coherent setup MIPS/malta: simplify plat_setup_iocoherency MIPS: Add basic support for ptrace single step MAINTAINERS: replace non-matching patterns for loongson{2,3} MIPS: Make check condition for SDBBP consistent with EJTAG spec mips: Replace lkml.org links with lore Revert "MIPS: microMIPS: Fix the judgment of mm_jr16_op and mm_jalr_op" MIPS: crash_dump.c: Simplify copy_oldmem_page() Revert "mips: Manually call fdt_init_reserved_mem() method" ...
| * | | | Revert "MIPS: Add basic support for ptrace single step"Thomas Bogendoerfer2021-02-184-116/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7c86ff9925cbc83e8a21f164a8fdc2767e03531e. There are too many special cases for MIPS not covered by this patch. In the end it might be better to implement single stepping in userland than emulating it in the kernel. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: kernel: Drop kgdb_call_nmi_hookThomas Bogendoerfer2021-02-151-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the removal of set_fs() calls kgdb_call_nmi_hook() is now the same as the default implementation, so we can remove it. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: Use common way to parse elfcorehdrJinyang He2021-02-131-28/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "elfcorehdr" can be parsed at kernel/crash_dump.c Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: Simplify EVA cache handlingThomas Bogendoerfer2021-02-131-56/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | protected_cache_op is only used for flushing user addresses, so we only need to define protected_cache_op different in EVA mode and be done with it. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | Revert "MIPS: kernel: {ftrace,kgdb}: Set correct address limit for cache ↵Thomas Bogendoerfer2021-02-132-21/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flushes" This reverts commit 6ebda44f366478d1eea180d93154e7d97b591f50. All icache flushes in this code paths are done via flush_icache_range(), which only uses normal cache instruction. And this is the correct thing for EVA mode, too. So no need to do set_fs(KERNEL_DS) here. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | | | MIPS: remove CONFIG_DMA_PERDEV_COHERENTChristoph Hellwig2021-02-132-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just select DMA_NONCOHERENT and ARCH_HAS_SETUP_DMA_OPS from the MIPS_GENERIC platform instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: remove CONFIG_DMA_MAYBE_COHERENTChristoph Hellwig2021-02-132-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_DMA_MAYBE_COHERENT just guards two early init options now. Just enable them unconditionally for CONFIG_DMA_NONCOHERENT. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | driver core: lift dma_default_coherent into common codeChristoph Hellwig2021-02-139-33/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lift the dma_default_coherent variable from the mips architecture code to the driver core. This allows an architecture to sdefault all device to be DMA coherent at run time, even if the kernel is build with support for DMA noncoherent device. By allowing device_initialize to set the ->dma_coherent field to this default the amount of arch hooks required for this behavior can be greatly reduced. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: refactor the runtime coherent vs noncoherent DMA indicatorsChristoph Hellwig2021-02-136-40/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the global coherentio enum, and the hw_coherentio (fake) boolean variables with a single boolean dma_default_coherent flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS/alchemy: factor out the DMA coherent setupChristoph Hellwig2021-02-131-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out a alchemy_dma_coherent helper that determines if the platform is DMA coherent. Also stop initializing the hw_coherentio variable, given that is only ever set to a non-zero value by the malta setup code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS/malta: simplify plat_setup_iocoherencyChristoph Hellwig2021-02-131-23/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that plat_mem_setup runs before earlyparams are handled and malta selects CONFIG_DMA_MAYBE_COHERENT, coherentio can only be set to IO_COHERENCE_DEFAULT at this point. So remove the checking for other options and merge plat_enable_iocoherency into plat_setup_iocoherency to simplify the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: Add basic support for ptrace single stepTiezhu Yang2021-02-134-1/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current code, arch_has_single_step() is not defined on MIPS, that means MIPS does not support instruction single-step for user mode. Delve is a debugger for the Go programming language, the ptrace syscall PtraceSingleStep() failed [1] on MIPS and then the single step function can not work well, we can see that PtraceSingleStep() definition returns ptrace(PTRACE_SINGLESTEP) [2]. So it is necessary to support ptrace single step on MIPS. At the beginning, we try to use the Debug Single Step exception on the Loongson 3A4000 platform, but it has no effect when set CP0_DEBUG SSt bit, this is because CP0_DEBUG NoSSt bit is 1 which indicates no single-step feature available [3], so this way which is dependent on the hardware is almost impossible. With further research, we find out there exists a common way used with break instruction in arch/alpha/kernel/ptrace.c, it is workable. For the above analysis, define arch_has_single_step(), add the common function user_enable_single_step() and user_disable_single_step(), set flag TIF_SINGLESTEP for child process, use break instruction to set breakpoint. We can use the following testcase to test it: tools/testing/selftests/breakpoints/step_after_suspend_test.c $ make -C tools/testing/selftests TARGETS=breakpoints $ cd tools/testing/selftests/breakpoints Without this patch: $ ./step_after_suspend_test -n TAP version 13 1..4 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error ok 1 # SKIP CPU 0 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error ok 2 # SKIP CPU 1 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error ok 3 # SKIP CPU 2 # ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error ok 4 # SKIP CPU 3 # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:4 error:0 With this patch: $ ./step_after_suspend_test -n TAP version 13 1..4 ok 1 CPU 0 ok 2 CPU 1 ok 3 CPU 2 ok 4 CPU 3 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 [1] https://github.com/go-delve/delve/blob/master/pkg/proc/native/threads_linux.go#L50 [2] https://github.com/go-delve/delve/blob/master/vendor/golang.org/x/sys/unix/syscall_linux.go#L1573 [3] http://www.t-es-t.hu/download/mips/md00047f.pdf Reported-by: Guoqi Chen <chenguoqi@loongson.cn> Signed-off-by: Xingxing Su <suxingxing@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: Make check condition for SDBBP consistent with EJTAG specTiezhu Yang2021-02-112-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to MIPS EJTAG Specification [1], a Debug Breakpoint exception occurs when an SDBBP instruction is executed, the CP0_DEBUG bit DBp indicates that a Debug Breakpoint exception occurred. When I read the original code, it looks a little confusing at first glance, just check bit DBp for SDBBP to make the code more readable, it will be much easier to understand. [1] http://www.t-es-t.hu/download/mips/md00047f.pdf Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | mips: Replace lkml.org links with loreKees Cook2021-02-111-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org links with lore"), replace lkml.org links with lore to better use a single source that's more likely to stay available long-term. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | Revert "MIPS: microMIPS: Fix the judgment of mm_jr16_op and mm_jalr_op"Thomas Bogendoerfer2021-02-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9308579fef3ddde19da9d45e23bf36d41932417f. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: crash_dump.c: Simplify copy_oldmem_page()Youling Tang2021-02-091-35/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace kmap_atomic_pfn() with kmap_local_pfn() which is preemptible and can take page faults. Remove the indirection of the dump page and the related cruft which is not longer required. Remove unused or redundant header files. Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | Revert "mips: Manually call fdt_init_reserved_mem() method"Serge Semin2021-02-091-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3751cbda8f223549d7ea28803cbec8ac87e43ed2. Originally the patch was created to fix the reserved-memory DT-node parsing failure on the early stages of the platform memory initialization. That happened due to the two early memory allocators utilization that time: bootmem and memblock. At first the platform-specific memory mapping array was initialized. Then the early_init_fdt_scan_reserved_mem() was called, which couldn't fully parse the "reserved-memory" DT-node since neither memblock nor bootmem allocators hadn't been initialized at that stage, so the fdt_init_reserved_mem() method failed on the memory allocation calls. Only after that the platform-specific memory mapping were used to create proper bootmem and memblock structures and let the early memory allocations work. That's why we had to call the fdt_init_reserved_mem() method one more time to retry the initialization of the features like CMA. The necessity to have that fix was disappeared after the full memblock support had been added to the MIPS kernel and all plat_mem_setup() had been fixed to add the memory regions right into the memblock memory pool. Let's revert that patch then especially after having Paul reported that the second fdt_init_reserved_mem() call causes the reserved memory pool being created twice bigger than implied. Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map") Reported-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: process: Fix no previous prototype warningJinyang He2021-02-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unwind_stack_by_address and unwind_stack need <asm/stacktrace.h>. arch_align_stack needs <asm/exec.h> link: https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/ZPL2RRA6RZKRQZI5IGOVLFXN2GVZBN3L/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: compressed: fix build with enabled UBSANAlexander Lobakin2021-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1e35918ad9d1 ("MIPS: Enable Undefined Behavior Sanitizer UBSAN") added a possibility to build the entire kernel with UBSAN instrumentation for MIPS, with the exception for VDSO. However, self-extracting head wasn't been added to exceptions, so this occurs: mips-alpine-linux-musl-ld: arch/mips/boot/compressed/decompress.o: in function `FSE_buildDTable_wksp': decompress.c:(.text.FSE_buildDTable_wksp+0x278): undefined reference to `__ubsan_handle_shift_out_of_bounds' mips-alpine-linux-musl-ld: decompress.c:(.text.FSE_buildDTable_wksp+0x2a8): undefined reference to `__ubsan_handle_shift_out_of_bounds' mips-alpine-linux-musl-ld: decompress.c:(.text.FSE_buildDTable_wksp+0x2c4): undefined reference to `__ubsan_handle_shift_out_of_bounds' mips-alpine-linux-musl-ld: arch/mips/boot/compressed/decompress.o: decompress.c:(.text.FSE_buildDTable_raw+0x9c): more undefined references to `__ubsan_handle_shift_out_of_bounds' follow Add UBSAN_SANITIZE := n to mips/boot/compressed/Makefile to exclude it from instrumentation scope and fix this issue. Fixes: 1e35918ad9d1 ("MIPS: Enable Undefined Behavior Sanitizer UBSAN") Cc: stable@vger.kernel.org # 5.0+ Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: relocatable: Use __kaslr_offset in show_kernel_relocationJinyang He2021-02-091-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The type of the VMLINUX_LOAD_ADDRESS macro is the (unsigned long long) in 32bits kernel but (unsigned long) in the 64-bit kernel. Although there is no error here, avoid using it to calculate kaslr_offset. Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: relocatable: Provide kaslr_offset() to get the kernel offsetJinyang He2021-02-093-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide kaslr_offset() to get the kernel offset when KASLR is enabled. Error may occur before update_kaslr_offset(), so put it at the end of the offset branch. Fixes: a307a4ce9ecd ("MIPS: Loongson64: Add KASLR support") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: kernel: Support extracting off-line stack traces from user-space with perfTiezhu Yang2021-02-044-1/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add perf_event_mips_regs/perf_reg_value/perf_reg_validate to support features HAVE_PERF_REGS/HAVE_PERF_USER_STACK_DUMP in kernel. [ayan@wavecomp.com: Repick this patch for unwinding userstack backtrace by perf and libunwind on MIPS based CPU.] [ralf@linux-mips.org: Add perf_get_regs_user() which is required after 'commit 88a7c26af8da ("perf: Move task_pt_regs sampling into arch code")'.] [yangtiezhu@loongson.cn: Fix build error about perf_get_regs_user() after commit 76a4efa80900 ("perf/arch: Remove perf_sample_data::regs_user_copy"), and also separate the original patches into two parts (MIPS kernel and perf tools) to merge easily.] The original patches: https://lore.kernel.org/patchwork/patch/1126521/ https://lore.kernel.org/patchwork/patch/1126520/ Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Archer Yan <ayan@wavecomp.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: pistachio: remove obsolete include/asm/mach-pistachioAlexander Lobakin2021-02-042-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 02bd530f888c ("MIPS: generic: Increase NR_IRQS to 256") include/asm/mach-pistachio/irq.h just does nothing. Remove the file along with mach-pistachio folder and include compiler directive. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | mips: dts: Add support for Cisco SG220-26 switchBert Vermeulen2021-02-042-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Bert Vermeulen <bert@biot.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
| * | | | MIPS: Add Realtek RTL838x/RTL839x support as generic MIPS systemBert Vermeulen2021-02-041-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is just enough system to boot the kernel with earlycon working. Signed-off-by: Bert Vermeulen <bert@biot.com> Signed-off-by: Sander Vanheule <sander@svanheule.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>