summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | tasks: Add a count of task RCU usersEric W. Biederman2019-09-252-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a count of the number of RCU users (currently 1) of the task struct so that we can later add the scheduler case and get rid of the very subtle task_rcu_dereference(), and just use rcu_dereference(). As suggested by Oleg have the count overlap rcu_head so that no additional space in task_struct is required. Inspired-by: Linus Torvalds <torvalds@linux-foundation.org> Inspired-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87woebdplt.fsf_-_@x220.int.ebiederm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/fair: Remove unused cfs_rq_clock_task() functionQian Cai2019-09-171-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfs_rq_clock_task() was first introduced and used in: f1b17280efbd ("sched: Maintain runnable averages across throttled periods") Over time its use has been graduately removed by the following commits: d31b1a66cbe0 ("sched/fair: Factorize PELT update") 23127296889f ("sched/fair: Update scale invariance of PELT") Today, there is no single user left, so it can be safely removed. Found via the -Wunused-function build warning. Signed-off-by: Qian Cai <cai@lca.pw> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/1568668775-2127-1-git-send-email-cai@lca.pw [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'next-lockdown' of ↵Linus Torvalds2019-09-288-22/+137
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull kernel lockdown mode from James Morris: "This is the latest iteration of the kernel lockdown patchset, from Matthew Garrett, David Howells and others. From the original description: This patchset introduces an optional kernel lockdown feature, intended to strengthen the boundary between UID 0 and the kernel. When enabled, various pieces of kernel functionality are restricted. Applications that rely on low-level access to either hardware or the kernel may cease working as a result - therefore this should not be enabled without appropriate evaluation beforehand. The majority of mainstream distributions have been carrying variants of this patchset for many years now, so there's value in providing a doesn't meet every distribution requirement, but gets us much closer to not requiring external patches. There are two major changes since this was last proposed for mainline: - Separating lockdown from EFI secure boot. Background discussion is covered here: https://lwn.net/Articles/751061/ - Implementation as an LSM, with a default stackable lockdown LSM module. This allows the lockdown feature to be policy-driven, rather than encoding an implicit policy within the mechanism. The new locked_down LSM hook is provided to allow LSMs to make a policy decision around whether kernel functionality that would allow tampering with or examining the runtime state of the kernel should be permitted. The included lockdown LSM provides an implementation with a simple policy intended for general purpose use. This policy provides a coarse level of granularity, controllable via the kernel command line: lockdown={integrity|confidentiality} Enable the kernel lockdown feature. If set to integrity, kernel features that allow userland to modify the running kernel are disabled. If set to confidentiality, kernel features that allow userland to extract confidential information from the kernel are also disabled. This may also be controlled via /sys/kernel/security/lockdown and overriden by kernel configuration. New or existing LSMs may implement finer-grained controls of the lockdown features. Refer to the lockdown_reason documentation in include/linux/security.h for details. The lockdown feature has had signficant design feedback and review across many subsystems. This code has been in linux-next for some weeks, with a few fixes applied along the way. Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode") is missing a Signed-off-by from its author. Matthew responded that he is providing this under category (c) of the DCO" * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits) kexec: Fix file verification on S390 security: constify some arrays in lockdown LSM lockdown: Print current->comm in restriction messages efi: Restrict efivar_ssdt_load when the kernel is locked down tracefs: Restrict tracefs when the kernel is locked down debugfs: Restrict debugfs when the kernel is locked down kexec: Allow kexec_file() with appropriate IMA policy when locked down lockdown: Lock down perf when in confidentiality mode bpf: Restrict bpf when kernel lockdown is in confidentiality mode lockdown: Lock down tracing and perf kprobes when in confidentiality mode lockdown: Lock down /proc/kcore x86/mmiotrace: Lock down the testmmiotrace module lockdown: Lock down module params that specify hardware parameters (eg. ioport) lockdown: Lock down TIOCSSERIAL lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down acpi: Disable ACPI table override if the kernel is locked down acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down ACPI: Limit access to custom_method when the kernel is locked down x86/msr: Restrict MSR access when the kernel is locked down x86: Lock down IO port access when the kernel is locked down ...
| * | | | | kexec: Allow kexec_file() with appropriate IMA policy when locked downMatthew Garrett2019-08-191-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Systems in lockdown mode should block the kexec of untrusted kernels. For x86 and ARM we can ensure that a kernel is trustworthy by validating a PE signature, but this isn't possible on other architectures. On those platforms we can use IMA digital signatures instead. Add a function to determine whether IMA has or will verify signatures for a given event type, and if so permit kexec_file() even if the kernel is otherwise locked down. This is restricted to cases where CONFIG_INTEGRITY_TRUSTED_KEYRING is set in order to prevent an attacker from loading additional keys at runtime. Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: linux-integrity@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | lockdown: Lock down perf when in confidentiality modeDavid Howells2019-08-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disallow the use of certain perf facilities that might allow userspace to access kernel data. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | bpf: Restrict bpf when kernel lockdown is in confidentiality modeDavid Howells2019-08-191-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_read() and bpf_read_str() could potentially be abused to (eg) allow private keys in kernel memory to be leaked. Disable them if the kernel has been locked down in confidentiality mode. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: netdev@vger.kernel.org cc: Chun-Yi Lee <jlee@suse.com> cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | lockdown: Lock down tracing and perf kprobes when in confidentiality modeDavid Howells2019-08-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disallow the creation of perf and ftrace kprobes when the kernel is locked down in confidentiality mode by preventing their registration. This prevents kprobes from being used to access kernel memory to steal crypto data, but continues to allow the use of kprobes from signed modules. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: davem@davemloft.net Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | lockdown: Lock down module params that specify hardware parameters (eg. ioport)David Howells2019-08-191-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provided an annotation for module parameters that specify hardware parameters (such as io ports, iomem addresses, irqs, dma channels, fixed dma buffers and other types). Suggested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | hibernate: Disable when the kernel is locked downJosh Boyer2019-08-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is currently no way to verify the resume image when returning from hibernate. This might compromise the signed modules trust model, so until we can work with signed hibernate images we disable it when the kernel is locked down. Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: rjw@rjwysocki.net Cc: pavel@ucw.cz cc: linux-pm@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | kexec_file: Restrict at runtime if the kernel is locked downJiri Bohac2019-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When KEXEC_SIG is not enabled, kernel should not load images through kexec_file systemcall if the kernel is locked down. [Modified by David Howells to fit with modifications to the previous patch and to return -EPERM if the kernel is locked down for consistency with other lockdowns. Modified by Matthew Garrett to remove the IMA integration, which will be replaced by integrating with the IMA architecture policy patches.] Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCEJiri Bohac2019-08-191-9/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparatory patch for kexec_file_load() lockdown. A locked down kernel needs to prevent unsigned kernel images from being loaded with kexec_file_load(). Currently, the only way to force the signature verification is compiling with KEXEC_VERIFY_SIG. This prevents loading usigned images even when the kernel is not locked down at runtime. This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE. Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG turns on the signature verification but allows unsigned images to be loaded. KEXEC_SIG_FORCE disallows images without a valid signature. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | kexec_load: Disable at runtime if the kernel is locked downMatthew Garrett2019-08-191-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kexec_load() syscall permits the loading and execution of arbitrary code in ring 0, which is something that lock-down is meant to prevent. It makes sense to disable kexec_load() in this situation. This does not affect kexec_file_load() syscall which can check for a signature on the image to be booted. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Dave Young <dyoung@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | lockdown: Enforce module signatures if the kernel is locked downDavid Howells2019-08-191-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the kernel is locked down, require that all modules have valid signatures that we can verify. I have adjusted the errors generated: (1) If there's no signature (ENODATA) or we can't check it (ENOPKG, ENOKEY), then: (a) If signatures are enforced then EKEYREJECTED is returned. (b) If there's no signature or we can't check it, but the kernel is locked down then EPERM is returned (this is then consistent with other lockdown cases). (2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we return the error we got. Note that the X.509 code doesn't check for key expiry as the RTC might not be valid or might not have been transferred to the kernel's clock yet. [Modified by Matthew Garrett to remove the IMA integration. This will be replaced with integration with the IMA architecture policy patchset.] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <matthewgarrett@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
* | | | | | Merge branch 'next-integrity' of ↵Linus Torvalds2019-09-274-48/+56
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "The major feature in this time is IMA support for measuring and appraising appended file signatures. In addition are a couple of bug fixes and code cleanup to use struct_size(). In addition to the PE/COFF and IMA xattr signatures, the kexec kernel image may be signed with an appended signature, using the same scripts/sign-file tool that is used to sign kernel modules. Similarly, the initramfs may contain an appended signature. This contained a lot of refactoring of the existing appended signature verification code, so that IMA could retain the existing framework of calculating the file hash once, storing it in the IMA measurement list and extending the TPM, verifying the file's integrity based on a file hash or signature (eg. xattrs), and adding an audit record containing the file hash, all based on policy. (The IMA support for appended signatures patch set was posted and reviewed 11 times.) The support for appended signature paves the way for adding other signature verification methods, such as fs-verity, based on a single system-wide policy. The file hash used for verifying the signature and the signature, itself, can be included in the IMA measurement list" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: ima_api: Use struct_size() in kzalloc() ima: use struct_size() in kzalloc() sefltest/ima: support appended signatures (modsig) ima: Fix use after free in ima_read_modsig() MODSIGN: make new include file self contained ima: fix freeing ongoing ahash_request ima: always return negative code for error ima: Store the measurement again when appraising a modsig ima: Define ima-modsig template ima: Collect modsig ima: Implement support for module-style appended signatures ima: Factor xattr_verify() out of ima_appraise_measurement() ima: Add modsig appraise_type option for module-style appended signatures integrity: Select CONFIG_KEYS instead of depending on it PKCS#7: Introduce pkcs7_get_digest() PKCS#7: Refactor verify_pkcs7_signature() MODSIGN: Export module signature definitions ima: initialize the "template" field with the default template
| * | | | | | MODSIGN: Export module signature definitionsThiago Jung Bauermann2019-08-054-48/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IMA will use the module_signature format for append signatures, so export the relevant definitions and factor out the code which verifies that the appended signature trailer is valid. Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it and be able to use mod_check_sig() without having to depend on either CONFIG_MODULE_SIG or CONFIG_MODULES. s390 duplicated the definition of struct module_signature so now they can use the new <linux/module_signature.h> header instead. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Acked-by: Jessica Yu <jeyu@kernel.org> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-09-272-3/+10
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more KVM updates from Paolo Bonzini: "x86 KVM changes: - The usual accuracy improvements for nested virtualization - The usual round of code cleanups from Sean - Added back optimizations that were prematurely removed in 5.2 (the bare minimum needed to fix the regression was in 5.3-rc8, here comes the rest) - Support for UMWAIT/UMONITOR/TPAUSE - Direct L2->L0 TLB flushing when L0 is Hyper-V and L1 is KVM - Tell Windows guests if SMT is disabled on the host - More accurate detection of vmexit cost - Revert a pvqspinlock pessimization" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (56 commits) KVM: nVMX: cleanup and fix host 64-bit mode checks KVM: vmx: fix build warnings in hv_enable_direct_tlbflush() on i386 KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot() KVM: x86: Drop ____kvm_handle_fault_on_reboot() KVM: VMX: Add error handling to VMREAD helper KVM: VMX: Optimize VMX instruction error and fault handling KVM: x86: Check kvm_rebooting in kvm_spurious_fault() KVM: selftests: fix ucall on x86 Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" kvm: nvmx: limit atomic switch MSRs kvm: svm: Intercept RDPRU kvm: x86: Add "significant index" flag to a few CPUID leaves KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero KVM: x86/mmu: Explicitly track only a single invalid mmu generation KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call" KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first"" KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages"" KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch"" KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages"" KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints"" ...
| * | | | | | Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"Wanpeng Li2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted). A large performance regression was caused by this commit. on over-subscription scenarios. The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads, with three VMs of 80 vCPUs each. The score of ebizzy -M is reduced from 13000-14000 records/s to 1700-1800 records/s: Host Guest score vanilla w/o kvm optimizations upstream 1700-1800 records/s vanilla w/o kvm optimizations revert 13000-14000 records/s vanilla w/ kvm optimizations upstream 4500-5000 records/s vanilla w/ kvm optimizations revert 14000-15500 records/s Exit from aggressive wait-early mechanism can result in premature yield and extra scheduling latency. Actually, only 6% of wait_early events are caused by vcpu_is_preempted() being true. However, when one vCPU voluntarily releases its vCPU, all the subsequently waiters in the queue will do the same and the cascading effect leads to bad performance. kvm optimizations: [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts) [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption) Tested-by: loobinliu@tencent.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: loobinliu@tencent.com Cc: stable@vger.kernel.org Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | cpu/SMT: create and export cpu_smt_possible()Vitaly Kuznetsov2019-09-241-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM needs to know if SMT is theoretically possible, this means it is supported and not forcefully disabled ('nosmt=force'). Create and export cpu_smt_possible() answering this question. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2019-09-261-3/+5
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a timer expiry bug that would cause spurious delay of timers" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Read jiffies once when forwarding base clk
| * | | | | | | timer: Read jiffies once when forwarding base clkLi RongQing2019-09-191-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The timer delayed for more than 3 seconds warning was triggered during testing. Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xee/0x100 ... Call Trace: process_one_work+0x18c/0x3a0 worker_thread+0x30/0x380 kthread+0x113/0x130 ret_from_fork+0x22/0x40 The reason is that the code in collect_expired_timers() uses jiffies unprotected: if (next_event > jiffies) base->clk = jiffies; As the compiler is allowed to reload the value base->clk can advance between the check and the store and in the worst case advance farther than next event. That causes the timer expiry to be delayed until the wheel pointer wraps around. Convert the code to use READ_ONCE() Fixes: 236968383cf5 ("timers: Optimize collect_expired_timers() for NOHZ") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Liang ZhiCheng <liangzhicheng@baidu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com
* | | | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2019-09-261-3/+3
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "The only kernel change is comment typo fixes. The rest is mostly tooling fixes, but also new vendor event additions and updates, a bigger libperf/libtraceevent library and a header files reorganization that came in a bit late" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (108 commits) perf unwind: Fix libunwind build failure on i386 systems perf parser: Remove needless include directives perf build: Add detection of java-11-openjdk-devel package perf jvmti: Include JVMTI support for s390 perf vendor events: Remove P8 HW events which are not supported perf evlist: Fix access of freed id arrays perf stat: Fix free memory access / memory leaks in metrics perf tools: Replace needless mmap.h with what is needed, event.h perf evsel: Move config terms to a separate header perf evlist: Remove unused perf_evlist__fprintf() method perf evsel: Introduce evsel_fprintf.h perf evsel: Remove need for symbol_conf in evsel_fprintf.c perf copyfile: Move copyfile routines to separate files libperf: Add perf_evlist__poll() function libperf: Add perf_evlist__add_pollfd() function libperf: Add perf_evlist__alloc_pollfd() function libperf: Add libperf_init() call to the tests libperf: Merge libperf_set_print() into libperf_init() libperf: Add libperf dependency for tests targets libperf: Use sys/types.h to get ssize_t, not unistd.h ...
| * | | | | | | | perf/core: Fix several typos in commentsRoy Ben Shlomo2019-09-201-3/+3
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typos in a few functions' documentation comments. Signed-off-by: Roy Ben Shlomo <royb@sentinelone.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: royb@sentinelone.com Link: http://lore.kernel.org/lkml/20190920171254.31373-1-royb@sentinelone.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | | | | | Merge tag 'trace-v5.4-2' of ↵Linus Torvalds2019-09-262-4/+6
|\ \ \ \ \ \ \ \ | | |_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Srikar Dronamraju fixed a bug in the newmulti probe code" * tag 'trace-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/probe: Fix same probe event argument matching
| * | | | | | | tracing/probe: Fix same probe event argument matchingSrikar Dronamraju2019-09-252-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fe60b0ce8e73 ("tracing/probe: Reject exactly same probe event") tries to reject a event which matches an already existing probe. However it currently continues to match arguments and rejects adding a probe even when the arguments don't match. Fix this by only rejecting a probe if and only if all the arguments match. Link: http://lkml.kernel.org/r/20190924114906.14038-1-srikar@linux.vnet.ibm.com Fixes: fe60b0ce8e73 ("tracing/probe: Reject exactly same probe event") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | | | | | | | bug: consolidate __WARN_FLAGS usageKees Cook2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of having separate tests for __WARN_FLAGS, merge the two #ifdef blocks and replace the synonym WANT_WARN_ON_SLOWPATH macro. Link: http://lkml.kernel.org/r/20190819234111.9019-7-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | bug: lift "cut here" out of __warn()Kees Cook2019-09-251-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for cleaning up "cut here", move the "cut here" logic up out of __warn() and into callers that pass non-NULL args. For anyone looking closely, there are two callers that pass NULL args: one already explicitly prints "cut here". The remaining case is covered by how a WARN is built, which will be cleaned up in the next patch. Link: http://lkml.kernel.org/r/20190819234111.9019-5-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | bug: consolidate warn_slowpath_fmt() usageKees Cook2019-09-251-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of having a separate helper for no printk output, just consolidate the logic into warn_slowpath_fmt(). Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | bug: refactor away warn_slowpath_fmt_taint()Kees Cook2019-09-251-15/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Clean up WARN() "cut here" handling", v2. Christophe Leroy noticed that the fix for missing "cut here" in the WARN() case was adding explicit printk() calls instead of teaching the exception handler to add it. This refactors the bug/warn infrastructure to pass this information as a new BUGFLAG. Longer details repeated from the last patch in the series: bug: move WARN_ON() "cut here" into exception handler The original cleanup of "cut here" missed the WARN_ON() case (that does not have a printk message), which was fixed recently by adding an explicit printk of "cut here". This had the downside of adding a printk() to every WARN_ON() caller, which reduces the utility of using an instruction exception to streamline the resulting code. By making this a new BUGFLAG, all of these can be removed and "cut here" can be handled by the exception handler. This was very pronounced on PowerPC, but the effect can be seen on x86 as well. The resulting text size of a defconfig build shows some small savings from this patch: text data bss dec hex filename 19691167 5134320 1646664 26472151 193eed7 vmlinux.before 19676362 5134260 1663048 26473670 193f4c6 vmlinux.after This change also opens the door for creating something like BUG_MSG(), where a custom printk() before issuing BUG(), without confusing the "cut here" line. This patch (of 7): There's no reason to have specialized helpers for passing the warn taint down to __warn(). Consolidate and refactor helper macros, removing __WARN_printf() and warn_slowpath_fmt_taint(). Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | kgdb: don't use a notifier to enter kgdb at panic; call directlyDouglas Anderson2019-09-252-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now kgdb/kdb hooks up to debug panics by registering for the panic notifier. This works OK except that it means that kgdb/kdb gets called _after_ the CPUs in the system are taken offline. That means that if anything important was happening on those CPUs (like something that might have contributed to the panic) you can't debug them. Specifically I ran into a case where I got a panic because a task was "blocked for more than 120 seconds" which was detected on CPU 2. I nicely got shown stack traces in the kernel log for all CPUs including CPU 0, which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of __mmc_switch(). I then ended up at the kdb prompt where switched over to kgdb to try to look at local variables of the process on CPU 0. I found that I couldn't. Digging more, I found that I had no info on any tasks running on CPUs other than CPU 2 and that asking kdb for help showed me "Error: no saved data for this cpu". This was because all the CPUs were offline. Let's move the entry of kdb/kgdb to a direct call from panic() and stop using the generic notifier. Putting a direct call in allows us to order things more properly and it also doesn't seem like we're breaking any abstractions by calling into the debugger from the panic function. Daniel said: : This patch changes the way kdump and kgdb interact with each other. : However it would seem rather odd to have both tools simultaneously armed : and, even if they were, the user still has the option to use panic_timeout : to force a kdump to happen. Thus I think the change of order is : acceptable. Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | kexec: bail out upon SIGKILL when allocating memory.Tetsuo Handa2019-09-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot found that a thread can stall for minutes inside kexec_load() after that thread was killed by SIGKILL [1]. It turned out that the reproducer was trying to allocate 2408MB of memory using kimage_alloc_page() from kimage_load_normal_segment(). Let's check for SIGKILL before doing memory allocation. [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | fork: improve error message for corrupted page tablesSai Praneeth Prakhya2019-09-251-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a user process exits, the kernel cleans up the mm_struct of the user process and during cleanup, check_mm() checks the page tables of the user process for corruption (E.g: unexpected page flags set/cleared). For corrupted page tables, the error message printed by check_mm() isn't very clear as it prints the loop index instead of page table type (E.g: Resident file mapping pages vs Resident shared memory pages). The loop index in check_mm() is used to index rss_stat[] which represents individual memory type stats. Hence, instead of printing index, print memory type, thereby improving error message. Without patch: -------------- [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467) [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2 [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5 [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 With patch: ----------- [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467) [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2 [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5 [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so that it matches the other print statement. Link: http://lkml.kernel.org/r/da75b5153f617f4c5739c08ee6ebeb3d19db0fbc.1565123758.git.sai.praneeth.prakhya@intel.com Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | kernel/elfcore.c: include proper prototypesValdis Kletnieks2019-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building with W=1, gcc properly complains that there's no prototypes: CC kernel/elfcore.o kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes] 7 | Elf_Half __weak elf_core_extra_phdrs(void) | ^~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes] 12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes] 17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm) | ^~~~~~~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes] 22 | size_t __weak elf_core_extra_data_size(void) | ^~~~~~~~~~~~~~~~~~~~~~~~ Provide the include file so gcc is happy, and we don't have potential code drift Link: http://lkml.kernel.org/r/29875.1565224705@turing-police Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2019-09-244-24/+68
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - a few hot fixes - ocfs2 updates - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan, cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug, sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy, oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap, zsmalloc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) mm/zsmalloc.c: fix a -Wunused-function warning zswap: do not map same object twice zswap: use movable memory if zpool support allocate movable memory zpool: add malloc_support_movable to zpool_driver shmem: fix obsolete comment in shmem_getpage_gfp() mm/madvise: reduce code duplication in error handling paths mm: mmap: increase sockets maximum memory size pgoff for 32bits mm/mmap.c: refine find_vma_prev() with rb_last() riscv: make mmap allocation top-down by default mips: use generic mmap top-down layout and brk randomization mips: replace arch specific way to determine 32bit task with generic version mips: adjust brk randomization offset to fit generic version mips: use STACK_TOP when computing mmap base address mips: properly account for stack randomization and stack guard gap arm: use generic mmap top-down layout and brk randomization arm: use STACK_TOP when computing mmap base address arm: properly account for stack randomization and stack guard gap arm64, mm: make randomization selected by generic topdown mmap layout arm64, mm: move generic mmap layout functions to mm arm64: consider stack randomization for mmap base only when necessary ...
| * | | | | | | | arm64, mm: move generic mmap layout functions to mmAlexandre Ghiti2019-09-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arm64 handles top-down mmap layout in a way that can be easily reused by other architectures, so make it available in mm. It then introduces a new config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT that can be set by other architectures to benefit from those functions. Note that this new config depends on MMU being enabled, if selected without MMU support, a warning will be thrown. Link: http://lkml.kernel.org/r/20190730055113.23635-5-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Suggested-by: Christoph Hellwig <hch@infradead.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | uprobe: collapse THP pmd after removing all uprobesSong Liu2019-09-241-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After all uprobes are removed from the huge page (with PTE pgtable), it is possible to collapse the pmd and benefit from THP again. This patch does the collapse by calling collapse_pte_mapped_thp(). Link: http://lkml.kernel.org/r/20190815164525.1848545-7-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLITSong Liu2019-09-241-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the newly added FOLL_SPLIT_PMD in uprobe. This preserves the huge page when the uprobe is enabled. When the uprobe is disabled, newer instances of the same application could still benefit from huge page. For the next step, we will enable khugepaged to regroup the pmd, so that existing instances of the application could also benefit from huge page after the uprobe is disabled. Link: http://lkml.kernel.org/r/20190815164525.1848545-5-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | uprobe: use original page when all uprobes are removedSong Liu2019-09-241-15/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, uprobe swaps the target page with a anonymous page in both install_breakpoint() and remove_breakpoint(). When all uprobes on a page are removed, the given mm is still using an anonymous page (not the original page). This patch allows uprobe to use original page when possible (all uprobes on the page are already removed, and the original page is in page cache and uptodate). As suggested by Oleg, we unmap the old_page and let the original page fault in. Link: http://lkml.kernel.org/r/20190815164525.1848545-3-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range()David Hildenbrand2019-09-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/memory_hotplug: online_pages() cleanups", v2. Some cleanups (+ one fix for a special case) in the context of online_pages(). This patch (of 5): This makes it clearer that we will never call func() with duplicate PFNs in case we have multiple sub-page memory resources. All unaligned parts of PFNs are completely discarded. Link: http://lkml.kernel.org/r/20190814154109.3448-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Nadav Amit <namit@vmware.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Arun KS <arunks@codeaurora.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | mm: remove quicklist page table cachesNicholas Piggin2019-09-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | Merge branch 'work.mount3' of ↵Linus Torvalds2019-09-241-34/+58
|\ \ \ \ \ \ \ \ \ | |_|_|_|/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more mount API conversions from Al Viro: "Assorted conversions of options parsing to new API. gfs2 is probably the most serious one here; the rest is trivial stuff. Other things in what used to be #work.mount are going to wait for the next cycle (and preferably go via git trees of the filesystems involved)" * 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gfs2: Convert gfs2 to fs_context vfs: Convert spufs to use the new mount API vfs: Convert hypfs to use the new mount API hypfs: Fix error number left in struct pointer member vfs: Convert functionfs to use the new mount API vfs: Convert bpf to use the new mount API
| * | | | | | | | vfs: Convert bpf to use the new mount APIDavid Howells2019-09-181-34/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the bpf filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexei Starovoitov <ast@kernel.org> cc: Daniel Borkmann <daniel@iogearbox.net> cc: Martin KaFai Lau <kafai@fb.com> cc: Song Liu <songliubraving@fb.com> cc: Yonghong Song <yhs@fb.com> cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2019-09-231-0/+1
|\ \ \ \ \ \ \ \ \ | |_|/ / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching fix from Jiri Kosina: "Error handling fix in livepatching module notifier, from Miroslav Benes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Nullify obj->mod in klp_module_coming()'s error path
| * | | | | | | | livepatch: Nullify obj->mod in klp_module_coming()'s error pathMiroslav Benes2019-08-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | klp_module_coming() is called for every module appearing in the system. It sets obj->mod to a patched module for klp_object obj. Unfortunately it leaves it set even if an error happens later in the function and the patched module is not allowed to be loaded. klp_is_object_loaded() uses obj->mod variable and could currently give a wrong return value. The bug is probably harmless as of now. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
* | | | | | | | | Merge tag 'modules-for-v5.4' of ↵Linus Torvalds2019-09-221-7/+69
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "The main bulk of this pull request introduces a new exported symbol namespaces feature. The number of exported symbols is increasingly growing with each release (we're at about 31k exports as of 5.3-rc7) and we currently have no way of visualizing how these symbols are "clustered" or making sense of this huge export surface. Namespacing exported symbols allows kernel developers to more explicitly partition and categorize exported symbols, as well as more easily limiting the availability of namespaced symbols to other parts of the kernel. For starters, we have introduced the USB_STORAGE namespace to demonstrate the API's usage. I have briefly summarized the feature and its main motivations in the tag below. Summary: - Introduce exported symbol namespaces. This new feature allows subsystem maintainers to partition and categorize their exported symbols into explicit namespaces. Module authors are now required to import the namespaces they need. Some of the main motivations of this feature include: allowing kernel developers to better manage the export surface, allow subsystem maintainers to explicitly state that usage of some exported symbols should only be limited to certain users (think: inter-module or inter-driver symbols, debugging symbols, etc), as well as more easily limiting the availability of namespaced symbols to other parts of the kernel. With the module import requirement, it is also easier to spot the misuse of exported symbols during patch review. Two new macros are introduced: EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in Documentation/kbuild/namespaces.rst. - Some small code and kbuild cleanups here and there" * tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: Remove leftover '#undef' from export header module: remove unneeded casts in cmp_name() module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES module: remove redundant 'depends on MODULES' module: Fix link failure due to invalid relocation on namespace offset usb-storage: export symbols in USB_STORAGE namespace usb-storage: remove single-use define for debugging docs: Add documentation for Symbol Namespaces scripts: Coccinelle script for namespace dependencies. modpost: add support for generating namespace dependencies export: allow definition default namespaces in Makefiles or sources module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS modpost: add support for symbol namespaces module: add support for symbol namespaces. export: explicitly align struct kernel_symbol module: support reading multiple values per modinfo tag
| * | | | | | | | | module: remove unneeded casts in cmp_name()Masahiro Yamada2019-09-111-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | You can pass opaque pointers directly. I also renamed 'va' and 'vb' into more meaningful arguments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | | | | | | | | module: Fix link failure due to invalid relocation on namespace offsetWill Deacon2019-09-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8651ec01daed ("module: add support for symbol namespaces.") broke linking for arm64 defconfig: | lib/crypto/arc4.o: In function `__ksymtab_arc4_setkey': | arc4.c:(___ksymtab+arc4_setkey+0x8): undefined reference to `no symbol' | lib/crypto/arc4.o: In function `__ksymtab_arc4_crypt': | arc4.c:(___ksymtab+arc4_crypt+0x8): undefined reference to `no symbol' This is because the dummy initialisation of the 'namespace_offset' field in 'struct kernel_symbol' when using EXPORT_SYMBOL on architectures with support for PREL32 locations uses an offset from an absolute address (0) in an effort to trick 'offset_to_pointer' into behaving as a NOP, allowing non-namespaced symbols to be treated in the same way as those belonging to a namespace. Unfortunately, place-relative relocations require a symbol reference rather than an absolute value and, although x86 appears to get away with this due to placing the kernel text at the top of the address space, it almost certainly results in a runtime failure if the kernel is relocated dynamically as a result of KASLR. Rework 'namespace_offset' so that a value of 0, which cannot occur for a valid namespaced symbol, indicates that the corresponding symbol does not belong to a namespace. Cc: Matthias Maennich <maennich@google.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 8651ec01daed ("module: add support for symbol namespaces.") Reported-by: kbuild test robot <lkp@intel.com> Tested-by: Matthias Maennich <maennich@google.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matthias Maennich <maennich@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | | | | | | | | module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTSMatthias Maennich2019-09-101-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is enabled (default=n), the requirement for modules to import all namespaces that are used by the module is relaxed. Enabling this option effectively allows (invalid) modules to be loaded while only a warning is emitted. Disabling this option keeps the enforcement at module loading time and loading is denied if the module's imports are not satisfactory. Reviewed-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | | | | | | | | module: add support for symbol namespaces.Matthias Maennich2019-09-101-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL() macros can be used to export a symbol to a specific namespace. There are no _GPL_FUTURE and _UNUSED variants because these are currently unused, and I'm not sure they are necessary. I didn't add EXPORT_SYMBOL_NS() for ASM exports; this patch sets the namespace of ASM exports to NULL by default. In case of relative references, it will be relocatable to NULL. If there's a need, this should be pretty easy to add. A module that wants to use a symbol exported to a namespace must add a MODULE_IMPORT_NS() statement to their module code; otherwise, modpost will complain when building the module, and the kernel module loader will emit an error and fail when loading the module. MODULE_IMPORT_NS() adds a modinfo tag 'import_ns' to the module. That tag can be observed by the modinfo command, modpost and kernel/module.c at the time of loading the module. The ELF symbols are renamed to include the namespace with an asm label; for example, symbol 'usb_stor_suspend' in namespace USB_STORAGE becomes 'usb_stor_suspend.USB_STORAGE'. This allows modpost to do namespace checking, without having to go through all the effort of parsing ELF and relocation records just to get to the struct kernel_symbols. On x86_64 I saw no difference in binary size (compression), but at runtime this will require a word of memory per export to hold the namespace. An alternative could be to store namespaced symbols in their own section and use a separate 'struct namespaced_kernel_symbol' for that section, at the cost of making the module loader more complex. Co-developed-by: Martijn Coenen <maco@android.com> Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | | | | | | | | module: support reading multiple values per modinfo tagMatthias Maennich2019-09-101-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to modpost's get_next_modinfo(), introduce get_next_modinfo() in kernel/module.c to acquire any further values associated with the same modinfo tag name. That is useful for any tags that have multiple occurrences (such as 'alias'), but is in particular introduced here as part of the symbol namespaces patch series to read the (potentially) multiple namespaces a module is importing. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
* | | | | | | | | | Merge tag 'for-linus-5.4-rc1' of ↵Linus Torvalds2019-09-211-1/+1
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - virtio support - fixes for our new time travel mode - various improvements to make lockdep and kasan work better - SPDX header updates * tag 'for-linus-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (25 commits) um: irq: Fix LAST_IRQ usage in init_IRQ() um: Add SPDX headers for files in arch/um/include um: Add SPDX headers for files in arch/um/os-Linux um: Add SPDX headers to files in arch/um/kernel/ um: Add SPDX headers for files in arch/um/drivers um: virtio: Implement VHOST_USER_PROTOCOL_F_REPLY_ACK um: virtio: Implement VHOST_USER_PROTOCOL_F_SLAVE_REQ um: drivers: Add virtio vhost-user driver um: Use real DMA barriers um: Don't use generic barrier.h um: time-travel: Restrict time update in IRQ handler um: time-travel: Fix periodic timers um: Enable CONFIG_CONSTRUCTORS um: Place (soft)irq text with macros um: Fix VDSO compiler warning um: Implement TRACE_IRQFLAGS_SUPPORT um: Remove misleading #define ARCh_IRQ_ENABLED um: Avoid using uninitialized regs um: Remove sig_info[SIGALRM] um: Error handling fixes in vector drivers ...