summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* samples/seccomp: fix dependencies on arch macrosWill Drewry2012-04-192-11/+19
| | | | | | | | | | | | | | | | | | | | | | | This change fixes the compilation error triggered here for i386 allmodconfig in linux-next: http://kisskb.ellerman.id.au/kisskb/buildresult/6123842/ Logic attempting to predict the host architecture has been removed from the Makefile. Instead, the bpf-direct sample should now compile on any architecture, but if the architecture is not supported, it will compile a minimal main() function. This change also ensures the samples are not compiled when there is no seccomp filter support. (Note, I wasn't able to reproduce the error locally, but the existing approach was clearly flawed. This tweak should resolve your issue and avoid other future weirdness.) Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
* Yama: add additional ptrace scopesKees Cook2012-04-192-12/+60
| | | | | | | | | This expands the available Yama ptrace restrictions to include two more modes. Mode 2 requires CAP_SYS_PTRACE for PTRACE_ATTACH, and mode 3 completely disables PTRACE_ATTACH (and locks the sysctl). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTERWill Drewry2012-04-181-4/+9
| | | | | | | | | | | | | | | If both audit and seccomp filter support are disabled, 'ret' is marked as unused. If just seccomp filter support is disabled, data and skip are considered unused. This change fixes those build warnings. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: ignore secure_computing return valuesWill Drewry2012-04-188-7/+14
| | | | | | | | | | | | | | | | | | | | | | This change is inspired by https://lkml.org/lkml/2012/4/16/14 which fixes the build warnings for arches that don't support CONFIG_HAVE_ARCH_SECCOMP_FILTER. In particular, there is no requirement for the return value of secure_computing() to be checked unless the architecture supports seccomp filter. Instead of silencing the warnings with (void) a new static inline is added to encode the expected behavior in a compiler and human friendly way. v2: - cleans things up with a static inline - removes sfr's signed-off-by since it is a different approach v1: - matches sfr's original change Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: use a static inline for a function stubStephen Rothwell2012-04-171-1/+1
| | | | | | | | | | Fixes this error message when CONFIG_SECCOMP is not set: arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter': arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value] Signed-off-by: Stephen Rothwell <sfr@ozlabs.au.ibm.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
* Documentation: prctl/seccomp_filterWill Drewry2012-04-148-1/+875
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documents how system call filtering using Berkeley Packet Filter programs works and how it may be used. Includes an example for x86 and a semi-generic example using a macro-based code generator. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> v18: - added acked by - update no new privs numbers v17: - remove @compat note and add Pitfalls section for arch checking (keescook@chromium.org) v16: - v15: - v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - comment on the ptrace_event use - update arch support comment - note the behavior of SECCOMP_RET_DATA when there are multiple filters (keescook@chromium.org) - lots of samples/ clean up incl 64-bit bpf-direct support (markus@chromium.org) - rebase to linux-next v11: - overhaul return value language, updates (keescook@chromium.org) - comment on do_exit(SIGSYS) v10: - update for SIGSYS - update for new seccomp_data layout - update for ptrace option use v9: - updated bpf-direct.c for SIGILL v8: - add PR_SET_NO_NEW_PRIVS to the samples. v7: - updated for all the new stuff in v7: TRAP, TRACE - only talk about PR_SET_SECCOMP now - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com) - adds dropper.c: a simple system call disabler v6: - tweak the language to note the requirement of PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) v5: - update sample to use system call arguments - adds a "fancy" example using a macro-based generator - cleaned up bpf in the sample - update docs to mention arguments - fix prctl value (eparis@redhat.com) - language cleanup (rdunlap@xenotime.net) v4: - update for no_new_privs use - minor tweaks v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net) - document use of tentative always-unprivileged - guard sample compilation for i386 and x86_64 v2: - move code to samples (corbet@lwn.net) Signed-off-by: James Morris <james.l.morris@oracle.com>
* x86: Enable HAVE_ARCH_SECCOMP_FILTERWill Drewry2012-04-142-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable support for seccomp filter on x86: - syscall_get_arch() - syscall_get_arguments() - syscall_rollback() - syscall_set_return_value() - SIGSYS siginfo_t support - secure_computing is called from a ptrace_event()-safe context - secure_computing return value is checked (see below). SECCOMP_RET_TRACE and SECCOMP_RET_TRAP may result in seccomp needing to skip a system call without killing the process. This is done by returning a non-zero (-1) value from secure_computing. This change makes x86 respect that return value. To ensure that minimal kernel code is exposed, a non-zero return value results in an immediate return to user space (with an invalid syscall number). Signed-off-by: Will Drewry <wad@chromium.org> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> v18: rebase and tweaked change description, acked-by v17: added reviewed by and rebased v..: all rebases since original introduction. Signed-off-by: James Morris <james.l.morris@oracle.com>
* ptrace,seccomp: Add PTRACE_SECCOMP supportWill Drewry2012-04-144-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP, and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE. When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit SECCOMP_RET_DATA mask of the BPF program return value will be passed as the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG. If the subordinate process is not using seccomp filter, then no system call notifications will occur even if the option is specified. If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE is returned, the system call will not be executed and an -ENOSYS errno will be returned to userspace. This change adds a dependency on the system call slow path. Any future efforts to use the system call fast path for seccomp filter will need to address this restriction. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: - rebase - comment fatal_signal check - acked-by - drop secure_computing_int comment v17: - ... v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu) [note PT_TRACE_MASK disappears in linux-next] v15: - add audit support for non-zero return codes - clean up style (indan@nul.nu) v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc (Brings back a change to ptrace.c and the masks.) v12: - rebase to linux-next - use ptrace_event and update arch/Kconfig to mention slow-path dependency - drop all tracehook changes and inclusion (oleg@redhat.com) v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator (indan@nul.nu) v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP v9: - n/a v8: - guarded PTRACE_SECCOMP use with an ifdef v7: - introduced Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: Add SECCOMP_RET_TRAPWill Drewry2012-04-144-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a new return value to seccomp filters that triggers a SIGSYS to be delivered with the new SYS_SECCOMP si_code. This allows in-process system call emulation, including just specifying an errno or cleanly dumping core, rather than just dying. Suggested-by: Markus Gutschke <markus@chromium.org> Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: - acked-by, rebase - don't mention secure_computing_int() anymore v15: - use audit_seccomp/skip - pad out error spacing; clean up switch (indan@nul.nu) v14: - n/a v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - rebase on to linux-next v11: - clarify the comment (indan@nul.nu) - s/sigtrap/sigsys v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig note suggested-by (though original suggestion had other behaviors) v9: - changes to SIGILL v8: - clean up based on changes to dependent patches v7: - introduction Signed-off-by: James Morris <james.l.morris@oracle.com>
* signal, x86: add SIGSYS info and make it synchronous.Will Drewry2012-04-144-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change enables SIGSYS, defines _sigfields._sigsys, and adds x86 (compat) arch support. _sigsys defines fields which allow a signal handler to receive the triggering system call number, the relevant AUDIT_ARCH_* value for that number, and the address of the callsite. SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it to have setup_frame() called for it. The goal is to ensure that ucontext_t reflects the machine state from the time-of-syscall and not from another signal handler. The first consumer of SIGSYS would be seccomp filter. In particular, a filter program could specify a new return value, SECCOMP_RET_TRAP, which would result in the system call being denied and the calling thread signaled. This also means that implementing arch-specific support can be dependent upon HAVE_ARCH_SECCOMP_FILTER. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Eric Paris <eparis@redhat.com> v18: - added acked by, rebase v17: - rebase and reviewed-by addition v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - reworded changelog (oleg@redhat.com) v11: - fix dropped words in the change description - added fallback copy_siginfo support. - added __ARCH_SIGSYS define to allow stepped arch support. v10: - first version based on suggestion Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: add SECCOMP_RET_ERRNOWill Drewry2012-04-143-16/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds the SECCOMP_RET_ERRNO as a valid return value from a seccomp filter. Additionally, it makes the first use of the lower 16-bits for storing a filter-supplied errno. 16-bits is more than enough for the errno-base.h calls. Returning errors instead of immediately terminating processes that violate seccomp policy allow for broader use of this functionality for kernel attack surface reduction. For example, a linux container could maintain a whitelist of pre-existing system calls but drop all new ones with errnos. This would keep a logically static attack surface while providing errnos that may allow for graceful failure without the downside of do_exit() on a bad call. This change also changes the signature of __secure_computing. It appears the only direct caller is the arm entry code and it clobbers any possible return value (register) immediately. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: - fix up comments and rebase - fix bad var name which was fixed in later revs - remove _int() and just change the __secure_computing signature v16-v17: ... v15: - use audit_seccomp and add a skip label. (eparis@redhat.com) - clean up and pad out return codes (indan@nul.nu) v14: - no change/rebase v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - move to WARN_ON if filter is NULL (oleg@redhat.com, luto@mit.edu, keescook@chromium.org) - return immediately for filter==NULL (keescook@chromium.org) - change evaluation to only compare the ACTION so that layered errnos don't result in the lowest one being returned. (keeschook@chromium.org) v11: - check for NULL filter (keescook@chromium.org) v10: - change loaders to fn v9: - n/a v8: - update Kconfig to note new need for syscall_set_return_value. - reordered such that TRAP behavior follows on later. - made the for loop a little less indent-y v7: - introduced Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: remove duplicated failure loggingKees Cook2012-04-143-20/+11
| | | | | | | | | | | | | | | This consolidates the seccomp filter error logging path and adds more details to the audit log. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> v18: make compat= permanent in the record v15: added a return code to the audit_seccomp path by wad@chromium.org (suggested by eparis@redhat.com) v*: original by keescook@chromium.org Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: add system call filtering using BPFWill Drewry2012-04-146-23/+472
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [This patch depends on luto@mit.edu's no_new_privs patch: https://lkml.org/lkml/2012/1/30/264 The whole series including Andrew's patches can be found here: https://github.com/redpig/linux/tree/seccomp Complete diff here: https://github.com/redpig/linux/compare/1dc65fed...seccomp ] This patch adds support for seccomp mode 2. Mode 2 introduces the ability for unprivileged processes to install system call filtering policy expressed in terms of a Berkeley Packet Filter (BPF) program. This program will be evaluated in the kernel for each system call the task makes and computes a result based on data in the format of struct seccomp_data. A filter program may be installed by calling: struct sock_fprog fprog = { ... }; ... prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog); The return value of the filter program determines if the system call is allowed to proceed or denied. If the first filter program installed allows prctl(2) calls, then the above call may be made repeatedly by a task to further reduce its access to the kernel. All attached programs must be evaluated before a system call will be allowed to proceed. Filter programs will be inherited across fork/clone and execve. However, if the task attaching the filter is unprivileged (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This ensures that unprivileged tasks cannot attach filters that affect privileged tasks (e.g., setuid binary). There are a number of benefits to this approach. A few of which are as follows: - BPF has been exposed to userland for a long time - BPF optimization (and JIT'ing) are well understood - Userland already knows its ABI: system call numbers and desired arguments - No time-of-check-time-of-use vulnerable data accesses are possible. - system call arguments are loaded on access only to minimize copying required for system call policy decisions. Mode 2 support is restricted to architectures that enable HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on syscall_get_arguments(). The full desired scope of this feature will add a few minor additional requirements expressed later in this series. Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be the desired additional functionality. No architectures are enabled in this patch. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: Indan Zupancic <indan@nul.nu> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> v18: - rebase to v3.4-rc2 - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org) - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu) - add a comment for get_u32 regarding endianness (akpm@) - fix other typos, style mistakes (akpm@) - added acked-by v17: - properly guard seccomp filter needed headers (leann@ubuntu.com) - tighten return mask to 0x7fff0000 v16: - no change v15: - add a 4 instr penalty when counting a path to account for seccomp_filter size (indan@nul.nu) - drop the max insns to 256KB (indan@nul.nu) - return ENOMEM if the max insns limit has been hit (indan@nul.nu) - move IP checks after args (indan@nul.nu) - drop !user_filter check (indan@nul.nu) - only allow explicit bpf codes (indan@nul.nu) - exit_code -> exit_sig v14: - put/get_seccomp_filter takes struct task_struct (indan@nul.nu,keescook@chromium.org) - adds seccomp_chk_filter and drops general bpf_run/chk_filter user - add seccomp_bpf_load for use by net/core/filter.c - lower max per-process/per-hierarchy: 1MB - moved nnp/capability check prior to allocation (all of the above: indan@nul.nu) v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com) - removed copy_seccomp (keescook@chromium.org,indan@nul.nu) - reworded the prctl_set_seccomp comment (indan@nul.nu) v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com) - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu) - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu) - pare down Kconfig doc reference. - extra comment clean up v10: - seccomp_data has changed again to be more aesthetically pleasing (hpa@zytor.com) - calling convention is noted in a new u32 field using syscall_get_arch. This allows for cross-calling convention tasks to use seccomp filters. (hpa@zytor.com) - lots of clean up (thanks, Indan!) v9: - n/a v8: - use bpf_chk_filter, bpf_run_filter. update load_fns - Lots of fixes courtesy of indan@nul.nu: -- fix up load behavior, compat fixups, and merge alloc code, -- renamed pc and dropped __packed, use bool compat. -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch dependencies v7: (massive overhaul thanks to Indan, others) - added CONFIG_HAVE_ARCH_SECCOMP_FILTER - merged into seccomp.c - minimal seccomp_filter.h - no config option (part of seccomp) - no new prctl - doesn't break seccomp on systems without asm/syscall.h (works but arg access always fails) - dropped seccomp_init_task, extra free functions, ... - dropped the no-asm/syscall.h code paths - merges with network sk_run_filter and sk_chk_filter v6: - fix memory leak on attach compat check failure - require no_new_privs || CAP_SYS_ADMIN prior to filter installation. (luto@mit.edu) - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com) - cleaned up Kconfig (amwang@redhat.com) - on block, note if the call was compat (so the # means something) v5: - uses syscall_get_arguments (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) - uses union-based arg storage with hi/lo struct to handle endianness. Compromises between the two alternate proposals to minimize extra arg shuffling and account for endianness assuming userspace uses offsetof(). (mcgrathr@chromium.org, indan@nul.nu) - update Kconfig description - add include/seccomp_filter.h and add its installation - (naive) on-demand syscall argument loading - drop seccomp_t (eparis@redhat.com) v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS - now uses current->no_new_privs (luto@mit.edu,torvalds@linux-foundation.com) - assign names to seccomp modes (rdunlap@xenotime.net) - fix style issues (rdunlap@xenotime.net) - reworded Kconfig entry (rdunlap@xenotime.net) v3: - macros to inline (oleg@redhat.com) - init_task behavior fixed (oleg@redhat.com) - drop creator entry and extra NULL check (oleg@redhat.com) - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) - adds tentative use of "always_unprivileged" as per torvalds@linux-foundation.org and luto@mit.edu v2: - (patch 2 only) Signed-off-by: James Morris <james.l.morris@oracle.com>
* arch/x86: add syscall_get_arch to syscall.hWill Drewry2012-04-141-0/+27
| | | | | | | | | | | | | | | | | | Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call entry path. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> v18: - update comment about x32 tasks - rebase to v3.4-rc2 v17: rebase and reviewed-by v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc Signed-off-by: James Morris <james.l.morris@oracle.com>
* asm/syscall.h: add syscall_get_archWill Drewry2012-04-141-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | Adds a stub for a function that will return the AUDIT_ARCH_* value appropriate to the supplied task based on the system call convention. For audit's use, the value can generally be hard-coded at the audit-site. However, for other functionality not inlined into syscall entry/exit, this makes that information available. seccomp_filter is the first planned consumer and, as such, the comment indicates a tie to CONFIG_HAVE_ARCH_SECCOMP_FILTER. Suggested-by: Roland McGrath <mcgrathr@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Paris <eparis@redhat.com> v18: comment and change reword and rebase. v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v11: fixed improper return type v10: introduced Signed-off-by: James Morris <james.l.morris@oracle.com>
* seccomp: kill the seccomp_t typedefWill Drewry2012-04-142-5/+7
| | | | | | | | | | | | | | | | | | | | Replaces the seccomp_t typedef with struct seccomp to match modern kernel style. Signed-off-by: Will Drewry <wad@chromium.org> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Paris <eparis@redhat.com> v18: rebase ... v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v8-v11: no changes v7: struct seccomp_struct -> struct seccomp v6: original inclusion in this series. Signed-off-by: James Morris <james.l.morris@oracle.com>
* net/compat.c,linux/filter.h: share compat_sock_fprogWill Drewry2012-04-142-8/+11
| | | | | | | | | | | | | | | | | | | Any other users of bpf_*_filter that take a struct sock_fprog from userspace will need to be able to also accept a compat_sock_fprog if the arch supports compat calls. This change allows the existing compat_sock_fprog be shared. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> v18: tasered by the apostrophe police v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v11: introduction Signed-off-by: James Morris <james.l.morris@oracle.com>
* sk_run_filter: add BPF_S_ANC_SECCOMP_LD_WWill Drewry2012-04-142-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | Introduces a new BPF ancillary instruction that all LD calls will be mapped through when skb_run_filter() is being used for seccomp BPF. The rewriting will be done using a secondary chk_filter function that is run after skb_chk_filter. The code change is guarded by CONFIG_SECCOMP_FILTER which is added, along with the seccomp_bpf_load() function later in this series. This is based on http://lkml.org/lkml/2012/3/2/141 Suggested-by: Indan Zupancic <indan@nul.nu> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> v18: rebase ... v15: include seccomp.h explicitly for when seccomp_bpf_load exists. v14: First cut using a single additional instruction ... v13: made bpf functions generic. Signed-off-by: James Morris <james.l.morris@oracle.com>
* Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVSJohn Johansen2012-04-141-4/+35
| | | | | | | | | | | | | | | | Add support for AppArmor to explicitly fail requested domain transitions if NO_NEW_PRIVS is set and the task is not unconfined. Transitions from unconfined are still allowed because this always results in a reduction of privileges. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> v18: new acked-by, new description Signed-off-by: James Morris <james.l.morris@oracle.com>
* Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privsAndy Lutomirski2012-04-148-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change, calling prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) disables privilege granting operations at execve-time. For example, a process will not be able to execute a setuid binary to change their uid or gid if this bit is set. The same is true for file capabilities. Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that LSMs respect the requested behavior. To determine if the NO_NEW_PRIVS bit is set, a task may call prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); It returns 1 if set and 0 if it is not set. If any of the arguments are non-zero, it will return -1 and set errno to -EINVAL. (PR_SET_NO_NEW_PRIVS behaves similarly.) This functionality is desired for the proposed seccomp filter patch series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the system call behavior for itself and its child tasks without being able to impact the behavior of a more privileged task. Another potential use is making certain privileged operations unprivileged. For example, chroot may be considered "safe" if it cannot affect privileged tasks. Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is set and AppArmor is in use. It is fixed in a subsequent patch. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Eric Paris <eparis@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> v18: updated change desc v17: using new define values as per 3.4 Signed-off-by: James Morris <james.l.morris@oracle.com>
* maintainers: update wiki url for the security subsystemJames Morris2012-04-091-1/+1
| | | | | | | | Update the wiki url for the security subsystem to: http://kernsec.org/ Signed-off-by: James Morris <james.l.morris@oracle.com>
* maintainers: add kernel/capability.c to capabilities entryJames Morris2012-04-091-0/+1
| | | | | | | Add kernel/capability.c to capabilities entry. Reported-by: Eric Paris <eparis@parisplace.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
* Merge branch 'linus-master'; commit 'v3.4-rc2' into nextJames Morris2012-04-090-0/+0
|
* Linux 3.4-rc2v3.4-rc2Linus Torvalds2012-04-071-1/+1
|
* Merge tag 'regmap-3.4-fixes' of ↵Linus Torvalds2012-04-072-1/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull two more small regmap fixes from Mark Brown: - Now we have users for it that aren't running Android it turns out that regcache_sync_region() is much more useful to drivers if it's exported for use by modules. Who knew? - Make sure we don't divide by zero when doing debugfs dumps of rbtrees, not visible up until now because everything was providing at least some cache on startup. * tag 'regmap-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: prevent division by zero in rbtree_show regmap: Export regcache_sync_region()
| * regmap: prevent division by zero in rbtree_showStephen Warren2012-04-041-1/+7
| | | | | | | | | | | | | | | | | | If there are no nodes in the cache, nodes will be 0, so calculating "registers / nodes" will cause division by zero. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@vger.kernel.org
| * regmap: Export regcache_sync_region()Mark Brown2012-04-031-0/+1
| | | | | | | | | | | | | | regcache_sync_region() isn't going to be useful to most drivers if we don't export it since otherwise they can't use it when built modular. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
* | Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2012-04-078-12/+36
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull a few KVM fixes from Avi Kivity: "A bunch of powerpc KVM fixes, a guest and a host RCU fix (unrelated), and a small build fix." * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Resolve RCU vs. async page fault problem KVM: VMX: vmx_set_cr0 expects kvm->srcu locked KVM: PMU: Fix integer constant is too large warning in kvm_pmu_set_msr() KVM: PPC: Book3S: PR: Fix preemption KVM: PPC: Save/Restore CR over vcpu_run KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entry KVM: PPC: Book3S HV: Fix kvm_alloc_linear in case where no linears exist KVM: PPC: Book3S: Compile fix for ppc32 in HIOR access code
| * | KVM: Resolve RCU vs. async page fault problemGleb Natapov2012-04-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Page ready" async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | Merge tag 'powerpc-fixes' of git://github.com/paulusmack/linux into new/masterAvi Kivity2012-04-055-11/+29
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Five fixes for bugs that have crept in to the powerpc KVM implementations. These are all small simple patches that only affect arch/powerpc/kvm. They come from the series that Alex Graf put together but which was too late for the 3.4 merge window. * tag 'powerpc-fixes' of git://github.com/paulusmack/linux: KVM: PPC: Book3S: PR: Fix preemption KVM: PPC: Save/Restore CR over vcpu_run KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entry KVM: PPC: Book3S HV: Fix kvm_alloc_linear in case where no linears exist KVM: PPC: Book3S: Compile fix for ppc32 in HIOR access code Signed-off-by: Avi Kivity <avi@redhat.com>
| | * | KVM: PPC: Book3S: PR: Fix preemptionAlexander Graf2012-04-031-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were leaking preemption counters. Fix the code to always toggle between preempt and non-preempt properly. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
| | * | KVM: PPC: Save/Restore CR over vcpu_runAlexander Graf2012-04-032-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls. We didn't respect that for any architecture until Paul spotted it in his patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
| | * | KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entryPaul Mackerras2012-04-031-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ABI specifies that CR fields CR2--CR4 are nonvolatile across function calls. Currently __kvmppc_vcore_entry doesn't save and restore the CR, leading to CR2--CR4 getting corrupted with guest values, possibly leading to incorrect behaviour in its caller. This adds instructions to save and restore CR at the points where we save and restore the nonvolatile GPRs. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
| | * | KVM: PPC: Book3S HV: Fix kvm_alloc_linear in case where no linears existPaul Mackerras2012-04-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kvm_alloc_linear we were using and deferencing ri after the list_for_each_entry had come to the end of the list. In that situation, ri is not really defined and probably points to the list head. This will happen every time if the free_linears list is empty, for instance. This led to a NULL pointer dereference crash in memset on POWER7 while trying to allocate an HPT in the case where no HPTs were preallocated. This fixes it by using a separate variable for the return value from the loop iterator. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
| | * | KVM: PPC: Book3S: Compile fix for ppc32 in HIOR access codeAlexander Graf2012-04-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were failing to compile on book3s_32 with the following errors: arch/powerpc/kvm/book3s_pr.c:883:45: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] arch/powerpc/kvm/book3s_pr.c:898:79: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] Fix this by explicity casting the u64 to long before we use it as a pointer. Also, on PPC32 we can not use get_user/put_user for 64bit wide variables, as there is no single instruction that could load or store variables that big. So instead, we have to use copy_from/to_user which works everywhere. Reported-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | KVM: VMX: vmx_set_cr0 expects kvm->srcu lockedMarcelo Tosatti2012-04-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmx_set_cr0 is called from vcpu run context, therefore it expects kvm->srcu to be held (for setting up the real-mode TSS). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | KVM: PMU: Fix integer constant is too large warning in kvm_pmu_set_msr()Sasikantha babu2012-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-shLinus Torvalds2012-04-0720-105/+147
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SuperH fixes from Paul Mundt. * tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: sh: fix clock-sh7757 for the latest sh_mobile_sdhi driver serial: sh-sci: use serial_port_in/out vs sci_in/out. sh: vsyscall: Fix up .eh_frame generation. sh: dma: Fix up device attribute mismatch from sysdev fallout. sh: dwarf unwinder depends on SHcompact. sh: fix up fallout from system.h disintegration.
| * | | | sh: fix clock-sh7757 for the latest sh_mobile_sdhi driverShimoda, Yoshihiro2012-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 996bc8aebd2cd5b6d4c5d85085f171fa2447f364 (mmc: sh_mobile_sdhi: do not manage PM clocks manually) modified the sh_mobile_sdhi driver to remove the clk_enable/clk_disable. So, we need to change the "CLKDEV_CON_ID" to "CLKDEV_DEV_ID". If we don't change this, we will see the following error from the driver: sh_mobile_sdhi sh_mobile_sdhi.0: timeout waiting for hardware interrupt (CMD52) Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| | | | |
| | \ \ \
| *-. \ \ \ Merge branches 'sh/urgent', 'sh/vsyscall' and 'common/serial-rework' into ↵Paul Mundt2012-03-304-102/+131
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | sh-latest
| | | * | | | serial: sh-sci: use serial_port_in/out vs sci_in/out.Paul Mundt2012-03-302-89/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follows the 8250 change for pretty much the same rationale. See commit "serial: use serial_port_in/out vs serial_in/out in 8250". Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| | * | | | | sh: vsyscall: Fix up .eh_frame generation.Paul Mundt2012-03-302-13/+45
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some improper formatting caused the .eh_frame generation to fail, resulting in gcc/g++ testsuite failures with regards to unwinding through the vDSO. Now that someone is actually working on this on the gcc side it's time to fix up the kernel side, too. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | sh: dma: Fix up device attribute mismatch from sysdev fallout.Paul Mundt2012-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes up an attribute mismatch that was introduced in the sysdev->struct device migration. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | sh: dwarf unwinder depends on SHcompact.Paul Mundt2012-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently there's no SHmedia support plugged in for the dwarf unwinder. While it's trivial to provide an SHmedia version of dwarf_read_arch_reg(), the general sh64 case is more complicated in that the TLB miss handler uses a locked down set of registers for optimization (including the frame pointer) which we need for the unwind table generation. While freeing up the frame pointer for use in the TLB miss handler is reasonably straightforward, it's still more trouble than it's worth, so we simply restrict the unwinder to 32-bit for now. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | sh: fix up fallout from system.h disintegration.Paul Mundt2012-03-3013-1/+13
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | Quite a bit of fallout all over the place, nothing terribly exciting. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-04-071-4/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer fixlet from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: sysctl: fix write access to dmesg_restrict/kptr_restrict
| * | | | | sysctl: fix write access to dmesg_restrict/kptr_restrictKees Cook2012-04-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bfdc0b4 adds code to restrict access to dmesg_restrict, however, it incorrectly alters kptr_restrict rather than dmesg_restrict. The original patch from Richard Weinberger (https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as expected, and so the patch seems to have been misapplied. This adds the CAP_SYS_ADMIN check to both dmesg_restrict and kptr_restrict, since both are sensitive. Reported-by: Phillip Lougher <plougher@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Richard Weinberger <richard@nod.at> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
* | | | | | Merge branch 'release' of ↵Linus Torvalds2012-04-063-3/+6
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & Power Management patches from Len Brown: "Two fixes for cpuidle merge-window changes, plus a URL fix in MAINTAINERS" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: MAINTAINERS: Update git url for ACPI cpuidle: Fix panic in CPU off-lining with no idle driver ACPI processor: Use safe_halt() rather than halt() in acpi_idle_play_dead()
| * \ \ \ \ \ Merge branches 'idle-fix' and 'misc' into releaseLen Brown2012-04-0610406-323273/+532118
| |\ \ \ \ \ \
| | * | | | | | MAINTAINERS: Update git url for ACPIIgor Murzov2012-04-061-1/+1
| | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Igor Murzov <e-mail@date.by> Signed-off-by: Len Brown <len.brown@intel.com>