summaryrefslogtreecommitdiffstats
path: root/kernel/sysctl_check.c
Commit message (Collapse)AuthorAgeFilesLines
* capabilities: introduce per-process capability bounding setSerge E. Hallyn2008-02-051-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The capability bounding set is a set beyond which capabilities cannot grow. Currently cap_bset is per-system. It can be manipulated through sysctl, but only init can add capabilities. Root can remove capabilities. By default it includes all caps except CAP_SETPCAP. This patch makes the bounding set per-process when file capabilities are enabled. It is inherited at fork from parent. Noone can add elements, CAP_SETPCAP is required to remove them. One example use of this is to start a safer container. For instance, until device namespaces or per-container device whitelists are introduced, it is best to take CAP_MKNOD away from a container. The bounding set will not affect pP and pE immediately. It will only affect pP' and pE' after subsequent exec()s. It also does not affect pI, and exec() does not constrain pI'. So to really start a shell with no way of regain CAP_MKNOD, you would do prctl(PR_CAPBSET_DROP, CAP_MKNOD); cap_t cap = cap_get_proc(); cap_value_t caparray[1]; caparray[0] = CAP_MKNOD; cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP); cap_set_proc(cap); cap_free(cap); The following test program will get and set the bounding set (but not pI). For instance ./bset get (lists capabilities in bset) ./bset drop cap_net_raw (starts shell with new bset) (use capset, setuid binary, or binary with file capabilities to try to increase caps) ************************************************************ cap_bound.c ************************************************************ #include <sys/prctl.h> #include <linux/capability.h> #include <sys/types.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #ifndef PR_CAPBSET_READ #define PR_CAPBSET_READ 23 #endif #ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif int usage(char *me) { printf("Usage: %s get\n", me); printf(" %s drop <capability>\n", me); return 1; } #define numcaps 32 char *captable[numcaps] = { "cap_chown", "cap_dac_override", "cap_dac_read_search", "cap_fowner", "cap_fsetid", "cap_kill", "cap_setgid", "cap_setuid", "cap_setpcap", "cap_linux_immutable", "cap_net_bind_service", "cap_net_broadcast", "cap_net_admin", "cap_net_raw", "cap_ipc_lock", "cap_ipc_owner", "cap_sys_module", "cap_sys_rawio", "cap_sys_chroot", "cap_sys_ptrace", "cap_sys_pacct", "cap_sys_admin", "cap_sys_boot", "cap_sys_nice", "cap_sys_resource", "cap_sys_time", "cap_sys_tty_config", "cap_mknod", "cap_lease", "cap_audit_write", "cap_audit_control", "cap_setfcap" }; int getbcap(void) { int comma=0; unsigned long i; int ret; printf("i know of %d capabilities\n", numcaps); printf("capability bounding set:"); for (i=0; i<numcaps; i++) { ret = prctl(PR_CAPBSET_READ, i); if (ret < 0) perror("prctl"); else if (ret==1) printf("%s%s", (comma++) ? ", " : " ", captable[i]); } printf("\n"); return 0; } int capdrop(char *str) { unsigned long i; int found=0; for (i=0; i<numcaps; i++) { if (strcmp(captable[i], str) == 0) { found=1; break; } } if (!found) return 1; if (prctl(PR_CAPBSET_DROP, i)) { perror("prctl"); return 1; } return 0; } int main(int argc, char *argv[]) { if (argc<2) return usage(argv[0]); if (strcmp(argv[1], "get")==0) return getbcap(); if (strcmp(argv[1], "drop")!=0 || argc<3) return usage(argv[0]); if (capdrop(argv[2])) { printf("unknown capability\n"); return 1; } return execl("/bin/bash", "/bin/bash", NULL); } ************************************************************ [serue@us.ibm.com: fix typo] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Casey Schaufler <casey@schaufler-ca.com>a Signed-off-by: "Serge E. Hallyn" <serue@us.ibm.com> Tested-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl: Infrastructure for per namespace sysctlsEric W. Biederman2008-01-281-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the basic infrastructure for per namespace sysctls. A list of lists of sysctl headers is added, allowing each namespace to have it's own list of sysctl headers. Each list of sysctl headers has a lookup function to find the first sysctl header in the list, allowing the lists to have a per namespace instance. register_sysct_root is added to tell sysctl.c about additional lists of sysctl_headers. As all of the users are expected to be in kernel no unregister function is provided. sysctl_head_next is updated to walk through the list of lists. __register_sysctl_paths is added to add a new sysctl table on a non-default sysctl list. The only intrusive part of this patch is propagating the information to decided which list of sysctls to use for sysctl_check_table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [S390] Remove appldata include from sysctl_check.cHeiko Carstens2008-01-261-1/+0
| | | | | | | Forgot to remove this when removing the appldata binary sysctls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* sysctl: fix ax25 checksEric W. Biederman2007-12-171-1/+6
| | | | | | | | | | | | | | | | | | | | Fix: sysctl table check failed: /net/ax25/ax0/ax25_default_mode .3.9.1.2 Unknown sysctl binary path Pid: 2936, comm: kissattach Not tainted 2.6.24-rc5 #1 [<c012ca6a>] set_fail+0x3b/0x43 [<c012ce7a>] sysctl_check_table+0x408/0x456 [<c012ce8e>] sysctl_check_table+0x41c/0x456 [<c012ce8e>] sysctl_check_table+0x41c/0x456 ... Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Bernard Pidoux <pidoux@ccr.jussieu.fr> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.David S. Miller2007-12-051-1/+1
| | | | | | Based upon a report by Mikael Pettersson. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6Linus Torvalds2007-11-261-31/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (41 commits) [XFRM]: Fix leak of expired xfrm_states [ATM]: [he] initialize lock and tasklet earlier [IPV4]: Remove bogus ifdef mess in arp_process [SKBUFF]: Free old skb properly in skb_morph [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on [IPSEC]: Temporarily remove locks around copying of non-atomic fields [TCP] MTUprobe: Cleanup send queue check (no need to loop) [TCP]: MTUprobe: receiver window & data available checks fixed [MAINTAINERS]: tlan list is subscribers-only [SUNRPC]: Remove SPIN_LOCK_UNLOCKED [SUNRPC]: Make xprtsock.c:xs_setup_{udp,tcp}() static [PFKEY]: Sending an SADB_GET responds with an SADB_GET [IRDA]: Compilation for CONFIG_INET=n case [IPVS]: Fix compiler warning about unused register_ip_vs_protocol [ARP]: Fix arp reply when sender ip 0 [IPV6] TCPMD5: Fix deleting key operation. [IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool(). [IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps. [IPV4] TCPMD5: Omit redundant NULL check for kfree() argument. ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution ...
| * [IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBEREDSimon Horman2007-11-191-25/+0
| | | | | | | | | | | | | | | | | | Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED, I stronly doubt that anyone is using the sys_sysctl interface to these variables. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [IPVS]: Fix sysctl warnings about missing strategy in schedulersSimon Horman2007-11-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy Switch these entried over to use CTL_UNNUMBERED as clearly the sys_syscal portion wasn't working. This is along the same lines as Christian Borntraeger's patch that fixes up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [IPVS]: Fix sysctl warnings about missing strategyChristian Borntraeger2007-11-191-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running the latest git code I get the following messages during boot: sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy I removed the binary sysctl handler for those messages and also removed the definitions in ip_vs.h. The alternative would be to implement a proper strategy handler, but syscall sysctl is deprecated. There are other sysctl definitions that are commented out or work with the default sysctl_data strategy. I did not touch these. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [S390] appldata: remove unused binary sysctls.Heiko Carstens2007-11-201-11/+0
| | | | | | | | | | | | | | | | | | | | Remove binary sysctls that never worked due to missing strategy functions. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | [S390] cmm: remove unused binary sysctls.Heiko Carstens2007-11-201-3/+0
|/ | | | | | | | | | Remove binary sysctls that never worked due to missing strategy functions. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* [SYSCTL]: Fix warning for token-ring from sysctl checkerOlof Johansson2007-11-131-1/+1
| | | | | | | | | As seen when booting ppc64_defconfig: sysctl table check failed: /net/token-ring .3.14 procname does not match binary path procname Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Dump stack during sysctl registration failureAlexey Dobriyan2007-11-051-0/+1
| | | | | | | | | | Let's make immediately obvious from where sysctl comes from and messages itself more noticeable. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix appletalk sysctl entry nameEric W. Biederman2007-10-221-1/+1
| | | | | | | | | | | | | | Gabriel C reported that modprobing appletalk on current git gives a warning in dmesg : "sysctl table check failed: /net/appletalk .3.7 procname does not match binary path procname" Oops. My apologies it appears I made a mistake when creating my table to check up on sysctl values. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Tested-by: Gabriel C <nix.or.die@googlemail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* V3 file capabilities: alter behavior of cap_setpcapAndrew Morgan2007-10-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1, can change the capabilities of another process, p2. This is not the meaning that was intended for this capability at all, and this implementation came about purely because, without filesystem capabilities, there was no way to use capabilities without one process bestowing them on another. Since we now have a filesystem support for capabilities we can fix the implementation of CAP_SETPCAP. The most significant thing about this change is that, with it in effect, no process can set the capabilities of another process. The capabilities of a program are set via the capability convolution rules: pI(post-exec) = pI(pre-exec) pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI) pE(post-exec) = fE ? pP(post-exec) : 0 at exec() time. As such, the only influence the pre-exec() program can have on the post-exec() program's capabilities are through the pI capability set. The correct implementation for CAP_SETPCAP (and that enabled by this patch) is that it can be used to add extra pI capabilities to the current process - to be picked up by subsequent exec()s when the above convolution rules are applied. Here is how it works: Let's say we have a process, p. It has capability sets, pE, pP and pI. Generally, p, can change the value of its own pI to pI' where (pI' & ~pI) & ~pP = 0. That is, the only new things in pI' that were not present in pI need to be present in pP. The role of CAP_SETPCAP is basically to permit changes to pI beyond the above: if (pE & CAP_SETPCAP) { pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */ } This capability is useful for things like login, which (say, via pam_cap) might want to raise certain inheritable capabilities for use by the children of the logged-in user's shell, but those capabilities are not useful to or needed by the login program itself. One such use might be to limit who can run ping. You set the capabilities of the 'ping' program to be "= cap_net_raw+i", and then only shells that have (pI & CAP_NET_RAW) will be able to run it. Without CAP_SETPCAP implemented as described above, login(pam_cap) would have to also have (pP & CAP_NET_RAW) in order to raise this capability and pass it on through the inheritable set. Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl: for irda update sysctl_checks list of binary pathsEric W. Biederman2007-10-181-0/+19
| | | | | | | | | | | | It turns out that the net/irda code didn't register any of it's binary paths in the global sysctl.h header file so I missed them completely when making an authoritative list of binary sysctl paths in the kernel. So add them to the list of valid binary sysctl paths. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl: update sysctl_check_tableEric W. Biederman2007-10-181-14/+22
| | | | | | | | | | | | | | | | | | | Well it turns out after I dug into the problems a little more I was returning a few false positives so this patch updates my logic to remove them. - Don't complain about 0 ctl_names in sysctl_check_binary_path It is valid for someone to remove the sysctl binary interface and still keep the same sysctl proc interface. - Count ctl_names and procnames as matching if they both don't exist. - Only warn about missing min&max when the generic functions care. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl: Error on bad sysctl tablesEric W. Biederman2007-10-181-0/+1556
After going through the kernels sysctl tables several times it has become clear that code review and testing is just not effective in prevent problematic sysctl tables from being used in the stable kernel. I certainly can't seem to fix the problems as fast as they are introduced. Therefore this patch adds sysctl_check_table which is called when a sysctl table is registered and checks to see if we have a problematic sysctl table. The biggest part of the code is the table of valid binary sysctl entries, but since we have frozen our set of binary sysctls this table should not need to change, and it makes it much easier to detect when someone unintentionally adds a new binary sysctl value. As best as I can determine all of the several hundred errors spewed on boot up now are legitimate. [bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>