From 89c5b53b16bf577079d4f0311406dbea3c71202c Mon Sep 17 00:00:00 2001 From: "Luis R. Rodriguez" Date: Wed, 12 Jul 2017 14:33:27 -0700 Subject: sysctl: fix lax sysctl_check_table() sanity check Patch series "sysctl: few fixes", v5. I've been working on making kmod more deterministic, and as I did that I couldn't help but notice a few issues with sysctl. My end goal was just to fix unsigned int support, which back then was completely broken. Liping Zhang has sent up small atomic fixes, however it still missed yet one more fix and Alexey Dobriyan had also suggested to just drop array support given its complexity. I have inspected array support using Coccinelle and indeed its not that popular, so if in fact we can avoid it for new interfaces, I agree its best. I did develop a sysctl stress driver but will hold that off for another series. This patch (of 5): Commit 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks") improved sanity checks considerbly, however the enhancements on sysctl_check_table() meant adding a functional change so that only the last table entry's sanity error is propagated. It also changed the way errors were propagated so that each new check reset the err value, this means only last sanity check computed is used for an error. This has been in the kernel since v3.4 days. Fix this by carrying on errors from previous checks and iterations as we traverse the table and ensuring we keep any error from previous checks. We keep iterating on the table even if an error is found so we can complain for all errors found in one shot. This works as -EINVAL is always returned on error anyway, and the check for error is any non-zero value. Fixes: 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks") Link: http://lkml.kernel.org/r/20170519033554.18592-2-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez Cc: Al Viro Cc: "Eric W. Biederman" Cc: Alexey Dobriyan Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/proc_sysctl.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 67985a7233c2..32c9c5630507 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1066,7 +1066,7 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) int err = 0; for (; table->procname; table++) { if (table->child) - err = sysctl_err(path, table, "Not a file"); + err |= sysctl_err(path, table, "Not a file"); if ((table->proc_handler == proc_dostring) || (table->proc_handler == proc_dointvec) || @@ -1078,15 +1078,15 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) (table->proc_handler == proc_doulongvec_minmax) || (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) { if (!table->data) - err = sysctl_err(path, table, "No data"); + err |= sysctl_err(path, table, "No data"); if (!table->maxlen) - err = sysctl_err(path, table, "No maxlen"); + err |= sysctl_err(path, table, "No maxlen"); } if (!table->proc_handler) - err = sysctl_err(path, table, "No proc_handler"); + err |= sysctl_err(path, table, "No proc_handler"); if ((table->mode & (S_IRUGO|S_IWUGO)) != table->mode) - err = sysctl_err(path, table, "bogus .mode 0%o", + err |= sysctl_err(path, table, "bogus .mode 0%o", table->mode); } return err; -- cgit v1.2.3 From 4f2fec00afa60aa8e5d1b7f2a8e0526900f55623 Mon Sep 17 00:00:00 2001 From: "Luis R. Rodriguez" Date: Wed, 12 Jul 2017 14:33:36 -0700 Subject: sysctl: simplify unsigned int support Commit e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields") added proc_douintvec() to start help adding support for unsigned int, this however was only half the work needed. Two fixes have come in since then for the following issues: o Printing the values shows a negative value, this happens since do_proc_dointvec() and this uses proc_put_long() This was fixed by commit 5380e5644afbba9 ("sysctl: don't print negative flag for proc_douintvec"). o We can easily wrap around the int values: UINT_MAX is 4294967295, if we echo in 4294967295 + 1 we end up with 0, using 4294967295 + 2 we end up with 1. o We echo negative values in and they are accepted This was fixed by commit 425fffd886ba ("sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec"). It still also failed to be added to sysctl_check_table()... instead of adding it with the current implementation just provide a proper and simplified unsigned int support without any array unsigned int support with no negative support at all. Historically sysctl proc helpers have supported arrays, due to the complexity this adds though we've taken a step back to evaluate array users to determine if its worth upkeeping for unsigned int. An evaluation using Coccinelle has been done to perform a grammatical search to ask ourselves: o How many sysctl proc_dointvec() (int) users exist which likely should be moved over to proc_douintvec() (unsigned int) ? Answer: about 8 - Of these how many are array users ? Answer: Probably only 1 o How many sysctl array users exist ? Answer: about 12 This last question gives us an idea just how popular arrays: they are not. Array support should probably just be kept for strings. The identified uint ports are: drivers/infiniband/core/ucma.c - max_backlog drivers/infiniband/core/iwcm.c - default_backlog net/core/sysctl_net_core.c - rps_sock_flow_sysctl() net/netfilter/nf_conntrack_timestamp.c - nf_conntrack_timestamp -- bool net/netfilter/nf_conntrack_acct.c nf_conntrack_acct -- bool net/netfilter/nf_conntrack_ecache.c - nf_conntrack_events -- bool net/netfilter/nf_conntrack_helper.c - nf_conntrack_helper -- bool net/phonet/sysctl.c proc_local_port_range() The only possible array users is proc_local_port_range() but it does not seem worth it to add array support just for this given the range support works just as well. Unsigned int support should be desirable more for when you *need* more than INT_MAX or using int min/max support then does not suffice for your ranges. If you forget and by mistake happen to register an unsigned int proc entry with an array, the driver will fail and you will get something as follows: sysctl table check failed: debug/test_sysctl//uint_0002 array now allowed CPU: 2 PID: 1342 Comm: modprobe Tainted: G W E Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Call Trace: dump_stack+0x63/0x81 __register_sysctl_table+0x350/0x650 ? kmem_cache_alloc_trace+0x107/0x240 __register_sysctl_paths+0x1b3/0x1e0 ? 0xffffffffc005f000 register_sysctl_table+0x1f/0x30 test_sysctl_init+0x10/0x1000 [test_sysctl] do_one_initcall+0x52/0x1a0 ? kmem_cache_alloc_trace+0x107/0x240 do_init_module+0x5f/0x200 load_module+0x1867/0x1bd0 ? __symbol_put+0x60/0x60 SYSC_finit_module+0xdf/0x110 SyS_finit_module+0xe/0x10 entry_SYSCALL_64_fastpath+0x1e/0xad RIP: 0033:0x7f042b22d119 Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields") Link: http://lkml.kernel.org/r/20170519033554.18592-5-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez Suggested-by: Alexey Dobriyan Cc: Subash Abhinov Kasiviswanathan Cc: Liping Zhang Cc: Alexey Dobriyan Cc: Heinrich Schuchardt Cc: Kees Cook Cc: "David S. Miller" Cc: Ingo Molnar Cc: Al Viro Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/proc_sysctl.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'fs/proc') diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 32c9c5630507..ee6feba8b6c0 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1061,6 +1061,18 @@ static int sysctl_err(const char *path, struct ctl_table *table, char *fmt, ...) return -EINVAL; } +static int sysctl_check_table_array(const char *path, struct ctl_table *table) +{ + int err = 0; + + if (table->proc_handler == proc_douintvec) { + if (table->maxlen != sizeof(unsigned int)) + err |= sysctl_err(path, table, "array now allowed"); + } + + return err; +} + static int sysctl_check_table(const char *path, struct ctl_table *table) { int err = 0; @@ -1081,6 +1093,8 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) err |= sysctl_err(path, table, "No data"); if (!table->maxlen) err |= sysctl_err(path, table, "No maxlen"); + else + err |= sysctl_check_table_array(path, table); } if (!table->proc_handler) err |= sysctl_err(path, table, "No proc_handler"); -- cgit v1.2.3 From 61d9b56a89208d8cccd0b4cfec7e6959717e16e3 Mon Sep 17 00:00:00 2001 From: "Luis R. Rodriguez" Date: Wed, 12 Jul 2017 14:33:40 -0700 Subject: sysctl: add unsigned int range support To keep parity with regular int interfaces provide the an unsigned int proc_douintvec_minmax() which allows you to specify a range of allowed valid numbers. Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an actual user for that. Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez Acked-by: Kees Cook Cc: Subash Abhinov Kasiviswanathan Cc: Heinrich Schuchardt Cc: Kees Cook Cc: "David S. Miller" Cc: Ingo Molnar Cc: Al Viro Cc: "Eric W. Biederman" Cc: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/proc_sysctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs/proc') diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index ee6feba8b6c0..8f9d564d0969 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1065,7 +1065,8 @@ static int sysctl_check_table_array(const char *path, struct ctl_table *table) { int err = 0; - if (table->proc_handler == proc_douintvec) { + if ((table->proc_handler == proc_douintvec) || + (table->proc_handler == proc_douintvec_minmax)) { if (table->maxlen != sizeof(unsigned int)) err |= sysctl_err(path, table, "array now allowed"); } @@ -1083,6 +1084,7 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) if ((table->proc_handler == proc_dostring) || (table->proc_handler == proc_dointvec) || (table->proc_handler == proc_douintvec) || + (table->proc_handler == proc_douintvec_minmax) || (table->proc_handler == proc_dointvec_minmax) || (table->proc_handler == proc_dointvec_jiffies) || (table->proc_handler == proc_dointvec_userhz_jiffies) || -- cgit v1.2.3 From e41d58185f1444368873d4d7422f7664a68be61d Mon Sep 17 00:00:00 2001 From: Dmitry Vyukov Date: Wed, 12 Jul 2017 14:34:35 -0700 Subject: fault-inject: support systematic fault injection Add /proc/self/task//fail-nth file that allows failing 0-th, 1-st, 2-nd and so on calls systematically. Excerpt from the added documentation: "Write to this file of integer N makes N-th call in the current task fail (N is 0-based). Read from this file returns a single char 'Y' or 'N' that says if the fault setup with a previous write to this file was injected or not, and disables the fault if it wasn't yet injected. Note that this file enables all types of faults (slab, futex, etc). This setting takes precedence over all other generic settings like probability, interval, times, etc. But per-capability settings (e.g. fail_futex/ignore-private) take precedence over it. This feature is intended for systematic testing of faults in a single system call. See an example below" Why add a new setting: 1. Existing settings are global rather than per-task. So parallel testing is not possible. 2. attr->interval is close but it depends on attr->count which is non reset to 0, so interval does not work as expected. 3. Trying to model this with existing settings requires manipulations of all of probability, interval, times, space, task-filter and unexposed count and per-task make-it-fail files. 4. Existing settings are per-failure-type, and the set of failure types is potentially expanding. 5. make-it-fail can't be changed by unprivileged user and aggressive stress testing better be done from an unprivileged user. Similarly, this would require opening the debugfs files to the unprivileged user, as he would need to reopen at least times file (not possible to pre-open before dropping privs). The proposed interface solves all of the above (see the example). We want to integrate this into syzkaller fuzzer. A prototype has found 10 bugs in kernel in first day of usage: https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance I've made the current interface work with all types of our sandboxes. For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to make /proc entries non-root owned. So I am fine with the current version of the code. [akpm@linux-foundation.org: fix build] Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov Cc: Akinobu Mita Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) (limited to 'fs/proc') diff --git a/fs/proc/base.c b/fs/proc/base.c index f1e1927ccd48..88b773f318cd 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1355,6 +1355,53 @@ static const struct file_operations proc_fault_inject_operations = { .write = proc_fault_inject_write, .llseek = generic_file_llseek, }; + +static ssize_t proc_fail_nth_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + int err, n; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + put_task_struct(task); + if (task != current) + return -EPERM; + err = kstrtoint_from_user(buf, count, 10, &n); + if (err) + return err; + if (n < 0 || n == INT_MAX) + return -EINVAL; + current->fail_nth = n + 1; + return count; +} + +static ssize_t proc_fail_nth_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + int err; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + put_task_struct(task); + if (task != current) + return -EPERM; + if (count < 1) + return -EINVAL; + err = put_user((char)(current->fail_nth ? 'N' : 'Y'), buf); + if (err) + return err; + current->fail_nth = 0; + return 1; +} + +static const struct file_operations proc_fail_nth_operations = { + .read = proc_fail_nth_read, + .write = proc_fail_nth_write, +}; #endif @@ -3311,6 +3358,11 @@ static const struct pid_entry tid_base_stuff[] = { #endif #ifdef CONFIG_FAULT_INJECTION REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations), + /* + * Operations on the file check that the task is current, + * so we create it with 0666 to support testing under unprivileged user. + */ + REG("fail-nth", 0666, proc_fail_nth_operations), #endif #ifdef CONFIG_TASK_IO_ACCOUNTING ONE("io", S_IRUSR, proc_tid_io_accounting), -- cgit v1.2.3