summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | | | | | xfs: prevent reading uninitialized stack memoryDan Rosenberg2010-09-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* | | | | | | | | | | | | | x86, tsc: Fix a preemption leak in restore_sched_clock_state()Peter Zijlstra2010-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A real life genuine preemption leak.. Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | execve: make responsive to SIGKILL with large argumentsRoland McGrath2010-09-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | execve: improve interactivity with large argumentsRoland McGrath2010-09-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | setup_arg_pages: diagnose excessive argument sizeRoland McGrath2010-09-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2010-09-105-10/+16
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Perform hardware_enable in CPU_STARTING callback KVM: i8259: fix migration KVM: fix i8259 oops when no vcpus are online KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
| * | | | | | | | | | | | | | KVM: x86: Perform hardware_enable in CPU_STARTING callbackZachary Amsden2010-09-091-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CPU_STARTING callback was added upstream with the intention of being used for KVM, specifically for the hardware enablement that must be done before we can run in hardware virt. It had bugs on the x86_64 architecture at the time, where it was called after CPU_ONLINE. The arches have since merged and the bug is gone. It might be noted other features should probably start making use of this callback; microcode updates in particular which might be fixing important erratums would be best applied before beginning to run user tasks. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | | | | | | | | | | | KVM: i8259: fix migrationGleb Natapov2010-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Top of kvm_kpic_state structure should have the same memory layout as kvm_pic_state since it is copied by memcpy. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | | | | | | | | KVM: fix i8259 oops when no vcpus are onlineAvi Kivity2010-09-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are no vcpus, found will be NULL. Check before doing anything with it. Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | | | | | | | | KVM: x86 emulator: fix regression with cmpxchg8b on i386 hostsAvi Kivity2010-09-082-6/+10
| | |_|/ / / / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b operands are 64-bit. Fix by adding val64 and orig_val64 union members to struct operand, and using them where needed. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | | | | | | | | | | | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-09-108-36/+73
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread perf symbols: Fix multiple initialization of symbol system perf: Fix CPU hotplug perf, trace: Fix module leak tracing/kprobe: Fix handling of C-unlike argument names tracing/kprobes: Fix handling of argument names perf probe: Fix handling of arguments names perf probe: Fix return probe support tracing/kprobe: Fix a memory leak in error case tracing: Do not allow llseek to set_ftrace_filter
| * \ \ \ \ \ \ \ \ \ \ \ \ \ Merge branch 'tip/perf/urgent' of ↵Ingo Molnar2010-09-101-0/+2
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
| | * | | | | | | | | | | | | | tracing: t_start: reset FTRACE_ITER_HASH in case of seek/preadChris Wright2010-09-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without having properly started the iterator to iterate the hash. This case is degenerate and, as discovered by Robert Swiecki, can cause t_hash_show() to misuse a pointer. This causes a NULL ptr deref with possible security implications. Tracked as CVE-2010-3079. Cc: Robert Swiecki <swiecki@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | | | | | | | | Merge branch 'perf/urgent' of ↵Ingo Molnar2010-09-102-1/+9
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
| | * | | | | | | | | | | | | | perf symbols: Fix multiple initialization of symbol systemJovi Zhang2010-09-092-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By returning immediately if it was already initialized, do it as well at symbol__exit, refusing multiple deinitializations. This fixes problems in the kmem, sched and timechart commands. Reported-by: Davidlohr Bueso <dave@gnu.org> Cc: Davidlohr Bueso <dave@gnu.org> Signed-off-by: Jovi Zhang <bookjovi@gmail.com> LKML-Reference: AANLkTi=9Cn=R8SPMCRp5z+gEjXbaBHeb-AaOtRbuwwcn@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | | | | | | | perf: Fix CPU hotplugPeter Zijlstra2010-09-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we have UP_PREPARE, we should also have UP_CANCELED. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | | | | | | | | | perf, trace: Fix module leakLi Zefan2010-09-091-0/+3
| |/ / / / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1c024eca (perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events) caused a module refcount leak. Reported-And-Tested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | | | | | | | | tracing/kprobe: Fix handling of C-unlike argument namesMasami Hiramatsu2010-09-081-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check the argument name whether it is invalid (not C-like symbol name). This makes event format simple. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113912.22882.62313.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | | | | | | tracing/kprobes: Fix handling of argument namesMasami Hiramatsu2010-09-081-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set "argN" name for each argument automatically if it has no specified name. Since dynamic trace event(kprobe_events) accepts special characters for its argument, its format can show those special characters (e.g. '$', '%', '+'). However, perf can't parse those format because of the character (especially '%') mess up the format. This sets "argX" name for those arguments if user omitted the argument names. E.g. # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events # cat tracing/kprobe_events p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113906.22882.59312.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | | | | | | perf probe: Fix handling of arguments namesMasami Hiramatsu2010-09-081-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't make argument names from raw parameters (means the parameters are written in kprobe-tracer syntax), because the argument syntax may include special characters. Just leave it, then kprobe-tracer gives a new name. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100827113859.22882.75598.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | | | | | | perf probe: Fix return probe supportMasami Hiramatsu2010-09-082-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug to support %return probe syntax again. Previous commit 4235b04 has a bug which disables the %return syntax on perf probe. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100827113852.22882.87447.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | | | | | | tracing/kprobe: Fix a memory leak in error caseMasami Hiramatsu2010-09-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a memory leak which happens when a field name conflicts with others. In error case, free_trace_probe() will free all arguments until nr_args, so this increments nr_args the begining of the loop instead of the end. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | | | | | | | tracing: Do not allow llseek to set_ftrace_filterSteven Rostedt2010-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reading the file set_ftrace_filter does three things. 1) shows whether or not filters are set for the function tracer 2) shows what functions are set for the function tracer 3) shows what triggers are set on any functions 3 is independent from 1 and 2. The way this file currently works is that it is a state machine, and as you read it, it may change state. But this assumption breaks when you use lseek() on the file. The state machine gets out of sync and the t_show() may use the wrong pointer and cause a kernel oops. Luckily, this will only kill the app that does the lseek, but the app dies while holding a mutex. This prevents anyone else from using the set_ftrace_filter file (or any other function tracing file for that matter). A real fix for this is to rewrite the code, but that is too much for a -rc release or stable. This patch simply disables llseek on the set_ftrace_filter() file for now, and we can do the proper fix for the next major release. Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | | | | | | | | | | | KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyringDavid Howells2010-09-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership of the parent process's session keyring whether or not the parent has a session keyring [CVE-2010-2960]. This results in the following oops: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443 ... Call Trace: [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443 [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0 [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b if the parent process has no session keyring. If the system is using pam_keyinit then it mostly protected against this as all processes derived from a login will have inherited the session keyring created by pam_keyinit during the log in procedure. To test this, pam_keyinit calls need to be commented out in /etc/pam.d/. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | | KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()David Howells2010-09-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's an protected access to the parent process's credentials in the middle of keyctl_session_to_parent(). This results in the following RCU warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by keyctl-session-/2137: #0: (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236 stack backtrace: Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1 Call Trace: [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236 [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b The code should take the RCU read lock to make sure the parents credentials don't go away, even though it's holding a spinlock and has IRQ disabled. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-09-1017-46/+238
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls
| * | | | | | | | | | | | | | | block: Range check cpu in blk_cpu_to_groupBrian King2010-09-101-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing CPU DLPAR, the following problem was discovered. We were DLPAR removing the first CPU, which in this case was logical CPUs 0-3. CPUs 0-2 were already marked offline and we were in the process of offlining CPU 3. After marking the CPU inactive and offline in cpu_disable, but before the cpu was completely idle (cpu_die), we ended up in __make_request on CPU 3. There we looked at the topology map to see which CPU to complete the I/O on and found no CPUs in the cpu_sibling_map. This resulted in the block layer setting the completion cpu to be NR_CPUS, which then caused an oops when we tried to complete the I/O. Fix this by sanity checking the value we return from blk_cpu_to_group to be a valid cpu value. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | scatterlist: prevent invalid free when alloc failsJeffrey Carlyle2010-08-301-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When alloc fails, free_table is being called. Depending on the number of bytes requested, we determine if we are going to call _get_free_page() or kmalloc(). When alloc fails, our math is wrong (due to sg_size - 1), and the last buffer is wrongfully assumed to have been allocated by kmalloc. Hence, kfree gets called and a panic occurs. Signed-off-by: Jeffrey Carlyle <jeff.carlyle@motorola.com> Signed-off-by: Olusanya Soyannwo <c23746@motorola.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | writeback: Fix lost wake-up shutting down writeback threadJ. Bruce Fields2010-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting the task state here may cause us to miss the wake up from kthread_stop(), so we need to recheck kthread_should_stop() or risk sleeping forever in the following schedule(). Symptom was an indefinite hang on an NFSv4 mount. (NFSv4 may create multiple mounts in a temporary namespace while traversing the mount path, and since the temporary namespace is immediately destroyed, it may end up destroying a mount very soon after it was created, possibly making this race more likely.) INFO: task mount.nfs4:4314 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 0000000000000000 2880 4314 4313 0x00000000 ffff88001ed6da28 0000000000000046 ffff88001ed6dfd8 ffff88001ed6dfd8 ffff88001ed6c000 ffff88001ed6c000 ffff88001ed6c000 ffff88001e5003a0 ffff88001ed6dfd8 ffff88001e5003a8 ffff88001ed6c000 ffff88001ed6dfd8 Call Trace: [<ffffffff8196090d>] schedule_timeout+0x1cd/0x2e0 [<ffffffff8106a31c>] ? mark_held_locks+0x6c/0xa0 [<ffffffff819639a0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff8106a5fd>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff819671fe>] ? sub_preempt_count+0xe/0xd0 [<ffffffff8195fc80>] wait_for_common+0x120/0x190 [<ffffffff81033c70>] ? default_wake_function+0x0/0x20 [<ffffffff8195fdcd>] wait_for_completion+0x1d/0x20 [<ffffffff810595fa>] kthread_stop+0x4a/0x150 [<ffffffff81061a60>] ? thaw_process+0x70/0x80 [<ffffffff810cc68a>] bdi_unregister+0x10a/0x1a0 [<ffffffff81229dc9>] nfs_put_super+0x19/0x20 [<ffffffff810ee8c4>] generic_shutdown_super+0x54/0xe0 [<ffffffff810ee9b6>] kill_anon_super+0x16/0x60 [<ffffffff8122d3b9>] nfs4_kill_super+0x39/0x90 [<ffffffff810eda45>] deactivate_locked_super+0x45/0x60 [<ffffffff810edfb9>] deactivate_super+0x49/0x70 [<ffffffff81108294>] mntput_no_expire+0x84/0xe0 [<ffffffff811084ef>] release_mounts+0x9f/0xc0 [<ffffffff81108575>] put_mnt_ns+0x65/0x80 [<ffffffff8122cc56>] nfs_follow_remote_path+0x1e6/0x420 [<ffffffff8122cfbf>] nfs4_try_mount+0x6f/0xd0 [<ffffffff8122d0c2>] nfs4_get_sb+0xa2/0x360 [<ffffffff810edcb8>] vfs_kern_mount+0x88/0x1f0 [<ffffffff810ede92>] do_kern_mount+0x52/0x130 [<ffffffff81963d9a>] ? _lock_kernel+0x6a/0x170 [<ffffffff81108e9e>] do_mount+0x26e/0x7f0 [<ffffffff81106b3a>] ? copy_mount_options+0xea/0x190 [<ffffffff811094b8>] sys_mount+0x98/0xf0 [<ffffffff810024d8>] system_call_fastpath+0x16/0x1b 1 lock held by mount.nfs4/4314: #0: (&type->s_umount_key#24){+.+...}, at: [<ffffffff810edfb1>] deactivate_super+0x41/0x70 Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | | | | | | | | | | | | | | writeback: do not lose wakeup events when forking bdi threadsArtem Bityutskiy2010-08-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following issue: INFO: task mount.nfs4:1120 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 00000000fffc6a21 0 1120 1119 0x00000000 ffff880235643948 0000000000000046 ffffffff00000000 ffffffff00000000 ffff880235643fd8 ffff880235314760 00000000001d44c0 ffff880235643fd8 00000000001d44c0 00000000001d44c0 00000000001d44c0 00000000001d44c0 Call Trace: [<ffffffff813bc747>] schedule_timeout+0x34/0xf1 [<ffffffff813bc530>] ? wait_for_common+0x3f/0x130 [<ffffffff8106b50b>] ? trace_hardirqs_on+0xd/0xf [<ffffffff813bc5c3>] wait_for_common+0xd2/0x130 [<ffffffff8104159c>] ? default_wake_function+0x0/0xf [<ffffffff813beaa0>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff813bc6bb>] wait_for_completion+0x18/0x1a [<ffffffff81101a03>] sync_inodes_sb+0xca/0x1bc [<ffffffff811056a6>] __sync_filesystem+0x47/0x7e [<ffffffff81105798>] sync_filesystem+0x47/0x4b [<ffffffff810e7ffd>] generic_shutdown_super+0x22/0xd2 [<ffffffff810e80f8>] kill_anon_super+0x11/0x4f [<ffffffffa00d06d7>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810e7b68>] deactivate_locked_super+0x21/0x41 [<ffffffff810e7fd6>] deactivate_super+0x40/0x45 [<ffffffff810fc66c>] mntput_no_expire+0xb8/0xed [<ffffffff810fc73b>] release_mounts+0x9a/0xb0 [<ffffffff810fc7bb>] put_mnt_ns+0x6a/0x7b [<ffffffffa00d0fb2>] nfs_follow_remote_path+0x19a/0x296 [nfs] [<ffffffffa00d11ca>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa00d1790>] nfs4_get_sb+0x276/0x2ff [nfs] [<ffffffff810e7dba>] vfs_kern_mount+0xb8/0x196 [<ffffffff810e7ef6>] do_kern_mount+0x48/0xe8 [<ffffffff810fdf68>] do_mount+0x771/0x7e8 [<ffffffff810fe062>] sys_mount+0x83/0xbd [<ffffffff810089c2>] system_call_fastpath+0x16/0x1b The reason of this hang was a race condition: when the flusher thread is forking a bdi thread, we use 'kthread_run()', so we run it _before_ we make it visible in 'bdi->wb.task'. The bdi thread runs, does all works, and goes sleep. 'bdi->wb.task' is still NULL. And this is a dangerous time window. If at this time someone queues a work for this bdi, he does not see the bdi thread and wakes up the forker thread instead! But the forker has already forked this bdi thread, but just did not make it visible yet! The result is that we lose the wake up event for this bdi thread and the NFS4 code waits forever. To fix the problem, we should use 'ktrhead_create()' for creating bdi threads, then make them visible in 'bdi->wb.task', and only after this wake them up. This is exactly what this patch does. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cciss: fix reporting of max queue depth since initStephen M. Cameron2010-08-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ioctl path and the scsi tape path were not accounting for their additions to the queue depth. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | block: switch s390 tape_block and mg_disk to elevator_change()Jens Axboe2010-08-232-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have this API, switch the two in-kernel users to it. Resolves an oops introduced by commit 1abec4fdbb142e3ccb6ce99832fae42129134a96. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | block: add function call to switch the IO scheduler from a driverJens Axboe2010-08-232-13/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently drivers must do an elevator_exit() + elevator_init() to switch IO schedulers. There are a few problems with this: - Since commit 1abec4fdbb142e3ccb6ce99832fae42129134a96, elevator_init() requires a zeroed out q->elevator pointer. The two existing in-kernel users don't do that. - It will only work at initialization time, since using the above two-staged construct does not properly quisce the queue. So add elevator_change() which takes care of this, and convert the elv_iosched_store() sysfs interface to use this helper as well. Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reported-by: Kevin Vigor <kevin@vigor.nu> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | fs/bio-integrity.c: return -ENOMEM on kmalloc failureAndrew Morton2010-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | bio-integrity.c: remove dependency on __GFP_NOFAILDavid Rientjes2010-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL from its mask. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | BLOCK: fix bio.bi_rw handlingJiri Slaby2010-08-232-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return of the bi_rw tests is no longer bool after commit 74450be1. But results of such tests are stored in bools. This doesn't fit in there for some compilers (gcc 4.5 here), so either use !! magic to get real bools or use ulong where the result is assigned somewhere. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | block: put dev->kobj in blk_register_queue fail pathXiaotian Feng2010-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel needs to kobject_put on dev->kobj if elv_register_queue fails. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cciss: handle allocation failureDan Carpenter2010-08-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If kmalloc() fails then cleanup and return failure (-1). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cfq-iosched: Documentation help for new tunablesVivek Goyal2010-08-232-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some documentation to provide help with tunables. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cfq-iosched: blktrace print per slice sector statsVivek Goyal2010-08-231-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Divyesh had gotten rid of this code in the past. I want to re-introduce it back as it helps me a lot during debugging. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Divyesh Shah <dpshah@google.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cfq-iosched: Implement tunable group_idleVivek Goyal2010-08-231-9/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Implement a new tunable group_idle, which allows idling on the group instead of a cfq queue. Hence one can set slice_idle = 0 and not idle on the individual queues but idle on the group. This way on fast storage we can get fairness between groups at the same time overall throughput improves. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cfq-iosched: Do group share accounting in IOPS when slice_idle=0Vivek Goyal2010-08-231-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Implement another CFQ mode where we charge group in terms of number of requests dispatched instead of measuring the time. Measuring in terms of time is not possible when we are driving deeper queue depths and there are requests from multiple cfq queues in the request queue. o This mode currently gets activated if one sets slice_idle=0 and associated disk supports NCQ. Again the idea is that on an NCQ disk with idling disabled most of the queues will dispatch 1 or more requests and then cfq queue expiry happens and we don't have a way to measure time. So start providing fairness in terms of IOPS. o Currently IOPS mode works only with cfq group scheduling. CFQ is following different scheduling algorithms for queue and group scheduling. These IOPS stats are used only for group scheduling hence in non-croup mode nothing should change. o For CFQ group scheduling one can disable slice idling so that we don't idle on queue and drive deeper request queue depths (achieving better throughput), at the same time group idle is enabled so one should get service differentiation among groups. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cfq-iosched: Do not idle if slice_idle=0Vivek Goyal2010-08-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not idle either on cfq queue or service tree if slice_idle=0. User does not want any queue or service tree idling. Currently even if slice_idle=0, we were waiting for request to finish before expiring the queue and that can lead to lower queue depths. Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | cciss: disable doorbell reset on reset_devicesStephen M. Cameron2010-08-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The doorbell reset initially appears to work correctly, the controller resets, comes up, some i/o can even be done, but on at least some Smart Arrays in some servers, it eventually causes a subsequent controller lockup due to some kind of PCIe error, and kdump can end up leaving the root filesystem in an unbootable state. For this reason, until the problem is fixed, or at least isolated to certain hardware enough to be avoided, the doorbell reset should not be used at all. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | | | | | | | blkio: Fix return code for mkdir callsCiju Rajan K2010-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the cgroup hierarchy for blkio control groups is deeper than two levels, kernel should not allow the creation of further levels. mkdir system call does not except EINVAL as a return value. This patch replaces EINVAL with more appropriate EPERM Signed-off-by: Ciju Rajan K <ciju@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | | | | | | | | | | | | | | Merge branch 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91Linus Torvalds2010-09-104-19/+36
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91: AT91: at91sam9261ek: remove C99 comments but keep information AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC AT91: dm9000 initialization update AT91: SAM9G45 - add a separate clock entry for every single TC block AT91: clock: peripheral clocks can have other parent than mck AT91: change dma resource index
| * | | | | | | | | | | | | | | | AT91: at91sam9261ek: remove C99 comments but keep informationNicolas Ferre2010-09-101-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
| * | | | | | | | | | | | | | | | AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMCNicolas Ferre2010-09-101-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sd/mmc data structure is not used if SPI is selected. The configuration of PIO on the board prevent from using both interfaces at the same time (board dependent). Remove the warnings at compilation time adding a preprocessor condition. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
| * | | | | | | | | | | | | | | | AT91: dm9000 initialization updateNicolas Ferre2010-09-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add information in dm9000 mac/phy chip initialization: - irq resource details - platform data details Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
| * | | | | | | | | | | | | | | | AT91: SAM9G45 - add a separate clock entry for every single TC blockFabian Godehardt2010-09-092-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this patch you will not be able to register the first block because of the second association call on at91_add_device_tc(). Signed-off-by: Fabian Godehardt <fg@emlix.com> [nicolas.ferre@atmel.com: change tcb1_clk to fake child clock of tcb0_clk] Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>