summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-11-191-25/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
| * genetlink: only pass array to genl_register_family_with_ops()Johannes Berg2013-11-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * genetlink: make all genl_ops users constJohannes Berg2013-11-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that genl_ops are no longer modified in place when registering, they can be made const. This patch was done mostly with spatch: @@ identifier ops; @@ +const struct genl_ops ops[] = { ... }; (except the struct thing in net/openvswitch/datapath.c) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * taskstats: use genl_register_family_with_ops()Johannes Berg2013-11-141-25/+14
| | | | | | | | | | | | | | | | This simplifies the code since there's no longer a need to have error handling in the registration. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | kernel/bounds: avoid circular dependencies in generated headersKirill A. Shutemov2013-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/spinlock.h> has heavy dependencies on other header files. It triggers circular dependencies in generated headers on IA64, at least: CC kernel/bounds.s In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0, from include/linux/thread_info.h:54, from include/asm-generic/preempt.h:4, from arch/ia64/include/generated/asm/preempt.h:1, from include/linux/preempt.h:18, from include/linux/spinlock.h:50, from kernel/bounds.c:14: /home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory compilation terminated. Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's enough to find out size of spinlock_t. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-and-Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2013-11-192-3/+16
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Ingo Molnar: "This is a multi-arch cleanup series from Thomas Gleixner, which we kept to near the end of the merge window, to not interfere with architecture updates. This series (motivated by the -rt kernel) unifies more aspects of IRQ handling and generalizes PREEMPT_ACTIVE" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt: Make PREEMPT_ACTIVE generic sparc: Use preempt_schedule_irq ia64: Use preempt_schedule_irq m32r: Use preempt_schedule_irq hardirq: Make hardirq bits generic m68k: Simplify low level interrupt handling code genirq: Prevent spurious detection for unconditionally polled interrupts
| * | genirq: Prevent spurious detection for unconditionally polled interruptsThomas Gleixner2013-11-132-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a 68k platform a couple of interrupts are demultiplexed and "polled" from a top level interrupt. Unfortunately there is no way to determine which of the sub interrupts raised the top level interrupt, so all of the demultiplexed interrupt handlers need to be invoked. Given a high enough frequency this can trigger the spurious interrupt detection mechanism, if one of the demultiplex interrupts returns IRQ_NONE continuously. But this is a false positive as the polling causes this behaviour and not buggy hardware/software. Introduce IRQ_POLLED which can be set at interrupt chip setup time via irq_set_status_flags(). The flag excludes the interrupt from the spurious detector and from all core polling activities. Reported-and-tested-by: Michael Schmitz <schmitzmic@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311061149250.23353@ionos.tec.linutronix.de
* | | Merge tag 'trace-3.13' of ↵Linus Torvalds2013-11-1616-212/+521
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "This batch of changes is mostly clean ups and small bug fixes. The only real feature that was added this release is from Namhyung Kim, who introduced "set_graph_notrace" filter that lets you run the function graph tracer and not trace particular functions and their call chain. Tom Zanussi added some updates to the ftrace multibuffer tracing that made it more consistent with the top level tracing. One of the fixes for perf function tracing required an API change in RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing that change in this release too, he gave me a branch that included all the changes to get that working, and I pulled that into my tree in order to complete the perf function tracing fix" * tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add rcu annotation for syscall trace descriptors tracing: Do not use signed enums with unsigned long long in fgragh output tracing: Remove unused function ftrace_off_permanent() tracing: Do not assign filp->private_data to freed memory tracing: Add helper function tracing_is_disabled() tracing: Open tracer when ftrace_dump_on_oops is used tracing: Add support for SOFT_DISABLE to syscall events tracing: Make register/unregister_ftrace_command __init tracing: Update event filters for multibuffer recordmcount.pl: Add support for __fentry__ ftrace: Have control op function callback only trace when RCU is watching rcu: Do not trace rcu_is_watching() functions ftrace/x86: skip over the breakpoint for ftrace caller trace/trace_stat: use rbtree postorder iteration helper instead of opencoding ftrace: Add set_graph_notrace filter ftrace: Narrow down the protected area of graph_lock ftrace: Introduce struct ftrace_graph_data ftrace: Get rid of ftrace_graph_filter_enabled tracing: Fix potential out-of-bounds in trace_get_user() tracing: Show more exact help information about snapshot
| * | | tracing: Add rcu annotation for syscall trace descriptorsSteven Rostedt (Red Hat)2013-11-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sparse complains about the enter/exit_sysycall_files[] variables being dereferenced with rcu_dereference_sched(). The fields need to be annotated with __rcu. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Do not use signed enums with unsigned long long in fgragh outputSteven Rostedt (Red Hat)2013-11-062-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The duration field of print_graph_duration() can also be used to do the space filling by passing an enum in it: DURATION_FILL_FULL DURATION_FILL_START DURATION_FILL_END The problem is that these are enums and defined as negative, but the duration field is unsigned long long. Most archs are fine with this but blackfin fails to compile because of it: kernel/built-in.o: In function `print_graph_duration': kernel/trace/trace_functions_graph.c:782: undefined reference to `__ucmpdi2' Overloading a unsigned long long with an signed enum is just bad in principle. We can accomplish the same thing by using part of the flags field instead. Cc: Mike Frysinger <vapier@gentoo.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Remove unused function ftrace_off_permanent()Steven Rostedt (Red Hat)2013-11-061-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the past, ftrace_off_permanent() was called if something strange was detected. But the ftrace_bug() now handles all the anomolies that can happen with ftrace (function tracing), and there are no uses of ftrace_off_permanent(). Get rid of it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Do not assign filp->private_data to freed memoryGeyslan G. Bem2013-11-061-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In system_tr_open(), the filp->private_data can be assigned the 'dir' variable even if it was freed. This is on the error path, and is harmless because the error return code will prevent filp->private_data from being used. But for correctness, we should not assign it to a recently freed variable, as that can cause static tools to give false warnings. Also have both subsystem_open() and system_tr_open() return -ENODEV if tracing has been disabled. Link: http://lkml.kernel.org/r/1383764571-7318-1-git-send-email-geyslan@gmail.com Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Add helper function tracing_is_disabled()Geyslan G. Bem2013-11-062-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch creates the function 'tracing_is_disabled', which can be used outside of trace.c. Link: http://lkml.kernel.org/r/1382141754-12155-1-git-send-email-geyslan@gmail.com Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Open tracer when ftrace_dump_on_oops is usedCody P Schafer2013-11-061-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With ftrace_dump_on_oops, we previously did not open the tracer in question, sometimes causing the trace output to be useless. For example, the function_graph tracer with tracing_thresh set dumped via ftrace_dump_on_oops would show a series of '}' indented at different levels, but no function names. call trace->open() (and do a few other fixups copied from the normal dump path) to make the output more intelligible. Link: http://lkml.kernel.org/r/1382554197-16961-1-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Add support for SOFT_DISABLE to syscall eventsTom Zanussi2013-11-052-12/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original SOFT_DISABLE patches didn't add support for soft disable of syscall events; this adds it. Add an array of ftrace_event_file pointers indexed by syscall number to the trace array and remove the existing enabled bitmaps, which as a result are now redundant. The ftrace_event_file structs in turn contain the soft disable flags we need for per-syscall soft disable accounting. Adding ftrace_event_files also means we can remove the USE_CALL_FILTER bit, thus enabling multibuffer filter support for syscall events. Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Make register/unregister_ftrace_command __initTom Zanussi2013-11-052-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | register/unregister_ftrace_command() are only ever called from __init functions, so can themselves be made __init. Also make register_snapshot_cmd() __init for the same reason. Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Update event filters for multibufferTom Zanussi2013-11-0512-91/+239
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Have control op function callback only trace when RCU is watchingSteven Rostedt (Red Hat)2013-11-051-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Jones reported that trinity would be able to trigger the following back trace: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #38 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by trinity-child1/18786: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310 stack backtrace: CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38 0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8 ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8 0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48 Call Trace: [<ffffffff816e2f6b>] dump_stack+0x19/0x1b [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310 [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310 [<ffffffff81309289>] ? __const_udelay+0x29/0x30 [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0 [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0 [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90 [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0 [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0 [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20 [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0 [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210 [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20 [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470 [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210 [<ffffffff816f4000>] ftrace_call+0x5/0x2f [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210 [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40 [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40 [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470 [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0 [<ffffffff81103673>] rcu_user_enter+0x13/0x20 [<ffffffff8114541a>] user_enter+0x6a/0xd0 [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140 [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d ------------[ cut here ]------------ Perf uses rcu_read_lock() but as the function tracer can trace functions even when RCU is not currently active, this makes the rcu_read_lock() used by perf ineffective. As perf is currently the only user of the ftrace_ops_control_func() and perf is also the only function callback that actively uses rcu_read_lock(), the quick fix is to prevent the ftrace_ops_control_func() from calling its callbacks if RCU is not active. With Paul's new "rcu_is_watching()" we can tell if RCU is active or not. Reported-by: Dave Jones <davej@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | rcu: Do not trace rcu_is_watching() functionsSteven Rostedt2013-11-052-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As perf uses the rcu_read_lock() primitives for recording into its ring buffer, perf tracing can not be called when RCU in inactive. With the perf function tracing, there are functions that can be traced when RCU is not active, and perf must not have its function callback called when this is the case. Luckily, Paul McKenney has created a way to detect when RCU is active or not with the rcu_is_watching() function. Unfortunately, this function can also be traced, and if that happens it can cause a bit of overhead for the perf function calls that do the check. Recursion protection prevents anything bad from happening, but there is a bit of added overhead for every function being traced that must detect that the rcu_is_watching() is also being traced. As rcu_is_watching() is a helper routine and not part of the critical logic in RCU, it does not need to be traced in order to debug RCU itself. Add the "notrace" annotation to all the rcu_is_watching() calls such that we never trace it. Link: http://lkml.kernel.org/r/20131104202736.72dd8e45@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | Merge branch 'idle.2013.09.25a' of ↵Steven Rostedt (Red Hat)2013-11-056-24/+41
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into HEAD Need to use Paul McKenney's "rcu_is_watching()" changes to fix a perf/ftrace bug.
| * | | | trace/trace_stat: use rbtree postorder iteration helper instead of opencodingCody P Schafer2013-11-051-36/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | ftrace: Add set_graph_notrace filterNamhyung Kim2013-10-183-3/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The set_graph_notrace filter is analogous to set_ftrace_notrace and can be used for eliminating uninteresting part of function graph trace output. It also works with set_graph_function nicely. # cd /sys/kernel/debug/tracing/ # echo do_page_fault > set_graph_function # perf ftrace live true 2) | do_page_fault() { 2) | __do_page_fault() { 2) 0.381 us | down_read_trylock(); 2) 0.055 us | __might_sleep(); 2) 0.696 us | find_vma(); 2) | handle_mm_fault() { 2) | handle_pte_fault() { 2) | __do_fault() { 2) | filemap_fault() { 2) | find_get_page() { 2) 0.033 us | __rcu_read_lock(); 2) 0.035 us | __rcu_read_unlock(); 2) 1.696 us | } 2) 0.031 us | __might_sleep(); 2) 2.831 us | } 2) | _raw_spin_lock() { 2) 0.046 us | add_preempt_count(); 2) 0.841 us | } 2) 0.033 us | page_add_file_rmap(); 2) | _raw_spin_unlock() { 2) 0.057 us | sub_preempt_count(); 2) 0.568 us | } 2) | unlock_page() { 2) 0.084 us | page_waitqueue(); 2) 0.126 us | __wake_up_bit(); 2) 1.117 us | } 2) 7.729 us | } 2) 8.397 us | } 2) 8.956 us | } 2) 0.085 us | up_read(); 2) + 12.745 us | } 2) + 13.401 us | } ... # echo handle_mm_fault > set_graph_notrace # perf ftrace live true 1) | do_page_fault() { 1) | __do_page_fault() { 1) 0.205 us | down_read_trylock(); 1) 0.041 us | __might_sleep(); 1) 0.344 us | find_vma(); 1) 0.069 us | up_read(); 1) 4.692 us | } 1) 5.311 us | } ... Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | ftrace: Narrow down the protected area of graph_lockNamhyung Kim2013-10-181-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The parser set up is just a generic utility that uses local variables allocated by the function. There's no need to hold the graph_lock for this set up. This also makes the code simpler. Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | ftrace: Introduce struct ftrace_graph_dataNamhyung Kim2013-10-181-19/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct ftrace_graph_data is for generalizing the access to set_graph_function file. This is a preparation for adding support to set_graph_notrace. Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | ftrace: Get rid of ftrace_graph_filter_enabledNamhyung Kim2013-10-182-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ftrace_graph_filter_enabled means that user sets function filter and it always has same meaning of ftrace_graph_count > 0. Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | tracing: Fix potential out-of-bounds in trace_get_user()Steven Rostedt2013-10-181-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | tracing: Show more exact help information about snapshotWang YanQing2013-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current "help" that comes out of the snapshot file when it is not allocated looks like this: # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Echo 2 says that it does not allocate the buffer, which is correct, but to be more consistent with "echo 0" it should also state that it does not free. Link: http://lkml.kernel.org/r/20130914045916.GA4243@udknight Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-153-3/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
| * | | | | timekeeping: Fix some trivial typos in commentsXie XiuQi2013-10-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | | | | irq: Fix some trivial typos in commentsXie XiuQi2013-10-182-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> [jkosina@suse.cz: fix 'explicitly', noticed by Randy Dunlap] Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | | | | kexec: Typo s/the/then/Geert Uytterhoeven2013-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-11-151-0/+11
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM changes from Paolo Bonzini: "Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) kvm, vmx: Fix lazy FPU on nested guest arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu arm/arm64: KVM: MMIO support for BE guest kvm, cpuid: Fix sparse warning kvm: Delete prototype for non-existent function kvm_check_iopl kvm: Delete prototype for non-existent function complete_pio hung_task: add method to reset detector pvclock: detect watchdog reset at pvclock read kvm: optimize out smp_mb after srcu_read_unlock srcu: API for barrier after srcu read unlock KVM: remove vm mmap method KVM: IOMMU: hva align mapping page size KVM: x86: trace cpuid emulation when called from emulator KVM: emulator: cleanup decode_register_operand() a bit KVM: emulator: check rex prefix inside decode_register() KVM: x86: fix emulation of "movzbl %bpl, %eax" kvm_host: typo fix KVM: x86: emulate SAHF instruction MAINTAINERS: add tree for kvm.git Documentation/kvm: add a 00-INDEX file ...
| * | | | | | hung_task: add method to reset detectorMarcelo Tosatti2013-11-061-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In certain occasions it is possible for a hung task detector positive to be false: continuation from a paused VM, for example. Add a method to reset detection, similar as is done with other kernel watchdogs. Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | | | | | | Merge tag 'modules-next-for-linus' of ↵Linus Torvalds2013-11-151-40/+26
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Mainly boring here, too. rmmod --wait finally removed, though" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: modpost: fix bogus 'exported twice' warnings. init: fix in-place parameter modification regression asmlinkage, module: Make ksymtab and kcrctab symbols and __this_module __visible kernel: add support for init_array constructors modpost: Optionally ignore secondary errors seen if a single module build fails module: remove rmmod --wait option.
| * | | | | | | kernel: add support for init_array constructorsFrantisek Hrbata2013-10-171-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the .init_array section as yet another section with constructors. This is needed because gcc could add __gcov_init calls to .init_array or .ctors section, depending on gcc (and binutils) version . v2: - reuse mod->ctors for .init_array section for modules, because gcc uses .ctors or .init_array, but not both at the same time v3: - fail to load if that does happen somehow. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | module: remove rmmod --wait option.Rusty Russell2013-09-231-38/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The option to wait for a module reference count to reach zero was in the initial module implementation, but it was never supported in modprobe (you had to use rmmod --wait). After discussion with Lucas, It has been deprecated (with a 10 second sleep) in kmod for the last year. This finally removes it: the flag will evoke a printk warning and a normal (non-blocking) remove attempt. Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | | | | | kernel: fix generic_exec_single indentationChristoph Hellwig2013-11-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | kernel: provide a __smp_call_function_single stub for !CONFIG_SMPChristoph Hellwig2013-11-151-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | kernel: remove CONFIG_USE_GENERIC_SMP_HELPERSChristoph Hellwig2013-11-152-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've switched over every architecture that supports SMP to it, so remove the new useless config variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | revert "softirq: Add support for triggering softirq work on softirqs"Christoph Hellwig2013-11-151-131/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit was incomplete in that code to remove items from the per-cpu lists was missing and never acquired a user in the 5 years it has been in the tree. We're going to implement what it seems to try to archive in a simpler way, and this code is in the way of doing so. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | mm: properly separate the bloated ptl from the regular casePeter Zijlstra2013-11-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use kernel/bounds.c to convert build-time spinlock_t size check into a preprocessor symbol and apply that to properly separate the page::ptl situation. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | mm: implement split page table lock for PMD levelKirill A. Shutemov2013-11-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The basic idea is the same as with PTE level: the lock is embedded into struct page of table's page. We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't take mm->page_table_lock anymore. Let's reuse page->lru of table's page for that. pgtable_pmd_page_ctor() returns true, if initialization is successful and false otherwise. Current implementation never fails, but assumption that constructor can fail will help to port it to -rt where spinlock_t is rather huge and cannot be embedded into struct page -- dynamic allocation is required. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | mm: convert mm->nr_ptes to atomic_long_tKirill A. Shutemov2013-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With split page table lock for PMD level we can't hold mm->page_table_lock while updating nr_ptes. Let's convert it to atomic_long_t to avoid races. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2013-11-143-4/+46
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Four bugfixes and one performance fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Avoid integer overflow sched: Optimize task_sched_runtime() sched/numa: Cure update_numa_stats() vs. hotplug sched/numa: Fix NULL pointer dereference in task_numa_migrate() sched: Fix endless sync_sched/rcu() loop inside _cpu_down()
| * | | | | | | | sched/fair: Avoid integer overflowMichal Nazarewicz2013-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sa->runnable_avg_sum is of type u32 but after shifting it by NICE_0_SHIFT bits it is promoted to u64. This of course makes no sense, since the result will never be more then 32-bit long. Casting sa->runnable_avg_sum to u64 before it is shifted, fixes this problem. Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1384112521-25177-1-git-send-email-mpn@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | sched: Optimize task_sched_runtime()Peter Zijlstra2013-11-131-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Large multi-threaded apps like to hit this using do_sys_times() and then queue up on the rq->lock. Avoid when possible. Larry reported ~20% performance increase his test case. Reported-by: Larry Woodman <lwoodman@redhat.com> Suggested-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131111172925.GG26898@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | sched/numa: Cure update_numa_stats() vs. hotplugPeter Zijlstra2013-11-131-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we're completely unserialized against hotplug its well possible to try and generate numa stats for an offlined node. Bail out early (and avoid a /0) in this case. The resulting stats are all 0 which should result in an undesirable balance target -- not to mention that actually trying to migrate to an offline CPU will fail. Reported-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/n/tip-orja0qylcvyhxfsuebcyL5sI@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | sched/numa: Fix NULL pointer dereference in task_numa_migrate()Rik van Riel2013-11-131-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cpusets code can split up the scheduler's domain tree into smaller domains. Some of those smaller domains may not cross NUMA nodes at all, leading to a NULL pointer dereference on the per-cpu sd_numa pointer. Tasks cannot be migrated out of their domain, so the patch also sets p->numa_preferred_nid to whereever they are, to prevent the migration from being retried over and over again. Reported-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/n/tip-oosqomw0Jput0Jkvoowhrqtu@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | sched: Fix endless sync_sched/rcu() loop inside _cpu_down()Michael wang2013-11-131-1/+4
| | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6acce3ef8: sched: Remove get_online_cpus() usage tries to do sync_sched/rcu() inside _cpu_down() but triggers: INFO: task swapper/0:1 blocked for more than 120 seconds. ... [<ffffffff811263dc>] synchronize_rcu+0x2c/0x30 [<ffffffff81d1bd82>] _cpu_down+0x2b2/0x340 ... It was caused by that in the rcu boost case we rely on smpboot thread to finish the rcu callback, which has already been parked before sync in here and leads to the endless sync_sched/rcu(). This patch exchanges the sequence of smpboot_park_threads() and sync_sched/rcu() to fix the bug. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | Merge branch 'core-locking-for-linus' of ↵Linus Torvalds2013-11-1428-32/+1107
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...