summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* sock: remove skb argument from sk_rcvqueues_fullSorin Dumitru2014-07-231-3/+2
| | | | | | | It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits). Signed-off-by: Sorin Dumitru <sorin@returnze.ro> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Use low memory profile on kdump kernelAmir Vadai2014-07-221-0/+7
| | | | | | | | | | When running in kdump kernel, reduce number of resources allocated for the hardware. This will enable the NIC to operate in this low memory environment at the expense of performance and some features not related to the basic NIC functionality. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skbuff: Use ALIGN macro instead of open coding itTobias Klauser2014-07-221-2/+1
| | | | | | | | Use ALIGN from linux/kernel.h to define SKB_DATA_ALIGN instead of open coding it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAYDavid Laight2014-07-221-1/+1
| | | | | | | | | | MSG_MORE and 'corking' a socket would require that the transmit of a data chunk be delayed. Rename the return value to be less specific. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-07-228-65/+66
|\ | | | | | | | | | | | | | | | | Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-07-212-3/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Null termination fix in dns_resolver got the pointer dereferncing wrong, fix from Ben Hutchings. 2) ip_options_compile() has a benign but real buffer overflow when parsing options. From Eric Dumazet. 3) Table updates can crash in netfilter's nftables if none of the state flags indicate an actual change, from Pablo Neira Ayuso. 4) Fix race in nf_tables dumping, also from Pablo. 5) GRE-GRO support broke the forwarding path because the segmentation state was not fully initialized in these paths, from Jerry Chu. 6) sunvnet driver leaks objects and potentially crashes on module unload, from Sowmini Varadhan. 7) We can accidently generate the same handle for several u32 classifier filters, fix from Cong Wang. 8) Several edge case bug fixes in fragment handling in xen-netback, from Zoltan Kiss. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) ipv4: fix buffer overflow in ip_options_compile() batman-adv: fix TT VLAN inconsistency on VLAN re-add batman-adv: drop QinQ claim frames in bridge loop avoidance dns_resolver: Null-terminate the right string xen-netback: Fix pointer incrementation to avoid incorrect logging xen-netback: Fix releasing header slot on error path xen-netback: Fix releasing frag_list skbs in error path xen-netback: Fix handling frag_list on grant op error path net_sched: avoid generating same handle for u32 filters net: huawei_cdc_ncm: add "subclass 3" devices net: qmi_wwan: add two Sierra Wireless/Netgear devices wan/x25_asy: integer overflow in x25_asy_change_mtu() net: ppp: fix creating PPP pass and active filters net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's sunvnet: clean up objects created in vnet_new() on vnet_exit() r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40 net-gre-gro: Fix a bug that breaks the forwarding path netfilter: nf_tables: 64bit stats need some extra synchronization netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale netfilter: nf_tables: safe RCU iteration on list when dumping ...
| | * Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2014-07-162-3/+5
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/nf_tables fixes The following patchset contains nf_tables fixes, they are: 1) Fix wrong transaction handling when the table flags are not modified. 2) Fix missing rcu read_lock section in the netlink dump path, which is not protected by the nfnl_lock. 3) Set NLM_F_DUMP_INTR in the netlink dump path to indicate interferences with updates. 4) Fix 64 bits chain counters when they are retrieved from a 32 bits arch, from Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * netfilter: nf_tables: 64bit stats need some extra synchronizationEric Dumazet2014-07-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use generic u64_stats_sync infrastructure to get proper 64bit stats, even on 32bit arches, at no extra cost for 64bit arches. Without this fix, 32bit arches can have some wrong counters at the time the carry is propagated into upper word. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stalePablo Neira Ayuso2014-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An updater may interfer with the dumping of any of the object lists. Fix this by using a per-net generation counter and use the nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set to notify userspace that it has to restart the dump since an updater has interfered. This patch also replaces the existing consistency checking code in the rule dumping path since it is broken. Basically, the value that the dump callback returns is not propagated to userspace via netlink_dump_start(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2014-07-194-24/+49
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "The locking department delivers: - A rather large and intrusive bundle of fixes to address serious performance regressions introduced by the new rwsem / mcs technology. Simpler solutions have been discussed, but they would have been ugly bandaids with more risk than doing the right thing. - Make the rwsem spin on owner technology opt-in for architectures and enable it only on the known to work ones. - A few fixes to the lockdep userspace library" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER locking/mutex: Disable optimistic spinning on some architectures locking/rwsem: Reduce the size of struct rw_semaphore locking/rwsem: Rename 'activity' to 'count' locking/spinlocks/mcs: Micro-optimize osq_unlock() locking/spinlocks/mcs: Introduce and use init macro and function for osq locks locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node() locking/rwsem: Allow conservative optimistic spinning when readers have lock tools/liblockdep: Account for bitfield changes in lockdeps lock_acquire tools/liblockdep: Remove debug print left over from development tools/liblockdep: Fix comparison of a boolean value with a value of 2
| | * | | locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNERDavidlohr Bueso2014-07-161-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER), encapsulate the dependencies for rwsem optimistic spinning. No logical changes here as it continues to depend on both SMP and the XADD algorithm variant. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Jason Low <jason.low2@hp.com> [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com Cc: aswin@hp.com Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Josef Bacik <jbacik@fusionio.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | locking/rwsem: Reduce the size of struct rw_semaphoreJason Low2014-07-161-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent optimistic spinning additions to rwsem provide significant performance benefits on many workloads on large machines. The cost of it was increasing the size of the rwsem structure by up to 128 bits. However, now that the previous patches in this series bring the overhead of struct optimistic_spin_queue to 32 bits, this patch reorders some fields in struct rw_semaphore such that we can reduce the overhead of the rwsem structure by 64 bits (on 64 bit systems). The extra overhead required for rwsem optimistic spinning would now be up to 8 additional bytes instead of up to 16 bytes. Additionally, the size of rwsem would now be more in line with mutexes. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1405358872-3732-6-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | locking/rwsem: Rename 'activity' to 'count'Peter Zijlstra2014-07-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two definitions of struct rw_semaphore, one in linux/rwsem.h and one in linux/rwsem-spinlock.h. For some reason they have different names for the initial field. This makes it impossible to use C99 named initialization for __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing along with the structure definitions. The simpler patch is renaming the rwsem-spinlock variant to match the regular rwsem. This allows us to switch to C99 named initialization. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | locking/spinlocks/mcs: Introduce and use init macro and function for osq locksJason Low2014-07-162-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we initialize the osq lock by directly setting the lock's values. It would be preferable if we use an init macro to do the initialization like we do with other locks. This patch introduces and uses a macro and function for initializing the osq lock. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overheadJason Low2014-07-163-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chris Mason <clm@fb.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()Jason Low2014-07-162-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the per-cpu nodes structure for the cancellable MCS spinlock is named "optimistic_spin_queue". However, in a follow up patch in the series we will be introducing a new structure that serves as the new "handle" for the lock. It would make more sense if that structure is named "optimistic_spin_queue". Additionally, since the current use of the "optimistic_spin_queue" structure are "nodes", it might be better if we rename them to "node" anyway. This preparatory patch renames all current "optimistic_spin_queue" to "optimistic_spin_node". Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chris Mason <clm@fb.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2014-07-191-36/+10
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fixes from Thomas Gleixner: "Two RCU patches: - Address a serious performance regression on open/close caused by commit ac1bea85781e ("Make cond_resched() report RCU quiescent states") - Export RCU debug functions. Not a regression, but enablement to address a serious recursion bug in the sl*b allocators in 3.17" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Reduce overhead of cond_resched() checks for RCU rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
| | * \ \ \ Merge branch 'urgent.2014.06.23a' of ↵Ingo Molnar2014-06-251-36/+10
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fixes from Paul E. McKenney: " This series includes the following: 1. Export a pair of debug-object interfaces for RCU that will allow the slab allocators to avoid a recursion bug located by Sasha Levin. Strictly speaking, this is not a regression, but it would be good to enable the fix. 2. Address a serious performance regression on an open/close micro-benchmark located by Dave Hansen. The offending commit is ac1bea85781e (Make cond_resched() report RCU quiescent states). " Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | | * | | | rcu: Reduce overhead of cond_resched() checks for RCUPaul E. McKenney2014-06-231-36/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * | | | rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()Paul E. McKenney2014-06-231-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, call_rcu() relies on implicit allocation and initialization for the debug-objects handling of RCU callbacks. If you hammer the kernel hard enough with Sasha's modified version of trinity, you can end up with the sl*b allocators recursing into themselves via this implicit call_rcu() allocation. This commit therefore exports the debug_init_rcu_head() and debug_rcu_head_free() functions, which permits the allocators to allocated and pre-initialize the debug-objects information, so that there no longer any need for call_rcu() to do that initialization, which in turn prevents the recursion into the memory allocators. Reported-by: Sasha Levin <sasha.levin@oracle.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Looks-good-to: Christoph Lameter <cl@linux.com>
| * | | | | | Merge tag 'pm+acpi-3.16-rc6' of ↵Linus Torvalds2014-07-181-2/+2
| |\ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are a few recent regression fixes, a revert of the ACPI video commit I promised, a system resume fix related to request_firmware(), an ACPI video quirk for one more Win8-oriented BIOS, an ACPI device enumeration documentation update and a few fixes for ARM cpufreq drivers. Specifics: - Fix for a recently introduced NULL pointer dereference in the core system suspend code occuring when platforms without ACPI attempt to use the "freeze" sleep state from Zhang Rui. - Fix for a recently introduced build warning in cpufreq headers from Brian W Hart. - Fix for a 3.13 cpufreq regression related to sysem resume that triggers on some systems with multiple CPU clusters from Viresh Kumar. - Fix for a 3.4 regression in request_firmware() resulting in WARN_ON()s on some systems during system resume from Takashi Iwai. - Revert of the ACPI video commit that changed the default value of the video.brightness_switch_enabled command line argument to 0 as it has been reported to break existing setups. - ACPI device enumeration documentation update to take recent code changes into account and make the documentation match the code again from Darren Hart. - Fixes for the sa1110, imx6q, kirkwood, and cpu0 cpufreq drivers from Linus Walleij, Nicolas Del Piano, Quentin Armitage, Viresh Kumar. - New ACPI video blacklist entry for HP ProBook 4540s from Hans de Goede" * tag 'pm+acpi-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: make table sentinel macros unsigned to match use cpufreq: move policy kobj to policy->cpu at resume cpufreq: cpu0: OPPs can be populated at runtime cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD cpufreq: imx6q: Select PM_OPP cpufreq: sa1110: set memory type for h3600 ACPI / video: Add use_native_backlight quirk for HP ProBook 4540s PM / sleep: fix freeze_ops NULL pointer dereferences PM / sleep: Fix request_firmware() error at resume Revert "ACPI / video: change acpi-video brightness_switch_enabled default to 0" ACPI / documentation: Remove reference to acpi_platform_device_ids from enumeration.txt
| | * | | | | cpufreq: make table sentinel macros unsigned to match useBrian W Hart2014-07-181-2/+2
| | | |_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*) moved function cpufreq_next_valid() to a public header. Warnings are now generated when objects including that header are built with -Wsign-compare (as an out-of-tree module might be): .../include/linux/cpufreq.h: In function ‘cpufreq_next_valid’: .../include/linux/cpufreq.h:519:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] while ((*pos)->frequency != CPUFREQ_TABLE_END) ^ .../include/linux/cpufreq.h:520:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID) ^ Constants CPUFREQ_ENTRY_INVALID and CPUFREQ_TABLE_END are signed, but are used with unsigned member 'frequency' of cpufreq_frequency_table. Update the macro definitions to be explicitly unsigned to match their use. This also corrects potentially wrong behavior of clk_rate_table_iter() if unsigned long is wider than usigned int. Fixes: 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*) Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2014-07-209-152/+69
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | netfilter: nft_log: complete logging supportPablo Neira Ayuso2014-06-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the unified nf_log_packet() interface that allows us explicit logger selection through the nf_loginfo structure. If you specify the group attribute, this means you want to receive logging messages through nfnetlink_log. In that case, the snaplen and qthreshold attributes allows you to tune internal aspects of the netlink logging infrastructure. On the other hand, if the level is specified, then the plain text format through the kernel logging ring is used instead, which is also used by default if neither group nor level are indicated. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | netfilter: bridge: add generic packet loggerPablo Neira Ayuso2014-06-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the generic plain text packet loggger for bridged packets. It routes the logging message to the real protocol packet logger. I decided not to refactor the ebt_log code for two reasons: 1) The ebt_log output is not consistent with the IPv4 and IPv6 Netfilter packet loggers. The output is different for no good reason and it adds redundant code to handle packet logging. 2) To avoid breaking backward compatibility for applications outthere that are parsing the specific ebt_log output, the ebt_log output has been left as is. So only nftables will use the new consistent logging format for logged bridged packets. More decisions coming in this patch: 1) This also removes ebt_log as default logger for bridged packets. Thus, nf_log_packet() routes packet to this new packet logger instead. This doesn't break backward compatibility since nf_log_packet() is not used to log packets in plain text format from anywhere in the ebtables/netfilter bridge code. 2) The new bridge packet logger also performs a lazy request to register the real IPv4, ARP and IPv6 netfilter packet loggers. If the real protocol logger is no available (not compiled or the module is not available in the system, not packet logging happens. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | netfilter: log: nf_log_packet() as real unified interfacePablo Neira Ayuso2014-06-271-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, the nf_loginfo parameter specified the logging configuration in case the specified default logger was loaded. This patch updates the semantics of the nf_loginfo parameter in nf_log_packet() which now indicates the logger that you explicitly want to use. Thus, nf_log_packet() is exposed as an unified interface which internally routes the log message to the corresponding logger type by family. The module dependencies are expressed by the new nf_logger_find_get() and nf_logger_put() functions which bump the logger module refcount. Thus, you can not remove logger modules that are used by rules anymore. Another important effect of this change is that the family specific module is only loaded when required. Therefore, xt_LOG and nft_log will just trigger the autoload of the nf_log_{ip,ip6} modules according to the family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c filesPablo Neira Ayuso2014-06-271-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The plain text logging is currently embedded into the xt_LOG target. In order to be able to use the plain text logging from nft_log, as a first step, this patch moves the family specific code to the following files and Kconfig symbols: 1) net/ipv4/netfilter/nf_log_ip.c: CONFIG_NF_LOG_IPV4 2) net/ipv6/netfilter/nf_log_ip6.c: CONFIG_NF_LOG_IPV6 3) net/netfilter/nf_log_common.c: CONFIG_NF_LOG_COMMON These new modules will be required by xt_LOG and nft_log. This patch is based on original patch from Arturo Borrero Gonzalez. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | netfilter: nf_log: move log buffering to core loggingPablo Neira Ayuso2014-06-252-54/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves Eric Dumazet's log buffer implementation from the xt_log.h header file to the core net/netfilter/nf_log.c. This also includes the renaming of the structure and functions to avoid possible undesired namespace clashes. This change allows us to use it from the arp and bridge packet logging implementation in follow up patches.
| * | | | | | netfilter: nf_log: use an array of loggers instead of listPablo Neira Ayuso2014-06-251-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that legacy ulog targets are not available anymore in the tree, we can have up to two possible loggers: 1) The plain text logging via kernel logging ring. 2) The nfnetlink_log infrastructure which delivers log messages to userspace. This patch replaces the list of loggers by an array of two pointers per family for each possible logger and it also introduces a new field to the nf_logger structure which indicates the position in the logger array (based on the logger type). This prepares a follow up patch that consolidates the nf_log_packet() interface by allowing to specify the logger as parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | netfilter: kill ulog targetsPablo Neira Ayuso2014-06-254-89/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has been marked as deprecated for quite some time and the NFLOG target replacement has been also available since 2006. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | netfilter: conntrack: remove timer from ecache extensionFlorian Westphal2014-06-252-3/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This brings the (per-conntrack) ecache extension back to 24 bytes in size (was 152 byte on x86_64 with lockdep on). When event delivery fails, re-delivery is attempted via work queue. Redelivery is attempted at least every 0.1 seconds, but can happen more frequently if userspace is not congested. The nf_ct_release_dying_list() function is removed. With this patch, ownership of the to-be-redelivered conntracks (on-dying-list-with-DYING-bit not yet set) is with the work queue, which will release the references once event is out. Joint work with Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | | | net: print net_device reg_state in netdev_* unless it's registeredVeaceslav Falico2014-07-201-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This way we'll always know in what status the device is, unless it's running normally (i.e. NETDEV_REGISTERED). Also, emit a warning once in case of a bad reg_state. CC: "David S. Miller" <davem@davemloft.net> CC: Jason Baron <jbaron@akamai.com> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Joe Perches <joe@perches.com> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: use dev->name in netdev_pr* when it's availableVeaceslav Falico2014-07-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netdev_name() returns dev->name only when the net_device is in NETREG_REGISTERED state. However, dev->name is always populated on creation, so we can easily use it. There are two cases when there's no real name - when it's an empty string or when the name is in form of "eth%d", then netdev_name() returns "unnamed net_device". CC: "David S. Miller" <davem@davemloft.net> CC: Tom Gundersen <teg@jklm.no> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Update setapp/getapp prototypes in dcbnl_rtnl_ops to return int instead of u8Anish Bhatt2014-07-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: fixed issue with checking return of dcbnl_rtnl_ops->getapp() Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: clean up some sparse endianness warnings in ipv6.hJeff Layton2014-07-161-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sparse is throwing warnings when building sunrpc modules due to some endianness shenanigans in ipv6.h. Specifically: CHECK net/sunrpc/addr.c include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer Sprinkle some endianness fixups to silence them. These should all get fixed up at compile time, so I don't think this will add any extra work to be done at runtime. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.David Held2014-07-161-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many multicast sources can have the same port which can result in a very large list when hashing by port only. Hash by address and port instead if this is the case. This makes multicast more similar to unicast. On a 24-core machine receiving from 500 multicast sockets on the same port, before this patch 80% of system CPU was used up by spin locking and only ~25% of packets were successfully delivered. With this patch, all packets are delivered and kernel overhead is ~8% system CPU on spinlocks. Signed-off-by: David Held <drheld@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: sctp: implement rfc6458, 8.1.31. SCTP_DEFAULT_SNDINFO supportGeir Ola Vaagland2014-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements section 8.1.31. of RFC6458, which adds support for setting/retrieving SCTP_DEFAULT_SNDINFO: Applications that wish to use the sendto() system call may wish to specify a default set of parameters that would normally be supplied through the inclusion of ancillary data. This socket option allows such an application to set the default sctp_sndinfo structure. The application that wishes to use this socket option simply passes the sctp_sndinfo structure (defined in Section 5.3.4) to this call. The input parameters accepted by this call include snd_sid, snd_flags, snd_ppid, and snd_context. The snd_flags parameter is composed of a bitwise OR of SCTP_UNORDERED, SCTP_EOF, and SCTP_SENDALL. The snd_assoc_id field specifies the association to which to apply the parameters. For a one-to-many style socket, any of the predefined constants are also allowed in this field. The field is ignored for one-to-one style sockets. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg supportGeir Ola Vaagland2014-07-164-19/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements section 5.3.6. of RFC6458, that is, support for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which is placed into ancillary data cmsghdr structure for each recvmsg() call, if this information is already available when delivering the current message. This option can be enabled/disabled via setsockopt(2) on SOL_SCTP level by setting an int value with 1/0 for SCTP_RECVNXTINFO in user space applications as per RFC6458, section 8.1.30. The sctp_nxtinfo structure is defined as per RFC as below ... struct sctp_nxtinfo { uint16_t nxt_sid; uint16_t nxt_flags; uint32_t nxt_ppid; uint32_t nxt_length; sctp_assoc_t nxt_assoc_id; }; ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg supportGeir Ola Vaagland2014-07-163-8/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements section 5.3.5. of RFC6458, that is, support for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is placed into ancillary data cmsghdr structure for each recvmsg() call. This option can be enabled/disabled via setsockopt(2) on SOL_SCTP level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user space applications as per RFC6458, section 8.1.29. The sctp_rcvinfo structure is defined as per RFC as below ... struct sctp_rcvinfo { uint16_t rcv_sid; uint16_t rcv_ssn; uint16_t rcv_flags; <-- 2 bytes hole --> uint32_t rcv_ppid; uint32_t rcv_tsn; uint32_t rcv_cumtsn; uint32_t rcv_context; sctp_assoc_t rcv_assoc_id; }; ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo. An sctp_rcvinfo item always corresponds to the data in msg_iov. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg supportGeir Ola Vaagland2014-07-162-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements section 5.3.4. of RFC6458, that is, support for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be placed into ancillary data cmsghdr structure for sendmsg() calls. The sctp_sndinfo structure is defined as per RFC as below ... struct sctp_sndinfo { uint16_t snd_sid; uint16_t snd_flags; uint32_t snd_ppid; uint32_t snd_context; sctp_assoc_t snd_assoc_id; }; ... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo. An sctp_sndinfo item always corresponds to the data in msg_iov. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-07-1626-55/+91
|\ \ \ \ \ \ \ | | |/ / / / / | |/| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2014-07-161-4/+4
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A cpufreq lockup fix and a compiler warning fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix compiler warnings x86, tsc: Fix cpufreq lockup
| | * | | | | | sched: Fix compiler warningsGuenter Roeck2014-07-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 143e1e28cb (sched: Rework sched_domain topology definition) introduced a number of functions with a return value of 'const int'. gcc doesn't know what to do with that and, if the kernel is compiled with W=1, complains with the following warnings whenever sched.h is included. include/linux/sched.h:875:25: warning: type qualifiers ignored on function return type include/linux/sched.h:882:25: warning: type qualifiers ignored on function return type include/linux/sched.h:889:25: warning: type qualifiers ignored on function return type include/linux/sched.h:1002:21: warning: type qualifiers ignored on function return type Commits fb2aa855 (sched, ARM: Create a dedicated scheduler topology table) and 607b45e9a (sched, powerpc: Create a dedicated topology table) introduce the same warning in the arm and powerpc code. Drop 'const' from the function declarations to fix the problem. The fix for all three patches has to be applied together to avoid compilation failures for the affected architectures. Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1403658329-13196-1-git-send-email-linux@roeck-us.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-07-155-18/+9
| |\ \ \ \ \ \ \ | | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Bluetooth pairing fixes from Johan Hedberg. 2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB, from Max Stepanov. 3) New iwlwifi chip IDs, from Oren Givon. 4) bnx2x driver reads wrong PCI config space MSI register, from Yijing Wang. 5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu. 6) Fix double SKB free in openvswitch, from Andy Zhou. 7) Fix sk_dst_set() being racey with UDP sockets, leading to strange crashes, from Eric Dumazet. 8) Interpret the NAPI budget correctly in the new systemport driver, from Florian Fainelli. 9) VLAN code frees percpu stats in the wrong place, leading to crashes in the get stats handler. From Eric Dumazet. 10) TCP sockets doing a repair can crash with a divide by zero, because we invoke tcp_push() with an MSS value of zero. Just skip that part of the sendmsg paths in repair mode. From Christoph Paasch. 11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai. 12) Don't ignore path MTU icmp messages with a zero mtu, machines out there still spit them out, and all of our per-protocol handlers for PMTU can cope with it just fine. From Edward Allcutt. 13) Some NETDEV_CHANGE notifier invocations were not passing in the correct kind of cookie as the argument, from Loic Prylli. 14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul Maloy. 15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix from Dmitry Popov. 16) Fix skb->sk assigned without taking a reference to 'sk' in appletalk, from Andrey Utkin. 17) Fix some info leaks in ULP event signalling to userspace in SCTP, from Daniel Borkmann. 18) Fix deadlocks in HSO driver, from Olivier Sobrie. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) hso: fix deadlock when receiving bursts of data hso: remove unused workqueue net: ppp: don't call sk_chk_filter twice mlx4: mark napi id for gro_skb bonding: fix ad_select module param check net: pppoe: use correct channel MTU when using Multilink PPP neigh: sysctl - simplify address calculation of gc_* variables net: sctp: fix information leaks in ulpevent layer MAINTAINERS: update r8169 maintainer net: bcmgenet: fix RGMII_MODE_EN bit tipc: clear 'next'-pointer of message fragments before reassembly r8152: fix r8152_csum_workaround function be2net: set EQ DB clear-intr bit in be_open() GRE: enable offloads for GRE farsync: fix invalid memory accesses in fst_add_one() and fst_init_card() igb: do a reset on SR-IOV re-init if device is down igb: Workaround for i210 Errata 25: Slow System Clock usbnet: smsc95xx: add reset_resume function with reset operation dp83640: Always decode received status frames r8169: disable L23 ...
| | * | | | | | neigh: sysctl - simplify address calculation of gc_* variablesMathias Krause2014-07-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in neigh_sysctl_register() relies on a specific layout of struct neigh_table, namely that the 'gc_*' variables are directly following the 'parms' member in a specific order. The code, though, expresses this in the most ugly way. Get rid of the ugly casts and use the 'tbl' pointer to get a handle to the table. This way we can refer to the 'gc_*' variables directly. Similarly seen in the grsecurity patch, written by Brad Spengler. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | ieee802154: reassembly: fix possible buffer overflowAlexander Aring2014-07-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The max_dsize attribute in ctl_table for lowpan_frags_ns_ctl_table is configured with integer accessing methods. This patch change the max_dsize attribute to int to avoid a possible buffer overflow. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ ↵Amir Vadai2014-07-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Hutchings <ben@decadent.org.uk> Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | net: fix sparse warning in sk_dst_set()Eric Dumazet2014-07-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk_dst_cache has __rcu annotation, so we need a cast to avoid following sparse error : include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces) include/net/sock.h:1774:19: expected struct dst_entry [noderef] <asn:4>*__ret include/net/sock.h:1774:19: got struct dst_entry *dst Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 7f502361531e ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix") Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | net: fix circular dependency in of_mdio codeDaniel Mack2014-07-021-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 86f6cf4127 (net: of_mdio: add of_mdiobus_link_phydev()) introduced a circular dependency between libphy and of_mdio. depmod: ERROR: <modroot>/kernel/drivers/net/phy/libphy.ko in dependency cycle! depmod: ERROR: <modroot>/kernel/drivers/of/of_mdio.ko in dependency cycle! The problem is that of_mdio.c references &mdio_bus_type and libphy now references of_mdiobus_link_phydev. Fix this by not exporting of_mdiobus_link_phydev() from of_mdio.ko. Make it a static function in mdio_bus.c instead. Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-by: Jeff Mahoney <jeffm@suse.com> Fixes: 86f6cf4127 (net: of_mdio: add of_mdiobus_link_phydev()) Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fixEric Dumazet2014-06-301-6/+6
| | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have two different ways to handle changes to sk->sk_dst First way (used by TCP) assumes socket lock is owned by caller, and use no extra lock : __sk_dst_set() & __sk_dst_reset() Another way (used by UDP) uses sk_dst_lock because socket lock is not always taken. Note that sk_dst_lock is not softirq safe. These ways are not inter changeable for a given socket type. ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used the socket lock as synchronization, but users might be UDP sockets. Instead of converting sk_dst_lock to a softirq safe version, use xchg() as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use sk_dst_lock from softirq context") In a follow up patch, we probably can remove sk_dst_lock, as it is only used in IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible") Signed-off-by: David S. Miller <davem@davemloft.net>