summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2014-06-123-3/+209
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more locking changes from Ingo Molnar: "This is the second round of locking tree updates for v3.16, offering large system scalability improvements: - optimistic spinning for rwsems, from Davidlohr Bueso. - 'qrwlocks' core code and x86 enablement, from Waiman Long and PeterZ" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, locking/rwlocks: Enable qrwlocks on x86 locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks locking/mutexes: Documentation update/rewrite locking/rwsem: Fix checkpatch.pl warnings locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCK locking/rwsem: Support optimistic spinning
| * locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocksWaiman Long2014-06-062-0/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This rwlock uses the arch_spin_lock_t as a waitqueue, and assuming the arch_spin_lock_t is a fair lock (ticket,mcs etc..) the resulting rwlock is a fair lock. It fits in the same 8 bytes as the regular rwlock_t by folding the reader and writer count into a single integer, using the remaining 4 bytes for the arch_spinlock_t. Architectures that can single-copy adress bytes can optimize queue_write_unlock() with a 0 write to the LSB (the write count). Performance as measured by Davidlohr Bueso (rwlock_t -> qrwlock_t): +--------------+-------------+---------------+ | Workload | #users | delta | +--------------+-------------+---------------+ | alltests | > 1400 | -4.83% | | custom | 0-100,> 100 | +1.43%,-1.57% | | high_systime | > 1000 | -2.61 | | shared | all | +0.32 | +--------------+-------------+---------------+ http://www.stgolabs.net/qrwlock-stuff/aim7-results-vs-rwsem_optsin/ Signed-off-by: Waiman Long <Waiman.Long@hp.com> [peterz: near complete rewrite] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/n/tip-gac1nnl3wvs2ij87zv2xkdzq@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCKDavidlohr Bueso2014-06-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimistic spinning is only used by the xadd variant of rw-semaphores. Make sure that we use the old version of the __RWSEM_INITIALIZER macro for systems that rely on the spinlock one, otherwise warnings can be triggered, such as the following reported on an arm box: ipc/ipcns_notifier.c:22:8: warning: excess elements in struct initializer [enabled by default] ipc/ipcns_notifier.c:22:8: warning: (near initialization for 'ipcns_chain.rwsem') [enabled by default] ipc/ipcns_notifier.c:22:8: warning: excess elements in struct initializer [enabled by default] ipc/ipcns_notifier.c:22:8: warning: (near initialization for 'ipcns_chain.rwsem') [enabled by default] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jason Low <jason.low2@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1400545677.6399.10.camel@buesod1.americas.hpqcorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * locking/rwsem: Support optimistic spinningDavidlohr Bueso2014-06-051-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have reached the point where our mutexes are quite fine tuned for a number of situations. This includes the use of heuristics and optimistic spinning, based on MCS locking techniques. Exclusive ownership of read-write semaphores are, conceptually, just about the same as mutexes, making them close cousins. To this end we need to make them both perform similarly, and right now, rwsems are simply not up to it. This was discovered by both reverting commit 4fc3f1d6 (mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable) and similarly, converting some other mutexes (ie: i_mmap_mutex) to rwsems. This creates a situation where users have to choose between a rwsem and mutex taking into account this important performance difference. Specifically, biggest difference between both locks is when we fail to acquire a mutex in the fastpath, optimistic spinning comes in to play and we can avoid a large amount of unnecessary sleeping and overhead of moving tasks in and out of wait queue. Rwsems do not have such logic. This patch, based on the work from Tim Chen and I, adds support for write-side optimistic spinning when the lock is contended. It also includes support for the recently added cancelable MCS locking for adaptive spinning. Note that is is only applicable to the xadd method, and the spinlock rwsem variant remains intact. Allowing optimistic spinning before putting the writer on the wait queue reduces wait queue contention and provided greater chance for the rwsem to get acquired. With these changes, rwsem is on par with mutex. The performance benefits can be seen on a number of workloads. For instance, on a 8 socket, 80 core 64bit Westmere box, aim7 shows the following improvements in throughput: +--------------+---------------------+-----------------+ | Workload | throughput-increase | number of users | +--------------+---------------------+-----------------+ | alltests | 20% | >1000 | | custom | 27%, 60% | 10-100, >1000 | | high_systime | 36%, 30% | >100, >1000 | | shared | 58%, 29% | 10-100, >1000 | +--------------+---------------------+-----------------+ There was also improvement on smaller systems, such as a quad-core x86-64 laptop running a 30Gb PostgreSQL (pgbench) workload for up to +60% in throughput for over 50 clients. Additionally, benefits were also noticed in exim (mail server) workloads. Furthermore, no performance regression have been seen at all. Based-on-work-from: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> [peterz: rej fixup due to comment patches, sched/rt.h header] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Scott J Norton" <scott.norton@hp.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fusionio.com> Link: http://lkml.kernel.org/r/1399055055.6275.15.camel@buesod1.americas.hpqcorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-06-12108-540/+2197
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
| * | net_sched: drr: warn when qdisc is not work conservingFlorian Westphal2014-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DRR scheduler requires that items on the active list are work conserving, i.e. do not hold on to skbs for throttling purposes, etc. Attaching e.g. tbf renders DRR useless because all other classes on the active list are delayed as well. So, warn users that this configuration won't work as expected; we already do this in couple of other qdiscs, see e.g. commit b00355db3f88d96810a60011a30cfb2c3469409d ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler') The 'const' change is needed to avoid compiler warning ("discards 'const' qualifier from pointer target type"). tested with: drr_hier() { parent=$1 classes=$2 for i in $(seq 1 $classes); do classid=$parent$(printf %x $i) tc class add dev eth0 parent $parent classid $classid drr tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit done } tc qdisc add dev eth0 root handle 1: drr drr_hier 1: 32 tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Save software checksum completeTom Herbert2014-06-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In skb_checksum complete, if we need to compute the checksum for the packet (via skb_checksum) save the result as CHECKSUM_COMPLETE. Subsequent checksum verification can use this. Also, added csum_complete_sw flag to distinguish between software and hardware generated checksum complete, we should always be able to trust the software computation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Preserve CHECKSUM_COMPLETE at validationTom Herbert2014-06-111-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when the first checksum in a packet is validated using CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY so that any subsequent checksums in the packet are not correctly validated. This patch adds csum_valid flag in sk_buff and uses that to indicate validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit is set accordingly in the skb_checksum_validate_* functions. The flag is checked in skb_checksum_complete, so that validation is communicated between checksum_init and checksum_complete sequence in TCP and UDP. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip_vti: fix sparse warnings for VTI_ISVTIDmitry Popov2014-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following sparse warnings: net/ipv4/ip_tunnel.c:245:53: warning: restricted __be16 degrades to integer net/ipv4/ip_vti.c:321:19: warning: incorrect type in assignment (different base types) net/ipv4/ip_vti.c:321:19: expected restricted __be16 [addressable] [assigned] [usertype] i_flags net/ipv4/ip_vti.c:321:19: got int net/ipv4/ip_vti.c:447:24: warning: incorrect type in assignment (different base types) net/ipv4/ip_vti.c:447:24: expected restricted __be16 [usertype] i_flags net/ipv4/ip_vti.c:447:24: got int Since VTI_ISVTI is always used with ip_tunnel_parm->i_flags (which is __be16), we can __force cast VTI_ISVTI to __be16 in header file. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: add __pskb_copy_fclone and pskb_copy_for_cloneOctavian Purdila2014-06-111-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several instances where a pskb_copy or __pskb_copy is immediately followed by an skb_clone. Add a couple of new functions to allow the copy skb to be allocated from the fclone cache and thus speed up subsequent skb_clone calls. Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cpumask: Utility function to set n'th cpu - local cpu firstAmir Vadai2014-06-111-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function sets the n'th cpu - local cpu's first. For example: in a 16 cores server with even cpu's local, will get the following values: cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set ... cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set ... cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set Curently this function will be used by multi queue networking devices to calculate the irq affinity mask, such that as many local cpu's as possible will be utilized to handle the mq device irq's. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sctp: migrate most recently used transport to ktimeDaniel Borkmann2014-06-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Be more precise in transport path selection and use ktime helpers instead of jiffies to compare and pick the better primary and secondary recently used transports. This also avoids any side-effects during a possible roll-over, and could lead to better path decision-making. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ktime: add ktime_after and ktime_before helperDaniel Borkmann2014-06-111-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | Add two minimal helper functions analogous to time_before() and time_after() that will later on both be needed by SCTP code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: filter: cleanup A/X name usageAlexei Starovoitov2014-06-111-72/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The macro 'A' used in internal BPF interpreter: #define A regs[insn->a_reg] was easily confused with the name of classic BPF register 'A', since 'A' would mean two different things depending on context. This patch is trying to clean up the naming and clarify its usage in the following way: - A and X are names of two classic BPF registers - BPF_REG_A denotes internal BPF register R0 used to map classic register A in internal BPF programs generated from classic - BPF_REG_X denotes internal BPF register R7 used to map classic register X in internal BPF programs generated from classic - internal BPF instruction format: struct sock_filter_int { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; - BPF_X/BPF_K is 1 bit used to encode source operand of instruction In classic: BPF_X - means use register X as source operand BPF_K - means use 32-bit immediate as source operand In internal: BPF_X - means use 'src_reg' register as source operand BPF_K - means use 32-bit immediate as source operand Suggested-by: Chema Gonzalez <chema@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: memorize and export selected IGMP/MLD querier portLinus Lüssing2014-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in IGMP/MLD queriers to be able to reliably serve any multicast listener behind this same bridge. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: add export of multicast database adjacent to net_devLinus Lüssing2014-06-101-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this new, exported function br_multicast_list_adjacent(net_dev) a list of IPv4/6 addresses is returned. This list contains all multicast addresses sensed by the bridge multicast snooping feature on all bridge ports of the bridge interface of net_dev, excluding addresses from the specified net_device itself. Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in multicast listeners to be able to reliably serve them with multicast packets. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: add gfp parameter to tcp_fragmentOctavian Purdila2014-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp_fragment can be called from process context (from tso_fragment). Add a new gfp parameter to allow it to preserve atomic memory if possible. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: phy: fix sparse warning in fixed.cKonrad Zapalowicz2014-06-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes the following sparse warning: drivers/net/phy/fixed.c:207 - warning: symbol 'fixed_phy_del' was not declared. Should it be static? by adding symbol definition to the phy_fixed.h API file. It is ok to do because the function in question is an exported symbol. Signed-off-by: Konrad Zapalowicz <bergo.torino@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | isdn/capi: move capi_info2str to capidrv.cPaul Bolle2014-06-041-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | capi_info2str() is apparently meant to be of general utility. It is actually only used in capidrv.c. So move it from capiutil.c to capidrv.c and (obviously) stop exporting it. And, since we're touching this, merge the two versions of this function. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)Tom Herbert2014-06-042-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | gre: Call gso_make_checksumTom Herbert2014-06-043-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Add GSO support for UDP tunnels with checksumTom Herbert2014-06-042-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Support for multiple checksums with gsoTom Herbert2014-06-041-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a GSO packet segment we may need to set more than one checksum in the packet (for instance a TCP checksum and UDP checksum for VXLAN encapsulation). To be efficient, we want to do checksum calculation for any part of the packet at most once. This patch adds csum_start offset to skb_gso_cb. This tracks the starting offset for skb->csum which is initially set in skb_segment. When a protocol needs to compute a transport checksum it calls gso_make_checksum which computes the checksum value from the start of transport header to csum_start and then adds in skb->csum to get the full checksum. skb->csum and csum_start are then updated to reflect the checksum of the resultant packet starting from the transport header. This patch also adds a flag to skbuff, encap_hdr_csum, which is set in *gso_segment fucntions to indicate that a tunnel protocol needs checksum calculation Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | udp: Generic functions to set checksumTom Herbert2014-06-042-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xen-net{back, front}: Document multi-queue feature in netif.hAndrew J. Bennieston2014-06-041-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document the multi-queue feature in terms of XenStore keys to be written by the backend and by the frontend. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-06-039-6/+12
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ \ Merge branch 'ethtool-rssh-fixes' of ↵David S. Miller2014-06-022-25/+24
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next Ben Hutchings says: ==================== Pull request: Fixes for new ethtool RSS commands This addresses several problems I previously identified with the new ETHTOOL_{G,S}RSSH commands: 1. Missing validation of reserved parameters 2. Vague documentation 3. Use of unnamed magic number 4. No consolidation with existing driver operations I don't currently have access to suitable network hardware, but have tested these changes with a dummy driver that can support various combinations of operations and sizes, together with (a) Debian's ethtool 3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH and minor adjustment for fixes 1 and 3. v2: Update RSS operations in vmxnet3 too ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()Ben Hutchings2014-06-031-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers regardless of whether they expose the hash key, unless you try to set a hash key for a driver that doesn't expose it. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| | * | | ethtool, be2net: constify array pointer parameters to ethtool_ops::set_rxfhBen Hutchings2014-05-191-1/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| | * | | ethtool: Disallow ETHTOOL_SRSSH with both indir table and hash key unchangedBen Hutchings2014-05-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This would be a no-op, so there is no reason to request it. This also allows conversion of the current implementations of ethtool_ops::{get,set}_rxfh_indir to ethtool_ops::{get,set}_rxfh with no change other than their parameters. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| | * | | ethtool: Expand documentation of ethtool_ops::{get,set}_rxfh()Ben Hutchings2014-05-191-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some corner-cases are not explained properly. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| | * | | ethtool: Improve explanation of the two arrays following struct ethtool_rxfhBen Hutchings2014-05-191-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of two variable-length arrays is unusual so deserves a bit more explanation. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| | * | | ethtool: Name the 'no change' value for setting RSS hash key but not indir tableBen Hutchings2014-05-191-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We usually allocate special values of u32 fields starting from the top down, so also change the value to 0xffffffff. As these operations haven't been included in a stable release yet, it's not too late to change. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| * | | | bridge: Add bridge ifindex to bridge fdb notify msgsRoopa Prabhu2014-06-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (This patch was previously posted as RFC at http://patchwork.ozlabs.org/patch/352677/) This patch adds NDA_MASTER attribute to neighbour attributes enum for bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs. Today bridge fdb notifications dont contain bridge information. Userspace can derive it from the port information in the fdb notification. However this is tricky in some scenarious. Example, bridge port delete notification comes before bridge fdb delete notifications. And we have seen problems in userspace when using libnl where, the bridge fdb delete notification handling code does not understand which bridge this fdb entry is part of because the bridge and port association has already been deleted. And these notifications (port membership and fdb) are generated on separate rtnl groups. Fixing the order of notifications could possibly solve the problem for some cases (I can submit a separate patch for that). This patch chooses to add NDA_MASTER to bridge fdb notify msgs because it not only solves the problem described above, but also helps userspace avoid another lookup into link msgs to derive the master index. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | macvlan: add netpoll supportdingtianhong2014-06-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add netpoll support to macvlan devices. Based on the netpoll support in the 802.1q vlan code. Tested and macvlan could work well with netconsole. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge branch 'for-davem' of ↵David S. Miller2014-06-026-3/+97
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-06-02 Please pull this remaining batch of updates intended for the 3.16 stream... For the mac80211 bits, Johannes says: "The remainder for -next right now is mostly fixes, and a handful of small new things like some CSA infrastructure, the regdb script mW/dBm conversion change and sending wiphy notifications." For the bluetooth bits, Gustavo says: "Some more patches for 3.16. There is nothing really special here, just a bunch of clean ups, fixes plus some small improvements. Please pull." For the nfc bits, Samuel says: "We have: - Felica (Type3) tags support for trf7970a - Type 4b tags support for port100 - st21nfca DTS typo fix - A few sparse warning fixes" For the atheros bits, Kalle says: "Ben added support for setting antenna configurations. Michal improved warm reset so that we would not need to fall back to cold reset that often, an issue where ath10k stripped protected flag while in monitor mode and made module initialisation asynchronous to fix the problems with firmware loading when the driver is linked to the kernel. Luca removed unused channel_switch_beacon callbacks both from ath9k and ath10k. Marek fixed Protected Management Frames (PMF) when using Action Frames. Also we had other small fixes everywhere in the driver." Along with that, there are a handful of updates to a variety of drivers. This includes updates to at76c50x-usb, ath9k, b43, brcmfmac, mwifiex, rsi, rtlwifi, and wil6210. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * \ \ \ Merge branch 'master' of ↵John W. Linville2014-06-026-3/+97
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
| | | * \ \ \ Merge branch 'for-upstream' of ↵John W. Linville2014-05-294-3/+50
| | | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Conflicts: drivers/bluetooth/btusb.c
| | | | * | | | Bluetooth: Clearly distinguish mgmt LTK type from authenticated propertyJohan Hedberg2014-05-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the mgmt level we have a key type parameter which currently accepts two possible values: 0x00 for unauthenticated and 0x01 for authenticated. However, in the internal struct smp_ltk representation we have an explicit "authenticated" boolean value. To make this distinction clear, add defines for the possible mgmt values and do conversion to and from the internal authenticated value. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | | * | | | Bluetooth: Store max TX power level for connectionAndrzej Kaczmarek2014-05-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to store local maximum TX power level for connection when reply for HCI_Read_Transmit_Power_Level is received. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | | * | | | Bluetooth: Add support to get connection informationAndrzej Kaczmarek2014-05-152-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for Get Connection Information mgmt command which can be used to query for information about connection, i.e. RSSI and local TX power level. In general values cached in hci_conn are returned as long as they are considered valid, i.e. do not exceed age limit set in hdev. This limit is calculated as random value between min/max values to avoid client trying to guess when to poll for updated information. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | | * | | | Bluetooth: Add conn info lifetime parameters to debugfsAndrzej Kaczmarek2014-05-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds conn_info_min_age and conn_info_max_age parameters to debugfs which determine lifetime of connection information. Actual lifetime will be random value between min and max age. Default values for min and max age are 1000ms and 3000ms respectively. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | | * | | | Bluetooth: Store TX power level for connectionAndrzej Kaczmarek2014-05-092-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to store local TX power level for connection when reply for HCI_Read_Transmit_Power_Level is received. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | | * | | | Bluetooth: Store RSSI for connectionAndrzej Kaczmarek2014-05-082-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to store RSSI for connection when reply for HCI_Read_RSSI is received. Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | | * | | | Bluetooth: Convert RFCOMM spinlocks into mutexesLibor Pechacek2014-05-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling CONFIG_DEBUG_ATOMIC_SLEEP has shown that some rfcomm functions acquiring spinlocks call sleeping locks further in the chain. Converting the offending spinlocks into mutexes makes sleeping safe. Signed-off-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | | * | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-nextJohn W. Linville2014-05-292-0/+47
| | | |\ \ \ \ \
| | | | * | | | | wireless: add missing WLAN_EID_BSS_INTOLERANT_CHL_REPORTJes Sorensen2014-05-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | | | * | | | | mac80211: add a single-transaction driver op to switch contextsLuciano Coelho2014-05-261-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, when the driver is already using all the channel contexts it can handle at once, we have to do an in-place switch (ie. we cannot afford using an extra context temporarily for the transaction). But some drivers may not support switching the channel context assigned to a vif on the fly (ie. without unassigning and assigning it) while others may only work if the context is changed on the fly, without unassigning it first. To allow these different scenarios, add a new driver operation that let's the driver decide how to handle an in-place switch. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | | | | | | inetpeer: get rid of ip_id_countEric Dumazet2014-06-024-41/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: Add support for device specific address syncingAlexander Duyck2014-06-021-0/+73
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change provides a function to be used in order to break the ndo_set_rx_mode call into a set of address add and remove calls. The code is based on the implementation of dev_uc_sync/dev_mc_sync. Since they essentially do the same thing but with only one dev I simply named my functions __dev_uc_sync/__dev_mc_sync. I also implemented an unsync version of the functions as well to allow for cleanup on close. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>