summaryrefslogtreecommitdiffstats
path: root/include/linux/skbuff.h
Commit message (Collapse)AuthorAgeFilesLines
* net: remove skb_orphan_try()Eric Dumazet2012-07-161-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 62b1a8ab9b3660bb820d8dfe23148ed6cda38574 ] Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : fc6055a5ba31 net: Introduce skb_orphan_try() 87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* skb: avoid unnecessary reallocations in __skb_cowFelix Fietkau2012-06-101-2/+0
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 617c8c11236716dcbda877e764b7bf37c6fd8063 ] At the beginning of __skb_cow, headroom gets set to a minimum of NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not cloned and the headroom is just below NET_SKB_PAD, but still more than the amount requested by the caller. This was showing up frequently in my tests on VLAN tx, where vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN). Locally generated packets should have enough headroom, and for forward paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to add any extra space here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: fix two typos in skbuff.hEric Dumazet2012-05-011-2/+2
| | | | | | | fix kernel doc typos in function names Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* skbuff: struct ubuf_info callback type safetyMichael S. Tsirkin2012-04-131-3/+4
| | | | | | | | | | | The skb struct ubuf_info callback gets passed struct ubuf_info itself, not the arg value as the field name and the function signature seem to imply. Rename the arg field to ctx to match usage, add documentation and change the callback argument type to make usage clear and to have compiler check correctness. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: avoid order-1 allocations on wifi and tx pathEric Dumazet2012-04-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marc Merlin reported many order-1 allocations failures in TX path on its wireless setup, that dont make any sense with MTU=1500 network, and non SG capable hardware. After investigation, it turns out TCP uses sk_stream_alloc_skb() and used as a convention skb_tailroom(skb) to know how many bytes of data payload could be put in this skb (for non SG capable devices) Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER + sizeof(struct skb_shared_info) being above 2048) Later, mac80211 layer need to add some bytes at the tail of skb (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is available has to call pskb_expand_head() and request order-1 allocations. This patch changes sk_stream_alloc_skb() so that only sk->sk_prot->max_header bytes of headroom are reserved, and use a new skb field, avail_size to hold the data payload limit. This way, order-0 allocations done by TCP stack can leave more than 2 KB of tailroom and no more allocation is performed in mac80211 layer (or any layer needing some tailroom) avail_size is unioned with mark/dropcount, since mark will be set later in IP stack for output packets. Therefore, skb size is unchanged. Reported-by: Marc MERLIN <marc@merlins.org> Tested-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds2012-03-281-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
| * Remove all #inclusions of asm/system.hDavid Howells2012-03-281-1/+0
| | | | | | | | | | | | | | | | | | Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-03-271-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Name string overrun fix in gianfar driver from Joe Perches. 2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El 3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso. 4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso. 5) Add a parameter to skb_add_rx_frag() so we can fix the truesize adjustments in the drivers that use it. The individual drivers aren't fixed by this commit, but will be dealt with using follow-on commits. From Eric Dumazet. 6) Add some device IDs to qmi_wwan driver, from Andrew Bird. 7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: fix a potential rcu_read_lock() imbalance in rt6_fill_node() net: add a truesize parameter to skb_add_rx_frag() gianfar: Fix possible overrun and simplify interrupt name field creation USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces qlcnic: Bug fix for LRO netfilter: nf_conntrack: permanently attach timeout policy to conntrack netfilter: xt_CT: fix assignation of the generic protocol tracker netfilter: xt_CT: missing rcu_read_lock section in timeout assignment netfilter: cttimeout: fix dependency with l4protocol conntrack module netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6 vhost: fix release path lockdep checks vhost: don't forget to schedule() tools/virtio: stub out strong barriers tools/virtio: add linux/hrtimer.h stub tools/virtio: add linux/module.h stub
| * | net: add a truesize parameter to skb_add_rx_frag()Eric Dumazet2012-03-251-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | skb_add_rx_frag() API is misleading. Network skbs built with this helper can use uncharged kernel memory and eventually stress/crash machine in OOM. Add a 'truesize' parameter and then fix drivers in followup patches. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'bug-for-3.4' of ↵Linus Torvalds2012-03-241-0/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/*.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.
| * BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker2012-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* | net: update the usage of CHECKSUM_UNNECESSARYYi Zou2012-03-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by Ben, this adds the clarification on the usage of CHECKSUM_UNNECESSARY on the outgoing patch. Also add the usage description of NETIF_F_FCOE_CRC and CHECKSUM_UNNECESSARY for the kernel FCoE protocol driver. This is a follow-up to the following: http://patchwork.ozlabs.org/patch/147315/ Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: www.Open-FCoE.org <devel@open-fcoe.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Use bool in skbuff.h helper functions.David S. Miller2012-03-091-3/+3
| | | | | | | | | | | | | | In particular do this for skb_is_nonlinear(), skb_is_gso(), and skb_is_gso_v6(). Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-02-261-0/+10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/sfc/rx.c Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change the rx_buf->is_page boolean into a set of u16 flags, and another to adjust how ->ip_summed is initialized. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipsec: be careful of non existing mac headersEric Dumazet2012-02-231-0/+10
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add framework to allow sending packets with customized CRC.Ben Greear2012-02-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | This is useful for testing RX handling of frames with bad CRCs. Requires driver support to actually put the packet on the wire properly. Signed-off-by: Ben Greear <greearb@candelatech.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | skb: Add skb_peek_next helperPavel Emelyanov2012-02-211-0/+18
| | | | | | | | | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | datagram: Add offset argument to __skb_recv_datagramPavel Emelyanov2012-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | This one is only considered for MSG_PEEK flag and the value pointed by it specifies where to start peeking bytes from. If the offset happens to point into the middle of the returned skb, the offset within this skb is put back to this very argument. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | skbuff: Move rxhash and vlan_tci to consolidate holes in sk_buffAlexander Duyck2012-02-101-4/+5
|/ | | | | | | | | | This change helps to reduce the overall size of the sk_buff by moving rxhash and vlan_tci so that the u16 values and u8 bitfields can be better combined to create only one hole instead of multiple. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* net: pack skb_shared_info more efficientlyIan Campbell2012-01-051-3/+3
| | | | | | | | | | | | nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be packed with tx_flags. Also by moving ip6_frag_id and dataref (both 4 bytes) next to each other we can avoid a hole between ip6_frag_id and frag_list on 64 bit systems. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: only use a single page of slop in MAX_SKB_FRAGSIan Campbell2011-12-231-4/+8
| | | | | | | | In order to accommodate a 64K buffer we need 64K/PAGE_SIZE plus one more page in order to allow for a buffer which does not start on a page boundary. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: take care of misalignmentsEric Dumazet2011-12-041-2/+9
| | | | | | | | | | | | | | | | | | | | | We discovered that TCP stack could retransmit misaligned skbs if a malicious peer acknowledged sub MSS frame. This currently can happen only if output interface is non SG enabled : If SG is enabled, tcp builds headless skbs (all payload is included in fragments), so the tcp trimming process only removes parts of skb fragments, header stay aligned. Some arches cant handle misalignments, so force a head reallocation and shrink headroom to MAX_TCP_HEADER. Dont care about misaligments on x86 and PPC (or other arches setting NET_IP_ALIGN to 0) This patch introduces __pskb_copy() which can specify the headroom of new head, and pskb_copy() becomes a wrapper on top of __pskb_copy() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove netdev_alloc_page and use __GFP_COLDEric Dumazet2011-11-221-32/+0
| | | | | | | | | | | | | | | | Given we dont use anymore the struct net_device *dev argument, and this interface brings litle benefit, remove netdev_{alloc|free}_page(), to debloat include/linux/skbuff.h a bit. (Some drivers used a mix of these interfaces and alloc_pages()) When allocating a page given to device for DMA transfer (device to memory), it makes sense to use a cold one (__GFP_COLD) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵John W. Linville2011-11-171-2/+17
|\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: include/net/bluetooth/bluetooth.h
| * net: add wireless TX status socket optionJohannes Berg2011-11-091-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 802.1X EAPOL handshake hostapd does requires knowing whether the frame was ack'ed by the peer. Currently, we fudge this pretty badly by not even transmitting the frame as a normal data frame but injecting it with radiotap and getting the status out of radiotap monitor as well. This is rather complex, confuses users (mon.wlan0 presence) and doesn't work with all hardware. To get rid of that hack, introduce a real wifi TX status option for data frame transmissions. This works similar to the existing TX timestamping in that it reflects the SKB back to the socket's error queue with a SCM_WIFI_STATUS cmsg that has an int indicating ACK status (0/1). Since it is possible that at some point we will want to have TX timestamping and wifi status in a single errqueue SKB (there's little point in not doing that), redefine SO_EE_ORIGIN_TIMESTAMPING to SO_EE_ORIGIN_TXSTATUS which can collect more than just the timestamp; keep the old constant as an alias of course. Currently the internal APIs don't make that possible, but it wouldn't be hard to split them up in a way that makes it possible. Thanks to Neil Horman for helping me figure out the functions that add the control messages. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | net: remove NETIF_F_NO_CSUM feature bitMichał Mirosław2011-11-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Only distinct use is checking if NETIF_F_NOCACHE_COPY should be enabled by default. The check heuristics is altered a bit here, so it hits other people than before. The default shouldn't be trusted for performance-critical cases anyway. For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: introduce and use netdev_features_t for device features setsMichał Mirosław2011-11-161-1/+3
| | | | | | | | | | | | | | | | | | | | v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: introduce build_skb()Eric Dumazet2011-11-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the thing we discussed during netdev 2011 conference was the idea to change some network drivers to allocate/populate their skb at RX completion time, right before feeding the skb to network stack. In old days, we allocated skbs when populating the RX ring. This means bringing into cpu cache sk_buff and skb_shared_info cache lines (since we clear/initialize them), then 'queue' skb->data to NIC. By the time NIC fills a frame in skb->data buffer and host can process it, cpu probably threw away the cache lines from its caches, because lot of things happened between the allocation and final use. So the deal would be to allocate only the data buffer for the NIC to populate its RX ring buffer. And use build_skb() at RX completion to attach a data buffer (now filled with an ethernet frame) to a new skb, initialize the skb_shared_info portion, and give the hot skb to network stack. build_skb() is the function to allocate an skb, caller providing the data buffer that should be attached to it. Drivers are expected to call skb_reserve() right after build_skb() to adjust skb->data to the Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some drivers might add a hardware provided alignment) Data provided to build_skb() MUST have been allocated by a prior kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) bytes at the end of the data without corrupting incoming frame. data = kmalloc(NET_SKB_PAD + NET_IP_ALIGN + 1536 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), GFP_ATOMIC); ... skb = build_skb(data); if (!skb) { recycle_data(data); } else { skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN); ... } Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Eilon Greenstein <eilong@broadcom.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Tom Herbert <therbert@google.com> CC: Jamal Hadi Salim <hadi@mojatatu.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Thomas Graf <tgraf@infradead.org> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | include: linux: skbuf.h: Fix parameter documentationMarcos Paulo de Souza2011-11-011-2/+2
|/ | | | | | | | Fixes parameter name of skb_frag_dmamap function to silence warning on make htmldocs. Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-10-241-1/+6
|\
| * net: hold sock reference while processing tx timestampsRichard Cochran2011-10-241-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pair of functions, * skb_clone_tx_timestamp() * skb_complete_tx_timestamp() were designed to allow timestamping in PHY devices. The first function, called during the MAC driver's hard_xmit method, identifies PTP protocol packets, clones them, and gives them to the PHY device driver. The PHY driver may hold onto the packet and deliver it at a later time using the second function, which adds the packet to the socket's error queue. As pointed out by Johannes, nothing prevents the socket from disappearing while the cloned packet is sitting in the PHY driver awaiting a timestamp. This patch fixes the issue by taking a reference on the socket for each such packet. In addition, the comments regarding the usage of these function are expanded to highlight the rule that PHY drivers must use skb_complete_tx_timestamp() to release the packet, in order to release the socket reference, too. These functions first appeared in v2.6.36. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Cc: <stable@vger.kernel.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add opaque struct around skb frag pageIan Campbell2011-10-211-4/+6
| | | | | | | | | | | | | | | | I've split this bit out of the skb frag destructor patch since it helps enforce the use of the fragment API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: constify skbuff and Qdisc elementsEric Dumazet2011-10-201-8/+9
| | | | | | | | | | | | | | Preliminary patch before tcp constification Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: do not take an additional reference in skb_frag_set_pageIan Campbell2011-10-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I audited all of the callers in the tree and only one of them (pktgen) expects it to do so. Taking this reference is pretty obviously confusing and error prone. In particular I looked at the following commits which switched callers of (__)skb_frag_set_page to the skb paged fragment api: 6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API. 5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API. 0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API. 86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API. 4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API. 18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API. b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API. b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API. 804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Allow skb_recycle_check to be done in stagesAndy Fleming2011-10-191-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | skb_recycle_check resets the skb if it's eligible for recycling. However, there are times when a driver might want to optionally manipulate the skb data with the skb before resetting the skb, but after it has determined eligibility. We do this by splitting the eligibility check from the skb reset, creating two inline functions to accomplish that task. Signed-off-by: Andy Fleming <afleming@freescale.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add skb frag size accessorsEric Dumazet2011-10-191-4/+24
| | | | | | | | | | | | | | | | | | | | | | To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: more accurate skb truesizeEric Dumazet2011-10-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored. Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting. This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place. At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks. Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before. Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of github.com:davem330/netDavid S. Miller2011-09-221-0/+1
|\| | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
| * net: copy userspace buffers on device forwardingMichael S. Tsirkin2011-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_forward_skb loops an skb back into host networking stack which might hang on the memory indefinitely. In particular, this can happen in macvtap in bridged mode. Copy the userspace fragments to avoid blocking the sender in that case. As this patch makes skb_copy_ubufs extern now, I also added some documentation and made it clear the SKBTX_DEV_ZEROCOPY flag automatically instead of doing it in all callers. This can be made into a separate patch if people feel it's worth it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: convert core to skb paged frag APIsIan Campbell2011-08-241-2/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add APIs for manipulating skb page fragments.Ian Campbell2011-08-221-2/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | The primary aim is to add skb_frag_(ref|unref) in order to remove the use of bare get/put_page on SKB pages fragments and to isolate users from subsequent changes to the skb_frag_t data structure. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add the comment for skb->l4_rxhashChangli Gao2011-08-191-0/+2
| | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rps: Add flag to skb to indicate rxhash is based on L4 tupleTom Herbert2011-08-171-2/+3
|/ | | | | | | | | | | The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atomic: use <linux/atomic.h>Arun Sharma2011-07-261-1/+1
| | | | | | | | | | | | | | This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: introduce __netdev_alloc_skb_ip_alignEric Dumazet2011-07-111-3/+9
| | | | | | | | RX rings should use GFP_KERNEL allocations if possible, add __netdev_alloc_skb_ip_align() helper to ease this. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* skbuff: update struct sk_buff members commentsDaniel Baluta2011-07-101-20/+22
| | | | | | | | | Rearrange struct sk_buff members comments to follow their definition order. Also, add missing comments for ooo_okay and dropcount members. Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* skbuff: skb supports zero-copy buffersShirley Ma2011-07-071-0/+16
| | | | | | | | | | | | | | | | | | This patch adds userspace buffers support in skb shared info. A new struct skb_ubuf_info is needed to maintain the userspace buffers argument and index, a callback is used to notify userspace to release the buffers once lower device has done DMA (Last reference to that skb has gone). If there is any userspace apps to reference these userspace buffers, then these userspaces buffers will be copied into kernel. This way we can prevent userspace apps from holding these userspace buffers too long. Use destructor_arg to point to the userspace buffer info; a new tx flags SKBTX_DEV_ZEROCOPY is added for zero-copy buffer check. Signed-off-by: Shirley Ma <xma@...ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-06-201-0/+5
|\ | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c
| * vlan: Fix the ingress VLAN_FLAG_REORDER_HDR checkJiri Pirko2011-06-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag but rather in vlan_do_receive. Otherwise the vlan header will not be properly put on the packet in the case of vlan header accelleration. As we remove the check from vlan_check_reorder_header rename it vlan_reorder_header to keep the naming clean. Fix up the skb->pkt_type early so we don't look at the packet after adding the vlan tag, which guarantees we don't goof and look at the wrong field. Use a simple if statement instead of a complicated switch statement to decided that we need to increment rx_stats for a multicast packet. Hopefully at somepoint we will just declare the case where VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove the code. Until then this keeps it working correctly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: correct comment on where to place transmit time stamp hook.Richard Cochran2011-06-191-2/+1
|/ | | | | | | | | | | The comment for the skb_tx_timestamp() function suggests calling it just after a buffer is released to the hardware for transmission. However, for drivers that free the buffer in an ISR, this produces a race between the time stamp code and the ISR. This commit changes the comment to advise placing the call just before handing the buffer over to the hardware. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>