summaryrefslogtreecommitdiffstats
path: root/net/core/skbuff.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.cAndrea Shepard2010-07-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make pskb_expand_head() check ip_summed to make sure csum_start is really csum_start and not csum before adjusting it. This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs. On my configuration, the sunhme driver produces skbs with differing amounts of headroom on receive depending on the packet size. See line 2030 of drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes of headroom but packets larger than that cutoff have only 20 bytes. When these packets reach the VLAN driver, vlan_check_reorder_header() calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes of headroom, uses pskb_expand_head() to make more. Then, pskb_expand_head() needs to adjust a lot of offsets into the skb, including csum_start. Since csum_start is a union with csum, if the packet has a valid csum value this will corrupt it, which was the effect I observed. The sunhme hardware computes receive checksums, so the skbs would be created by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and then pskb_expand_head() would corrupt the csum field, leading to an "hw csum error" message later on, for example in icmp_rcv() for pings larger than the sunhme RX_COPY_THRESHOLD. On the basis of the comment at the beginning of include/linux/skbuff.h, I believe that the csum_start skb field is only meaningful if ip_csummed is CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that case to avoid corrupting a valid csum value. Please see my more in-depth disucssion of tracking down this bug for more details if you like: http://puellavulnerata.livejournal.com/112186.html http://puellavulnerata.livejournal.com/112567.html http://puellavulnerata.livejournal.com/112891.html http://puellavulnerata.livejournal.com/113096.html http://puellavulnerata.livejournal.com/113591.html I am not subscribed to this list, so please CC me on replies. Signed-off-by: Andrea Shepard <andrea@persephoneslair.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: pskb_expand_head() optimizationEric Dumazet2010-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | Move frags[] at the end of struct skb_shared_info, and make pskb_expand_head() copy only the used part of it instead of whole array. This should avoid kmemcheck warnings and speedup pskb_expand_head() as well, avoiding a lot of cache misses. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/core: EXPORT_SYMBOL cleanupsEric Dumazet2010-07-121-1/+0
|/ | | | | | | | | CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rxhash already set in __copy_skb_headerEric Dumazet2010-06-131-1/+0
| | | | | | | No need to copy rxhash again in __skb_clone() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix deliver_no_wcard regression on loopback deviceJohn Fastabend2010-06-131-0/+1
| | | | | | | | | | | | deliver_no_wcard is not being set in skb_copy_header. In the skb_cloned case it is not being cleared and may cause the skb to be dropped when the loopback device pushes it back up the stack. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sock_queue_err_skb() dont mess with sk_forward_allocEric Dumazet2010-05-311-2/+28
| | | | | | | | | | | | | | | | | Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit 29030374 (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2010-05-311-17/+21
|\
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2010-05-251-0/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits) drivers/net/usb/asix.c: Fix pointer cast. be2net: Bug fix to avoid disabling bottom half during firmware upgrade. proc_dointvec: write a single value hso: add support for new products Phonet: fix potential use-after-free in pep_sock_close() ath9k: remove VEOL support for ad-hoc ath9k: change beacon allocation to prefer the first beacon slot sock.h: fix kernel-doc warning cls_cgroup: Fix build error when built-in macvlan: do proper cleanup in macvlan_common_newlink() V2 be2net: Bug fix in init code in probe net/dccp: expansion of error code size ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep wireless: fix sta_info.h kernel-doc warnings wireless: fix mac80211.h kernel-doc warnings iwlwifi: testing the wrong variable in iwl_add_bssid_station() ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs() ath9k_htc: dereferencing before check in hif_usb_tx_cb() rt2x00: Fix rt2800usb TX descriptor writing. rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions. ...
| * \ Merge branch 'master' into for-2.6.35Jens Axboe2010-05-211-22/+11
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | pipe: add support for shrinking and growing pipesJens Axboe2010-05-211-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for growing and shrinking the size of a pipe and adjusts pipe.c and splice.c (and relay and network splice) usage to work with these larger (or smaller) pipes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | | | net: fix sk_forward_alloc corruptionsEric Dumazet2010-05-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As David found out, sock_queue_err_skb() should be called with socket lock hold, or we risk sk_forward_alloc corruption, since we use non atomic operations to update this field. This patch adds bh_lock_sock()/bh_unlock_sock() pair to three spots. (BH already disabled) 1) skb_tstamp_tx() 2) Before calling ip_icmp_error(), in __udp4_lib_err() 3) Before calling ipv6_icmp_error(), in __udp6_lib_err() Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | skb: make skb_recycle_check() return a bool valueChangli Gao2010-05-291-6/+6
| |_|/ |/| | | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | gro: Fix bogus gso_size on the first fraglist entryHerbert Xu2010-05-201-0/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | When GRO produces fraglist entries, and the resulting skb hits an interface that is incapable of TSO but capable of FRAGLIST, we end up producing a bogus packet with gso_size non-zero. This was reported in the field with older versions of KVM that did not set the TSO bits on tuntap. This patch fixes that. Reported-by: Igor Zhang <yugzhang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add a noref bit on skb dstEric Dumazet2010-05-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: __alloc_skb() speedupEric Dumazet2010-05-051-16/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With following patch I can reach maximum rate of my pktgen+udpsink simulator : - 'old' machine : dual quad core E5450 @3.00GHz - 64 UDP rx flows (only differ by destination port) - RPS enabled, NIC interrupts serviced on cpu0 - rps dispatched on 7 other cores. (~130.000 IPI per second) - SLAB allocator (faster than SLUB in this workload) - tg3 NIC - 1.080.000 pps without a single drop at NIC level. Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch first sk_buff cache line, the second to prefetch the shinfo part. Also using one memset() to initialize all skb_shared_info fields instead of one by one to reduce number of instructions, using long word moves. All skb_shared_info fields before 'dataref' are cleared in __alloc_skb(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Inline skb_pull() in eth_type_trans().David S. Miller2010-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot") we uninlined skb_pull. But in some critical paths it makes sense to inline this thing and it helps performance significantly. Create an skb_pull_inline() so that we can do this in a way that serves also as annotation. Based upon a patch by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Remove two unnecessary exports (skbuff).Rami Rosen2010-04-201-4/+2
| | | | | | | | | | | | | | | | | | | | | | There is no need to export skb_under_panic() and skb_over_panic() in skbuff.c, since these methods are used only in skbuff.c ; this patch removes these two exports. It also marks these functions as 'static' and removeS the extern declarations of them from include/linux/skbuff.h Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rps: Receive Packet SteeringTom Herbert2010-03-161-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements software receive side packet steering (RPS). RPS distributes the load of received packet processing across multiple CPUs. Problem statement: Protocol processing done in the NAPI context for received packets is serialized per device queue and becomes a bottleneck under high packet load. This substantially limits pps that can be achieved on a single queue NIC and provides no scaling with multiple cores. This solution queues packets early on in the receive path on the backlog queues of other CPUs. This allows protocol processing (e.g. IP and TCP) to be performed on packets in parallel. For each device (or each receive queue in a multi-queue device) a mask of CPUs is set to indicate the CPUs that can process packets. A CPU is selected on a per packet basis by hashing contents of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index into the CPU mask. The IPI mechanism is used to raise networking receive softirqs between CPUs. This effectively emulates in software what a multi-queue NIC can provide, but is generic requiring no device support. Many devices now provide a hash over the 4-tuple on a per packet basis (e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash in an skb field, and that value in turn is used to index into the RPS maps. Using the HW generated hash can avoid cache misses on the packet when steering it to a remote CPU. The CPU mask is set on a per device and per queue basis in the sysfs variable /sys/class/net/<device>/queues/rx-<n>/rps_cpus. This is a set of canonical bit maps for receive queues in the device (numbered by <n>). If a device does not support multi-queue, a single variable is used for the device (rx-0). Generally, we have found this technique increases pps capabilities of a single queue device with good CPU utilization. Optimal settings for the CPU mask seem to depend on architectures and cache hierarcy. Below are some results running 500 instances of netperf TCP_RR test with 1 byte req. and resp. Results show cumulative transaction rate and system CPU utilization. e1000e on 8 core Intel Without RPS: 108K tps at 33% CPU With RPS: 311K tps at 64% CPU forcedeth on 16 core AMD Without RPS: 156K tps at 15% CPU With RPS: 404K tps at 49% CPU bnx2x on 16 core AMD Without RPS 567K tps at 61% CPU (4 HW RX queues) Without RPS 738K tps at 96% CPU (8 HW RX queues) With RPS: 854K tps at 76% CPU (4 HW RX queues) Caveats: - The benefits of this patch are dependent on architecture and cache hierarchy. Tuning the masks to get best performance is probably necessary. - This patch adds overhead in the path for processing a single packet. In a lightly loaded server this overhead may eliminate the advantages of increased parallelism, and possibly cause some relative performance degradation. We have found that masks that are cache aware (share same caches with the interrupting CPU) mitigate much of this. - The RPS masks can be changed dynamically, however whenever the mask is changed this introduces the possibility of generating out of order packets. It's probably best not change the masks too frequently. Signed-off-by: Tom Herbert <therbert@google.com> include/linux/netdevice.h | 32 ++++- include/linux/skbuff.h | 3 + net/core/dev.c | 335 +++++++++++++++++++++++++++++++++++++-------- net/core/net-sysfs.c | 225 ++++++++++++++++++++++++++++++- net/core/skbuff.c | 2 + 5 files changed, 538 insertions(+), 59 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* const: constify remaining pipe_buf_operationsAlexey Dobriyan2009-12-161-1/+1
| | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: rename skb->iif to skb->skb_iifEric Dumazet2009-11-201-1/+1
| | | | | | | To help grep games, rename iif to skb_iif Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-11-181-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/sfc/sfe4001.c drivers/net/wireless/libertas/cmd.c drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/rtl8187se/Kconfig drivers/staging/rtl8192e/Kconfig
| * gro: Fix illegal merging of trailer trashHerbert Xu2009-11-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we've merged skb's with page frags, and subsequently receive a trailer skb (< MSS) that is not completely non-linear (this can occur on Intel NICs if the packet size falls below the threshold), GRO ends up producing an illegal GSO skb with a frag_list. This is harmless unless the skb is then forwarded through an interface that requires software GSO, whereupon the GSO code will BUG. This patch detects this case in GRO and avoids merging the trailer skb. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | skbuff: Do not allow skb recycling with disabled IRQsAnton Vorontsov2009-11-111-0/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | NAPI drivers try to recycle SKBs in their polling routine, but we generally don't know the context in which the polling will be called, and the skb recycling itself may require IRQs to be enabled. This patch adds irqs_disabled() test to the skb_recycle_check() routine, so that we'll not let the drivers hit the skb recycling path with IRQs disabled. As a side effect, this patch actually disables skb recycling for some [broken] drivers. E.g. gianfar driver grabs an irqsave spinlock during TX ring processing, and then tries to recycle an skb, and that caused the following badness: nf_conntrack version 0.5.0 (1008 buckets, 4032 max) ------------[ cut here ]------------ Badness at kernel/softirq.c:143 NIP: c003e3c4 LR: c423a528 CTR: c003e344 ... NIP [c003e3c4] local_bh_enable+0x80/0xc4 LR [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack] Call Trace: [c15d1b60] [c003e32c] local_bh_disable+0x1c/0x34 (unreliable) [c15d1b70] [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack] [c15d1b80] [c02c6370] nf_conntrack_destroy+0x3c/0x70 Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove unused skb->do_not_encryptJohannes Berg2009-07-241-3/+0
| | | | | | | | mac80211 required this due to the master netdev, but now it can put all information into skb->cb and this can go. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-06-181-3/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits) netxen: fix tx ring accounting netxen: fix detection of cut-thru firmware mode forcedeth: fix dma api mismatches atm: sk_wmem_alloc initial value is one net: correct off-by-one write allocations reports via-velocity : fix no link detection on boot Net / e100: Fix suspend of devices that cannot be power managed TI DaVinci EMAC : Fix rmmod error net: group address list and its count ipv4: Fix fib_trie rebalancing, part 2 pkt_sched: Update drops stats in act_police sky2: version 1.23 sky2: add GRO support sky2: skb recycling sky2: reduce default transmit ring sky2: receive counter update sky2: fix shutdown synchronization sky2: PCI irq issues sky2: more receive shutdown sky2: turn off pause during shutdown ... Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
| * skbuff: don't corrupt mac_header on skb expansionStephen Hemminger2009-06-171-3/+6
| | | | | | | | | | | | | | | | | | The skb mac_header field is sometimes NULL (or ~0u) as a sentinel value. The places where skb is expanded add an offset which would change this flag into an invalid pointer (or offset). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * skbuff: skb_mac_header_was_set is always true on >32 bitStephen Hemminger2009-06-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | Looking at the crash in log_martians(), one suspect is that the check for mac header being set is not correct. The value of mac_header defaults to 0 on allocation, therefore skb_mac_header_was_set will always be true on platforms using NET_SKBUFF_USES_OFFSET. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus2' of ↵Linus Torvalds2009-06-161-0/+8
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...
| * net: use kmemcheck bitfields API for skbuffVegard Nossum2009-06-151-0/+8
| | | | | | | | Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
* | Merge branch 'master' of ↵David S. Miller2009-06-151-1/+1
|\| | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
| * Merge branch 'tracing-for-linus' of ↵Linus Torvalds2009-06-101-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
| | * Merge commit 'v2.6.30-rc6' into tracing/coreIngo Molnar2009-05-181-0/+2
| | |\ | | | | | | | | | | | | | | | | | | | | Merge reason: we were on an -rc4 base, sync up to -rc6 Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * \ Merge branch 'linus' into tracing/coreIngo Molnar2009-05-071-13/+14
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: tracing/core was on a .30-rc1 base and was missing out on on a handful of tracing fixes present in .30-rc5-almost. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | tracing/events: move trace point headers into include/trace/eventsSteven Rostedt2009-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: clean up Create a sub directory in include/trace called events to keep the trace point headers in their own separate directory. Only headers that declare trace points should be defined in this directory. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | mac80211: do not pass PS frames out of mac80211 againJohannes Berg2009-06-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to handle powersave frames properly we had needed to pass these out to the device queues again, and introduce the skb->requeue bit. This, however, also has unnecessary overhead by needing to 'clean up' already tried frames, and this clean-up code is also buggy when software encryption is used. Instead of sending the frames via the master netdev queue again, simply put them into the pending queue. This also fixes a problem where frames for that particular station could be reordered when some were still on the software queues and older ones are re-injected into the software queue after them. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | | net/core/skbuff.c: Use frag list abstraction interfaces.David S. Miller2009-06-091-124/+106
| | | | | | | | | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Ensure partial checksum offset is inside the skb headHerbert Xu2009-06-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Thu, Jun 04, 2009 at 09:06:00PM +1000, Herbert Xu wrote: > > tun: Optimise handling of bogus gso->hdr_len > > As all current versions of virtio_net generate a value for the > header length that's too small, we should optimise this so that > we don't copy it twice. This can be done by ensuring that it is > at least as large as the place where we'll write the checksum. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> With this applied we can strengthen the partial checksum check: In skb_partial_csum_set we check to see if the checksum offset is within the packet. However, we really should check that it is within the skb head as that's the only bit we can modify without copying. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: skb->dst accessorsEric Dumazet2009-06-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | gro: Store shinfo in local variable in skb_gro_receiveHerbert Xu2009-05-271-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch stores the two shinfo pointers in local variables because they're used over and over again in skb_gro_receive. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | gro: Nasty optimisations for page frags in skb_gro_receiveHerbert Xu2009-05-271-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reverses the direction of the frags array copy in skb_gro_receive in order simplify the loop conditional. It also avoids touching the first element of the original frags array. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | gro: Localise offset/headlen in skb_gro_offsetHerbert Xu2009-05-271-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch stores the offset/headlen in local variables as they're used repeatedly in skb_gro_offset. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | gro: Open-code frags copy in skb_gro_receiveHerbert Xu2009-05-271-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc does a poor job at generating code for the memcpy of the frags array in skb_gro_receive, which is the primary purpose of that function when merging frags. In particular, it can't utilise the alignment information of the source and destination. This patch open-codes the copy so we process words instead of bytes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'master' of ↵David S. Miller2009-05-251-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
| * | | | net: fix skb_seq_read returning wrong offset/length for page frag dataThomas Chenault2009-05-181-1/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When called with a consumed value that is less than skb_headlen(skb) bytes into a page frag, skb_seq_read() incorrectly returns an offset/length relative to skb->data. Ensure that data which should come from a page frag does. Signed-off-by: Thomas Chenault <thomas_chenault@dell.com> Tested-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | skbuff: Copy csum instead of csum_start/csum_offsetHerbert Xu2009-05-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi: skbuff: Copy csum instead of csum_start/csum_offset It's easier to copy the u32 csum instead of its two u16 constituents. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | skbuff: Move new code into __copy_skb_headerHerbert Xu2009-05-251-5/+9
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi: skbuff: Move new __skb_clone code into __copy_skb_header It seems that people just keep on adding stuff to __skb_clone instead __copy_skb_header. This is wrong as it means your brand-new attributes won't always get copied as you intended. This patch moves them to the right place, and adds a comment to prevent this from happening again. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: David S. Miller <davem@davemloft.net>
* | / net: update skb_recycle_check() for hardware timestamping changesLennert Buytenhek2009-05-061-0/+2
| |/ |/| | | | | | | | | | | | | | | | | Commit ac45f602ee3d1b6f326f68bc0c2591ceebf05ba4 ("net: infrastructure for hardware time stamping") added two skb initialization actions to __alloc_skb(), which need to be added to skb_recycle_check() as well. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Fix oops when splicing skbs from a frag_list.Jarek Poplawski2009-04-301-13/+14
|/ | | | | | | | | | | | | | | | | | Lennert Buytenhek wrote: > Since 4fb669948116d928ae44262ab7743732c574630d ("net: Optimize memory > usage when splicing from sockets.") I'm seeing this oops (e.g. in > 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver > (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather > than the frag mode: My patch incorrectly assumed skb->sk was always valid, but for "frag_listed" skbs we can only use skb->sk of their parent. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org> Tested-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* gso: Fix support for linear packetsHerbert Xu2009-03-281-1/+1
| | | | | | | | | | | | | | | | | | When GRO/frag_list support was added to GSO, I made an error which broke the support for segmenting linear GSO packets (GSO packets are normally non-linear in the payload). These days most of these packets are constructed by the tun driver, which prefers to allocate linear memory if possible. This is fixed in the latest kernel, but for 2.6.29 and earlier it is still the norm. Therefore this bug causes failures with GSO when used with tun in 2.6.29. Reported-by: James Huang <jamesclhuang@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying ↵Neil Horman2009-03-131-0/+22
| | | | | | | | | | | | | | | end-of-line points for skbs Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/linux/skbuff.h | 4 +++- net/core/datagram.c | 2 +- net/core/skbuff.c | 22 ++++++++++++++++++++++ net/ipv4/arp.c | 2 +- net/ipv4/udp.c | 2 +- net/packet/af_packet.c | 2 +- 6 files changed, 29 insertions(+), 5 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>