summaryrefslogtreecommitdiffstats
path: root/net/core
Commit message (Collapse)AuthorAgeFilesLines
* net: disable preemption before call smp_processor_id()Changli Gao2010-08-071-0/+2
| | | | | | | | | | | Although netif_rx() isn't expected to be called in process context with preemption enabled, it'd better handle this case. And this is why get_cpu() is used in the non-RPS #ifdef branch. If tree RCU is selected, rcu_read_lock() won't disable preemption, so preempt_disable() should be called explictly. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix napi_gro_frags vs netpoll pathJarek Poplawski2010-08-051-4/+1
| | | | | | | | | | The netpoll_rx_on() check in __napi_gro_receive() skips part of the "common" GRO_NORMAL path, especially "pull:" in dev_gro_receive(), where at least eth header should be copied for entirely paged skbs. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "net: remove zap_completion_queue"David S. Miller2010-08-032-3/+32
| | | | | | | | | | | | | | This reverts commit 15e83ed78864d0625e87a85f09b297c0919a4797. As explained by Johannes Berg, the optimization made here is invalid. Or, at best, incomplete. Not only destructor invocation, but conntract entry releasing must be executed outside of hw IRQ context. So just checking "skb->destructor" is insufficient. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanup inclusionChangli Gao2010-08-021-2/+0
| | | | | | | | Commit ab95bfe01f9872459c8678572ccadbf646badad0 replaces bridge and macvlan hooks in __netif_receive_skb(), so dev.c doesn't need to include their headers. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ingress filter message limitStephen Hemminger2010-08-011-4/+4
| | | | | | | | | If user misconfigures ingress and causes a redirection loop, don't overwhelm the log. This is also a error case so make it unlikely. Found by inspection, luckily not in real system. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-07-272-2/+6
|\ | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x_main.c Merge bnx2x bug fixes in by hand... :-/ Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: dev_forward_skb should call nf_resetBen Greear2010-07-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | With conn-track zones and probably with different network namespaces, the netfilter logic needs to be re-calculated on packet receive. If the netfilter logic is not reset, it will not be recalculated properly. This patch adds the nf_reset logic to dev_forward_skb. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Fix skb_copy_expand() handling of ->csum_startDavid S. Miller2010-07-221-1/+2
| | | | | | | | | | | | It should only be adjusted if ip_summed == CHECKSUM_PARTIAL. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.cAndrea Shepard2010-07-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make pskb_expand_head() check ip_summed to make sure csum_start is really csum_start and not csum before adjusting it. This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs. On my configuration, the sunhme driver produces skbs with differing amounts of headroom on receive depending on the packet size. See line 2030 of drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes of headroom but packets larger than that cutoff have only 20 bytes. When these packets reach the VLAN driver, vlan_check_reorder_header() calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes of headroom, uses pskb_expand_head() to make more. Then, pskb_expand_head() needs to adjust a lot of offsets into the skb, including csum_start. Since csum_start is a union with csum, if the packet has a valid csum value this will corrupt it, which was the effect I observed. The sunhme hardware computes receive checksums, so the skbs would be created by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and then pskb_expand_head() would corrupt the csum field, leading to an "hw csum error" message later on, for example in icmp_rcv() for pings larger than the sunhme RX_COPY_THRESHOLD. On the basis of the comment at the beginning of include/linux/skbuff.h, I believe that the csum_start skb field is only meaningful if ip_csummed is CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that case to avoid corrupting a valid csum value. Please see my more in-depth disucssion of tracking down this bug for more details if you like: http://puellavulnerata.livejournal.com/112186.html http://puellavulnerata.livejournal.com/112567.html http://puellavulnerata.livejournal.com/112891.html http://puellavulnerata.livejournal.com/113096.html http://puellavulnerata.livejournal.com/113591.html I am not subscribed to this list, so please CC me on replies. Signed-off-by: Andrea Shepard <andrea@persephoneslair.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | drop_monitor: use genl_register_family_with_ops()Changli Gao2010-07-261-16/+7
| | | | | | | | | | | | | | | | [ Fix unused local variable build warnings. -DaveM ] Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: pskb_expand_head() optimizationEric Dumazet2010-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | Move frags[] at the end of struct skb_shared_info, and make pskb_expand_head() copy only the used part of it instead of whole array. This should avoid kmemcheck warnings and speedup pskb_expand_head() as well, avoiding a lot of cache misses. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sysfs: add attribute to indicate hw address assignment typeStefan Assmann2010-07-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add addr_assign_type to struct net_device and expose it via sysfs. This new attribute has the purpose of giving user-space the ability to distinguish between different assignment types of MAC addresses. For example user-space can treat NICs with randomly generated MAC addresses differently than NICs that have permanent (locally assigned) MAC addresses. For the former udev could write a persistent net rule by matching the device path instead of the MAC address. There's also the case of devices that 'steal' MAC addresses from slave devices. In which it is also be beneficial for user-space to be aware of the fact. This patch also introduces a helper function to assist adoption of drivers that generate MAC addresses randomly. Signed-off-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: core: don't use own hex_to_bin() methodAndy Shevchenko2010-07-231-24/+12
| | | | | | | | | | Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-07-202-8/+17
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/vhost/net.c net/bridge/br_device.c Fix merge conflict in drivers/vhost/net.c with guidance from Stephen Rothwell. Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d since net-next-2.6 has fixes that make bridge netpoll work properly thus we don't need it disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: fix problem in reading sock TX queueTom Herbert2010-07-141-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix problem in reading the tx_queue recorded in a socket. In dev_pick_tx, the TX queue is read by doing a check with sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get. The problem is that there is not mutual exclusion across these calls in the socket so it it is possible that the queue in the sock can be invalidated after sk_tx_queue_recorded is called so that sk_tx_queue get returns -1, which sets 65535 in queue_index and thus dev_pick_tx returns 65536 which is a bogus queue and can cause crash in dev_queue_xmit. We fix this by only calling sk_tx_queue_get which does the proper checks. The interface is that sk_tx_queue_get returns the TX queue if the sock argument is non-NULL and TX queue is recorded, else it returns -1. sk_tx_queue_recorded is no longer used so it can be completely removed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/core: neighbour update OopsDoug Kehn2010-07-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When configuring DMVPN (GRE + openNHRP) and a GRE remote address is configured a kernel Oops is observed. The obserseved Oops is caused by a NULL header_ops pointer (neigh->dev->header_ops) in neigh_update_hhs() when void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *) = neigh->dev->header_ops->cache_update; is executed. The dev associated with the NULL header_ops is the GRE interface. This patch guards against the possibility that header_ops is NULL. This Oops was first observed in kernel version 2.6.26.8. Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: skb_tx_hash() fix relative to skb_orphan_try()Eric Dumazet2010-07-141-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fc6055a5ba31e2 (net: Introduce skb_orphan_try()) added early orphaning of skbs. This unfortunately added a performance regression in skb_tx_hash() in case of stacked devices (bonding, vlans, ...) Since skb->sk is now NULL, we cannot access sk->sk_hash anymore to spread tx packets to multiple NIC queues on multiqueue devices. skb_tx_hash() in this case only uses skb->protocol, same value for all flows. skb_orphan_try() can copy sk->sk_hash into skb->rxhash and skb_tx_hash() can use this saved sk_hash value to compute its internal hash value. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | drop_monitor: Add error code to detect duplicate state changesNeil Horman2010-07-201-2/+8
| | | | | | | | | | | | | | | | | | | | | | Patch to add -EAGAIN error to dropwatch netlink message handling code. -EAGAIN will be returned anytime userspace attempts to transition the state of the drop monitor service to a state that its already in. That allows user space to detect this condition, so it doesn't wait for a success ACK that will never arrive. Tested successfully by me Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | __dst_free(): put EXPORT_SYMBOLS after the fctNicolas Dichtel2010-07-201-1/+1
| | | | | | | | | | Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: this_cpu_xxx conversionsEric Dumazet2010-07-191-3/+2
| | | | | | | | | | | | | | Use modern this_cpu_xxx() api, saving few bytes on x86 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: 64bit stats for netdev_queueEric Dumazet2010-07-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Since struct netdev_queue tx_bytes/tx_packets/tx_dropped are already protected by _xmit_lock, its easy to convert these fields to u64 instead of unsigned long. This completes 64bit stats for devices using them (vlan, macvlan, ...) Strictly, we could avoid the locking in dev_txq_stats_fold() on 64bit arches, but its slow path and we prefer keep it simple. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: support time stamping in phy devices.Richard Cochran2010-07-183-1/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new networking option to allow hardware time stamps from PHY devices. When enabled, likely candidates among incoming and outgoing network packets are offered to the PHY driver for possible time stamping. When accepted by the PHY driver, incoming packets are deferred for later delivery by the driver. The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl and callbacks for transmit and receive time stamping. Drivers may optionally implement these functions. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/core: Remove unnecessary casts of private_dataJoe Perches2010-07-121-2/+2
| | | | | | | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sock_free() optimizationsEric Dumazet2010-07-121-2/+3
| | | | | | | | | | | | | | | | Avoid two extra instructions in sock_free(), to reload skb->truesize and skb->sk Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/core: EXPORT_SYMBOL cleanupsEric Dumazet2010-07-1213-53/+32
| | | | | | | | | | | | | | | | | | CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Document that dev_get_stats() returns the given pointerBen Hutchings2010-07-091-6/+6
| | | | | | | | | | | | | | | | | | Document that dev_get_stats() returns the same stats pointer it was given. Remove const qualification from the returned pointer since the caller may do what it likes with that structure. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Get rid of rtnl_link_stats64 / net_device_stats unionBen Hutchings2010-07-091-5/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit be1f3c2c027cc5ad735df6a45a542ed1db7ec48b "net: Enable 64-bit net device statistics on 32-bit architectures" I redefined struct net_device_stats so that it could be used in a union with struct rtnl_link_stats64, avoiding the need for explicit copying or conversion between the two. However, this is unsafe because there is no locking required and no lock consistently held around calls to dev_get_stats() and use of the statistics structure it returns. In commit 28172739f0a276eb8d6ca917b3974c2edb036da3 "net: fix 64 bit counters on 32 bit arches" Eric Dumazet dealt with that problem by requiring callers of dev_get_stats() to provide storage for the result. This means that the net_device::stats64 field and the padding in struct net_device_stats are now redundant, so remove them. Update the comment on net_device_ops::ndo_get_stats64 to reflect its new usage. Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since that is what all its callers are really using and it is no longer going to be compatible with struct net_device_stats. Eric Dumazet suggested the separate function for the structure conversion. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-07-072-11/+48
|\| | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net: decreasing real_num_tx_queues needs to flush qdiscJohn Fastabend2010-07-021-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reducing real_num_queues needs to flush the qdisc otherwise skbs with queue_mappings greater then real_num_tx_queues can be sent to the underlying driver. The flow for this is, dev_queue_xmit() dev_pick_tx() skb_tx_hash() => hash using real_num_tx_queues skb_set_queue_mapping() ... qdisc_enqueue_root() => enqueue skb on txq from hash ... dev->real_num_tx_queues -= n ... sch_direct_xmit() dev_hard_start_xmit() ndo_start_xmit(skb,dev) => skb queue set with old hash skbs are enqueued on the qdisc with skb->queue_mapping set 0 < queue_mappings < real_num_tx_queues. When the driver decreases real_num_tx_queues skb's may be dequeued from the qdisc with a queue_mapping greater then real_num_tx_queues. This fixes a case in ixgbe where this was occurring with DCB and FCoE. Because the driver is using queue_mapping to map skbs to tx descriptor rings we can potentially map skbs to rings that no longer exist. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethtool: Fix potential user buffer overflow for ETHTOOL_{G, S}RXFHBen Hutchings2010-06-291-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct ethtool_rxnfc was originally defined in 2.6.27 for the ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data fields. It was then extended in 2.6.30 to support various additional commands. These commands should have been defined to use a new structure, but it is too late to change that now. Since user-space may still be using the old structure definition for the ETHTOOL_{G,S}RXFH commands, and since they do not need the additional fields, only copy the originally defined fields to and from user-space. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethtool: Fix potential kernel buffer overflow in ETHTOOL_GRXCLSRLALLBen Hutchings2010-06-291-2/+3
| | | | | | | | | | | | | | | | | | | | | | On a 32-bit machine, info.rule_cnt >= 0x40000000 leads to integer overflow and the buffer may be smaller than needed. Since ETHTOOL_GRXCLSRLALL is unprivileged, this can presumably be used for at least denial of service. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: fix 64 bit counters on 32 bit archesEric Dumazet2010-07-073-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a small possibility that a reader gets incorrect values on 32 bit arches. SNMP applications could catch incorrect counters when a 32bit high part is changed by another stats consumer/provider. One way to solve this is to add a rtnl_link_stats64 param to all ndo_get_stats64() methods, and also add such a parameter to dev_get_stats(). Rule is that we are not allowed to use dev->stats64 as a temporary storage for 64bit stats, but a caller provided area (usually on stack) Old drivers (only providing get_stats() method) need no changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netdevice.h net/core/dev.c: Convert netdev_<level> logging macros to functionsJoe Perches2010-07-041-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduces an x86 defconfig text and data ~2k. text is smaller, data is larger. $ size vmlinux* text data bss dec hex filename 7198862 720112 1366288 9285262 8dae8e vmlinux 7205273 716016 1366288 9287577 8db799 vmlinux.device_h Uses %pV and struct va_format Format arguments are verified before printk Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ethtool: Add support for control of RX flow hash indirectionBen Hutchings2010-06-301-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | Many NICs use an indirection table to map an RX flow hash value to one of an arbitrary number of queues (not necessarily a power of 2). It can be useful to remove some queues from this indirection table so that they are only used for flows that are specifically filtered there. It may also be useful to weight the mapping to account for user processes with the same CPU-affinity as the RX interrupts. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ethtool: Change ethtool_op_set_flags to validate flagsBen Hutchings2010-06-301-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ethtool_op_set_flags() does not check for unsupported flags, and has no way of doing so. This means it is not suitable for use as a default implementation of ethtool_ops::set_flags. Add a 'supported' parameter specifying the flags that the driver and hardware support, validate the requested flags against this, and change all current callers to pass this parameter. Change some other trivial implementations of ethtool_ops::set_flags to call ethtool_op_set_flags(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/core: use ntohs for skb->protocolSebastian Andrzej Siewior2010-06-301-1/+2
| | | | | | | | | | | | | | | | This is only noticed by people that are not doing everything correct in the first place. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: use this_cpu_ptr()Eric Dumazet2010-06-281-2/+2
| | | | | | | | | | | | | | use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/core/pktgen.c: Use pr_<level>Joe Perches2010-06-251-78/+64
| | | | | | | | | | | | | | | | | | | | | | | | Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt Remove "pktgen: " from formats Convert printks to pr_<level> Added func_enter() for debugging Moved version to end of string at module_init Coalesced long formats Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: optimize Berkeley Packet Filter (BPF) processingHagen Paul Pfeifer2010-06-251-51/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gcc is currenlty not in the ability to optimize the switch statement in sk_run_filter() because of dense case labels. This patch replace the OR'd labels with ordered sequenced case labels. The sk_chk_filter() function is modified to patch/replace the original OPCODES in a ordered but equivalent form. gcc is now in the ability to transform the switch statement in sk_run_filter into a jump table of complexity O(1). Until this patch gcc generates a sequence of conditional branches (O(n) of 567 byte .text segment size (arch x86_64): 7ff: 8b 06 mov (%rsi),%eax 801: 66 83 f8 35 cmp $0x35,%ax 805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d> 80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a> 811: 66 83 f8 15 cmp $0x15,%ax 815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322> 81b: 77 73 ja 890 <sk_run_filter+0xd2> 81d: 66 83 f8 04 cmp $0x4,%ax 821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280> 827: 77 29 ja 852 <sk_run_filter+0x94> 829: 66 83 f8 01 cmp $0x1,%ax [...] With the modification the compiler translate the switch statement into the following jump table fragment: 7ff: 66 83 3e 2c cmpw $0x2c,(%rsi) 803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a> 809: 0f b7 06 movzwl (%rsi),%eax 80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8) 813: 44 89 e3 mov %r12d,%ebx 816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0> 81b: 41 89 dc mov %ebx,%r12d 81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0> Furthermore, I reordered the instructions to reduce cache line misses by order the most common instruction to the start. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: fix "netpoll: Allow netpoll_setup/cleanup recursion"Andrew Morton2010-06-241-1/+0
| | | | | | | | | | | | | | | | | | Remove rtnl_unlock() which had no corresponding rtnl_lock(). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: consolidate netif_needs_gso() checksJohn Fastabend2010-06-231-36/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netif_needs_gso() is checked twice in the TX path once, before submitting the skb to the qdisc and once after it is dequeued from the qdisc just before calling ndo_hard_start(). This opens a window for a user to change the gso/tso or tx checksum settings that can cause netif_needs_gso to be true in one check and false in the other. Specifically, changing TX checksum setting may cause the warning in skb_gso_segment() to be triggered if the checksum is calculated earlier. This consolidates the netif_needs_gso() calls so that the stack only checks if gso is needed in dev_hard_start_xmit(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Export cred_to_ucred to modules.David S. Miller2010-06-161-0/+1
| | | | | | | | | | | | | | AF_UNIX references this, and can be built as a module, so... Signed-off-by: David S. Miller <davem@davemloft.net>
* | scm: Capture the full credentials of the scm sender.Eric W. Biederman2010-06-161-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Start capturing not only the userspace pid, uid and gid values of the sending process but also the struct pid and struct cred of the sending process as well. This is in preparation for properly supporting SCM_CREDENTIALS for sockets that have different uid and/or pid namespaces at the different ends. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | af_unix: Allow SO_PEERCRED to work across namespaces.Eric W. Biederman2010-06-161-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use struct pid and struct cred to store the peer credentials on struct sock. This gives enough information to convert the peer credential information to a value relative to whatever namespace the socket is in at the time. This removes nasty surprises when using SO_PEERCRED on socket connetions where the processes on either side are in different pid and user namespaces. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sock: Introduce cred_to_ucredEric W. Biederman2010-06-161-0/+14
| | | | | | | | | | | | | | | | | | | | To keep the coming code clear and to allow both the sock code and the scm code to share the logic introduce a fuction to translate from struct cred to struct ucred. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: use rx_handler_data pointer to store net_bridge_port pointerJiri Pirko2010-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | Register net_bridge_port pointer as rx_handler data pointer. As br_port is removed from struct net_device, another netdev priv_flag is added to indicate the device serves as a bridge port. Also rcuized pointers are now correctly dereferenced in br_fdb.c and in netfilter parts. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add rx_handler data pointerJiri Pirko2010-06-151-1/+5
| | | | | | | | | | | | | | Add possibility to register rx_handler data pointer along with a rx_handler. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netpoll: Allow netpoll_setup/cleanup recursionHerbert Xu2010-06-151-79/+97
| | | | | | | | | | | | | | | | | | | | | | This patch adds the functions __netpoll_setup/__netpoll_cleanup which is designed to be called recursively through ndo_netpoll_seutp. They must be called with RTNL held, and the caller must initialise np->dev and ensure that it has a valid reference count. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netpoll: Add ndo_netpoll_setupHerbert Xu2010-06-151-0/+10
| | | | | | | | | | | | | | | | This patch adds ndo_netpoll_setup as the initialisation primitive to complement ndo_netpoll_cleanup. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netpoll: Add locking for netpoll_setup/cleanupHerbert Xu2010-06-151-75/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | As it stands, netpoll_setup and netpoll_cleanup have no locking protection whatsoever. So chaos ensures if two entities try to perform them on the same device. This patch adds RTNL to the equation. The code has been rearranged so that bits that do not need RTNL protection are now moved to the top of netpoll_setup. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>