summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* drivers/net: delete orphaned MCA ibmlana driver contentPaul Gortmaker2013-01-142-1353/+0
| | | | | | | | | | | | In commit a5e371f61ad33c07b28e7c9b60c78d71fdd34e2a ("drivers/net: delete all code/drivers depending on CONFIG_MCA") most of the MCA drivers went, including the Kconfig/Makefile hooks for ibmlana, but it seems that I missed the "git rm" on these actual driver files, and with the namespace overlap with machine check architecture, it got missed by various git grep type checking done at that time. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ifb: dont hard code inet_net useEric Dumazet2013-01-141-1/+1
| | | | | | | | | ifb should lookup devices in the appropriate namespace. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: remove flags argument from phy_{attach, connect, connect_direct}Florian Fainelli2013-01-1434-64/+56
| | | | | | | | | | | | | | | | The flags argument of the phy_{attach,connect,connect_direct} functions is then used to assign a struct phy_device dev_flags with its value. All callers but the tg3 driver pass the flag 0, which results in the underlying PHY drivers in drivers/net/phy/ not being able to actually use any of the flags they would set in dev_flags. This patch gets rid of the flags argument, and passes phydev->dev_flags to the internal PHY library call phy_attach_direct() such that drivers which actually modify a phy device dev_flags get the value preserved for use by the underlying phy driver. Acked-by: Kosta Zertsekel <konszert@marvell.com> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: namespace aware act_mirredBenjamin LaHaise2013-01-1422-72/+94
| | | | | | | | | | | | | | | | | Eric Dumazet pointed out that act_mirred needs to find the current net_ns, and struct net pointer is not provided in the call chain. His original patch made use of current->nsproxy->net_ns to find the network namespace, but this fails to work correctly for userspace code that makes use of netlink sockets in different network namespaces. Instead, pass the "struct net *" down along the call chain to where it is needed. This version removes the ifb changes as Eric has submitted that patch separately, but is otherwise identical to the previous version. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6 netevent: Remove old_neigh from netevent_redirect.YOSHIFUJI Hideaki / 吉藤英明2013-01-143-28/+13
| | | | | | | | | | The only user is cxgb3 driver. old_neigh is used to check device change, but it must not happen on redirect. In this sense, we can remove old_neigh argument. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net: Clean up orphaned probes in Space.cPaul Gortmaker2013-01-141-31/+0
| | | | | | | | | | | | | | The removal of the 8390 EISA drivers actually comprises the complete content of the EISA probe block, so we can now remove that block, and its hook into the unified probe. Note that the deleted comment mentions PCI probes, but they long since moved elsewhere, so no PCI probes are touched here. We get rid of the orphaned EISA probe prototypes, and a couple of left over MCA probe prototypes at the same time. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: 64bit version of ipv6_prefix_equal().YOSHIFUJI Hideaki / 吉藤英明2013-01-141-0/+26
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Remove __ipv6_prefix_equal().YOSHIFUJI Hideaki / 吉藤英明2013-01-141-10/+5
| | | | | | | | ipv6_prefix_equal() just casts its arguments and it is the only user of __ipv6_prefix_equal(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: 64bit version of ipv6_addr_set().YOSHIFUJI Hideaki / 吉藤英明2013-01-141-4/+22
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: 64bit version of ipv6_addr_v4mapped().YOSHIFUJI Hideaki / 吉藤英明2013-01-141-2/+7
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: 64bit version of ipv6_addr_loopback().YOSHIFUJI Hideaki / 吉藤英明2013-01-141-0/+6
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: 64bit version of ipv6_addr_diff().YOSHIFUJI Hideaki / 吉藤英明2013-01-141-1/+28
| | | | | | | | | | | | Introduce __ipv6_addr_diff64() to to find the first different bit between two addresses on 64bit architectures. 32bit version is still available as __ipv6_addr_diff32(), and __ipv6_addr_diff() automatically selects appropriate version. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Move comment to right place.YOSHIFUJI Hideaki / 吉藤英明2013-01-132-5/+4
| | | | | | | | IN6ADDR_* and in6addr_* are not exported to userspace, and are defined in include/linux/in6.h. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller2013-01-1313-70/+88
|\ | | | | | | | | | | | | | | | | | | | | | | Included changes: - use per_cpu_add when possible - prevent the TT component to add multicast address as "mesh clients" - some debug output improvements - proper lockdeps class initializations - new style fixes (space before/after brackets) - other minor fixes and refactoring Signed-off-by: David S. Miller <davem@davemloft.net>
| * batman-adv: unbloat batadv_priv if debug is not enabledMarek Lindner2013-01-123-1/+6
| | | | | | | | | | Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
| * batman-adv: remove unused variable from orig_node structMarek Lindner2013-01-121-1/+0
| | | | | | | | | | Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
| * batman-adv: fix typo in debug messageAntonio Quartulli2013-01-121-1/+1
| | | | | | | | | | | | | | in bat_iv_ogm.c a debug message should print "tq" instead of "td" Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: use the const qualifier in hash functionsAntonio Quartulli2013-01-122-2/+2
| | | | | | | | | | | | | | | | The data argument in each hash function should carry the "const" qualifier as it is never modified. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: don't compile the BLA switch if not requestedAntonio Quartulli2013-01-122-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | When the Bridge Loop Avoidance component is not compiled-in, its boolean switch should be not compiled as well. This patch surrounds the switch with a proper ifdef. This behaviour was introduced by 9fd6b0615b5499b270d39a92b8790e206cf75833 ("batman-adv: add bridge loop avoidance compile option") Signed-off-by: Antonio Quartulli <ordex@autistici.org> Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: remove useless NULL checkAntonio Quartulli2013-01-121-4/+2
| | | | | | | | | | | | | | | | debugfs_remove_recursive() checks whether its argument is not null on its own, therefore it is possible to remove the external check. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: remove useless blank lines before and after bracketsAntonio Quartulli2013-01-128-29/+0
| | | | | | | | | | Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: Initialize lockdep class keys for hashesAntonio Quartulli2013-01-123-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | Different hashes have the same class key because they get initialised with the same one. For this reason lockdep can create false warning when they are used recursively. Re-initialise the key for each hash after the invocation to hash_new() to avoid this problem. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Tested-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: remove useless assignment in tt_local_add()Antonio Quartulli2013-01-121-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The flag field of the tt_local_entry->common structure in tt_local_add() is first assigned NO_FLAGS and then TT_CLIENT_NEW so nullifying the first operation. For this reason it is safe to remove the first assignment. This was introuduced by ("batman-adv: keep local table consistency for further TT_RESPONSE") Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: unify and properly print hex valuesAntonio Quartulli2013-01-123-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Values are printed in hexadecimal format in several points in the code, but they are not printed using the same format string. This patches unifies the format used for such numbers so that they look the same everywhere. Given the fact that all the variables printed as hexadecimal are 16 bit long, this is the chosen printing format: %#.4x Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: print the CRC together with the translation tablesAntonio Quartulli2013-01-121-6/+9
| | | | | | | | | | | | | | | | | | To simplify debugging operations, it is better to print the related CRC together with the translation table (local CRC for the local table and global CRC for each entry in the global table) Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: improve local translation table outputAntonio Quartulli2013-01-121-4/+23
| | | | | | | | | | | | | | | | This patch adds a nice header to the local translation table and the last_seen time for each local entry Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: reduce local TT entry timeout to 10 minutesAntonio Quartulli2013-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The current timeout is set to one hour. However a client connected to the mesh network will always generate traffic. In the worst case it will send ARP requests every 4 or 5 minutes. On the other hand having a long timeout means storing dead entries for one hour and it leads to very big trans-tables containing useless clients. This patch reduces the timeout to 10 minutes Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * batman-adv: Do not add multicast MAC addresses to translation tableLinus Lüssing2013-01-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The current translation table mechanism is not suitable for multicast addresses and we are currently flooding such frames anyway. Therefore this patch prevents multicast MAC addresses being added to the translation table. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
| * batman-adv: use per_cpu_add helperShan Wei2013-01-121-3/+1
| | | | | | | | | | | | | | | | | | | | | | this_cpu_add is an atomic operation. and be more faster than per_cpu_ptr operation. Signed-off-by: Shan Wei <davidshan@tencent.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
* | ipv6: Store Router Alert option in IP6CB directly.YOSHIFUJI Hideaki / 吉藤英明2013-01-134-5/+8
| | | | | | | | | | | | | | | | Router Alert option is very small and we can store the value itself in the skb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6 xfrm: Use ipv6_addr_hash() in xfrm6_tunnel_spi_hash_byaddr().YOSHIFUJI Hideaki / 吉藤英明2013-01-131-1/+1
| | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6 route: Use ipv6_addr_hash() in rt6_info_hash_nhsfn().YOSHIFUJI Hideaki / 吉藤英明2013-01-131-9/+2
| | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().YOSHIFUJI Hideaki / 吉藤英明2013-01-133-46/+28
| | | | | | | | | | | | | | | | Move generalized version of ipv6_is_mld() to header, and use it from ip6_mc_input(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Use ipv6_get_dsfield() instead of ipv6_tclass().YOSHIFUJI Hideaki / 吉藤英明2013-01-133-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced ipv6_tclass(), but similar function is already available as ipv6_get_dsfield(). We might be able to call ipv6_tclass() from ipv6_get_dsfield(), but it is confusing to have two versions. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).YOSHIFUJI Hideaki / 吉藤英明2013-01-133-7/+13
| | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.YOSHIFUJI Hideaki / 吉藤英明2013-01-135-13/+16
|/ | | | | | | | | | This is not only for readability but also for optimization. What we do here is to build the 32bit word at the beginning of the ipv6 header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and we do not need to read the tclass portion of the target buffer. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* networking/cs89x0.txt: delete stale information about hand patchingPaul Gortmaker2013-01-111-79/+0
| | | | | | | | | | | | | Output of a git grep happened to make me look into this file, and I found instructions about how to hand patch (without using patch) the driver into the kernel tree. Since the driver has been a part of the mainline kernel for years, we can dump this whole section. Fortunately it doesn't even cause a renumbering of the sections to do so. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: splice: fix __splice_segment()Eric Dumazet2013-01-111-13/+15
| | | | | | | | | | | | | | commit 9ca1b22d6d2 (net: splice: avoid high order page splitting) forgot that skb->head could need a copy into several page frags. This could be the case for loopback traffic mostly. Also remove now useless skb argument from linear_to_page() and __splice_segment() prototypes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* gianfar: use more portable i/o accessorsKim Phillips2013-01-111-4/+4
| | | | | | | | | in/out_be32 accessors are Power arch centric whereas ioread/writebe32 are available in other arches. Also, unlike in/out_be32, ioread/writebe32 expect non-volatile address arguments. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: fib: fix a comment.Rami Rosen2013-01-111-1/+1
| | | | | | | | In fib_frontend.c, there is a confusing comment; NETLINK_CB(skb).portid does not refer to a pid of sending process, but rather to a netlink portid. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Export __netdev_pick_tx so that it can be used in modulesAlexander Duyck2013-01-111-0/+1
| | | | | | | | | | | When testing with FCoE enabled we discovered that I had not exported __netdev_pick_tx. As a result ixgbe doesn't build with the RFC patches applied because ixgbe_select_queue was calling the function. This change corrects that build issue by correctly exporting __netdev_pick_tx so it can be used by modules. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: missing break statement in tg3_get_5720_nvram_info()Dan Carpenter2013-01-101-0/+2
| | | | | | | | There is a missing break statement so FLASH_5762_EEPROM_HD gets treated like FLASH_5762_EEPROM_LD. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add support for XPS without sysfs being definedAlexander Duyck2013-01-104-22/+21
| | | | | | | | | | This patch makes it so that we can support transmit packet steering without sysfs needing to be enabled. The reason for making this change is to make it so that a driver can make use of the XPS even while the sysfs portion of the interface is not present. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rewrite netif_set_xps_queues to address several issuesAlexander Duyck2013-01-101-66/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | This change is meant to address several issues I found within the netif_set_xps_queues function. If the allocation of one of the maps to be assigned to new_dev_maps failed we could end up with the device map in an inconsistent state since we had already worked through a number of CPUs and removed or added the queue. To address that I split the process into several steps. The first of which is just the allocation of updated maps for CPUs that will need larger maps to store the queue. By doing this we can fail gracefully without actually altering the contents of the current device map. The second issue I found was the fact that we were always allocating a new device map even if we were not adding any queues. I have updated the code so that we only allocate a new device map if we are adding queues, otherwise if we are not adding any queues to CPUs we just skip to the removal process. The last change I made was to reuse the code from remove_xps_queue to remove the queue from the CPU. By making this change we can be consistent in how we go about adding and removing the queues from the CPUs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rewrite netif_reset_xps_queue to allow for better code reuseAlexander Duyck2013-01-101-23/+33
| | | | | | | | | | | | | | | | | | | This patch does a minor refactor on netif_reset_xps_queue to address a few items I noticed. First is the fact that we are doing removal of queues in both netif_reset_xps_queue and netif_set_xps_queue. Since there is no need to have the code in two places I am pushing it out into a separate function and will come back in another patch and reuse the code in netif_set_xps_queue. The second item this change addresses is the fact that the Tx queues were not getting their numa_node value cleared as a part of the XPS queue reset. This patch resolves that by resetting the numa_node value if the dev_maps value is set. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add functions netif_reset_xps_queue and netif_set_xps_queueAlexander Duyck2013-01-103-143/+173
| | | | | | | | | | | | | | | | This patch adds two functions, netif_reset_xps_queue and netif_set_xps_queue. The main idea behind these two functions is to provide a mechanism through which drivers can update their defaults in regards to XPS. Currently no such mechanism exists and as a result we cannot use XPS for things such as ATR which would require a basic configuration to start in which the Tx queues are mapped to CPUs via a 1:1 mapping. With this change I am making it possible for drivers such as ixgbe to be able to use the XPS feature by controlling the default configuration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Split core bits of netdev_pick_tx into __netdev_pick_txAlexander Duyck2013-01-102-25/+33
| | | | | | | | | | This change splits the core bits of netdev_pick_tx into a separate function. The main idea behind this is to make this code accessible to select queue functions when they decide to process the standard path instead of their own custom path in their select queue routine. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* softirq: reduce latenciesEric Dumazet2013-01-101-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback to ksoftirqd condition to : - A time limit of 2 ms. - need_resched() being set on current task When one of this condition is met, we wakeup ksoftirqd for further softirq processing if we still have pending softirqs. Using need_resched() as the only condition can trigger RCU stalls, as we can keep BH disabled for too long. I ran several benchmarks and got no significant difference in throughput, but a very significant reduction of latencies (one order of magnitude) : In following bench, 200 antagonist "netperf -t TCP_RR" are started in background, using all available cpus. Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC IRQ (hard+soft) Before patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=550110.424 MIN_LATENCY=146858 MAX_LATENCY=997109 P50_LATENCY=305000 P90_LATENCY=550000 P99_LATENCY=710000 MEAN_LATENCY=376989.12 STDDEV_LATENCY=184046.92 After patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=40545.492 MIN_LATENCY=9834 MAX_LATENCY=78366 P50_LATENCY=33583 P90_LATENCY=59000 P99_LATENCY=69000 MEAN_LATENCY=38364.67 STDDEV_LATENCY=12865.26 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Miller <davem@davemloft.net> Cc: Tom Herbert <therbert@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: more precise pkt_len computationEric Dumazet2013-01-101-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One long standing problem with TSO/GSO/GRO packets is that skb->len doesn't represent a precise amount of bytes on wire. Headers are only accounted for the first segment. For TCP, thats typically 66 bytes per 1448 bytes segment missing, an error of 4.5 % for normal MSS value. As consequences : 1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits. 2) Device stats are slightly under estimated as well. Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len computation. Packet schedulers should use qdisc pkt_len instead of skb->len for their bandwidth limitations, and TSO enabled devices drivers could use pkt_len if their statistics are not hardware assisted, and if they don't scratch skb->cb[] first word. Both egress and ingress paths work, thanks to commit fda55eca5a (net: introduce skb_transport_header_was_set()) : If GRO built a GSO packet, it also set the transport header for us. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Paolo Valente <paolo.valente@unimore.it> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* doc: Clarify behavior when sysctl tcp_ecn = 1Vijay Subramanian2013-01-101-2/+3
| | | | | | | | | | | | | | | Recent commit (commit 7e3a2dc52953 doc: make the description of how tcp_ecn works more explicit and clear ) clarified the behavior of tcp_ecn sysctl variable but description is inconsistent. When requested by incoming conections, ECN is enabled with not just tcp_ecn = 2 but also with tcp_ecn = 1. This patch makes it clear that with tcp_ecn = 1, ECN is enabled when requested by incoming connections. Also fix spelling of 'incoming'. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>