summaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-10-061-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David Miller: "The most important bit in here is the fix for input route caching from Eric Dumazet, it's a shame we couldn't fully analyze this in time for 3.6 as it's a 3.6 regression introduced by the routing cache removal. Anyways, will send quickly to -stable after you pull this in. Other changes of note: 1) Fix lockdep splats in team and bonding, from Eric Dumazet. 2) IPV6 adds link local route even when there is no link local address, from Nicolas Dichtel. 3) Fix ixgbe PTP implementation, from Jacob Keller. 4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya. 5) MAC length computed improperly in VLAN demux, from Antonio Quartulli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt Remove noisy printks from llcp_sock_connect tipc: prevent dropped connections due to rcvbuf overflow silence some noisy printks in irda team: set qdisc_tx_busylock to avoid LOCKDEP splat bonding: set qdisc_tx_busylock to avoid LOCKDEP splat sctp: check src addr when processing SACK to update transport state sctp: fix a typo in prototype of __sctp_rcv_lookup() ipv4: add a fib_type to fib_info can: mpc5xxx_can: fix section type conflict can: peak_pcmcia: fix error return code can: peak_pci: fix error return code cxgb4: Fix build error due to missing linux/vmalloc.h include. bnx2x: fix ring size for 10G functions cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params() ixgbe: add support for X540-AT1 ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit ixgbe: fix PTP ethtool timestamping function ixgbe: (PTP) Fix PPS interrupt code ixgbe: Fix PTP X540 SDP alignment code for PPS signal ...
| * ipv4: add a fib_type to fib_infoEric Dumazet2012-10-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.) introduced a regression for forwarding. This was hard to reproduce but the symptom was that packets were delivered to local host instead of being forwarded. David suggested to add fib_type to fib_info so that we dont inadvertently share same fib_info for different purposes. With help from Julian Anastasov who provided very helpful hints, reproduced here : <quote> Can it be a problem related to fib_info reuse from different routes. For example, when local IP address is created for subnet we have: broadcast 192.168.0.255 dev DEV proto kernel scope link src 192.168.0.1 192.168.0.0/24 dev DEV proto kernel scope link src 192.168.0.1 local 192.168.0.1 dev DEV proto kernel scope host src 192.168.0.1 The "dev DEV proto kernel scope link src 192.168.0.1" is a reused fib_info structure where we put cached routes. The result can be same fib_info for 192.168.0.255 and 192.168.0.0/24. RTN_BROADCAST is cached only for input routes. Incoming broadcast to 192.168.0.255 can be cached and can cause problems for traffic forwarded to 192.168.0.0/24. So, this patch should solve the problem because it separates the broadcast from unicast traffic. And the ip_route_input_slow caching will work for local and broadcast input routes (above routes 1 and 3) just because they differ in scope and use different fib_info. </quote> Many thanks to Chris Clayton for his patience and help. Reported-by: Chris Clayton <chris2553@googlemail.com> Bisected-by: Chris Clayton <chris2553@googlemail.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sections: fix section conflicts in netAndi Kleen2012-10-062-2/+2
|/ | | | | | | Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2012-10-0263-4121/+2021
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
| * igmp: export symbol ip_mc_leave_groupstephen hemminger2012-10-011-0/+1
| | | | | | | | | | | | | | Needed for VXLAN. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gre: fix sparse warningstephen hemminger2012-10-011-2/+2
| | | | | | | | | | | | | | Use be16 consistently when looking at flags. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: gre: add GRO capabilityEric Dumazet2012-10-011-2/+11
| | | | | | | | | | | | | | | | | | | | | | Add GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and checking GRO is building large packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: gro: add checksuming helpersEric Dumazet2012-10-011-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | skb with CHECKSUM_NONE cant currently be handled by GRO, and we notice this deep in GRO stack in tcp[46]_gro_receive() But there are cases where GRO can be a benefit, even with a lack of checksums. This preliminary work is needed to add GRO support to tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-09-286-24/+36
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: Remove unused parameter from tcp_v4_save_optionsChristoph Paasch2012-09-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | struct sock *sk is not used inside tcp_v4_save_options. Thus it can be removed. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tunnel: drop packet if ECN present with not-ECTstephen hemminger2012-09-272-33/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux tunnels were written before RFC6040 and therefore never implemented the corner case of ECN getting set in the outer header and the inner header not being ready for it. Section 4.2. Default Tunnel Egress Behaviour. o If the inner ECN field is Not-ECT, the decapsulator MUST NOT propagate any other ECN codepoint onwards. This is because the inner Not-ECT marking is set by transports that rely on dropped packets as an indication of congestion and would not understand or respond to any other ECN codepoint [RFC4774]. Specifically: * If the inner ECN field is Not-ECT and the outer ECN field is CE, the decapsulator MUST drop the packet. * If the inner ECN field is Not-ECT and the outer ECN field is Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the outgoing packet with the ECN field cleared to Not-ECT. This patch moves the ECN decap logic out of the individual tunnels into a common place. It also adds logging to allow detecting broken systems that set ECN bits incorrectly when tunneling (or an intermediate router might be changing the header). Overloads rx_frame_error to keep track of ECN related error. Thanks to Chris Wright who caught this while reviewing the new VXLAN tunnel. This code was tested by injecting faulty logic in other end GRE to send incorrectly encapsulated packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xfrm: remove extranous rcu_read_lockstephen hemminger2012-09-272-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | The handlers for xfrm_tunnel are always invoked with rcu read lock already. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | gre: remove unnecessary rcu_read_lock/unlockstephen hemminger2012-09-271-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | The gre function pointers for receive and error handling are always called (from gre.c) with rcu_read_lock already held. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | gre: fix handling of key 0stephen hemminger2012-09-271-10/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GRE driver incorrectly uses zero as a flag value. Zero is a perfectly valid value for key, and the tunnel should match packets with no key only with tunnels created without key, and vice versa. This is a slightly visible change since previously it might be possible to construct a working tunnel that sent key 0 and received only because of the key wildcard of zero. I.e the sender sent key of zero, but tunnel was defined without key. Note: using gre key 0 requires iproute2 utilities v3.2 or later. The original utility code was broken as well. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipconfig: fix trivial build errorAndy Shevchenko2012-09-251-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 5e953778a2aab04929a5e7b69f53dc26e39b079e ("ipconfig: add nameserver IPs to kernel-parameter ip=") introduces ic_nameservers_predef() that defined only for BOOTP. However it is used by ip_auto_config_setup() as well. This patch moves it outside of #ifdef BOOTP. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Christoph Fritz <chf.fritz@googlemail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: raw: revert unrelated changeEric Dumazet2012-09-251-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5640f7685831 ("net: use a per task frag allocator") accidentally contained an unrelated change to net/ipv4/raw.c, later committed (without the pr_err() debugging bits) in net tree as commit ab43ed8b749 (ipv4: raw: fix icmp_filter()) This patch reverts this glitch, noticed by Stephen Rothwell. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: use a per task frag allocatorEric Dumazet2012-09-244-114/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use a per socket order-0 page cache for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of memory for applications handling many mostly idle sockets, since each socket holds one page in sk->sk_sndmsg_page Its also quite inefficient to build TSO 64KB packets, because we need about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit page allocator more than wanted. This patch adds a per task frag allocator and uses bigger pages, if available. An automatic fallback is done in case of memory pressure. (up to 32768 bytes per frag, thats order-3 pages on x86) This increases TCP stream performance by 20% on loopback device, but also benefits on other network devices, since 8x less frags are mapped on transmit and unmapped on tx completion. Alexander Duyck mentioned a probable performance win on systems with IOMMU enabled. Its possible some SG enabled hardware cant cope with bigger fragments, but their ndo_start_xmit() should already handle this, splitting a fragment in sub fragments, since some arches have PAGE_SIZE=65536 Successfully tested on various ethernet devices. (ixgbe, igb, bnx2x, tg3, mellanox mlx4) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller2012-09-244-229/+10
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== This patchset contains updates for your net-next tree, they are: * Mostly fixes for the recently pushed IPv6 NAT support: - Fix crash while removing nf_nat modules from Patrick McHardy. - Fix unbalanced rcu_read_unlock from Ulrich Weber. - Merge NETMAP and REDIRECT into one single xt_target module, from Jan Engelhardt. - Fix Kconfig for IPv6 NAT, which allows inconsistent configurations, from myself. * Updates for ipset, all of the from Jozsef Kadlecsik: - Add the new "nomatch" option to obtain reverse set matching. - Support for /0 CIDR in hash:net,iface set type. - One non-critical fix for a rare crash due to pass really wrong configuration parameters. - Coding style cleanups. - Sparse fixes. - Add set revision supported via modinfo.i * One extension for the xt_time match, to support matching during the transition between two days with one single rule, from Florian Westphal. * Fix maximum packet length supported by nfnetlink_queue and add NFQA_CAP_LEN attribute, from myself. You can notice that this batch contains a couple of fixes that may go to 3.6-rc but I don't consider them critical to push them: * The ipset fix for the /0 cidr case, which is triggered with one inconsistent command line invocation of ipset. * The nfnetlink_queue maximum packet length supported since it requires the new NFQA_CAP_LEN attribute to provide a full workaround for the described problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | netfilter: combine ipt_REDIRECT and ip6t_REDIRECTJan Engelhardt2012-09-213-121/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Combine more modules since the actual code is so small anyway that the kmod metadata and the module in its loaded state totally outweighs the combined actual code size. IP_NF_TARGET_REDIRECT becomes a compat option; IP6_NF_TARGET_REDIRECT is completely eliminated since it has not see a release yet. Signed-off-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | netfilter: combine ipt_NETMAP and ip6t_NETMAPJan Engelhardt2012-09-213-108/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Combine more modules since the actual code is so small anyway that the kmod metadata and the module in its loaded state totally outweighs the combined actual code size. IP_NF_TARGET_NETMAP becomes a compat option; IP6_NF_TARGET_NETMAP is completely eliminated since it has not see a release yet. Signed-off-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | tcp: TCP Fast Open Server - record retransmits after 3WHSNeal Cardwell2012-09-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When recording the number of SYNACK retransmits for servers using TCP Fast Open, fix the code to ensure that we copy over the retransmit count from the request_sock after we receive the ACK that completes the 3-way handshake. The story here is similar to that of SYNACK RTT measurements. Previously we were always doing this in tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we receive the SYN. So for TFO we must copy the final SYNACK retransmit count in tcp_rcv_state_process(). Note that copying over the SYNACK retransmit count will give us the correct count since, as is mentioned in a comment in tcp_retransmit_timer(), before we receive an ACK for our SYN-ACK a TFO passive connection does not retransmit anything else (e.g., data or FIN segments). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: TCP Fast Open Server - call tcp_validate_incoming() for all packetsNeal Cardwell2012-09-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A TCP Fast Open (TFO) passive connection must call both tcp_check_req() and tcp_validate_incoming() for all incoming ACKs that are attempting to complete the 3WHS. This is needed to parallel all the action that happens for a non-TFO connection, where for an ACK that is attempting to complete the 3WHS we call both tcp_check_req() and tcp_validate_incoming(). For example, upon receiving the ACK that completes the 3WHS, we need to call tcp_fast_parse_options() and update ts_recent based on the incoming timestamp value in the ACK. One symptom of the problem with the previous code was that for passive TFO connections using TCP timestamps, the outgoing TS ecr values ignored the incoming TS val value on the ACK that completed the 3WHS. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: TCP Fast Open Server - note timestamps and retransmits for SYNACK RTTNeal Cardwell2012-09-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, when using TCP Fast Open a server would return from tcp_check_req() before updating snt_synack based on TCP timestamp echo replies and whether or not we've retransmitted the SYNACK. The result was that (a) for TFO connections using timestamps we used an incorrect baseline SYNACK send time (tcp_time_stamp of SYNACK send instead of rcv_tsecr), and (b) for TFO connections that do not have TCP timestamps but retransmit the SYNACK we took a SYNACK RTT sample when we should not take a sample. This fix merely moves the snt_synack update logic a bit earlier in the function, so that connections using TCP Fast Open will properly do these updates when the ACK for the SYNACK arrives. Moving this snt_synack update logic means that with TCP_DEFER_ACCEPT enabled we do a few instructions of wasted work on each bare ACK, but that seems OK. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHSNeal Cardwell2012-09-222-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When taking SYNACK RTT samples for servers using TCP Fast Open, fix the code to ensure that we only call tcp_valid_rtt_meas() after we receive the ACK that completes the 3-way handshake. Previously we were always taking an RTT sample in tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we receive the SYN. So for TFO we must wait until tcp_rcv_state_process() to take the RTT sample. To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock() before we set the snt_synack timestamp, since tcp_synack_rtt_meas() already ensures that we only take a SYNACK RTT sample if snt_synack is non-zero. To be careful, we only take a snt_synack timestamp when a SYNACK transmit or retransmit succeeds. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: extract code to compute SYNACK RTTNeal Cardwell2012-09-221-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for adding another spot where we compute the SYNACK RTT, extract this code so that it can be shared. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipconfig: add nameserver IPs to kernel-parameter ip=Christoph Fritz2012-09-211-3/+36
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On small systems (e.g. embedded ones) IP addresses are often configured by bootloaders and get assigned to kernel via parameter "ip=". If set to "ip=dhcp", even nameserver entries from DHCP daemons are handled. These entries exported in /proc/net/pnp are commonly linked by /etc/resolv.conf. To configure nameservers for networks without DHCP, this patch adds option <dns0-ip> and <dns1-ip> to kernel-parameter 'ip='. Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> Tested-by: Jan Weitzel <j.weitzel@phytec.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: Document use of undefined variable.Alan Cox2012-09-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both tcp_timewait_state_process and tcp_check_req use the same basic construct of struct tcp_options received tmp_opt; tmp_opt.saw_tstamp = 0; then call tcp_parse_options However if they are fed a frame containing a TCP_SACK then tbe code behaviour is undefined because opt_rx->sack_ok is undefined data. This ought to be documented if it is intentional. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: Don't add TCP-code in inet_sock_destructChristoph Paasch2012-09-202-2/+7
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: H.K. Jerry Chu <hkchu@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: unify fragment thresh handling codeAmerigo Wang2012-09-192-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | gre: add GSO supportEric Dumazet2012-09-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | Add GSO support to GRE tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-09-151-0/+5
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netlink: Rename pid to portid to avoid confusionEric W. Biederman2012-09-109-53/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netlink: hide struct module parameter in netlink_kernel_createPablo Neira Ayuso2012-09-082-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch defines netlink_kernel_create as a wrapper function of __netlink_kernel_create to hide the struct module *me parameter (which seems to be THIS_MODULE in all existing netlink subsystems). Suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: rt_cache_flush() cleanupEric Dumazet2012-09-071-17/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We dont use jhash anymore since route cache removal, so we can get rid of get_random_bytes() calls for rt_genid changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv4/route: arg delay is useless in rt_cache_flush()Nicolas Dichtel2012-09-076-34/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | igmp: avoid drop_monitor false positivesEric Dumazet2012-09-071-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | igmp should call consume_skb() for all correctly processed packets, to avoid false dropwatch/drop_monitor false positives. Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: fix TFO regressionEric Dumazet2012-09-062-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengguang Wu reported various panics and bisected to commit 8336886f786fdac (tcp: TCP Fast Open Server - support TFO listeners) Fix this by making sure socket is a TCP socket before accessing TFO data structures. [ 233.046014] kfree_debugcheck: out of range ptr ea6000000bb8h. [ 233.047399] ------------[ cut here ]------------ [ 233.048393] kernel BUG at /c/kernel-tests/src/stable/mm/slab.c:3074! [ 233.048393] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 233.048393] Modules linked in: [ 233.048393] CPU 0 [ 233.048393] Pid: 3929, comm: trinity-watchdo Not tainted 3.6.0-rc3+ #4192 Bochs Bochs [ 233.048393] RIP: 0010:[<ffffffff81169653>] [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d [ 233.048393] RSP: 0018:ffff88000facbca8 EFLAGS: 00010092 [ 233.048393] RAX: 0000000000000031 RBX: 0000ea6000000bb8 RCX: 00000000a189a188 [ 233.048393] RDX: 000000000000a189 RSI: ffffffff8108ad32 RDI: ffffffff810d30f9 [ 233.048393] RBP: ffff88000facbcb8 R08: 0000000000000002 R09: ffffffff843846f0 [ 233.048393] R10: ffffffff810ae37c R11: 0000000000000908 R12: 0000000000000202 [ 233.048393] R13: ffffffff823dbd5a R14: ffff88000ec5bea8 R15: ffffffff8363c780 [ 233.048393] FS: 00007faa6899c700(0000) GS:ffff88001f200000(0000) knlGS:0000000000000000 [ 233.048393] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 233.048393] CR2: 00007faa6841019c CR3: 0000000012c82000 CR4: 00000000000006f0 [ 233.048393] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 233.048393] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 233.048393] Process trinity-watchdo (pid: 3929, threadinfo ffff88000faca000, task ffff88000faec600) [ 233.048393] Stack: [ 233.048393] 0000000000000000 0000ea6000000bb8 ffff88000facbce8 ffffffff8116ad81 [ 233.048393] ffff88000ff588a0 ffff88000ff58850 ffff88000ff588a0 0000000000000000 [ 233.048393] ffff88000facbd08 ffffffff823dbd5a ffffffff823dbcb0 ffff88000ff58850 [ 233.048393] Call Trace: [ 233.048393] [<ffffffff8116ad81>] kfree+0x5f/0xca [ 233.048393] [<ffffffff823dbd5a>] inet_sock_destruct+0xaa/0x13c [ 233.048393] [<ffffffff823dbcb0>] ? inet_sk_rebuild_header +0x319/0x319 [ 233.048393] [<ffffffff8231c307>] __sk_free+0x21/0x14b [ 233.048393] [<ffffffff8231c4bd>] sk_free+0x26/0x2a [ 233.048393] [<ffffffff825372db>] sctp_close+0x215/0x224 [ 233.048393] [<ffffffff810d6835>] ? lock_release+0x16f/0x1b9 [ 233.048393] [<ffffffff823daf12>] inet_release+0x7e/0x85 [ 233.048393] [<ffffffff82317d15>] sock_release+0x1f/0x77 [ 233.048393] [<ffffffff82317d94>] sock_close+0x27/0x2b [ 233.048393] [<ffffffff81173bbe>] __fput+0x101/0x20a [ 233.048393] [<ffffffff81173cd5>] ____fput+0xe/0x10 [ 233.048393] [<ffffffff810a3794>] task_work_run+0x5d/0x75 [ 233.048393] [<ffffffff8108da70>] do_exit+0x290/0x7f5 [ 233.048393] [<ffffffff82707415>] ? retint_swapgs+0x13/0x1b [ 233.048393] [<ffffffff8108e23f>] do_group_exit+0x7b/0xba [ 233.048393] [<ffffffff8108e295>] sys_exit_group+0x17/0x17 [ 233.048393] [<ffffffff8270de10>] tracesys+0xdd/0xe2 [ 233.048393] Code: 59 01 5d c3 55 48 89 e5 53 41 50 0f 1f 44 00 00 48 89 fb e8 d4 b0 f0 ff 84 c0 75 11 48 89 de 48 c7 c7 fc fa f7 82 e8 0d 0f 57 01 <0f> 0b 5f 5b 5d c3 55 48 89 e5 0f 1f 44 00 00 48 63 87 d8 00 00 [ 233.048393] RIP [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d [ 233.048393] RSP <ffff88000facbca8> Reported-by: Fengguang Wu <wfg@linux.intel.com> Tested-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "H.K. Jerry Chu" <hkchu@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: add generic netlink support for tcp_metricsJulian Anastasov2012-09-051-13/+341
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for genl "tcp_metrics". No locking is changed, only that now we can unlink and delete entries after grace period. We implement get/del for single entry and dump to support show/flush filtering in user space. Del without address attribute causes flush for all addresses, sadly under genl_mutex. v2: - remove rcu_assign_pointer as suggested by Eric Dumazet, it is not needed because there are no other writes under lock - move the flushing code in tcp_metrics_flush_all v3: - remove synchronize_rcu on flush as suggested by Eric Dumazet Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller2012-09-0330-3358/+545
| |\ \ \
| | * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso2012-09-0321-129/+760
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
| | * | | | netfilter: properly annotate ipv4_netfilter_{init,fini}()Jan Beulich2012-09-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Despite being just a few bytes of code, they should still have proper annotations. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | netfilter: nf_nat: support IPv6 in TFTP NAT helperPablo Neira Ayuso2012-08-303-56/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: nf_nat: support IPv6 in IRC NAT helperPablo Neira Ayuso2012-08-303-105/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: nf_nat: support IPv6 in SIP NAT helperPatrick McHardy2012-08-303-586/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add IPv6 support to the SIP NAT helper. There are no functional differences to IPv4 NAT, just different formats for addresses. Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: nf_nat: support IPv6 in amanda NAT helperPatrick McHardy2012-08-303-91/+0
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: nf_nat: support IPv6 in FTP NAT helperPatrick McHardy2012-08-303-143/+0
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: ip6tables: add MASQUERADE targetPatrick McHardy2012-08-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: add protocol independent NAT corePatrick McHardy2012-08-3028-2382/+518
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the IPv4 NAT implementation to a protocol independent core and address family specific modules. Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | netfilter: nf_nat: add protoff argument to packet mangling functionsPatrick McHardy2012-08-308-74/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For mangling IPv6 packets the protocol header offset needs to be known by the NAT packet mangling functions. Add a so far unused protoff argument and convert the conntrack and NAT helpers to use it in preparation of IPv6 NAT. Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | ipv4: fix path MTU discovery with connection trackingPatrick McHardy2012-08-262-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and (in case of forwarded packets) refragments them at POST_ROUTING independent of the IP_DF flag. Refragmentation uses the dst_mtu() of the local route without caring about the original fragment sizes, thereby breaking PMTUD. This patch fixes this by keeping track of the largest received fragment with IP_DF set and generates an ICMP fragmentation required error during refragmentation if that size exceeds the MTU. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net>