linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	icmp: Fix regression in nexthop resolution during replies.	David S. Miller	2011-07-22	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \|	icmp_route_lookup() uses the wrong flow parameters if the reverse session route lookup isn't used. So do not commit to the re-decoded flow until we actually make a final decision to use a real route saved in 'rt2'. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Constrain UFO fragment sizes to multiples of 8 bytes	Bill Sommerfeld	2011-07-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Because the ip fragment offset field counts 8-byte chunks, ip fragments other than the last must contain a multiple of 8 bytes of payload. ip_ufo_append_data wasn't respecting this constraint and, depending on the MTU and ip option sizes, could create malformed non-final fragments. Google-Bug-Id: 5009328 Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: make fragment identifications less predictable	Eric Dumazet	2011-07-21	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IPv6 fragment identification generation is way beyond what we use for IPv4 : It uses a single generator. Its not scalable and allows DOS attacks. Now inetpeer is IPv6 aware, we can use it to provide a more secure and scalable frag ident generator (per destination, instead of system wide) This patch : 1) defines a new secure_ipv6_id() helper 2) extends inet_getid() to provide 32bit results 3) extends ipv6_select_ident() with a new dest parameter Reported-by: Fernando Gont <fernando@gont.com.ar> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	lro: do vlan cleanup	Jiri Pirko	2011-07-21	1	-28/+11
\| \| \| \| \| \| \|	- remove useless vlan parameters and pointers Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	lro: kill lro_vlan_hwaccel_receive_frags	Jiri Pirko	2011-07-21	1	-20/+0
\| \| \| \| \|	Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	lro: kill lro_vlan_hwaccel_receive_skb	Jiri Pirko	2011-07-21	1	-15/+0
\| \| \| \| \| \| \|	no longer used Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: save cpu cycles from check_leaf()	Eric Dumazet	2011-07-18	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Compiler is not smart enough to avoid double BSWAP instructions in ntohl(inet_make_mask(plen)). Lets cache this value in struct leaf_info, (fill a hole on 64bit arches) With route cache disabled, this saves ~2% of cpu in udpflood bench on x86_64 machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Add ->neigh_lookup() operation to dst_ops	David S. Miller	2011-07-18	1	-7/+19
\| \| \| \| \| \| \| \|	In the future dst entries will be neigh-less. In that environment we need to have an easy transition point for current users of dst->neighbour outside of the packet output fast path. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Abstract dst->neighbour accesses behind helpers.	David S. Miller	2011-07-17	3	-14/+15
\| \| \| \| \| \|	dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
*	neigh: Pass neighbour entry to output ops.	David S. Miller	2011-07-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This will get us closer to being able to do "neigh stuff" completely independent of the underlying dst_entry for protocols (ipv4/ipv6) that wish to do so. We will also be able to make dst entries neigh-less. Signed-off-by: David S. Miller <davem@davemloft.net>
*	neigh: Kill ndisc_ops->queue_xmit	David S. Miller	2011-07-16	1	-5/+1
\| \| \| \| \| \|	It is always dev_queue_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	neigh: Kill hh_cache->hh_output	David S. Miller	2011-07-16	1	-3/+3
\| \| \| \| \| \| \| \|	It's just taking on one of two possible values, either neigh_ops->output or dev_queue_xmit(). And this is purely depending upon whether nud_state has NUD_CONNECTED set or not. Signed-off-by: David S. Miller <davem@davemloft.net>
*	neigh: Kill neigh_ops->hh_output	David S. Miller	2011-07-16	1	-4/+0
\| \| \| \| \| \|	It's always dev_queue_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Create and use new helper, neigh_output().	David S. Miller	2011-07-16	1	-7/+3
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Use calculated 'neigh' instead of re-evaluating dst->neighbour	David S. Miller	2011-07-16	1	-1/+1
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2011-07-14	2	-16/+4
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
\| *	net: refine {udp\|tcp\|sctp}_mem limits	Eric Dumazet	2011-07-07	2	-16/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current tcp/udp/sctp global memory limits are not taking into account hugepages allocations, and allow 50% of ram to be used by buffers of a single protocol [ not counting space used by sockets / inodes ...] Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram per protocol, and a minimum of 128 pages. Heavy duty machines sysadmins probably need to tweak limits anyway. References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032 Reported-by: starlight <starlight@binnacle.cx> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Embed hh_cache inside of struct neighbour.	David S. Miller	2011-07-14	2	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that there is a one-to-one correspondance between neighbour and hh_cache entries, we no longer need: 1) dynamic allocation 2) attachment to dst->hh 3) refcounting Initialization of the hh_cache entry is indicated by hh_len being non-zero, and such initialization is always done with the neighbour's lock held as a writer. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: Inline neigh binding.	David Miller	2011-07-13	2	-27/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of all of the useless and costly indirection by doing the neigh hash table lookup directly inside of the neighbour binding. Rename from arp_bind_neighbour to rt_bind_neighbour. Use new helpers {__,}ipv4_neigh_lookup() In rt_bind_neighbour() get rid of useless tests which are never true in the context this function is called, namely dev is never NULL and the dst->neighbour is always NULL. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	inetpeer: kill inet_putpeer race	Eric Dumazet	2011-07-11	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently can free inetpeer entries too early : [ 782.636674] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f130f44c) [ 782.636677] 1f7b13c100000000000000000000000002000000000000000000000000000000 [ 782.636686] i i i i u u u u i i i i u u u u i i i i u u u u u u u u u u u u [ 782.636694] ^ [ 782.636696] [ 782.636698] Pid: 4638, comm: ssh Not tainted 3.0.0-rc5+ #270 Hewlett-Packard HP Compaq 6005 Pro SFF PC/3047h [ 782.636702] EIP: 0060:[<c13fefbb>] EFLAGS: 00010286 CPU: 0 [ 782.636707] EIP is at inet_getpeer+0x25b/0x5a0 [ 782.636709] EAX: 00000002 EBX: 00010080 ECX: f130f3c0 EDX: f0209d30 [ 782.636711] ESI: 0000bc87 EDI: 0000ea60 EBP: f0209ddc ESP: c173134c [ 782.636712] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 782.636714] CR0: 8005003b CR2: f0beca80 CR3: 30246000 CR4: 000006d0 [ 782.636716] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 782.636717] DR6: ffff4ff0 DR7: 00000400 [ 782.636718] [<c13fbf76>] rt_set_nexthop.clone.45+0x56/0x220 [ 782.636722] [<c13fc449>] __ip_route_output_key+0x309/0x860 [ 782.636724] [<c141dc54>] tcp_v4_connect+0x124/0x450 [ 782.636728] [<c142ce43>] inet_stream_connect+0xa3/0x270 [ 782.636731] [<c13a8da1>] sys_connect+0xa1/0xb0 [ 782.636733] [<c13a99dd>] sys_socketcall+0x25d/0x2a0 [ 782.636736] [<c149deb8>] sysenter_do_call+0x12/0x28 [ 782.636738] [<ffffffff>] 0xffffffff Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: Use universal hash for ARP.	David S. Miller	2011-07-11	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	We need to make sure the multiplier is odd. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of ↵	David S. Miller	2011-07-05	6	-61/+46
\|\\| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
\| *	net: bind() fix error return on wrong address family	Marcus Meissner	2011-07-04	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi, Reinhard Max also pointed out that the error should EAFNOSUPPORT according to POSIX. The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN. Other protocols error values in their af bind() methods in current mainline git as far as a brief look shows: EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25, No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip Ciao, Marcus Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	xfrm4: Don't call icmp_send on local error	Steffen Klassert	2011-07-01	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling icmp_send() on a local message size error leads to an incorrect update of the path mtu. So use ip_local_error() instead to notify the socket about the error. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: Don't use ufo handling on later transformed packets	Steffen Klassert	2011-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We might call ip_ufo_append_data() for packets that will be IPsec transformed later. This function should be used just for real udp packets. So we check for rt->dst.header_len which is only nonzero on IPsec handling and call ip_ufo_append_data() just if rt->dst.header_len is zero. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netfilter: Fix ip_route_me_harder triggering ip_rt_bug	Julian Anastasov	2011-06-29	2	-48/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid creating input routes with ip_route_me_harder. It does not work for locally generated packets. Instead, restrict sockets to provide valid saddr for output route (or unicast saddr for transparent proxy). For other traffic allow saddr to be unicast or local but if callers forget to check saddr type use 0 for the output route. The resulting handling should be: - REJECT TCP: - in INPUT we can provide addr_type = RTN_LOCAL but better allow rejecting traffic delivered with local route (no IP address => use RTN_UNSPEC to allow also RTN_UNICAST). - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST - OUTPUT: RTN_UNSPEC - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT - IPVS: - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT to restrict saddr to be local Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: Fix IPsec slowpath fragmentation problem	Steffen Klassert	2011-06-27	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path). On IPsec the effective mtu is lower because we need to add the protocol headers and trailers later when we do the IPsec transformations. So after the IPsec transformations the packet might be too big, which leads to a slowpath fragmentation then. This patch fixes this by building the packets based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr handling to this. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: Fix packet size calculation in __ip_append_data	Steffen Klassert	2011-06-27	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Git commit 59104f06 (ip: take care of last fragment in ip_append_data) added a check to see if we exceed the mtu when we add trailer_len. However, the mtu is already subtracted by the trailer length when the xfrm transfomation bundles are set up. So IPsec packets with mtu size get fragmented, or if the DF bit is set the packets will not be send even though they match the mtu perfectly fine. This patch actually reverts commit 59104f06. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet	Xufeng Zhang	2011-06-21	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider this scenario: When the size of the first received udp packet is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags. However, if checksum error happens and this is a blocking socket, it will goto try_again loop to receive the next packet. But if the size of the next udp packet is smaller than receive buffer, MSG_TRUNC flag should not be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before receive the next packet, MSG_TRUNC is still set, which is wrong. Fix this problem by clearing MSG_TRUNC flag when starting over for a new packet. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: Add ip_defrag() agent IP_DEFRAG_AF_PACKET.	David S. Miller	2011-07-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Elide the ICMP on frag queue timeouts unconditionally for this user. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: Reduce switch/case indent	Joe Perches	2011-07-01	2	-32/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the case labels the same indent as the switch. git diff -w shows no difference. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	netfilter: Reduce switch/case indent	Joe Perches	2011-07-01	2	-119/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the case labels the same indent as the switch. git diff -w shows miscellaneous 80 column wrapping, comment reflowing and a comment for a useless gcc warning for an otherwise unused default: case. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipconfig: Reduce switch/case indent	Joe Perches	2011-07-01	1	-35/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the case labels the same indent as the switch. git diff -w shows miscellaneous 80 column wrapping. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ip: introduce ip_is_fragment helper inline function	Paul Gortmaker	2011-06-21	6	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are enough instances of this: iph->frag_off & htons(IP_MF \| IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	udp: add tracepoints for queueing skb to rcvbuf	Satoru Moriya	2011-06-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a tracepoint to __udp_queue_rcv_skb to get the return value of ip_queue_rcv_skb. It indicates why kernel drops a packet at this point. ip_queue_rcv_skb returns following values in the packet drop case: rcvbuf is full : -ENOMEM sk_filter returns error : -EINVAL, -EACCESS, -ENOMEM, etc. __sk_mem_schedule returns error: -ENOBUF Signed-off-by: Satoru Moriya <satoru.moriya@hds.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Remove redundant linux/version.h includes from net/	Jesper Juhl	2011-06-21	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was suggested by "make versioncheck" that the follwing includes of linux/version.h are redundant: /home/jj/src/linux-2.6/net/caif/caif_dev.c: 14 linux/version.h not needed. /home/jj/src/linux-2.6/net/caif/chnl_net.c: 10 linux/version.h not needed. /home/jj/src/linux-2.6/net/ipv4/gre.c: 19 linux/version.h not needed. /home/jj/src/linux-2.6/net/netfilter/ipset/ip_set_core.c: 20 linux/version.h not needed. /home/jj/src/linux-2.6/net/netfilter/xt_set.c: 16 linux/version.h not needed. and it seems that it is right. Beyond manually inspecting the source files I also did a few build tests with various configs to confirm that including the header in those files is indeed not needed. Here's a patch to remove the pointless includes. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of ↵	David S. Miller	2011-06-20	17	-68/+78
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c
\| *	ipv4, ping: Remove duplicate icmp.h include	Jesper Juhl	2011-06-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the duplicate inclusion of net/icmp.h from net/ipv4/ping.c Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fix multicast losses	Eric Dumazet	2011-06-18	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Knut Tidemann found that first packet of a multicast flow was not correctly received, and bisected the regression to commit b23dd4fe42b4 (Make output route lookup return rtable directly.) Special thanks to Knut, who provided a very nice bug report, including sample programs to demonstrate the bug. Reported-and-bisectedby: Knut Tidemann <knut.andre.tidemann@jotron.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	inet_diag: fix inet_diag_bc_audit()	Eric Dumazet	2011-06-17	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A malicious user or buggy application can inject code and trigger an infinite loop in inet_diag_bc_audit() Also make sure each instruction is aligned on 4 bytes boundary, to avoid unaligned accesses. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: rfs: enable RFS before first data packet is received	Eric Dumazet	2011-06-17	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit : > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Fri, 17 Jun 2011 00:50:46 +0100 > > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote: > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock sk, struct sk_buff skb) > >> goto discard; > >> > >> if (nsk != sk) { > >> + sock_rps_save_rxhash(nsk, skb->rxhash); > >> if (tcp_child_process(sk, nsk, skb)) { > >> rsk = nsk; > >> goto reset; > >> > > > > I haven't tried this, but it looks reasonable to me. > > > > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar. > > Indeed ipv6 side needs the same fix. > > Eric please add that part and resubmit. And in fact I might stick > this into net-2.6 instead of net-next-2.6 > OK, here is the net-2.6 based one then, thanks ! [PATCH v2] net: rfs: enable RFS before first data packet is received First packet received on a passive tcp flow is not correctly RFS steered. One sock_rps_record_flow() call is missing in inet_accept() But before that, we also must record rxhash when child socket is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
\| *	Merge branch 'master' of ↵	David S. Miller	2011-06-16	4	-7/+9
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
\| \| *	netfilter: nf_nat: avoid double seq_adjust for loopback	Julian Anastasov	2011-06-16	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid double seq adjustment for loopback traffic because it causes silent repetition of TCP data. One example is passive FTP with DNAT rule and difference in the length of IP addresses. This patch adds check if packet is sent and received via loopback device. As the same conntrack is used both for outgoing and incoming direction, we restrict seq adjustment to happen only in POSTROUTING. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| \| *	netfilter: fix looped (broad\|multi)cast's MAC handling	Nicolas Cavallari	2011-06-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, when broadcast or multicast packet are sent from a local application, they are sent to the interface then looped by the kernel to other local applications, going throught netfilter hooks in the process. These looped packet have their MAC header removed from the skb by the kernel looping code. This confuse various netfilter's netlink queue, netlink log and the legacy ip_queue, because they try to extract a hardware address from these packets, but extracts a part of the IP header instead. This patch prevent NFQUEUE, NFLOG and ip_QUEUE to include a MAC header if there is none in the packet. Signed-off-by: Nicolas Cavallari <cavallar@lri.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| \| *	netfilter: ipt_ecn: fix inversion for IP header ECN match	Patrick McHardy	2011-06-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace allows to specify inversion for IP header ECN matches, the kernel silently accepts it, but doesn't invert the match result. Signed-off-by: Patrick McHardy <kaber@trash.net>
\| \| *	netfilter: ipt_ecn: fix protocol check in ecn_mt_check()	Patrick McHardy	2011-06-16	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check for protocol inversion in ecn_mt_check() and remove the unnecessary runtime check for IPPROTO_TCP in ecn_mt(). Signed-off-by: Patrick McHardy <kaber@trash.net>
\| \| *	netfilter: ip_tables: fix compile with debug	Sebastian Andrzej Siewior	2011-06-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	ipv4: Fix packet size calculation for raw IPsec packets in __ip_append_data	Steffen Klassert	2011-06-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We assume that transhdrlen is positive on the first fragment which is wrong for raw packets. So we don't add exthdrlen to the packet size for raw packets. This leads to a reallocation on IPsec because we have not enough headroom on the skb to place the IPsec headers. This patch fixes this by adding exthdrlen to the packet size whenever the send queue of the socket is empty. This issue was introduced with git commit 1470ddf7 (inet: Remove explicit write references to sk/inet in ip_append_data) Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: pmtu_expires fixes	Eric Dumazet	2011-06-09	1	-34/+44
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 2c8cec5c10bc (ipv4: Cache learned PMTU information in inetpeer) added some racy peer->pmtu_expires accesses. As its value can be changed by another cpu/thread, we should be more careful, reading its value once. Add peer_pmtu_expired() and peer_pmtu_cleaned() helpers Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netfilter: use unsigned variables for packet lengths in ip[6]_queue.	Dave Jones	2011-06-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Netlink message lengths can't be negative, so use unsigned variables. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>