linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' into for-next	Tejun Heo	2009-07-04	8	-21/+66
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h
\| *	Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"	Eric W. Biederman	2009-06-30	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 73ce7b01b4496a5fbf9caf63033c874be692333f. After discovering that we don't listen to gratuitious arps in 2.6.30 I tracked the failure down to this commit. The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227. are all in agreement that an arp request with sip == 0 should be used for the probe (to prevent learning) and an arp request with sip == tip should be used for the gratitous announcement that people can learn from. It appears the author of the broken patch got those two cases confused and modified the code to drop all gratuitous arp traffic. Ouch! Cc: stable@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: Fix fib_trie rebalancing, part 3	Jarek Poplawski	2009-06-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alas current delaying of freeing old tnodes by RCU in trie_rebalance is still not enough because we can free a top tnode before updating a t->trie pointer. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: Do not tack on TSO data to non-TSO packet	Herbert Xu	2009-06-29	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a socket starts out on a non-TSO route, and then switches to a TSO route, then we will tack on data to the tail of the tx queue even if it started out life as non-TSO. This is suboptimal because all of it will then be copied and checksummed unnecessarily. This patch fixes this by ensuring that skb->ip_summed is set to CHECKSUM_PARTIAL before appending extra data beyond the MSS. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: Stop non-TSO packets morphing into TSO	Herbert Xu	2009-06-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a socket starts out on a non-TSO route, and then switches to a TSO route, then the tail on the tx queue can morph into a TSO packet, causing mischief because the rest of the stack does not expect a partially linear TSO packet. This patch fixes this by ensuring that skb->ip_summed is set to CHECKSUM_PARTIAL before declaring a packet as TSO. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'master' of ↵	David S. Miller	2009-06-29	1	-6/+11
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
\| \| *	netfilter: tcp conntrack: fix unacknowledged data detection with NAT	Patrick McHardy	2009-06-29	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When NAT helpers change the TCP packet size, the highest seen sequence number needs to be corrected. This is currently only done upwards, when the packet size is reduced the sequence number is unchanged. This causes TCP conntrack to falsely detect unacknowledged data and decrease the timeout. Fix by updating the highest seen sequence number in both directions after packet mangling. Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	inet: Call skb_orphan before tproxy activates	Herbert Xu	2009-06-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As transparent proxying looks up the socket early and assigns it to the skb for later processing, we must drop any existing socket ownership prior to that in order to distinguish between the case where tproxy is active and where it is not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: missing check ACK flag of received segment in FIN-WAIT-2 state	Wei Yongjun	2009-06-25	1	-1/+2
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop the segment and return[Page 72]. But this check is missing in function tcp_timewait_state_process(). This cause the segment with FIN flag but no ACK has two diffent action: Case 1: Node A Node B <------------- FIN,ACK (enter FIN-WAIT-1) ACK -------------> (enter FIN-WAIT-2) FIN -------------> discard (move sk to tw list) Case 2: Node A Node B <------------- FIN,ACK (enter FIN-WAIT-1) ACK -------------> (enter FIN-WAIT-2) (move sk to tw list) FIN -------------> <------------- ACK This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	Linus Torvalds	2009-06-24	1	-3/+23
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: bnx2: Fix the behavior of ethtool when ONBOOT=no qla3xxx: Don't sleep while holding lock. qla3xxx: Give the PHY time to come out of reset. ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off net: Move rx skb_orphan call to where needed ipv6: Use correct data types for ICMPv6 type and code net: let KS8842 driver depend on HAS_IOMEM can: let SJA1000 driver depend on HAS_IOMEM netxen: fix firmware init handshake netxen: fix build with without CONFIG_PM netfilter: xt_rateest: fix comparison with self netfilter: xt_quota: fix incomplete initialization netfilter: nf_log: fix direct userspace memory access in proc handler netfilter: fix some sparse endianess warnings netfilter: nf_conntrack: fix conntrack lookup race netfilter: nf_conntrack: fix confirmation race condition netfilter: nf_conntrack: death_by_timeout() fix
\| \| *	ipv4 routing: Ensure that route cache entries are usable and reclaimable ↵	Neil Horman	2009-06-23	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with caching is off When route caching is disabled (rt_caching returns false), We still use route cache entries that are created and passed into rt_intern_hash once. These routes need to be made usable for the one call path that holds a reference to them, and they need to be reclaimed when they're finished with their use. To be made usable, they need to be associated with a neighbor table entry (which they currently are not), otherwise iproute_finish2 just discards the packet, since we don't know which L2 peer to send the packet to. To do this binding, we need to follow the path a bit higher up in rt_intern_hash, which calls arp_bind_neighbour, but not assign the route entry to the hash table. Currently, if caching is off, we simply assign the route to the rp pointer and are reutrn success. This patch associates us with a neighbor entry first. Secondly, we need to make sure that any single use routes like this are known to the garbage collector when caching is off. If caching is off, and we try to hash in a route, it will leak when its refcount reaches zero. To avoid this, this patch calls rt_free on the route cache entry passed into rt_intern_hash. This places us on the gc list for the route cache garbage collector, so that when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this suggestion). I've tested this on a local system here, and with these patches in place, I'm able to maintain routed connectivity to remote systems, even if I set /proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to return false. Signed-off-by: Neil Horman <nhorman@redhat.com> Reported-by: Jarek Poplawski <jarkao2@gmail.com> Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	Linus Torvalds	2009-06-22	1	-2/+12
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (43 commits) via-velocity: Fix velocity driver unmapping incorrect size. mlx4_en: Remove redundant refill code on RX mlx4_en: Removed redundant check on lso header size mlx4_en: Cancel port_up check in transmit function mlx4_en: using stop/start_all_queues mlx4_en: Removed redundant skb->len check mlx4_en: Counting all the dropped packets on the TX side usbnet cdc_subset: fix issues talking to PXA gadgets Net: qla3xxx, remove sleeping in atomic ipv4: fix NULL pointer + success return in route lookup path isdn: clean up documentation index cfg80211: validate station settings cfg80211: allow setting station parameters in mesh cfg80211: allow adding/deleting stations on mesh ath5k: fix beacon_int handling MAINTAINERS: Fix Atheros pattern paths ath9k: restore PS mode, before we put the chip into FULL SLEEP state. ath9k: wait for beacon frame along with CAB acer-wmi: fix rfkill conversion ath5k: avoid PCI FATAL interrupts by restoring RETRY_TIMEOUT disabling ...
\| \| *	ipv4: fix NULL pointer + success return in route lookup path	Neil Horman	2009-06-20	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't drop route if we're not caching I recently got a report of an oops on a route lookup. Maxime was testing what would happen if route caching was turned off (doing so by setting making rt_caching always return 0), and found that it triggered an oops. I looked at it and found that the problem stemmed from the fact that the route lookup routines were returning success from their lookup paths (which is good), but never set the *rp pointer to anything (which is bad). This happens because in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0. This almost emulates slient success. What we should be doing is assigning rp = rt and _not_ dropping the route. This way, during slow path lookups, when we create a new route cache entry, we don't immediately discard it, rather we just don't add it into the cache hash table, but we let this one lookup use it for the purpose of this route request. Maxime has tested and reports it prevents the oops. There is still a subsequent routing issue that I'm looking into further, but I'm confident that, even if its related to this same path, this patch makes sense to take. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	percpu: clean up percpu variable definitions	Tejun Heo	2009-06-24	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>
* \| \|	percpu: cleanup percpu array definitions	Tejun Heo	2009-06-24	1	-1/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the following three different ways to define percpu arrays are in use. 1. DEFINE_PER_CPU(elem_type[array_len], array_name); 2. DEFINE_PER_CPU(elem_type, array_name[array_len]); 3. DEFINE_PER_CPU(elem_type, array_name)[array_len]; Unify to #1 which correctly separates the roles of the two parameters and thus allows more flexibility in the way percpu variables are defined. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm@kvack.org Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	Linus Torvalds	2009-06-18	4	-22/+20
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits) netxen: fix tx ring accounting netxen: fix detection of cut-thru firmware mode forcedeth: fix dma api mismatches atm: sk_wmem_alloc initial value is one net: correct off-by-one write allocations reports via-velocity : fix no link detection on boot Net / e100: Fix suspend of devices that cannot be power managed TI DaVinci EMAC : Fix rmmod error net: group address list and its count ipv4: Fix fib_trie rebalancing, part 2 pkt_sched: Update drops stats in act_police sky2: version 1.23 sky2: add GRO support sky2: skb recycling sky2: reduce default transmit ring sky2: receive counter update sky2: fix shutdown synchronization sky2: PCI irq issues sky2: more receive shutdown sky2: turn off pause during shutdown ... Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
\| *	net: correct off-by-one write allocations reports	Eric Dumazet	2009-06-18	3	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: Fix fib_trie rebalancing, part 2	Jarek Poplawski	2009-06-18	1	-14/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My previous patch, which explicitly delays freeing of tnodes by adding them to the list to flush them after the update is finished, isn't strict enough. It treats exceptionally tnodes without parent, assuming they are newly created, so "invisible" for the read side yet. But the top tnode doesn't have parent as well, so we have to exclude all exceptions (at least until a better way is found). Additionally we need to move rcu assignment of this node before flushing, so the return type of the trie_rebalance() function is changed. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'for-linus2' of ↵	Linus Torvalds	2009-06-16	1	-0/+3
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...
\| *	net: annotate inet_timewait_sock bitfields	Vegard Nossum	2009-06-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The use of bitfields here would lead to false positive warnings with kmemcheck. Silence them. (Additionally, one erroneous comment related to the bitfield was also fixed.) Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
* \|	ipv4: Fix fib_trie rebalancing	Jarek Poplawski	2009-06-15	1	-10/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While doing trie_rebalance(): resize(), inflate(), halve() RCU free tnodes before updating their parents. It depends on RCU delaying the real destruction, but if RCU readers start after call_rcu() and before parent update they could access freed memory. It is currently prevented with preempt_disable() on the update side, but it's not safe, except maybe classic RCU, plus it conflicts with memory allocations with GFP_KERNEL flag used from these functions. This patch explicitly delays freeing of tnodes by adding them to the list, which is flushed after the update is finished. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	PIM-SM: namespace changes	Tom Goff	2009-06-14	1	-12/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IPv4: - make PIM register vifs netns local - set the netns when a PIM register vif is created - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2) by adding the protocol handler when multicast routing is initialized IPv6: - make PIM register vifs netns local - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2) by adding the protocol handler when multicast routing is initialized Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: update ARPD help text	Timo Teräs	2009-06-13	1	-22/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed the statements about ARP cache size as this config option does not affect it. The cache size is controlled by neigh_table gc thresholds. Remove also expiremental and obsolete markings as the API originally intended for arp caching is useful for implementing ARP-like protocols (e.g. NHRP) in user space and has been there for a long enough time. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: use a deferred timer in rt_check_expire	Eric Dumazet	2009-06-13	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the sake of power saver lovers, use a deferrable timer to fire rt_check_expire() As some big routers cache equilibrium depends on garbage collection done in time, we take into account elapsed time between two rt_check_expire() invocations to adjust the amount of slots we have to check. Based on an initial idea and patch from Tero Kristo Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	netfilter: ip_tables: fix build error	Patrick McHardy	2009-06-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix build error introduced by commit bb70dfa5 (netfilter: xtables: consolidate comefrom debug cast access): net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table': net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function) net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.) Signed-off-by: Patrick McHardy <kaber@trash.net>
* \|	Merge branch 'master' of ↵	Patrick McHardy	2009-06-11	30	-223/+287
\|\ \ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
\| * \|	net: No more expensive sock_hold()/sock_put() on each tx	Eric Dumazet	2009-06-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netfilter: Fix extra semi-colon in skb_walk_frags() changes.	David S. Miller	2009-06-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Noticed by Jesper Dangaard Brouer Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netfilter: Use frag list abstraction interfaces.	David S. Miller	2009-06-09	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv4: Use frag list abstraction interfaces.	David S. Miller	2009-06-09	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv4: remove ip_mc_drop_socket() declaration from af_inet.c.	Rami Rosen	2009-06-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip_mc_drop_socket() method is declared in linux/igmp.h, which is included anyhow in af_inet.c. So there is no need for this declaration. This patch removes it from af_inet.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: skb->dst accessors	Eric Dumazet	2009-06-03	22	-92/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define three accessors to get/set dst attached to a skb struct dst_entry skb_dst(const struct sk_buff skb) void skb_dst_set(struct sk_buff skb, struct dst_entry dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: skb->rtable accessor	Eric Dumazet	2009-06-03	15	-50/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	Merge branch 'master' of ↵	David S. Miller	2009-06-03	1	-2/+9
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/forcedeth.c
\| \| *	tcp: tcp_vegas ssthresh bugfix	Doug Leith	2009-05-25	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv4: New multicast-all socket option	Nivedita Singhvi	2009-06-02	3	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After some discussion offline with Christoph Lameter and David Stevens regarding multicast behaviour in Linux, I'm submitting a slightly modified patch from the one Christoph submitted earlier. This patch provides a new socket option IP_MULTICAST_ALL. In this case, default behaviour is _unchanged_ from the current Linux standard. The socket option is set by default to provide original behaviour. Sockets wishing to receive data only from multicast groups they join explicitly will need to clear this socket option. Signed-off-by: Nivedita Singhvi <niv@us.ibm.com> Signed-off-by: Christoph Lameter<cl@linux.com> Acked-by: David Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: ipv4/ip_sockglue.c cleanups	Eric Dumazet	2009-06-02	1	-37/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pure cleanups Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: fix loop in ofo handling code and reduce its complexity	Ilpo Järvinen	2009-05-29	1	-17/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Somewhat luckily, I was looking into these parts with very fine comb because I've made somewhat similar changes on the same area (conflicts that arose weren't that lucky though). The loop was very much overengineered recently in commit 915219441d566 (tcp: Use SKB queue and list helpers instead of doing it by-hand), while it basically just wants to know if there are skbs after 'skb'. Also it got broken because skb1 = skb->next got translated into skb1 = skb1->next (though abstracted) improperly. Note that 'skb1' is pointing to previous sk_buff than skb or NULL if at head. Two things went wrong: - We'll kfree 'skb' on the first iteration instead of the skbuff following 'skb' (it would require required SACK reneging to recover I think). - The list head case where 'skb1' is NULL is checked too early and the loop won't execute whereas it previously did. Conclusion, mostly revert the recent changes which makes the cset very messy looking but using proper accessor in the previous-like version. The effective changes against the original can be viewed with: git-diff 915219441d566f1da0caa0e262be49b666159e17^ \ net/ipv4/tcp_input.c \| sed -n -e '57,70 p' Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: unset IFF_XMIT_DST_RELEASE in ipgre_tunnel_setup()	Eric Dumazet	2009-05-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit() to no release it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: unset IFF_XMIT_DST_RELEASE in ipip_tunnel_setup()	Eric Dumazet	2009-05-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipip_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit() to no release it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: Use SKB queue and list helpers instead of doing it by-hand.	David S. Miller	2009-05-28	2	-45/+90
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	netfilter: nf_ct_icmp: keep the ICMP ct entries longer	Jan Kasprzak	2009-06-08	1	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current conntrack code kills the ICMP conntrack entry as soon as the first reply is received. This is incorrect, as we then see only the first ICMP echo reply out of several possible duplicates as ESTABLISHED, while the rest will be INVALID. Also this unnecessarily increases the conntrackd traffic on H-A firewalls. Make all the ICMP conntrack entries (including the replied ones) last for the default of nf_conntrack_icmp{,v6}_timeout seconds. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \| \|	netfilter: ipt_MASQUERADE: remove redundant rwlock	Florian Westphal	2009-06-05	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lock "protects" an assignment and a comparision of an integer. When the caller of device_cmp() evaluates the result, nat->masq_index may already have been changed (regardless if the lock is there or not). So, the lock either has to be held during nf_ct_iterate_cleanup(), or can be removed. This does the latter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \| \|	netfilter: x_tables: added hook number into match extension parameter structure.	Evgeniy Polyakov	2009-06-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \| \|	netfilter: conntrack: simplify event caching system	Pablo Neira Ayuso	2009-06-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch simplifies the conntrack event caching system by removing several events: * IPCT_[]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* \| \|	Merge branch 'master' of git://dev.medozas.de/linux	Patrick McHardy	2009-06-02	3	-138/+143
\|\ \ \ \| \|/ / \|/\| \|
\| * \|	netfilter: xtables: consolidate comefrom debug cast access	Jan Engelhardt	2009-05-08	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
\| * \|	netfilter: xtables: remove another level of indent	Jan Engelhardt	2009-05-08	2	-43/+42
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
\| * \|	netfilter: xtables: remove some goto	Jan Engelhardt	2009-05-08	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Combining two ifs, and goto is easily gone. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
\| * \|	netfilter: xtables: reduce indent level by one	Jan Engelhardt	2009-05-08	2	-121/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cosmetic only. Transformation applied: -if (foo) { long block; } else { short block; } +if (!foo) { short block; continue; } long block; Signed-off-by: Jan Engelhardt <jengelh@medozas.de>