linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	af_netlink: Add needed scm_destroy after scm_send.	Eric W. Biederman	2010-06-16	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \|	scm_send occasionally allocates state in the scm_cookie, so I have modified netlink_sendmsg to guarantee that when scm_send succeeds scm_destory will be called to free that state. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	af_unix: Allow SO_PEERCRED to work across namespaces.	Eric W. Biederman	2010-06-16	3	-16/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use struct pid and struct cred to store the peer credentials on struct sock. This gives enough information to convert the peer credential information to a value relative to whatever namespace the socket is in at the time. This removes nasty surprises when using SO_PEERCRED on socket connetions where the processes on either side are in different pid and user namespaces. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock: Introduce cred_to_ucred	Eric W. Biederman	2010-06-16	2	-0/+19
\| \| \| \| \| \| \| \| \| \|	To keep the coming code clear and to allow both the sock code and the scm code to share the logic introduce a fuction to translate from struct cred to struct ucred. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	user_ns: Introduce user_nsmap_uid and user_ns_map_gid.	Eric W. Biederman	2010-06-16	2	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define what happens when a we view a uid from one user_namespace in another user_namepece. - If the user namespaces are the same no mapping is necessary. - For most cases of difference use overflowuid and overflowgid, the uid and gid currently used for 16bit apis when we have a 32bit uid that does fit in 16bits. Effectively the situation is the same, we want to return a uid or gid that is not assigned to any user. - For the case when we happen to be mapping the uid or gid of the creator of the target user namespace use uid 0 and gid as confusing that user with root is not a problem. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	scm: Reorder scm_cookie.	Eric W. Biederman	2010-06-16	1	-1/+1
\| \| \| \| \| \| \| \|	Reorder the fields in scm_cookie so they pack better on 64bit. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: Bumped up version number	Anirban Chakraborty	2010-06-16	1	-2/+2
\| \| \| \| \| \| \|	Changed the driver version number to 5.0.4 Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qlcnic: Fix a bug in setting up NIC partitioning mode	Anirban Chakraborty	2010-06-16	3	-52/+37
\| \| \| \| \| \| \| \| \|	The driver was not detecting the presence of NIC partitioning capability of the firmware properly. Now, it checks the eswitch set bit in the FW capabilities register and accordingly sets the driver mode as NPAR capable or not. Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	syncookies: check decoded options against sysctl settings	Florian Westphal	2010-06-16	3	-9/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Discard the ACK if we find options that do not match current sysctl settings. Previously it was possible to create a connection with sack, wscale, etc. enabled even if the feature was disabled via sysctl. Also remove an unneeded call to tcp_sack_reset() in cookie_check_timestamp: Both call sites (cookie_v4_check, cookie_v6_check) zero "struct tcp_options_received", hand it to tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack) and then call cookie_check_timestamp(). Even if num_sacks/dsacks were changed, the structure is allocated on the stack and after cookie_check_timestamp returns only a few selected members are copied to the inet_request_sock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	inetpeer: restore small inet_peer structures	Eric Dumazet	2010-06-16	4	-11/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches. Thats a bit unfortunate, since old size was exactly 64 bytes. This can be solved, using an union between this rcu_head an four fields, that are normally used only when a refcount is taken on inet_peer. rcu_head is used only when refcnt=-1, right before structure freeing. Add a inet_peer_refcheck() function to check this assertion for a while. We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	gadget/rndis: dev_get_stats() now returns rtnl_link_stats64.	David S. Miller	2010-06-15	1	-1/+1
\| \| \| \| \| \|	Based upon a report by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
*	inetpeer: do not use zero refcnt for freed entries	Eric Dumazet	2010-06-15	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Followup of commit aa1039e73cc2 (inetpeer: RCU conversion) Unused inet_peer entries have a null refcnt. Using atomic_inc_not_zero() in rcu lookups is not going to work for them, and slow path is taken. Fix this using -1 marker instead of 0 for deleted entries. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: Use correct primitives for RCU dereferencing	Herbert Xu	2010-06-15	1	-2/+2
\| \| \| \| \| \| \| \| \|	Now that RCU debugging checks for matching rcu_dereference calls and rcu_read_lock, we need to use the correct primitives or face nasty warnings. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: Add const to dummy br_netpoll_send_skb	Herbert Xu	2010-06-15	1	-1/+1
\| \| \| \| \| \| \| \|	The version of br_netpoll_send_skb used when netpoll is off is missing a const thus causing a warning. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: NET_SKB_PAD should depend on L1_CACHE_BYTES	Eric Dumazet	2010-06-15	3	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In old kernels, NET_SKB_PAD was defined to 16. Then commit d6301d3dd1c2 (net: Increase default NET_SKB_PAD to 32), and commit 18e8c134f4e9 (net: Increase NET_SKB_PAD to 64 bytes) increased it to 64. While first patch was governed by network stack needs, second was more driven by performance issues on current hardware. Real intent was to align data on a cache line boundary. So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic. Remove microblaze and powerpc own NET_SKB_PAD definitions. Thanks to Alexander Duyck and David Miller for their comments. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipfrag : frag_kfree_skb() cleanup	Eric Dumazet	2010-06-15	2	-11/+5
\| \| \| \| \| \| \| \| \|	Third param (work) is unused, remove it. Remove __inline__ and inline qualifiers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ip_frag: Remove some atomic ops	Eric Dumazet	2010-06-15	2	-4/+2
\| \| \| \| \| \| \|	Instead of doing one atomic operation per frag, we can factorize them. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: syncookies: do not skip ->iif initialization	Florian Westphal	2010-06-15	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When syncookies are in effect, req->iif is left uninitialized. In case of e.g. link-local addresses the route lookup then fails and no syn-ack is sent. Rearrange things so ->iif is also initialized in the syncookie case. want_cookie can only be true when the isn was zero, thus move the want_cookie check into the "!isn" branch. Cc: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Fix error in comment on net_device_ops::ndo_get_stats	Ben Hutchings	2010-06-15	1	-1/+1
\| \| \| \| \| \| \| \|	ndo_get_stats still returns struct net_device_stats *; there is no struct net_device_stats64. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer	Sonic Zhang	2010-06-15	2	-43/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SKBs hold onto resources that can't be held indefinitely, such as TCP socket references and netfilter conntrack state. So if a packet is left in TX ring for a long time, there might be a TCP socket that cannot be closed and freed up. Current blackfin EMAC driver always reclaim and free used tx skbs in future transfers. The problem is that future transfer may not come as soon as possible. This patch start a timer after transfer to reclaim and free skb. There is nearly no performance drop with this patch. TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC. If EMAC TX transfer control is turned on, endless TX interrupts are triggered no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically, TX transfer control can't be turned off in the middle. The only way is to disable TX interrupt completely. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	inetpeer: RCU conversion	Eric Dumazet	2010-06-15	2	-69/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inetpeer currently uses an AVL tree protected by an rwlock. It's possible to make most lookups use RCU 1) Add a struct rcu_head to struct inet_peer 2) add a lookup_rcu_bh() helper to perform lockless and opportunistic lookup. This is a normal function, not a macro like lookup(). 3) Add a limit to number of links followed by lookup_rcu_bh(). This is needed in case we fall in a loop. 4) add an smp_wmb() in link_to_pool() right before node insert. 5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take last reference to an inet_peer, since lockless readers could increase refcount, even while we hold peers.lock. 6) Delay struct inet_peer freeing after rcu grace period so that lookup_rcu_bh() cannot crash. 7) inet_getpeer() first attempts lockless lookup. Note this lookup can fail even if target is in AVL tree, but a concurrent writer can let tree in a non correct form. If this attemps fails, lock is taken a regular lookup is performed again. 8) convert peers.lock from rwlock to a spinlock 9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 -> 128 bytes) In a future patch, this is probably possible to revert this part, if rcu field is put in an union to share space with rid, ip_id_count, tcp_ts & tcp_ts_stamp. These fields being manipulated only with refcnt > 0. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cnic: Fix cnic_cm_abort() error handling.	Michael Chan	2010-06-15	1	-11/+18
\| \| \| \| \| \| \| \| \| \| \|	Fix the code that handles the error case when cnic_cm_abort() cannot proceed normally. We cannot just set the csk->state and we must go through cnic_ready_to_close() to handle all the conditions. We also add error return code in cnic_cm_abort(). Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Eddie Wai <waie@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cnic: Refactor and fix cnic_ready_to_close().	Michael Chan	2010-06-15	1	-16/+10
\| \| \| \| \| \| \| \| \| \| \|	Combine RESET_RECEIVED and RESET_COMP logic and fix race condition between these 2 events and cnic_cm_close(). In particular, we need to (test_and_clear_bit(SK_F_OFFLD_COMPLETE, &csk->flags)) before we update csk->state. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Eddie Wai <waie@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cnic: Refactor code in cnic_cm_process_kcqe().	Michael Chan	2010-06-15	1	-6/+9
\| \| \| \| \| \| \| \| \|	Move chip-specific code to the respective chip's ->close_conn() functions for better code organization. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Eddie Wai <waie@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cnic: Return error code in cnic_cm_close() if unsuccessful.	Michael Chan	2010-06-15	1	-0/+2
\| \| \| \| \| \| \| \| \|	So that bnx2i can handle the error condition immediately and not have to wait for timeout. Signed-off-by: Michael Chan <mchan@broadcom.com. Signed-off-by: Eddie Wai <waie@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ixgbe: update set_rx_mode to fix issues w/ macvlan	Alexander Duyck	2010-06-15	3	-26/+79
\| \| \| \| \| \| \| \| \| \| \| \| \|	This change corrects issues where macvlan was not correctly triggering promiscuous mode on ixgbe due to the filters not being correctly set. It also corrects the fact that VF rar filters were being overwritten when the PF was reset. CC: Shirley Ma <xma@us.ibm.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2010-06-15	36	-205/+608
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
\| *	Merge branch 'master' of /repos/git/net-next-2.6	Patrick McHardy	2010-06-15	531	-8682/+14835
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: include/net/netfilter/xt_rateest.h net/bridge/br_netfilter.c net/netfilter/nf_conntrack_core.c Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: xtables: idletimer target implementation	Luciano Coelho	2010-06-15	5	-0/+373
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements an idletimer Xtables target that can be used to identify when interfaces have been idle for a certain period of time. Timers are identified by labels and are created when a rule is set with a new label. The rules also take a timeout value (in seconds) as an option. If more than one rule uses the same timer label, the timer will be restarted whenever any of the rules get a hit. One entry for each timer is created in sysfs. This attribute contains the timer remaining for the timer to expire. The attributes are located under the xt_idletimer class: /sys/class/xt_idletimer/timers/<label> When the timer expires, the target module sends a sysfs notification to the userspace, which can then decide what to do (eg. disconnect to save power). Cc: Timo Teras <timo.teras@iki.fi> Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: CLUSTERIP: RCU conversion	Eric Dumazet	2010-06-15	1	-19/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- clusterip_lock becomes a spinlock - lockless lookups - kfree() deferred after RCU grace period - rcu_barrier_bh() inserted in clusterip_tg_exit() v2) - As Patrick pointed out, we use atomic_inc_not_zero() in clusterip_config_find_get(). - list_add_rcu() and list_del_rcu() variants are used. - atomic_dec_and_lock() used in clusterip_config_entry_put() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: defrag: kill unused work parameter of frag_kfree_skb()	Shan Wei	2010-06-14	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The parameter (work) is unused, remove it. Reported from Eric Dumazet. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: defrag: remove one redundant atomic ops	Shan Wei	2010-06-14	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of doing one atomic operation per frag, we can factorize them. Reported from Eric Dumazet. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: kill redundant check code in which setting ip_summed value	Shan Wei	2010-06-14	2	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the returned csum value is 0, We has set ip_summed with CHECKSUM_UNNECESSARY flag in __skb_checksum_complete_head(). So this patch kills the check and changes to return to upper caller directly. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: nfnetlink_log: RCU conversion, part 2	Eric Dumazet	2010-06-14	2	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- must use atomic_inc_not_zero() in instance_lookup_get() - must use hlist_add_head_rcu() instead of hlist_add_head() - must use hlist_del_rcu() instead of hlist_del() - Introduce NFULNL_COPY_DISABLED to stop lockless reader from using an instance, before we do final instance_put() on it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: nfnetlink_log: RCU conversion	Eric Dumazet	2010-06-09	1	-22/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- instances_lock becomes a spinlock - lockless lookups While nfnetlink_log probably not performance critical, using less rwlocks in our code is always welcomed... Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: nfnetlink_queue: some optimizations	Eric Dumazet	2010-06-09	1	-19/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use an atomic_t for id_sequence to avoid a spin_lock/spin_unlock pair - Group highly modified struct nfqnl_instance fields together Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: ip6_queue: rwlock to spinlock conversion	Eric Dumazet	2010-06-09	1	-32/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Converts queue_lock rwlock to a spinlock. (readlocked part can be changed by reads of integer values) One atomic operation instead of four per ipq_enqueue_packet() call. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: ip_queue: rwlock to spinlock conversion	Eric Dumazet	2010-06-09	1	-32/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Converts queue_lock rwlock to a spinlock. (readlocked part can be changed by reads of integer values) One atomic operation instead of four per ipq_enqueue_packet() call. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: xt_sctp: use WORD_ROUND macro to calculate length of multiple of ↵	Shan Wei	2010-06-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	4 bytes Use WORD_ROUND to round an int up to the next multiple of 4. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: nf_conntrack: per_cpu untracking	Eric Dumazet	2010-06-09	2	-13/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NOTRACK makes all cpus share a cache line on nf_conntrack_untracked twice per packet, slowing down performance. This patch converts it to a per_cpu variable. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: nf_conntrack: IPS_UNTRACKED bit	Eric Dumazet	2010-06-08	15	-29/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NOTRACK makes all cpus share a cache line on nf_conntrack_untracked twice per packet. This is bad for performance. __read_mostly annotation is also a bad choice. This patch introduces IPS_UNTRACKED bit so that we can use later a per_cpu untrack structure more easily. A new helper, nf_ct_untracked_get() returns a pointer to nf_conntrack_untracked. Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add IPS_NAT_DONE_MASK bits to untracked status. nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: xt_rateest: Better struct xt_rateest layout	Eric Dumazet	2010-06-08	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently dirty two cache lines in struct xt_rateest, this hurts SMP performance. This patch moves lock/bstats/rstats at beginning of structure so that they share a single cache line. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: vmalloc_node cleanup	Eric Dumazet	2010-06-04	3	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using vmalloc_node(size, numa_node_id()) for temporary storage is not needed. vmalloc(size) is more respectful of user NUMA policy. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: xt_statistic: remove nth_lock spinlock	Eric Dumazet	2010-06-01	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use atomic_cmpxchg() to avoid dirtying a shared location. xt_statistic_priv smp aligned to avoid sharing same cache line with other stuff. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: br_netfilter: use skb_set_noref()	Eric Dumazet	2010-06-01	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid dirtying bridge_parent_rtable refcount, using new dst noref infrastructure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \| \|	tcp: unify tcp flag macros	Changli Gao	2010-06-15	6	-71/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH, TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced with the corresponding TCPHDR_*. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/tcp.h \| 24 ++++++------- net/ipv4/tcp.c \| 8 ++-- net/ipv4/tcp_input.c \| 2 - net/ipv4/tcp_output.c \| 59 ++++++++++++++++----------------- net/netfilter/nf_conntrack_proto_tcp.c \| 32 ++++++----------- net/netfilter/xt_TCPMSS.c \| 4 -- 6 files changed, 58 insertions(+), 71 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	bridge: use rx_handler_data pointer to store net_bridge_port pointer	Jiri Pirko	2010-06-15	20	-50/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Register net_bridge_port pointer as rx_handler data pointer. As br_port is removed from struct net_device, another netdev priv_flag is added to indicate the device serves as a bridge port. Also rcuized pointers are now correctly dereferenced in br_fdb.c and in netfilter parts. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	macvlan: use rx_handler_data pointer to store macvlan_port pointer V2	Jiri Pirko	2010-06-15	3	-14/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Register macvlan_port pointer as rx_handler data pointer. As macvlan_port is removed from struct net_device, another netdev priv_flag is added to indicate the device serves as a macvlan port. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: add rx_handler data pointer	Jiri Pirko	2010-06-15	4	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add possibility to register rx_handler data pointer along with a rx_handler. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	bridge: Fix netpoll support	Herbert Xu	2010-06-15	4	-84/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are multiple problems with the newly added netpoll support: 1) Use-after-free on each netpoll packet. 2) Invoking unsafe code on netpoll/IRQ path. 3) Breaks when netpoll is enabled on the underlying device. This patch fixes all of these problems. In particular, we now allocate proper netpoll structures for each underlying device. We only allow netpoll to be enabled on the bridge when all the devices underneath it support netpoll. Once it is enabled, we do not allow non-netpoll devices to join the bridge (until netpoll is disabled again). This allows us to do away with the npinfo juggling that caused problem number 1. Incidentally this patch fixes number 2 by bypassing unsafe code such as multicast snooping and netfilter. Reported-by: Qianfeng Zhang <frzhang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	netpoll: Add netpoll_tx_running	Herbert Xu	2010-06-15	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the helper netpoll_tx_running for use within ndo_start_xmit. It returns non-zero if ndo_start_xmit is being invoked by netpoll, and zero otherwise. This is currently implemented by simply looking at the hardirq count. This is because for all non-netpoll uses of ndo_start_xmit, IRQs must be enabled while netpoll always disables IRQs before calling ndo_start_xmit. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>