summaryrefslogtreecommitdiffstats
path: root/net/ipv4/udplite.c
Commit message (Collapse)AuthorAgeFilesLines
* udp: RCU handling for Unicast packets.Eric Dumazet2008-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Goals are : 1) Optimizing handling of incoming Unicast UDP frames, so that no memory writes should happen in the fast path. Note: Multicasts and broadcasts still will need to take a lock, because doing a full lockless lookup in this case is difficult. 2) No expensive operations in the socket bind/unhash phases : - No expensive synchronize_rcu() calls. - No added rcu_head in socket structure, increasing memory needs, but more important, forcing us to use call_rcu() calls, that have the bad property of making sockets structure cold. (rcu grace period between socket freeing and its potential reuse make this socket being cold in CPU cache). David did a previous patch using call_rcu() and noticed a 20% impact on TCP connection rates. Quoting Cristopher Lameter : "Right. That results in cacheline cooldown. You'd want to recycle the object as they are cache hot on a per cpu basis. That is screwed up by the delayed regular rcu processing. We have seen multiple regressions due to cacheline cooldown. The only choice in cacheline hot sensitive areas is to deal with the complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU." - Because udp sockets are allocated from dedicated kmem_cache, use of SLAB_DESTROY_BY_RCU can help here. Theory of operation : --------------------- As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()), special attention must be taken by readers and writers. Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed, reused, inserted in a different chain or in worst case in the same chain while readers could do lookups in the same time. In order to avoid loops, a reader must check each socket found in a chain really belongs to the chain the reader was traversing. If it finds a mismatch, lookup must start again at the begining. This *restart* loop is the reason we had to use rdlock for the multicast case, because we dont want to send same message several times to the same socket. We use RCU only for fast path. Thus, /proc/net/udp still takes spinlocks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: introduce struct udp_table and multiple spinlocksEric Dumazet2008-10-291-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UDP sockets are hashed in a 128 slots hash table. This hash table is protected by *one* rwlock. This rwlock is readlocked each time an incoming UDP message is handled. This rwlock is writelocked each time a socket must be inserted in hash table (bind time), or deleted from this table (close time) This is not scalable on SMP machines : 1) Even in read mode, lock() and unlock() are atomic operations and must dirty a contended cache line, shared by all cpus. 2) A writer might be starved if many readers are 'in flight'. This can happen on a machine with some NIC receiving many UDP messages. User process can be delayed a long time at socket creation/dismantle time. This patch prepares RCU migration, by introducing 'struct udp_table and struct udp_hslot', and using one spinlock per chain, to reduce contention on central rwlock. Introducing one spinlock per chain reduces latencies, for port randomization on heavily loaded UDP servers. This also speedup bindings to specific ports. udp_lib_unhash() was uninlined, becoming to big. Some cleanups were done to ease review of following patch (RCUification of UDP Unicast lookups) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: put udplite statistics on struct netPavel Emelyanov2008-07-181-1/+0
| | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove CVS keywordsAdrian Bunk2008-06-111-2/+0
| | | | | | | | This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Remove owner from udp_seq_afinfo.Denis V. Lunev2008-03-281-1/+3
| | | | | | | Move it to udp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Place file operations directly into udp_seq_afinfo.Denis V. Lunev2008-03-281-2/+0
| | | | | | | No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo.Denis V. Lunev2008-03-281-1/+3
| | | | | | | No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SOCK]: Drop inuse pcounter from struct proto (v2).Pavel Emelyanov2008-03-281-3/+0
| | | | | | | | An uppercut - do not use the pcounter on struct proto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Enable TCP/UDP/ICMP inside namespace.Denis V. Lunev2008-03-241-0/+1
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][UDP-Lite]: Register /proc/net/udplite(6) in a namespace.Pavel Emelyanov2008-03-241-1/+16
| | | | | | | | UDP-Lite sockets are displayed in another files, rather than UDP ones, so make the present in namespaces as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP-Lite]: Clean up proc creation a bit.Pavel Emelyanov2008-03-241-3/+11
| | | | | | | | | Just introduce a helper to remove ifdefs from inside the udplite4_register function. This will help to make the next patch nicer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Make full use of proto.h.udp_hash innovation.Pavel Emelyanov2008-03-221-13/+2
| | | | | | | | | | | | | | After this we have only udp_lib_get_port to get the port and two stubs for ipv4 and ipv6. No difference in udp and udplite except for initialized h.udp_hash member. I tried to find a graceful way to drop the only difference between udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison routine), but adding one more callback on the struct proto didn't appear such :( Maybe later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][IPV6] udp6 - make proc per namespaceDaniel Lezcano2008-03-211-1/+1
| | | | | | | | The proc init/exit functions take a new network namespace parameter in order to register/unregister /proc/net/udp6 for a namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Revert udplite and code split.David S. Miller2008-03-061-0/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit db1ed684f6c430c4cdad67d058688b8a1b5e607c ("[IPV6] UDP: Rename IPv6 UDP files."), commit 8be8af8fa4405652e6c0797db5465a4be8afb998 ("[IPV4] UDP: Move IPv4-specific bits to other file.") and commit e898d4db2749c6052072e9bc4448e396cbdeb06a ("[UDP]: Allow users to configure UDP-Lite."). First, udplite is of such small cost, and it is a core protocol just like TCP and normal UDP are. We spent enormous amounts of effort to make udplite share as much code with core UDP as possible. All of that work is less valuable if we're just going to slap a config option on udplite support. It is also causing build failures, as reported on linux-next, showing that the changeset was not tested very well. In fact, this is the second build failure resulting from the udplite change. Finally, the config options provided was a bool, instead of a modular option. Meaning the udplite code does not even get build tested by allmodconfig builds, and furthermore the user is not presented with a reasonable modular build option which is particularly needed by distribution vendors. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] UDP: Move IPv4-specific bits to other file.YOSHIFUJI Hideaki2008-03-041-121/+0
| | | | | | | Move IPv4-specific UDP bits from net/ipv4/udp.c into (new) net/ipv4/udp_ipv4.c. Rename net/ipv4/udplite.c to net/ipv4/udplite_ipv4.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void.YOSHIFUJI Hideaki2008-01-281-1/+1
| | | | | | | | Fix following sparse warnings: | net/ipv4/udp.c:421:2: warning: returning void-valued expression | net/ipv4/udplite.c:38:2: warning: returning void-valued expression Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructureEric Dumazet2007-11-071-0/+3
| | | | | | | | | | | Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Randomize port selection.Stephen Hemminger2007-10-101-2/+1
| | | | | | | | This patch causes UDP port allocation to be randomized like TCP. The earlier code would always choose same port (ie first empty list). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Revert 2-pass hashing changes.David S. Miller2007-06-071-4/+3
| | | | | | | | | | | | | | | | | This reverts changesets: 6aaf47fa48d3c44280810b1b470261d340e4ed87 b7b5f487ab39bc10ed0694af35651a03d9cb97ff de34ed91c4ffa4727964a832c46e624dd1495cf5 fc038410b4b1643766f8033f4940bcdb1dace633 There are still some correctness issues recently discovered which do not have a known fix that doesn't involve doing a full hash table scan on port bind. So revert for now. Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Fix AF-specific references in AF-agnostic code.David S. Miller2007-05-101-3/+4
| | | | | | | | | | | __udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Clean up UDP-Lite receive checksumHerbert Xu2007-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Possible cleanups.Adrian Bunk2006-12-021-5/+5
| | | | | | | | | | | | | | | | | | This patch contains the following possible cleanups: - make the following needlessly global functions statis: - ipv4/tcp.c: __tcp_alloc_md5sig_pool() - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup() - ipv4/udplite.c: udplite_rcv() - ipv4/udplite.c: udplite_err() - make the following needlessly global structs static: - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops - net/ipv{4,6}/udplite.c: remove inline's from static functions (gcc should know best when to inline them) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Supporting UDP-Lite (RFC 3828) in LinuxGerrit Renker2006-12-021-0/+119
This is a revision of the previously submitted patch, which alters the way files are organized and compiled in the following manner: * UDP and UDP-Lite now use separate object files * source file dependencies resolved via header files net/ipv{4,6}/udp_impl.h * order of inclusion files in udp.c/udplite.c adapted accordingly [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828) This patch adds support for UDP-Lite to the IPv4 stack, provided as an extension to the existing UDPv4 code: * generic routines are all located in net/ipv4/udp.c * UDP-Lite specific routines are in net/ipv4/udplite.c * MIB/statistics support in /proc/net/snmp and /proc/net/udplite * shared API with extensions for partial checksum coverage [NET/IPv6]: Extension for UDP-Lite over IPv6 It extends the existing UDPv6 code base with support for UDP-Lite in the same manner as per UDPv4. In particular, * UDPv6 generic and shared code is in net/ipv6/udp.c * UDP-Litev6 specific extensions are in net/ipv6/udplite.c * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6 * support for IPV6_ADDRFORM * aligned the coding style of protocol initialisation with af_inet6.c * made the error handling in udpv6_queue_rcv_skb consistent; to return `-1' on error on all error cases * consolidation of shared code [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support The UDP-Lite patch further provides * API documentation for UDP-Lite * basic xfrm support * basic netfilter support for IPv4 and IPv6 (LOG target) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>