summaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAgeFilesLines
* [IPV4] UDP: Move IPv4-specific bits to other file.YOSHIFUJI Hideaki2008-03-044-1088/+1136
| | | | | | | Move IPv4-specific UDP bits from net/ipv4/udp.c into (new) net/ipv4/udp_ipv4.c. Rename net/ipv4/udplite.c to net/ipv4/udplite_ipv4.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [UDP]: Allow users to configure UDP-Lite.YOSHIFUJI Hideaki2008-03-045-13/+36
| | | | | | | | | | | | | | | | Let's give users an option for disabling UDP-Lite (~4K). old: | text data bss dec hex filename | 286498 12432 6072 305002 4a76a net/ipv4/built-in.o | 193830 8192 3204 205226 321aa net/ipv6/ipv6.o new (without UDP-Lite): | text data bss dec hex filename | 284086 12136 5432 301654 49a56 net/ipv4/built-in.o | 191835 7832 3076 202743 317f7 net/ipv6/ipv6.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [TCP]: Add IPv6 support to TCP SYN cookiesGlenn Griffin2008-03-044-4/+7
| | | | | | | | | Updated to incorporate Eric's suggestion of using a per cpu buffer rather than allocating on the stack. Just a two line change, but will resend in it's entirety. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [TCP]: lower stack usage in cookie_hash() functionEric Dumazet2008-03-041-1/+3
| | | | | | | | 400 bytes allocated on stack might be a litle bit too much. Using a per_cpu var is more friendly. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [ARP]: Introduce the arp_hdr_len helper.Pavel Emelyanov2008-03-033-14/+5
| | | | | | | | | | | | | | There are some place, that calculate the ARP header length. These calculations are correct, but a) some operate with "magic" constants, b) enlarge the code length (sometimes at the cost of coding style), c) are not informative from the first glance. The proposal is to introduce a helper, that includes all the good sides of these calculations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Merge exit paths in tcp_v4_conn_request.Denis V. Lunev2008-03-031-10/+6
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: skb->dst can't be NULL in ip_options_echo.Denis V. Lunev2008-03-031-4/+1
| | | | | | | | | ip_options_echo is called on the packet input path after the initial routing. The dst entry on the packet is cleared only in the several very specific places and immidiately assigned back (may be new). Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICMP]: Section conflict between icmp_sk_init/icmp_sk_exit.Denis V. Lunev2008-02-291-1/+3
| | | | | | | | Functions from __exit section should not be called from ones in __init section. Fix this conflict. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.Denis V. Lunev2008-02-293-7/+11
| | | | | | | | | | It looks like dst parameter is used in this API due to historical reasons. Actually, it is really used in the direct call to tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack and remove dst from rtx_syn_ack. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate.Pavel Emelyanov2008-02-291-1/+1
| | | | | | | | | | Some netfilter code and rxrpc one use seq_open() to open a proc file, but seq_release_private to release one. This is harmless, but ambiguous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Make icmp_sk per namespace.Denis V. Lunev2008-02-291-17/+32
| | | | | | | | | | All preparations are done. Now just add a hook to perform an initialization on namespace startup and replace icmp_sk macro with proper inline call. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: icmp(v6)_sk should not pin a namespace.Denis V. Lunev2008-02-291-8/+4
| | | | | | | | | | | | | So, change icmp(v6)_sk creation/disposal to the scheme used in the netlink for rtnl, i.e. create a socket in the context of the init_net and assign the namespace without getting a referrence later. Also use sk_release_kernel instead of sock_release to properly destroy such sockets. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICMP]: Allocate data for __icmp(v6)_sk dynamically.Denis V. Lunev2008-02-291-5/+10
| | | | | | | | | | | | | Own __icmp(v6)_sk should be present in each namespace. So, it should be allocated dynamically. Though, alloc_percpu does not fit the case as it implies additional dereferrence for no bonus. Allocate data for pointers just like __percpu_alloc_mask does and place pointers to struct sock into this array. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock.Denis V. Lunev2008-02-291-9/+10
| | | | | | | | | | | | | | | | | | | | We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket is get from global variable now. When this code became namespaces, one should pass a namespace and get socket from it. Though, above is useless. Socket is available in the caller, just pass it inside. This saves a bit of code now and saves more later. add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168) function old new delta icmp_rcv 718 719 +1 icmpv6_rcv 2343 2303 -40 icmp_send 1566 1518 -48 icmp_reply 549 468 -81 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICMP]: Store sock rather than socket for ICMP flow control.Denis V. Lunev2008-02-291-14/+13
| | | | | | | | | Basically, there is no difference, what to store: socket or sock. Though, sock looks better as there will be 1 less dereferrence on the fast path. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICMP]: Optimize icmp_socket usage.Denis V. Lunev2008-02-291-12/+17
| | | | | | | | | | | | | | Use this macro only once in a function to save a bit of space. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98) function old new delta icmp_reply 562 561 -1 icmp_push_reply 305 258 -47 icmp_init 273 223 -50 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICMP]: Add return code to icmp_init.Denis V. Lunev2008-02-292-5/+24
| | | | | | | | | | | | icmp_init could fail and this is normal for namespace other than initial. So, the panic should be triggered only on init_net initialization path. Additionally create rollback path for icmp_init as a separate function. It will also be used later during namespace destruction. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Remove struct net_proto_family* from _init calls.Denis V. Lunev2008-02-293-4/+4
| | | | | | | | | struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process inet_select_addr inside a namespace.Denis V. Lunev2008-02-281-1/+2
| | | | | | | The context is available from a network device passed in. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Enable IPv4 address manipulations inside namespace.Denis V. Lunev2008-02-281-9/+0
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Enable all routing manipulation via netlink inside namespace.Denis V. Lunev2008-02-281-8/+8
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process devinet ioctl in the correct namespace.Denis V. Lunev2008-02-283-7/+8
| | | | | | | | Add namespace parameter to devinet_ioctl and locate device inside it for state changes. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Register /proc/net/rt_cache for each namespace.Denis V. Lunev2008-02-281-3/+21
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process /proc/net/rt_cache inside a namespace.Denis V. Lunev2008-02-281-3/+7
| | | | | | | Show routing cache for a particular namespace only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: rt_cache_get_next should take rt_genid into account.Denis V. Lunev2008-02-281-5/+13
| | | | | | | | | In the other case /proc/net/rt_cache will look inconsistent in respect to genid. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process ip_rt_redirect in the correct namespace.Denis V. Lunev2008-02-281-2/+5
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Enable inetdev_event notifier.Denis V. Lunev2008-02-281-3/+0
| | | | | | | | After all these preparations it is time to enable main IPv4 device initialization routine inside namespace. It is safe do this now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Disable multicaststing configuration inside non-initial namespace.Denis V. Lunev2008-02-281-0/+39
| | | | | | | | Do not calls hooks from device notifiers and disallow configuration from ioctl/netlink layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Consolidate masq_inet_event and masq_device_event.Denis V. Lunev2008-02-281-12/+2
| | | | | | | | They do exactly the same job. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use proc_create() to setup ->proc_fops firstWang Chen2008-02-281-3/+2
| | | | | | | | Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to main tree. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPCOMP]: Disable BH on output when using shared tfmHerbert Xu2008-02-281-1/+4
| | | | | | | | | | | | | Because we use shared tfm objects in order to conserve memory, (each tfm requires 128K of vmalloc memory), BH needs to be turned off on output as that can occur in process context. Previously this was done implicitly by the xfrm output code. That was lost when it became lockless. So we need to add the BH disabling to IPComp directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Don't create tunnels with '%' in name.Pavel Emelyanov2008-02-262-10/+18
| | | | | | | | | | | | | | | | | | Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a pre-defined name for a device from the userspace. Since these drivers call the register_netdevice() (rtnl_lock, is held), which does _not_ generate the device's name, this name may contain a '%' character. Not sure how bad is this to have a device with a '%' in its name, but all the other places either use the register_netdev(), which call the dev_alloc_name(), or explicitly call the dev_alloc_name() before registering, i.e. do not allow for such names. This had to be prior to the commit 34cc7b, but I forgot to number the patches and this one got lost, sorry. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Reset scope when changing addressBjorn Mork2008-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | This bug did bite at least one user, who did have to resort to rebooting the system after an "ifconfig eth0 127.0.0.1" typo. Deleting the address and adding a new is a less intrusive workaround. But I still beleive this is a bug that should be fixed. Some way or another. Another possibility would be to remove the scope mangling based on address. This will always be incomplete (are 127/8 the only address space with host scope requirements?) We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured with a loopback address (127/8). The scope is never reset, and will remain set to RT_SCOPE_HOST after changing the address. This patch resets the scope if the address is changed again, to restore normal functionality. Signed-off-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.Pavel Emelyanov2008-02-232-20/+4
| | | | | | | | | | | | | | Use the added dev_alloc_name() call to create tunnel device name, rather than iterate in a hand-made loop with an artificial limit. Thanks Patrick for noticing this. [ The way this works is, when the device is actually registered, the generic code noticed the '%' in the name and invokes dev_alloc_name() to fully resolve the name. -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Fix incorrect use of skb_make_writableJoonwoo Park2008-02-191-1/+1
| | | | | | | | | http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling ↵Patrick McHardy2008-02-191-5/+7
| | | | | | | | | | | | | | | | | | packet data As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>, inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue triggers a SKB_LINEAR_ASSERT in skb_put(). Going back through the git history, it seems this bug is present since at least 2.6.12-rc2, probably even since the removal of skb_linearize() for netfilter. Linearize non-linear skbs through skb_copy_expand() when enlarging them. Tested by Thomas, fixes bugzilla #9933. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4/fib_hash.c: fix NULL dereferenceAdrian Bunk2008-02-191-5/+5
| | | | | | | | | | | | | | | | Unless I miss a guaranteed relation between between "f" and "new_fa->fa_info" this patch is required for fixing a NULL dereference introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 ("[IPV4] FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the Coverity checker. Eric Dumazet says: Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() does. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix tcp_v4_send_synack() commentKris Katterjohn2008-02-171-1/+1
| | | | | Signed-off-by: Kris Katterjohn <katterjohn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: fix alignment of IP-Config outputUwe Kleine-Koenig2008-02-171-1/+1
| | | | | | | | Make the indented lines aligned in the output (not in the code). Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "[NDISC]: Fix race in generic address resolution"David S. Miller2008-02-171-0/+3
| | | | | | | | | | | | This reverts commit 69cc64d8d92bf852f933e90c888dfff083bd4fc9. It causes recursive locking in IPV6 because unlike other neighbour layer clients, it even needs neighbour cache entries to send neighbour soliciation messages :-( We'll have to find another way to fix this race. Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Unexport inet_listen_wlockAdrian Bunk2008-02-131-2/+0
| | | | | | | This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Unexport __inet_hash_connectAdrian Bunk2008-02-131-1/+0
| | | | | | | This patch removes the unused EXPORT_SYMBOL_GPL(__inet_hash_connect). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC]: Fix bogus usage of u64 on input sequence numberHerbert Xu2008-02-122-3/+4
| | | | | | | | | Al Viro spotted a bogus use of u64 on the input sequence number which is big-endian. This patch fixes it by giving the input sequence number its own member in the xfrm_skb_cb structure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NDISC]: Fix race in generic address resolutionDavid S. Miller2008-02-121-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Frank Blaschka provided the bug report and the initial suggested fix for this bug. He also validated this version of this fix. The problem is that the access to neigh->arp_queue is inconsistent, we grab references when dropping the lock lock to call neigh->ops->solicit() but this does not prevent other threads of control from trying to send out that packet at the same time causing corruptions because both code paths believe they have exclusive access to the skb. The best option seems to be to hold the write lock on neigh->lock during the ->solicit() call. I looked at all of the ndisc_ops implementations and this seems workable. The only case that needs special care is the IPV4 ARP implementation of arp_solicit(). It wants to take neigh->lock as a reader to protect the header entry in neigh->ha during the emission of the soliciation. We can simply remove the read lock calls to take care of that since holding the lock as a writer at the caller providers a superset of the protection afforded by the existing read locking. The rest of the ->solicit() implementations don't care whether the neigh is locked or not. Signed-off-by: David S. Miller <davem@davemloft.net>
* fib_trie: /proc/net/route performance improvementStephen Hemminger2008-02-121-11/+82
| | | | | | | | | Use key/offset caching to change /proc/net/route (use by iputils route) from O(n^2) to O(n). This improves performance from 30sec with 160,000 routes to 1sec. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fib_trie: handle empty treeStephen Hemminger2008-02-121-4/+2
| | | | | | | | This fixes possible problems when trie_firstleaf() returns NULL to trie_leafindex(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Remove IP_TOS setting privilege checks.David S. Miller2008-02-121-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various RFCs have all sorts of things to say about the CS field of the DSCP value. In particular they try to make the distinction between values that should be used by "user applications" and things like routing daemons. This seems to have influenced the CAP_NET_ADMIN check which exists for IP_TOS socket option settings, but in fact it has an off-by-one error so it wasn't allowing CS5 which is meant for "user applications" as well. Further adding to the inconsistency and brokenness here, IPV6 does not validate the DSCP values specified for the IPV6_TCLASS socket option. The real actual uses of these TOS values are system specific in the final analysis, and these RFC recommendations are just that, "a recommendation". In fact the standards very purposefully use "SHOULD" and "SHOULD NOT" when describing how these values can be used. In the final analysis the only clean way to provide consistency here is to remove the CAP_NET_ADMIN check. The alternatives just don't work out: 1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing setups. 2) If we just fix the off-by-one error in the class comparison in IPV4, certain DSCP values can be used in IPV6 but not IPV4 by default. So people will just ask for a sysctl asking to override that. I checked several other freely available kernel trees and they do not make any privilege checks in this area like we do. For the BSD stacks, this goes back all the way to Stevens Volume 2 and beyond. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IGMP]: Optimize kfree_skb in igmp_rcv.Denis V. Lunev2008-02-091-7/+6
| | | | | | | Merge error paths inside igmp_rcv. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: route: fix crash ip_route_inputPatrick McHardy2008-02-071-1/+1
| | | | | | | | | | | | | ip_route_me_harder() may call ip_route_input() with skbs that don't have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets generated by the REJECT target, resulting in a crash when dereferencing skb->dev->nd_net. Since ip_route_input() has an input device argument, it seems correct to use that one anyway. Bug introduced in b5921910a1 (Routing cache virtualization). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_conntrack: fix ct_extend ->move operationPatrick McHardy2008-02-071-3/+3
| | | | | | | | | | | | | | | | | | | The ->move operation has two bugs: - It is called with the same extension as source and destination, so it doesn't update the new extension. - The address of the old extension is calculated incorrectly, instead of (void *)ct->ext + ct->ext->offset[i] it uses ct->ext + ct->ext->offset[i]. Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com> and Thomas Woerner <twoerner@redhat.com>. Tested-by: Thomas Woerner <twoerner@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>