summaryrefslogtreecommitdiffstats
path: root/net/ipv6
Commit message (Collapse)AuthorAgeFilesLines
* tcp/dccp: get rid of central timewait timerEric Dumazet2015-04-132-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using a timer wheel for timewait sockets was nice ~15 years ago when memory was expensive and machines had a single processor. This does not scale, code is ugly and source of huge latencies (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.) We can afford to use an extra 64 bytes per timewait sock and spread timewait load to all cpus to have better behavior. Tested: On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1 on the target (lpaa24) Before patch : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 419594 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 437171 While test is running, we can observe 25 or even 33 ms latencies. lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2 lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2 After patch : About 90% increase of throughput : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 810442 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 800992 And latencies are kept to minimal values during this load, even if network utilization is 90% higher : lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2015-04-091-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. They are: * nf_tables set timeout infrastructure from Patrick Mchardy. 1) Add support for set timeout support. 2) Add support for set element timeouts using the new set extension infrastructure. 4) Add garbage collection helper functions to get rid of stale elements. Elements are accumulated in a batch that are asynchronously released via RCU when the batch is full. 5) Add garbage collection synchronization helpers. This introduces a new element busy bit to address concurrent access from the netlink API and the garbage collector. 5) Add timeout support for the nft_hash set implementation. The garbage collector peridically checks for stale elements from the workqueue. * iptables/nftables cgroup fixes: 6) Ignore non full-socket objects from the input path, otherwise cgroup match may crash, from Daniel Borkmann. 7) Fix cgroup in nf_tables. 8) Save some cycles from xt_socket by skipping packet header parsing when skb->sk is already set because of early demux. Also from Daniel. * br_netfilter updates from Florian Westphal. 9) Save frag_max_size and restore it from the forward path too. 10) Use a per-cpu area to restore the original source MAC address when traffic is DNAT'ed. 11) Add helper functions to access physical devices. 12) Use these new physdev helper function from xt_physdev. 13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter state information. 14) Annotate original layer 2 protocol number in nf_bridge info, instead of using kludgy flags. 15) Also annotate the pkttype mangling when the packet travels back and forth from the IP to the bridge layer, instead of using a flag. * More nf_tables set enhancement from Patrick: 16) Fix possible usage of set variant that doesn't support timeouts. 17) Avoid spurious "set is full" errors from Netlink API when there are pending stale elements scheduled to be released. 18) Restrict loop checks to set maps. 19) Add support for dynamic set updates from the packet path. 20) Add support to store optional user data (eg. comments) per set element. BTW, I have also pulled net-next into nf-next to anticipate the conflict resolution between your okfn() signature changes and Florian's br_netfilter updates. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso2015-04-0831-197/+196
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve conflicts between 5888b93 ("Merge branch 'nf-hook-compress'") and Florian Westphal br_netfilter works. Conflicts: net/bridge/br_netfilter.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: bridge: add helpers for fetching physin/outdevFlorian Westphal2015-04-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | right now we store this in the nf_bridge_info struct, accessible via skb->nf_bridge. This patch prepares removal of this pointer from skb: Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out device (or ifindexes). Followup patches to netfilter will then allow nf_bridge_info to be obtained by a call into the br_netfilter core, rather than keeping a pointer to it in sk_buff. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | netfilter: Fix switch statement warnings with recent gcc.David Miller2015-04-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More recent GCC warns about two kinds of switch statement uses: 1) Switching on an enumeration, but not having an explicit case statement for all members of the enumeration. To show the compiler this is intentional, we simply add a default case with nothing more than a break statement. 2) Switching on a boolean value. I think this warning is dumb but nevertheless you get it wholesale with -Wswitch. This patch cures all such warnings in netfilter. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | ipv6: call iptunnel_xmit with NULL sock pointer if no tunnel sock is availableHannes Frederic Sowa2015-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | Fixes: 79b16aadea32cce ("udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().") Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: remove extra newlinesSheng Yong2015-04-071-3/+0
| |/ |/| | | | | | | Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().David Miller2015-04-074-9/+21
| | | | | | | | | | | | | | | | | | | | | | That was we can make sure the output path of ipv4/ipv6 operate on the UDP socket rather than whatever random thing happens to be in skb->sk. Based upon a patch by Jiri Pirko. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
* | netfilter: Pass socket pointer down through okfn().David Miller2015-04-0710-42/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-04-063-3/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/mellanox/mlx4/cmd.c net/core/fib_rules.c net/ipv4/fib_frontend.c The fib_rules.c and fib_frontend.c conflicts were locking adjustments in 'net' overlapping addition and removal of code in 'net-next'. The mlx4 conflict was a bug fix in 'net' happening in the same place a constant was being replaced with a more suitable macro. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: protect skb->sk accesses from recursive dereference inside the stackhannes@stressinduktion.org2015-04-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip6mr: call del_timer_sync() in ip6mr_free_table()WANG Cong2015-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to wait for the flying timers, since we are going to free the mrtable right after it. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: move fib_rules_unregister() under rtnl lockWANG Cong2015-04-022-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have to hold rtnl lock for fib_rules_unregister() otherwise the following race could happen: fib_rules_unregister(): fib_nl_delrule(): ... ... ... ops = lookup_rules_ops(); list_del_rcu(&ops->list); list_for_each_entry(ops->rules) { fib_rules_cleanup_ops(ops); ... list_del_rcu(); list_del_rcu(); } Note, net->rules_mod_lock is actually not needed at all, either upper layer netns code or rtnl lock guarantees we are safe. Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: Pass nf_hook_state through nft_set_pktinfo*().David S. Miller2015-04-043-3/+3
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: Pass nf_hook_state through ip6t_do_table().David S. Miller2015-04-046-20/+16
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: Pass nf_hook_state through nf_nat_ipv6_{in,out,fn,local_fn}().David S. Miller2015-04-043-36/+27
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: Make nf_hookfn use nf_hook_state.David S. Miller2015-04-0411-87/+52
| | | | | | | | | | | | | | | | | | | | | Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: Use nf_hook_state in nf_queue_entry.David S. Miller2015-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | | That way we don't have to reinstantiate another nf_hook_state on the stack of the nf_reinject() path. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-04-023-3/+21
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/usb/asix_common.c drivers/net/usb/sr9800.c drivers/net/usb/usbnet.c include/linux/usb/usbnet.h net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c The TCP conflicts were overlapping changes. In 'net' we added a READ_ONCE() to the socket cached RX route read, whilst in 'net-next' Eric Dumazet touched the surrounding code dealing with how mini sockets are handled. With USB, it's a case of the same bug fix first going into net-next and then I cherry picked it back into net. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: tcp6: fix double call of tcp_v6_fill_cb()Alexey Kodanev2015-03-291-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp_v6_fill_cb() will be called twice if socket's state changes from TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data corruption because in the second tcp_v6_fill_cb() call it's not copying IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and unpredictable results. Performance loss of up to 1200% has been observed in LTP/vxlan03 test. This can be fixed by copying inet6_skb_parm to the beginning of 'cb' only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be called again. Fixes: 2dc49d1680b53 ("tcp6: don't move IP6CB before xfrm6_policy_check()") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipmr,ip6mr: call ip6mr_free_table() on failure pathWANG Cong2015-03-291-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Don't reduce hop limit for an interfaceD.S. Ljungmark2015-03-251-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: prevent fetching dst twice in early demux codeMichal Kubeček2015-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux() struct dst_entry *dst = sk->sk_rx_dst; if (dst) dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie); to code reading sk->sk_rx_dst twice, once for the test and once for the argument of ip6_dst_check() (dst_check() is inline). This allows ip6_dst_check() to be called with null first argument, causing a crash. Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6 TCP early demux code. Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Fixes: c7109986db3c ("ipv6: Early TCP socket demux") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipmr,ip6mr: implement ndo_get_iflinkNicolas Dichtel2015-04-021-1/+6
| | | | | | | | | | | | | | | | | | | | | Don't use dev->iflink anymore. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipip,gre,vti,sit: implement ndo_get_iflinkNicolas Dichtel2015-04-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Don't use dev->iflink anymore. CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ip6tnl,gre6,vti6: implement ndo_get_iflinkNicolas Dichtel2015-04-023-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | Don't use dev->iflink anymore. CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | dev: introduce dev_get_iflink()Nicolas Dichtel2015-04-022-3/+3
| |/ |/| | | | | | | | | | | | | | | | | | | The goal of this patch is to prepare the removal of the iflink field. It introduces a new ndo function, which will be implemented by virtual interfaces. There is no functional change into this patch. All readers of iflink field now call dev_get_iflink(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: implement nla_get_in_addr and nla_get_in6_addrJiri Benc2015-03-317-24/+15
| | | | | | | | | | | | | | Those are counterparts to nla_put_in_addr and nla_put_in6_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: implement nla_put_in_addr and nla_put_in6_addrJiri Benc2015-03-3111-41/+32
| | | | | | | | | | | | | | | | | | | | | | | | IP addresses are often stored in netlink attributes. Add generic functions to do that. For nla_put_in_addr, it would be nicer to pass struct in_addr but this is not used universally throughout the kernel, in way too many places __be32 is used to store IPv4 address. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: simplify xfrm_address_t useJiri Benc2015-03-312-5/+3
| | | | | | | | | | | | | | | | | | | | | | In many places, the a6 field is typecasted to struct in6_addr. As the fields are in union anyway, just add in6_addr type to the union and get rid of the typecasting. Modifying the uapi header is okay, the union has still the same size. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: coding style: comparison for inequality with NULLIan Morris2015-03-3117-43/+43
| | | | | | | | | | | | | | | | | | | | | | | | The ipv6 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x != NULL and sometimes as x. x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: coding style: comparison for equality with NULLIan Morris2015-03-3125-174/+173
| | | | | | | | | | | | | | | | | | | | | | | | The ipv6 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2015-03-292-2/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, nf_tables updates to add the set extension infrastructure and finish the transaction for sets from Patrick McHardy. More specifically, they are: 1) Move netns to basechain and use recently added possible_net_t, from Patrick McHardy. 2) Use LOGLEVEL_<FOO> from nf_log infrastructure, from Joe Perches. 3) Restore nf_log_trace that was accidentally removed during conflict resolution. 4) nft_queue does not depend on NETFILTER_XTABLES, starting from here all patches from Patrick McHardy. 5) Use raw_smp_processor_id() in nft_meta. Then, several patches to prepare ground for the new set extension infrastructure: 6) Pass object length to the hash callback in rhashtable as needed by the new set extension infrastructure. 7) Cleanup patch to restore struct nft_hash as wrapper for struct rhashtable 8) Another small source code readability cleanup for nft_hash. 9) Convert nft_hash to rhashtable callbacks. And finally... 10) Add the new set extension infrastructure. 11) Convert the nft_hash and nft_rbtree sets to use it. 12) Batch set element release to avoid several RCU grace period in a row and add new function nft_set_elem_destroy() to consolidate set element release. 13) Return the set extension data area from nft_lookup. 14) Refactor existing transaction code to add some helper functions and document it. 15) Complete the set transaction support, using similar approach to what we already use, to activate/deactivate elements in an atomic fashion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: Use LOGLEVEL_<FOO> definesJoe Perches2015-03-252-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the #defines where appropriate. Miscellanea: Add explicit #include <linux/kernel.h> where it was not previously used so that these #defines are a bit more explicitly defined instead of indirectly included via: module.h->moduleparam.h->kernel.h Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | fib6: install fib6 ops in the last stepWANG Cong2015-03-291-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | We should not commit the new ops until we finish all the setup, otherwise we have to NULL it on failure. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: hash net ptr into fragmentation bucket selectionHannes Frederic Sowa2015-03-253-11/+13
|/ / | | | | | | | | | | | | | | | | | | | | As namespaces are sometimes used with overlapping ip address ranges, we should also use the namespace as input to the hash to select the ip fragmentation counter bucket. Cc: Eric Dumazet <edumazet@google.com> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: fix ipv4 mapped request socksEric Dumazet2015-03-252-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ss should display ipv4 mapped request sockets like this : tcp SYN-RECV 0 0 ::ffff:192.168.0.1:8080 ::ffff:192.0.2.1:35261 and not like this : tcp SYN-RECV 0 0 192.168.0.1:8080 192.0.2.1:35261 We should init ireq->ireq_family based on listener sk_family, not the actual protocol carried by SYN packet. This means we can set ireq_family in inet_reqsk_alloc() Fixes: 3f66b083a5b7 ("inet: introduce ireq_family") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()Eric Dumazet2015-03-241-8/+2
| | | | | | | | | | | | | | | | | | | | | | With request socks convergence, we no longer need different lookup methods. A request socket can use generic lookup function. Add const qualifier to 2nd tcp_v[46]_md5_lookup() parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: md5: remove request sock argument of calc_md5_hash()Eric Dumazet2015-03-241-9/+6
| | | | | | | | | | | | | | | | | | | | | | Since request and established sockets now have same base, there is no need to pass two pointers to tcp_v4_md5_hash_skb() or tcp_v6_md5_hash_skb() Also add a const qualifier to their struct tcp_md5sig_key argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: md5: input path is run under rcu protected sectionsEric Dumazet2015-03-241-19/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is guaranteed that both tcp_v4_rcv() and tcp_v6_rcv() run from rcu read locked sections : ip_local_deliver_finish() and ip6_input_finish() both use rcu_read_lock() Also align tcp_v6_inbound_md5_hash() on tcp_v4_inbound_md5_hash() by returning a boolean. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: fix sparse warnings in privacy stable addresses generationHannes Frederic Sowa2015-03-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | Those warnings reported by sparse endianness check (via kbuild test robot) are harmless, nevertheless fix them up and make the code a little bit easier to read. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-03-231-3/+3
|\| | | | | | | | | | | | | | | | | | | Conflicts: net/netfilter/nf_tables_core.c The nf_tables_core.c conflict was resolved using a conflict resolution from Stephen Rothwell as a guide. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2015-03-221-3/+3
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix missing initialization of tuple structure in nfnetlink_cthelper to avoid mismatches when looking up to attach userspace helpers to flows, from Ian Wilson. 2) Fix potential crash in nft_hash when we hit -EAGAIN in nft_hash_walk(), from Herbert Xu. 3) We don't need to indicate the hook information to update the basechain default policy in nf_tables. 4) Restore tracing over nfnetlink_log due to recent rework to accomodate logging infrastructure into nf_tables. 5) Fix wrong IP6T_INV_PROTO check in xt_TPROXY. 6) Set IP6T_F_PROTO flag in nft_compat so we can use SYNPROXY6 and REJECT6 from xt over nftables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * netfilter: restore rule tracing via nfnetlink_logPablo Neira Ayuso2015-03-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since fab4085 ("netfilter: log: nf_log_packet() as real unified interface"), the loginfo structure that is passed to nf_log_packet() is used to explicitly indicate the logger type you want to use. This is a problem for people tracing rules through nfnetlink_log since packets are always routed to the NF_LOG_TYPE logger after the aforementioned patch. We can fix this by removing the trace loginfo structures, but that still changes the log level from 4 to 5 for tracing messages and there may be someone relying on this outthere. So let's just introduce a new nf_log_trace() function that restores the former behaviour. Reported-by: Markus Kötter <koetter@rrzn.uni-hannover.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | ipv6: introduce idgen_delay and idgen_retries knobsHannes Frederic Sowa2015-03-233-7/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is specified by RFC 7217. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: do retries on stable privacy addressesHannes Frederic Sowa2015-03-231-3/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a DAD conflict is detected, we want to retry privacy stable address generation up to idgen_retries (= 3) times with a delay of idgen_delay (= 1 second). Add the logic to addrconf_dad_failure. By design, we don't clean up dad failed permanent addresses. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: collapse state_lock and lockHannes Frederic Sowa2015-03-231-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: introduce IFA_F_STABLE_PRIVACY flagHannes Frederic Sowa2015-03-231-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to mark appropriate addresses so we can do retries in case their DAD failed. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: generation of stable privacy addresses for link-local and autoconfHannes Frederic Sowa2015-03-231-4/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the stable privacy address generation for link-local and autoconf addresses as specified in RFC7217. RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key) is the RID (random identifier). As the hash function F we chose one round of sha1. Prefix will be either the link-local prefix or the router advertised one. As Net_Iface we use the MAC address of the device. DAD_Counter and secret_key are implemented as specified. We don't use Network_ID, as it couples the code too closely to other subsystems. It is specified as optional in the RFC. As Net_Iface we only use the MAC address: we simply have no stable identifier in the kernel we could possibly use: because this code might run very early, we cannot depend on names, as they might be changed by user space early on during the boot process. A new address generation mode is introduced, IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to none or eui64 address configuration mode although the stable_secret is already set. We refuse writes to ipv6/conf/all/stable_secret but only allow ipv6/conf/default/stable_secret and the interface specific file to be written to. The default stable_secret is used as the parameter for the namespace, the interface specific can overwrite the secret, e.g. when switching a network configuration from one system to another while inheriting the secret. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: introduce secret_stable to ipv6_devconfHannes Frederic Sowa2015-03-231-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the procfs logic for the stable_address knob: The secret is formatted as an ipv6 address and will be stored per interface and per namespace. We track initialized flag and return EIO errors until the secret is set. We don't inherit the secret to newly created namespaces. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>