summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* ipv6: fix handling of blackhole and prohibit routesNicolas Dichtel2012-09-051-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | When adding a blackhole or a prohibit route, they were handling like classic routes. Moreover, it was only possible to add this kind of routes by specifying an interface. Bug already reported here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498 Before the patch: $ ip route add blackhole 2001::1/128 RTNETLINK answers: No such device $ ip route add blackhole 2001::1/128 dev eth0 $ ip -6 route | grep 2001 2001::1 dev eth0 metric 1024 After: $ ip route add blackhole 2001::1/128 $ ip -6 route | grep 2001 blackhole 2001::1 dev lo metric 1024 error -22 v2: wrong patch v3: add a field fc_type in struct fib6_config to store RTN_* type Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: qdisc busylock needs lockdep annotationsEric Dumazet2012-09-052-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems we need to provide ability for stacked devices to use specific lock_class_key for sch->busylock We could instead default l2tpeth tx_queue_len to 0 (no qdisc), but a user might use a qdisc anyway. (So same fixes are probably needed on non LLTX stacked drivers) Noticed while stressing L2TPV3 setup : ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc3+ #788 Not tainted ------------------------------------------------------- netperf/4660 is trying to acquire lock: (l2tpsock){+.-...}, at: [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core] but task is already holding lock: (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&sch->busylock)->rlock){+.-...}: [<ffffffff810a5df0>] lock_acquire+0x90/0x200 [<ffffffff817499fc>] _raw_spin_lock_irqsave+0x4c/0x60 [<ffffffff81074872>] __wake_up+0x32/0x70 [<ffffffff8136d39e>] tty_wakeup+0x3e/0x80 [<ffffffff81378fb3>] pty_write+0x73/0x80 [<ffffffff8136cb4c>] tty_put_char+0x3c/0x40 [<ffffffff813722b2>] process_echoes+0x142/0x330 [<ffffffff813742ab>] n_tty_receive_buf+0x8fb/0x1230 [<ffffffff813777b2>] flush_to_ldisc+0x142/0x1c0 [<ffffffff81062818>] process_one_work+0x198/0x760 [<ffffffff81063236>] worker_thread+0x186/0x4b0 [<ffffffff810694d3>] kthread+0x93/0xa0 [<ffffffff81753e24>] kernel_thread_helper+0x4/0x10 -> #0 (l2tpsock){+.-...}: [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10 [<ffffffff810a5df0>] lock_acquire+0x90/0x200 [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50 [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth] [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70 [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290 [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00 [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890 [<ffffffff815db019>] ip_output+0x59/0xf0 [<ffffffff815da36d>] ip_local_out+0x2d/0xa0 [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680 [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60 [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30 [<ffffffff815f5300>] tcp_push_one+0x30/0x40 [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040 [<ffffffff81614495>] inet_sendmsg+0x125/0x230 [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0 [<ffffffff81579ece>] sys_sendto+0xfe/0x130 [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&sch->busylock)->rlock); lock(l2tpsock); lock(&(&sch->busylock)->rlock); lock(l2tpsock); *** DEADLOCK *** 5 locks held by netperf/4660: #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e581c>] tcp_sendmsg+0x2c/0x1040 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff815da3e0>] ip_queue_xmit+0x0/0x680 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9ac5>] ip_finish_output+0x135/0x890 #3: (rcu_read_lock_bh){.+....}, at: [<ffffffff81595820>] dev_queue_xmit+0x0/0xe00 #4: (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00 stack backtrace: Pid: 4660, comm: netperf Not tainted 3.6.0-rc3+ #788 Call Trace: [<ffffffff8173dbf8>] print_circular_bug+0x1fb/0x20c [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10 [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0 [<ffffffff810a3f44>] ? __lock_acquire+0x2e4/0x1b10 [<ffffffff810a5df0>] lock_acquire+0x90/0x200 [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50 [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth] [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70 [<ffffffff81594e0e>] ? dev_hard_start_xmit+0x5e/0xa70 [<ffffffff81595961>] ? dev_queue_xmit+0x141/0xe00 [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290 [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00 [<ffffffff81595820>] ? dev_hard_start_xmit+0xa70/0xa70 [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890 [<ffffffff815d9ac5>] ? ip_finish_output+0x135/0x890 [<ffffffff815db019>] ip_output+0x59/0xf0 [<ffffffff815da36d>] ip_local_out+0x2d/0xa0 [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680 [<ffffffff815da3e0>] ? ip_local_out+0xa0/0xa0 [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60 [<ffffffff815fa25e>] ? tcp_md5_do_lookup+0x18e/0x1a0 [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30 [<ffffffff815f5300>] tcp_push_one+0x30/0x40 [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040 [<ffffffff81614495>] inet_sendmsg+0x125/0x230 [<ffffffff81614370>] ? inet_create+0x6b0/0x6b0 [<ffffffff8157e6e2>] ? sock_update_classid+0xc2/0x3b0 [<ffffffff8157e750>] ? sock_update_classid+0x130/0x3b0 [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0 [<ffffffff81162579>] ? fget_light+0x3f9/0x4f0 [<ffffffff81579ece>] sys_sendto+0xfe/0x130 [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8174a0b0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810757e3>] ? finish_task_switch+0x83/0xf0 [<ffffffff810757a6>] ? finish_task_switch+0x46/0xf0 [<ffffffff81752cb7>] ? sysret_check+0x1b/0x56 [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: ipv6: using csum_ipv6_magic requires net/ip6_checksum.hStephen Rothwell2012-09-051-0/+1
| | | | | | | | | | Fixes this build error: net/ipv6/netfilter/nf_nat_l3proto_ipv6.c: In function 'nf_nat_ipv6_csum_recalc': net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:144:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add unknown state to sysfs NIC duplex exportNikolay Aleksandrov2012-09-051-3/+15
| | | | | | | | | | Currently when the NIC duplex state is DUPLEX_UNKNOWN it is exported as full through sysfs, this patch adds support for DUPLEX_UNKNOWN. It is handled the same way as in ethtool. Signed-off-by: Nikolay Aleksandrov <naleksan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: add generic netlink support for tcp_metricsJulian Anastasov2012-09-051-13/+341
| | | | | | | | | | | | | | | | | | | | Add support for genl "tcp_metrics". No locking is changed, only that now we can unlink and delete entries after grace period. We implement get/del for single entry and dump to support show/flush filtering in user space. Del without address attribute causes flush for all addresses, sadly under genl_mutex. v2: - remove rcu_assign_pointer as suggested by Eric Dumazet, it is not needed because there are no other writes under lock - move the flushing code in tcp_metrics_flush_all v3: - remove synchronize_rcu on flush as suggested by Eric Dumazet Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd ↵Masatake YAMATO2012-09-041-5/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entries lsof reports some of socket descriptors as "can't identify protocol" like: [yamato@localhost]/tmp% sudo lsof | grep dbus | grep iden dbus-daem 652 dbus 6u sock ... 17812 can't identify protocol dbus-daem 652 dbus 34u sock ... 24689 can't identify protocol dbus-daem 652 dbus 42u sock ... 24739 can't identify protocol dbus-daem 652 dbus 48u sock ... 22329 can't identify protocol ... lsof cannot resolve the protocol used in a socket because procfs doesn't provide the map between inode number on sockfs and protocol type of the socket. For improving the situation this patch adds an extended attribute named 'system.sockprotoname' in which the protocol name for /proc/PID/fd/SOCKET is stored. So lsof can know the protocol for a given /proc/PID/fd/SOCKET with getxattr system call. A few weeks ago I submitted a patch for the same purpose. The patch was introduced /proc/net/sockfs which enumerates inodes and protocols of all sockets alive on a system. However, it was rejected because (1) a global lock was needed, and (2) the layout of struct socket was changed with the patch. This patch doesn't use any global lock; and doesn't change the layout of any structs. In this patch, a protocol name is stored to dentry->d_name of sockfs when new socket is associated with a file descriptor. Before this patch dentry->d_name was not used; it was just filled with empty string. lsof may use an extended attribute named 'system.sockprotoname' to retrieve the value of dentry->d_name. It is nice if we can see the protocol name with ls -l /proc/PID/fd. However, "socket:[#INODE]", the name format returned from sockfs_dname() was already defined. To keep the compatibility between kernel and user land, the extended attribute is used to prepare the value of dentry->d_name. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2012-09-0410-171/+317
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
| * openvswitch: Increase maximum number of datapath ports.Pravin B Shelar2012-09-037-51/+113
| | | | | | | | | | | | | | Use hash table to store ports of datapath. Allow 64K ports per switch. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * openvswitch: Add support for network namespaces.Pravin B Shelar2012-08-227-123/+207
| | | | | | | | | | | | | | | | | | | | Following patch adds support for network namespace to openvswitch. Since it must release devices when namespaces are destroyed, a side effect of this patch is that the module no longer keeps a refcount but instead cleans up any state when it is unloaded. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
* | net: Add INET dependency on aes crypto for the sake of TCP fastopen.David S. Miller2012-09-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen Rothwell says: ==================== After merging the final tree, today's linux-next build (powerpc ppc44x_defconfig) failed like this: net/built-in.o: In function `tcp_fastopen_ctx_free': tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm' net/built-in.o: In function `tcp_fastopen_reset_cipher': (.text+0x5cccc): undefined reference to `crypto_alloc_base' net/built-in.o: In function `tcp_fastopen_reset_cipher': (.text+0x5cd6c): undefined reference to `crypto_destroy_tfm' Presumably caused by commit 104671636897 ("tcp: TCP Fast Open Server - header & support functions") from the net-next tree. I assume that some dependency on the CRYPTO infrastructure is missing. I have reverted commit 1bed966cc3bd ("Merge branch 'tcp_fastopen_server'") for today. ==================== Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: use list_move_tail instead of list_del/list_add_tailWei Yongjun2012-09-041-6/+4
| | | | | | | | | | | | | | | | | | | | Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller2012-09-0366-1680/+3393
|\ \
| * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso2012-09-03120-1313/+2941
| |\ \ | | | | | | | | | | | | | | | | This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
| * | | netfilter: properly annotate ipv4_netfilter_{init,fini}()Jan Beulich2012-09-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Despite being just a few bytes of code, they should still have proper annotations. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_queue()Michael Wang2012-09-033-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 'list_for_each_continue_rcu' has already been replaced by 'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a parameter can not benefit us any more. This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of nf_queue() and __nf_queue() to save code. Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_iterate()Michael Wang2012-09-033-18/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 'list_for_each_continue_rcu' has already been replaced by 'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a parameter can not benefit us any more. This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of nf_iterate() to save code. Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: remove xt_NOTRACKCong Wang2012-09-033-67/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was scheduled to be removed for a long time. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: netfilter@vger.kernel.org Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nf_conntrack: add nf_ct_timeout_lookupPablo Neira Ayuso2012-09-031-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the new nf_ct_timeout_lookup function to encapsulate the timeout policy attachment that is called in the nf_conntrack_in path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: xt_CT: refactorize xt_ct_tg_checkPablo Neira Ayuso2012-09-031-136/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds xt_ct_set_helper and xt_ct_set_timeout to reduce the size of xt_ct_tg_check. This aims to improve code mantainability by splitting xt_ct_tg_check in smaller chunks. Suggested by Eric Dumazet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: xt_socket: fix compilation warnings with gcc 4.7Pablo Neira Ayuso2012-09-031-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes compilation warnings in xt_socket with gcc-4.7. In file included from net/netfilter/xt_socket.c:22:0: net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’: include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:265:16: note: ‘sport’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:265:9: note: ‘dport’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:264:27: note: ‘saddr’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:264:19: note: ‘daddr’ was declared here In file included from net/netfilter/xt_socket.c:22:0: net/netfilter/xt_socket.c: In function ‘socket_match.isra.4’: include/net/netfilter/nf_tproxy_core.h:75:2: warning: ‘protocol’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:113:5: note: ‘protocol’ was declared here In file included from include/net/tcp.h:37:0, from net/netfilter/xt_socket.c:17: include/net/inet_hashtables.h:356:45: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:112:16: note: ‘sport’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:106:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:112:9: note: ‘dport’ was declared here In file included from include/net/tcp.h:37:0, from net/netfilter/xt_socket.c:17: include/net/inet_hashtables.h:356:15: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:111:16: note: ‘saddr’ was declared here In file included from include/net/tcp.h:37:0, from net/netfilter/xt_socket.c:17: include/net/inet_hashtables.h:356:15: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:111:9: note: ‘daddr’ was declared here In file included from net/netfilter/xt_socket.c:22:0: net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’: include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:268:16: note: ‘sport’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:268:9: note: ‘dport’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:267:27: note: ‘saddr’ was declared here In file included from net/netfilter/xt_socket.c:22:0: include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/xt_socket.c:267:19: note: ‘daddr’ was declared here Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation ↵Patrick McHardy2012-08-303-0/+175
| | | | | | | | | | | | | | | | | | | | | | | | target Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_nat: support IPv6 in TFTP NAT helperPablo Neira Ayuso2012-08-306-8/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_nat: support IPv6 in IRC NAT helperPablo Neira Ayuso2012-08-306-14/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_nat: support IPv6 in SIP NAT helperPatrick McHardy2012-08-306-111/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add IPv6 support to the SIP NAT helper. There are no functional differences to IPv4 NAT, just different formats for addresses. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_nat: support IPv6 in amanda NAT helperPatrick McHardy2012-08-306-8/+7
| | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_nat: support IPv6 in FTP NAT helperPatrick McHardy2012-08-306-20/+27
| | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: ip6tables: add NETMAP targetPatrick McHardy2012-08-303-0/+105
| | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: ip6tables: add REDIRECT targetPatrick McHardy2012-08-303-0/+110
| | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: ip6tables: add MASQUERADE targetPatrick McHardy2012-08-305-2/+151
| | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: ipv6: add IPv6 NAT supportPatrick McHardy2012-08-309-2/+755
| | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | net: core: add function for incremental IPv6 pseudo header checksum updatesPatrick McHardy2012-08-301-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header checksums for IPv6 NAT. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: ipv6: expand skb head in ip6_route_me_harder after oif changePatrick McHardy2012-08-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Expand the skb headroom if the oif changed due to rerouting similar to how IPv4 packets are handled. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: add protocol independent NAT corePatrick McHardy2012-08-3036-1039/+1352
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the IPv4 NAT implementation to a protocol independent core and address family specific modules. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_nat: add protoff argument to packet mangling functionsPatrick McHardy2012-08-3015-195/+295
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For mangling IPv6 packets the protocol header offset needs to be known by the NAT packet mangling functions. Add a so far unused protoff argument and convert the conntrack and NAT helpers to use it in preparation of IPv6 NAT. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_conntrack: restrict NAT helper invocation to IPv4Patrick McHardy2012-08-306-22/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The NAT helpers currently only handle IPv4 packets correctly. Restrict invocation of the helpers to IPv4 in preparation of IPv6 NAT. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages ↵Patrick McHardy2012-08-301-57/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | containing fragments ICMPv6 error messages are tracked by extracting the conntrack tuple of the inner packet and looking up the corresponding conntrack entry. Tuple extraction uses the ->get_l4proto() callback, which in case of fragments returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the first fragment when the entire next header is present, resulting in a failure to find the correct connection tracking entry. This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the fragment offset is zero. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | netfilter: nf_conntrack_ipv6: improve fragmentation handlingPatrick McHardy2012-08-304-15/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IPv6 conntrack fragmentation currently has a couple of shortcomings. Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the defragmented packet is then passed to conntrack, the resulting conntrack information is attached to each original fragment and the fragments then continue their way through the stack. Helper invocation occurs in the POSTROUTING hook, at which point only the original fragments are available. The result of this is that fragmented packets are never passed to helpers. This patch improves the situation in the following way: - If a reassembled packet belongs to a connection that has a helper assigned, the reassembled packet is passed through the stack instead of the original fragments. - During defragmentation, the largest received fragment size is stored. On output, the packet is refragmented if required. If the largest received fragment size exceeds the outgoing MTU, a "packet too big" message is generated, thus behaving as if the original fragments were passed through the stack from an outside point of view. - The ipv6_helper() hook function can't receive fragments anymore for connections using a helper, so it is switched to use ipv6_skip_exthdr() instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the reassembled packets are passed to connection tracking helpers. The result of this is that we can properly track fragmented packets, but still generate ICMPv6 Packet too big messages if we would have before. This patch is also required as a precondition for IPv6 NAT, where NAT helpers might enlarge packets up to a point that they require fragmentation. In that case we can't generate Packet too big messages since the proper MTU can't be calculated in all cases (f.i. when changing textual representation of a variable amount of addresses), so the packet is transparently fragmented iff the original packet or fragments would have fit the outgoing MTU. IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | ipvs: IPv6 MTU checking cleanup and bugfixJesper Dangaard Brouer2012-08-301-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using a common helper function __mtu_check_toobig_v6(). The MTU check for tunnel mode can also use this helper as ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to skb->len. And the 'mtu' variable have been adjusted before calling helper. Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6() were missing a check for skb_is_gso(). This bug e.g. caused issues for KVM IPVS setups, where different Segmentation Offloading techniques are utilized, between guests, via the virtio driver. This resulted in very bad performance, due to the ICMPv6 "too big" messages didn't affect the sender. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | ipv4: fix path MTU discovery with connection trackingPatrick McHardy2012-08-262-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and (in case of forwarded packets) refragments them at POST_ROUTING independent of the IP_DF flag. Refragmentation uses the dst_mtu() of the local route without caring about the original fragment sizes, thereby breaking PMTUD. This patch fixes this by keeping track of the largest received fragment with IP_DF set and generates an ICMP fragmentation required error during refragmentation if that size exceeds the MTU. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net>
* | | | tcp: use PRR to reduce cwin in CWR stateYuchung Cheng2012-09-032-81/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state, in addition to Recovery state. Retire the current rate-halving in CWR. When losses are detected via ACKs in CWR state, the sender enters Recovery state but the cwnd reduction continues and does not restart. Rename and refactor cwnd reduction functions since both CWR and Recovery use the same algorithm: tcp_init_cwnd_reduction() is new and initiates reduction state variables. tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery(). tcp_ends_cwnd_reduction() is previously tcp_complete_cwr(). The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(), and the cwnd moderation inside tcp_enter_cwr() are removed. The unused parameter, flag, in tcp_cwnd_reduction() is also removed. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: move tcp_update_cwnd_in_recoveryYuchung Cheng2012-09-031-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prepare replacing rate halving with PRR algorithm in CWR state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: move tcp_enter_cwr()Yuchung Cheng2012-09-031-23/+23
| |/ / |/| | | | | | | | | | | | | | | | | | | | To prepare replacing rate halving with PRR algorithm in CWR state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | 6lowpan: handle NETDEV_UNREGISTER eventAlan Ott2012-09-011-7/+37
| | | | | | | | | | | | | | | | | | | | | | | | Before, it was impossible to remove a wpan device which had lowpan attached to it. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@tempietto.lan>
* | | 6lowpan: Make a copy of skb's delivered to 6lowpanAlan Ott2012-09-011-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since lowpan_process_data() modifies the skb (by calling skb_pull()), we need our own copy so that it doesn't affect the data received by other protcols (in this case, af_ieee802154). Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@tempietto.lan>
* | | tcp: TCP Fast Open Server - main code pathJerry Chu2012-08-312-27/+309
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the main processing path to complete the TFO server patches. A TFO request (i.e., SYN+data packet with a TFO cookie option) first gets processed in tcp_v4_conn_request(). If it passes the various TFO checks by tcp_fastopen_check(), a child socket will be created right away to be accepted by applications, rather than waiting for the 3WHS to finish. In additon to the use of TFO cookie, a simple max_qlen based scheme is put in place to fend off spoofed TFO attack. When a valid ACK comes back to tcp_rcv_state_process(), it will cause the state of the child socket to switch from either TCP_SYN_RECV to TCP_ESTABLISHED, or TCP_FIN_WAIT1 to TCP_FIN_WAIT2. At this time retransmission will resume for any unack'ed (data, FIN,...) segments. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: TCP Fast Open Server - support TFO listenersJerry Chu2012-08-3111-35/+326
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch builds on top of the previous patch to add the support for TFO listeners. This includes - 1. allocating, properly initializing, and managing the per listener fastopen_queue structure when TFO is enabled 2. changes to the inet_csk_accept code to support TFO. E.g., the request_sock can no longer be freed upon accept(), not until 3WHS finishes 3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg() if it's a TFO socket 4. properly closing a TFO listener, and a TFO socket before 3WHS finishes 5. supporting TCP_FASTOPEN socket option 6. modifying tcp_check_req() to use to check a TFO socket as well as request_sock 7. supporting TCP's TFO cookie option 8. adding a new SYN-ACK retransmit handler to use the timer directly off the TFO socket rather than the listener socket. Note that TFO server side will not retransmit anything other than SYN-ACK until the 3WHS is completed. The patch also contains an important function "reqsk_fastopen_remove()" to manage the somewhat complex relation between a listener, its request_sock, and the corresponding child socket. See the comment above the function for the detail. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: TCP Fast Open Server - header & support functionsJerry Chu2012-08-314-3/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds all the necessary data structure and support functions to implement TFO server side. It also documents a number of flags for the sysctl_tcp_fastopen knob, and adds a few Linux extension MIBs. In addition, it includes the following: 1. a new TCP_FASTOPEN socket option an application must call to supply a max backlog allowed in order to enable TFO on its listener. 2. A number of key data structures: "fastopen_rsk" in tcp_sock - for a big socket to access its request_sock for retransmission and ack processing purpose. It is non-NULL iff 3WHS not completed. "fastopenq" in request_sock_queue - points to a per Fast Open listener data structure "fastopen_queue" to keep track of qlen (# of outstanding Fast Open requests) and max_qlen, among other things. "listener" in tcp_request_sock - to point to the original listener for book-keeping purpose, i.e., to maintain qlen against max_qlen as part of defense against IP spoofing attack. 3. various data structure and functions, many in tcp_fastopen.c, to support server side Fast Open cookie operations, including /proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: remove some deadcodeSorin Dumitru2012-08-311-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | __ipv6_regen_rndid no longer returns anything other than 0 so there's no point in verifying what it returns Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: fix documentation of skb_needs_linearize().Rami Rosen2012-08-311-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_needs_linearize() does not check highmem DMA as it does not call illegal_highdma() anymore, so there is no need to mention highmem DMA here. (Indeed, ~NETIF_F_SG flag, which is checked in skb_needs_linearize(), can be set when illegal_highdma() returns true, and we are assured that illegal_highdma() is invoked prior to skb_needs_linearize() as skb_needs_linearize() is a static method called only once. But ~NETIF_F_SG can be set not only there in this same invocation path. It can also be set when can_checksum_protocol() returns false). see commit 02932ce9e2c136e6fab2571c8e0dd69ae8ec9853, Convert skb_need_linearize() to use precomputed features. Signed-off-by: Rami Rosen <rosenr@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Minor logic clean-up in ipv4_mtuAlexander Duyck2012-08-311-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ipv4_mtu there is some logic where we are testing for a non-zero value and a timer expiration, then setting the value to zero, and then testing if the value is zero we set it to a value based on the dst. Instead of bothering with the extra steps it is easier to just cleanup the logic so that we set it to the dst based value if it is zero or if the timer has expired. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>