summaryrefslogtreecommitdiffstats
path: root/drivers/net/geneve.c
Commit message (Collapse)AuthorAgeFilesLines
* geneve: do not use RT_TOS for IPv6 flowlabelMatthias May2022-08-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | commit ca2bb69514a8bc7f83914122f0d596371352416c upstream. According to Guillaume Nault RT_TOS should never be used for IPv6. Quote: RT_TOS() is an old macro used to interprete IPv4 TOS as described in the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4 code, although, given the current state of the code, most of the existing calls have no consequence. But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS" field to be interpreted the RFC 1349 way. There's no historical compatibility to worry about. Fixes: 3a56f86f1be6 ("geneve: handle ipv6 priority like ipv4 tos") Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skbPhillip Potter2021-05-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d13f048dd40e8577260cd43faea8ec9b77520197 ] Modify the header size check in geneve6_xmit_skb and geneve_xmit_skb to use pskb_inet_may_pull rather than pskb_network_may_pull. This fixes two kernel selftest failures introduced by the commit introducing the checks: IPv4 over geneve6: PMTU exceptions IPv4 over geneve6: PMTU exceptions - nexthop objects It does this by correctly accounting for the fact that IPv4 packets may transit over geneve IPv6 tunnels (and vice versa), and still fixes the uninit-value bug fixed by the original commit. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 6628ddfec758 ("net: geneve: check skb is large enough for IPv4/IPv6 header") Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: geneve: check skb is large enough for IPv4/IPv6 headerPhillip Potter2021-04-281-0/+6
| | | | | | | | | | | | | | | | | [ Upstream commit 6628ddfec7580882f11fdc5c194a8ea781fdadfa ] Check within geneve_xmit_skb/geneve6_xmit_skb that sk_buff structure is large enough to include IPv4 or IPv6 header, and reject if not. The geneve_xmit_skb portion and overall idea was contributed by Eric Dumazet. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=abe95dc3e3e9667fc23b8d81f29ecad95c6f106f Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+2e406a9ac75bb71d4b7a@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Revert "geneve: pull IP header before ECN decapsulation"Jakub Kicinski2020-12-111-16/+4
| | | | | | | | | | | | | | | | | | | commit c02bd115b1d25931159f89c7d9bf47a30f5d4b41 upstream. This reverts commit 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation"). Eric says: "network header should have been pulled already before hitting geneve_rx()". Let's revert the syzbot fix since it's causing more harm than good, and revisit. Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation") Link: https://bugzilla.kernel.org/show_bug.cgi?id=210569 Link: https://lore.kernel.org/netdev/CANn89iJVWfb=2i7oU1=D55rOyQnBbbikf+Mc6XHMkY7YX-yGEw@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* geneve: pull IP header before ECN decapsulationEric Dumazet2020-12-081-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4179b00c04d18ea7013f68d578d80f3c9d13150a ] IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume IP header is already pulled. geneve does not ensure this yet. Fixing this generically in IP_ECN_decapsulate() and IP6_ECN_decapsulate() is not possible, since callers pass a pointer that might be freed by pskb_may_pull() syzbot reported : BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline] BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260 CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline] INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline] __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:449 [inline] ip_rcv_finish net/ipv4/ip_input.c:428 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539 __netif_receive_skb_one_core net/core/dev.c:5315 [inline] __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429 process_backlog+0x523/0xc10 net/core/dev.c:6319 napi_poll+0x420/0x1010 net/core/dev.c:6763 net_rx_action+0x35c/0xd40 net/core/dev.c:6833 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77 do_softirq kernel/softirq.c:343 [inline] __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline] __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173 packet_snd net/packet/af_packet.c:2992 [inline] packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] __sys_sendto+0x9dc/0xc80 net/socket.c:1992 __do_sys_sendto net/socket.c:2004 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:2000 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ip_tunnels: Set tunnel option flag when tunnel metadata is presentYi-Hung Wei2020-11-241-2/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 9c2e14b48119b39446031d29d994044ae958d8fc ] Currently, we may set the tunnel option flag when the size of metadata is zero. For example, we set TUNNEL_GENEVE_OPT in the receive function no matter the geneve option is present or not. As this may result in issues on the tunnel flags consumers, this patch fixes the issue. Related discussion: * https://lore.kernel.org/netdev/1604448694-19351-1-git-send-email-yihung.wei@gmail.com/T/#u Fixes: 256c87c17c53 ("net: check tunnel option type in tunnel flags") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Link: https://lore.kernel.org/r/1605053800-74072-1-git-send-email-yihung.wei@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* geneve: add transport ports in route lookup for geneveMark Gray2020-09-261-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 34beb21594519ce64a55a498c2fe7d567bc1ca20 ] This patch adds transport ports information for route lookup so that IPsec can select Geneve tunnel traffic to do encryption. This is needed for OVS/OVN IPsec with encrypted Geneve tunnels. This can be tested by configuring a host-host VPN using an IKE daemon and specifying port numbers. For example, for an Openswan-type configuration, the following parameters should be configured on both hosts and IPsec set up as-per normal: $ cat /etc/ipsec.conf conn in ... left=$IP1 right=$IP2 ... leftprotoport=udp/6081 rightprotoport=udp ... conn out ... left=$IP1 right=$IP2 ... leftprotoport=udp rightprotoport=udp/6081 ... The tunnel can then be setup using "ip" on both hosts (but changing the relevant IP addresses): $ ip link add tun type geneve id 1000 remote $IP2 $ ip addr add 192.168.0.1/24 dev tun $ ip link set tun up This can then be tested by pinging from $IP1: $ ping 192.168.0.2 Without this patch the traffic is unencrypted on the wire. Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx@gmail.com> Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* geneve: change from tx_error to tx_dropped on missing metadataJiri Benc2020-06-251-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9d149045b3c0e44c049cdbce8a64e19415290017 ] If the geneve interface is in collect_md (external) mode, it can't send any packets submitted directly to its net interface, as such packets won't have metadata attached. This is expected. However, the kernel itself sends some packets to the interface, most notably, IPv6 DAD, IPv6 multicast listener reports, etc. This is not wrong, as tunnel metadata can be specified in routing table (although technically, that has never worked for IPv6, but hopefully will be fixed eventually) and then the interface must correctly participate in IPv6 housekeeping. The problem is that any such attempt increases the tx_error counter. Just bringing up a geneve interface with IPv6 enabled is enough to see a number of tx_errors. That causes confusion among users, prompting them to find a network error where there is none. Change the counter used to tx_dropped. That better conveys the meaning (there's nothing wrong going on, just some packets are getting dropped) and hopefully will make admins panic less. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca2020-04-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream. ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.19: - Drop change in lwt_bpf.c - Delete now-unused "ret" in mlx5e_route_lookup_ipv6() - Initialise "out_dev" in mlx5e_create_encap_header_ipv6() to avoid introducing a spurious "may be used uninitialised" warning - Adjust filenames, context, indentation] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* geneve: move debug check after netdev unregisterFlorian Westphal2020-04-021-2/+6
| | | | | | | | | | | | | | [ Upstream commit 0fda7600c2e174fe27e9cf02e78e345226e441fa ] The debug check must be done after unregister_netdevice_many() call -- the list_del() for this is done inside .ndo_stop. Fixes: 2843a25348f8 ("geneve: speedup geneve tunnels dismantle") Reported-and-tested-by: <syzbot+68a8ed58e3d17c700de5@syzkaller.appspotmail.com> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* geneve: correctly handle ipv6.disable module parameterJiri Benc2019-03-101-3/+8
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cf1c9ccba7308e48a68fa77f476287d9d614e4c7 ] When IPv6 is compiled but disabled at runtime, geneve_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. This is the same fix as what commit d074bf960044 ("vxlan: correctly handle ipv6.disable module parameter") is doing for vxlan. Note there's also commit c0a47e44c098 ("geneve: should not call rt6_lookup() when ipv6 was disabled") which fixes a similar issue but for regular tunnels, while this patch is needed for metadata based tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* geneve: should not call rt6_lookup() when ipv6 was disabledHangbin Liu2019-02-271-3/+7
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c0a47e44c0980b3b23ee31fa7936d70ea5dce491 ] When we add a new GENEVE device with IPv6 remote, checking only for IS_ENABLED(CONFIG_IPV6) is not enough as we may disable IPv6 in the kernel command line (ipv6.disable=1), and calling rt6_lookup() would cause a NULL pointer dereference. v2: - don't mix declarations and code (reported by Stefano Brivio, Eric Dumazet) - there's no need to use in6_dev_get() as we only need to check that idev exists (reported by David Ahern). This is under RTNL, so we can simply use __in6_dev_get() instead (Stefano, Eric). Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: c40e89fd358e9 ("geneve: configure MTU based on a lower device") Cc: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* geneve, vxlan: Don't set exceptions if skb->len < mtuStefano Brivio2018-10-171-4/+3
| | | | | | | | | | | We shouldn't abuse exceptions: if the destination MTU is already higher than what we're transmitting, no exception should be created. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve, vxlan: Don't check skb_dst() twiceStefano Brivio2018-10-171-11/+4
| | | | | | | | | | Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids that we try updating PMTU for a non-existent destination, but didn't clean up cases where the check was already explicit. Drop those redundant checks. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-07-031-1/+1
|\ | | | | | | | | | | | | | | | | | | Simple overlapping changes in stmmac driver. Adjust skb_gro_flush_final_remcsum function signature to make GRO list changes in net-next, as per Stephen Rothwell's example merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: fix use-after-free in GRO with ESPSabrina Dubroca2018-07-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: check tunnel option type in tunnel flagsPieter Jansen van Vuuren2018-06-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Check the tunnel option type stored in tunnel flags when creating options for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel options on interfaces that are not associated with them. Make sure all users of the infrastructure set correct flags, for the BPF helper we have to set all bits to keep backward compatibility. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Convert GRO SKB handling to list_head.David Miller2018-06-261-5/+6
|/ | | | | | | | | | | | | | | Manage pending per-NAPI GRO packets via list_head. Return an SKB pointer from the GRO receive handlers. When GRO receive handlers return non-NULL, it means that this SKB needs to be completed at this time and removed from the NAPI queue. Several operations are greatly simplified by this transformation, especially timing out the oldest SKB in the list when gro_count exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue in reverse order. Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: configure MTU based on a lower deviceAlexey Kodanev2018-04-201-3/+53
| | | | | | | | | | | | | | | | | | | | | Currently, on a new link creation or when 'remote' address parameter is updated, an MTU is not changed and always equals 1500. When a lower device has a larger MTU, it might not be efficient, e.g. for UDP, and requires the manual MTU adjustments to match the MTU of the lower device. This patch tries to automate this process, finds a lower device using the 'remote' address parameter, then uses its MTU to tune GENEVE's MTU: * on a new link creation * when 'remote' parameter is changed Also with this patch, the MTU from a user, on a new link creation, is passed to geneve_change_mtu() where it is verified, and MTU adjustments with a lower device is skipped in that case. Prior that change, it was possible to set the invalid MTU values on a new link creation. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: check MTU for a minimum in geneve_change_mtu()Alexey Kodanev2018-04-201-3/+2
| | | | | | | | | geneve_change_mtu() will be used not only as ndo_change_mtu() callback, but also to verify a user specified MTU on a new link creation in the next patch. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: cleanup hard coded value for Ethernet header lengthAlexey Kodanev2018-04-201-4/+5
| | | | | | | | | Use ETH_HLEN instead and introduce two new macros: GENEVE_IPV4_HLEN and GENEVE_IPV6_HLEN that include Ethernet header length, corresponded IP header length and GENEVE_BASE_HLEN. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: remove white-space before '#if IS_ENABLED(CONFIG_IPV6)'Alexey Kodanev2018-04-201-1/+1
| | | | | Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-01-291-2/+2
|\ | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: don't call update_pmtu unconditionallyNicolas Dichtel2018-01-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-01-091-0/+14
|\|
| * geneve: update skb dst pmtu on tx pathXin Long2018-01-021-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") has fixed a performance issue caused by the change of lower dev's mtu for vxlan. The same thing needs to be done for geneve as well. Note that geneve cannot adjust it's mtu according to lower dev's mtu when creating it. The performance is very low later when netperfing over it without fixing the mtu manually. This patch could also avoid this issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve: speedup geneve tunnels dismantleHaishuang Yan2017-12-191-8/+16
|/ | | | | | | | Since we now hold RTNL lock in geneve_exit_net, it's better batch them to speedup geneve tunnel dismantle. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6Hangbin Liu2017-11-241-1/+15
| | | | | | | | | | | | Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't make sense if we haven't enabled CONFIG_IPV6. Fix it by adding if IS_ENABLED(CONFIG_IPV6) check. Fixes: abe492b4f50c ("geneve: UDP checksum configuration via netlink") Fixes: fd7eafd02121 ("geneve: fix fill_info when link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: fix fill_info when link downHangbin Liu2017-11-151-14/+10
| | | | | | | | | | | | | | geneve->sock4/6 were added with geneve_open and released with geneve_stop. So when geneve link down, we will not able to show remote address and checksum info after commit 11387fe4a98 ("geneve: fix fill_info when using collect_metadata"). Fix this by avoid passing *_REMOTE{,6} for COLLECT_METADATA since they are mutually exclusive, and always show UDP_ZERO_CSUM6_RX info. Fixes: 11387fe4a98 ("geneve: fix fill_info when using collect_metadata") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: exit_net cleanup check addedVasily Averin2017-11-141-0/+1
| | | | | | | | Be sure that sock_list initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-10-221-6/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were quite a few overlapping sets of changes here. Daniel's bug fix for off-by-ones in the new BPF branch instructions, along with the added allowances for "data_end > ptr + x" forms collided with the metadata additions. Along with those three changes came veritifer test cases, which in their final form I tried to group together properly. If I had just trimmed GIT's conflict tags as-is, this would have split up the meta tests unnecessarily. In the socketmap code, a set of preemption disabling changes overlapped with the rename of bpf_compute_data_end() to bpf_compute_data_pointers(). Changes were made to the mv88e6060.c driver set addr method which got removed in net-next. The hyperv transport socket layer had a locking change in 'net' which overlapped with a change of socket state macro usage in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * geneve: Fix function matching VNI and tunnel ID on big-endianStefano Brivio2017-10-211-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On big-endian machines, functions converting between tunnel ID and VNI use the three LSBs of tunnel ID storage to map VNI. The comparison function eq_tun_id_and_vni(), on the other hand, attempted to map the VNI from the three MSBs. Fix it by using the same check implemented on LE, which maps VNI from the three LSBs of tunnel ID. Fixes: 2e0b26e10352 ("geneve: Optimize geneve device lookup.") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve: Get rid of is_all_zero(), streamline is_tnl_info_zero()Stefano Brivio2017-10-221-16/+3
|/ | | | | | | | | No need to re-invent memchr_inv() with !is_all_zero(). While at it, replace conditional and return clauses with a single return clause in is_tnl_info_zero(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: use netlink_ext_ack for error reporting in rtnl operationsGirish Moodalbail2017-08-111-36/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | Add extack error messages for failure paths while creating/modifying geneve devices. Once extack support is added to iproute2, more meaningful and helpful error messages will be displayed making it easy for users to discern what went wrong. Before: ======= $ ip link add gen1 address 0:1:2:3:4:5:6 type geneve id 200 \ remote 192.168.13.2 RTNETLINK answers: Invalid argument After: ====== $ ip link add gen1 address 0:1:2:3:4:5:6 type geneve id 200 \ remote 192.168.13.2 Error: Provided link layer address is not Ethernet Also, netdev_dbg() calls used to log errors associated with Netlink request have been removed. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-08-101-1/+1
|\ | | | | | | | | | | | | | | | | Mainline had UFO fixes, but UFO is removed in net-next so we take the HEAD hunks. Minor context conflict in bcmsysport statistics bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
| * geneve: maximum value of VNI cannot be usedGirish Moodalbail2017-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range of values for it would be from 0 to 16777215 (2^24 -1). However, one cannot create a geneve device with VNI set to 16777215. This patch fixes this issue. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve/vxlan: offload ports on register/unregister eventsSabrina Dubroca2017-07-241-1/+6
| | | | | | | | | | | | | | | | | | This improves consistency of handling when moving a netdev to another netns. Most drivers currently do a full reset when the device goes up, so that will flush the offload state anyway. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve/vxlan: add support for NETDEV_UDP_TUNNEL_DROP_INFOSabrina Dubroca2017-07-241-6/+13
| | | | | | | | | | Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve: add rtnl changelink supportGirish Moodalbail2017-07-241-42/+176
|/ | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds changelink rtnl operation support for geneve devices and the code changes involve: - added geneve_quiesce() which quiesces the geneve device data path for both TX and RX. This lets us perform the changelink operation atomically w.r.t data path. Also added geneve_unquiesce() to reverse the operation of geneve_quiesce(). - refactor geneve_newlink into geneve_nl2info to be used by both geneve_newlink and geneve_changelink - geneve_nl2info takes a changelink boolean argument to isolate changelink checks. - Allow changing only a few attributes (ttl, tos, and remote tunnel endpoint IP address (within the same address family)): - return -EOPNOTSUPP for attributes that cannot be changed for now. Incremental patches can make the non-supported one available in the future if needed. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: fix hlist corruptionJiri Benc2017-07-031-16/+32
| | | | | | | | | | It's not a good idea to add the same hlist_node to two different hash lists. This leads to various hard to debug memory corruptions. Fixes: 8ed66f0e8235 ("geneve: implement support for IPv6-based tunnels") Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer2017-06-261-1/+2
| | | | | | | | Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer2017-06-261-1/+2
| | | | | | | | Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* networking: make skb_push & __skb_push return void pointersJohannes Berg2017-06-161-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + *(u8 *)fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic *(u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-06-151-1/+1
|\ | | | | | | | | | | | | The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Fix inconsistent teardown and release of private netdev state.David S. Miller2017-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve: add missing rx stats accountingGirish Moodalbail2017-06-091-12/+24
|/ | | | | | | | There are few places on the receive path where packet drops and packet errors were not accounted for. This patch fixes that issue. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: fix needed_headroom and max_mtu for collect_metadataEric Garver2017-06-041-1/+1
| | | | | | | | | | | | | | | | Since commit 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") when using COLLECT_METADATA geneve devices are created with too small of a needed_headroom and too large of a max_mtu. This is because ip_tunnel_info_af() is not valid with the device level info when using COLLECT_METADATA and we mistakenly fall into the IPv4 case. For COLLECT_METADATA, always use the worst case of ipv6 since both sockets are created. Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: fix fill_info when using collect_metadataEric Garver2017-05-251-3/+5
| | | | | | | | | | | | | | Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is because it uses ip_tunnel_info_af() with the device level info, which is not valid for COLLECT_METADATA. Fix by checking for the presence of the actual sockets. Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: fix incorrect setting of UDP checksum flagGirish Moodalbail2017-04-301-1/+1
| | | | | | | | | | | | | | | | | | | Creating a geneve link with 'udpcsum' set results in a creation of link for which UDP checksum will NOT be computed on outbound packets, as can be seen below. 11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0 geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum Similarly, creating a link with 'noudpcsum' set results in a creation of link for which UDP checksum will be computed on outbound packets. Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: lock RCU on TX pathJakub Kicinski2017-03-011-0/+2
| | | | | | | | | There is no guarantees that callers of the TX path will hold the RCU lock. Grab it explicitly. Fixes: fceb9c3e3825 ("geneve: avoid using stale geneve socket.") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>