| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b5141915b5aec3b29a63db869229e3741ebce258 ]
The commit cited below removed the RCU read-side critical section from
rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
without RCU protection.
This results in the following warning [1] in the VXLAN driver, which
relied on the callback being invoked from an RCU read-side critical
section.
Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
in the bridge driver.
[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
-----------------------------
drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by bridge/166:
#0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090
stack backtrace:
CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x100/0x184
lockdep_rcu_suspicious+0x153/0x15d
vxlan_fdb_dump+0x51e/0x6d0
rtnl_fdb_dump+0x4dc/0xad0
netlink_dump+0x540/0x1090
__netlink_dump_start+0x695/0x950
rtnetlink_rcv_msg+0x802/0xbd0
netlink_rcv_skb+0x17a/0x480
rtnetlink_rcv+0x22/0x30
netlink_unicast+0x5ae/0x890
netlink_sendmsg+0x98a/0xf40
__sys_sendto+0x279/0x3b0
__x64_sys_sendto+0xe6/0x1a0
do_syscall_64+0x54/0xa0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe14fa2ade0
Code: Bad RIP value.
RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit a0dced17ad9dc08b1b25e0065b54c97a318e6e8b ]
This reverts commit 71130f29979c7c7956b040673e6b9d5643003176.
In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to
make sure the tos value are filtered by RT_TOS() based on RFC1349.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | MBZ |
+-----+-----+-----+-----+-----+-----+-----+-----+
But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+
So with
IPTOS_TOS_MASK 0x1E
RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)
the first 3 bits DSCP info will get lost.
To take all the DSCP info in xmit, we should revert the patch and just push
all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later.
Fixes: 71130f29979c ("vxlan: fix tos value before xmit")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8066e6b449e050675df48e7c4b16c29f00507ff0 ]
When proxy mode is enabled the vxlan device might reply to Neighbor
Solicitation (NS) messages on behalf of remote hosts.
In case the NS message includes the "Source link-layer address" option
[1], the vxlan device will use the specified address as the link-layer
destination address in its reply.
To avoid an infinite loop, break out of the options parsing loop when
encountering an option with length zero and disregard the NS message.
This is consistent with the IPv6 ndisc code and RFC 4886 which states
that "Nodes MUST silently discard an ND packet that contains an option
with length zero" [2].
[1] https://tools.ietf.org/html/rfc4861#section-4.3
[2] https://tools.ietf.org/html/rfc4861#section-4.6
Fixes: 4b29dba9c085 ("vxlan: fix nonfunctional neigh_reduce()")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream.
ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.
All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().
This requires some changes in all the callers, as these two functions
take different arguments and have different return types.
Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.4:
- Drop changes in lwt_bpf.c, mlx5, and rxe
- Adjust filename, context, indentation]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit cf5da330bbdd0c06b05c525a3d1d58ccd82c87a6 upstream.
The function vxlan_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.
Note that the function vxlan_dev_create() is moved after the rtnl stuff so
that vxlan_dellink() can be called in this function.
Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN")
CC: Thomas Graf <tgraf@suug.ch>
CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit bbec7802c6948c8626b71a4fe31283cb4691c358 upstream.
vxlan driver has bypass for local vxlan traffic, but that
depends on information about all VNIs on local system in
vxlan driver. This is not available in case of LWT.
Therefore following patch disable encap bypass for LWT
vxlan traffic.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
Reported-by: Jakub Libosvar <jlibosva@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 384d91c267e621e0926062cfb3f20cb72dc16928 ]
gro_cells_init() returns error if memory allocation is failed.
But the vxlan module doesn't check the return value of gro_cells_init().
Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer")`
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 71130f29979c7c7956b040673e6b9d5643003176 ]
Before ip_tunnel_ecn_encap() and udp_tunnel_xmit_skb() we should filter
tos value by RT_TOS() instead of using config tos directly.
vxlan_get_route() would filter the tos to fl4.flowi4_tos but we didn't
return it back, as geneve_get_v4_rt() did. So we have to use RT_TOS()
directly in function ip_tunnel_ecn_encap().
Fixes: 206aaafcd279 ("VXLAN: Use IP Tunnels tunnel ENC encap API")
Fixes: 1400615d64cf ("vxlan: allow setting ipv6 traffic class")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit eadf52cf1852196a1363044dcda22fa5d7f296f7 ]
This patch is to improve the tun_info options_len by dropping
the skb when TUNNEL_VXLAN_OPT is set but options_len is less
than vxlan_metadata. This can void a potential out-of-bounds
access on ip_tun_info.
Fixes: ee122c79d422 ("vxlan: Flow based tunneling")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit cc4807bb609230d8959fd732b0bf3bd4c2de8eac ]
Commit ad6c9986bcb62 ("vxlan: Fix GRO cells race condition between
receive and link delete") fixed a race condition for the typical case a vxlan
device is dismantled from the current netns. But if a netns is dismantled,
vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue()
of all the vxlan tunnels that are related to this netns.
In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before
unregister_netdevice_queue(). This means that the gro_cells_destroy() call is
done too soon, for the same reasons explained in above commit.
So we need to fully respect the RCU rules, and thus must remove the
gro_cells_destroy() call or risk use after-free.
Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer")
Signed-off-by: Suanming.Mou <mousuanming@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ad6c9986bcb627c7c22b8f9e9a934becc27df87c ]
If we receive a packet while deleting a VXLAN device, there's a chance
vxlan_rcv() is called at the same time as vxlan_dellink(). This is fine,
except that vxlan_dellink() should never ever touch stuff that's still in
use, such as the GRO cells list.
Otherwise, vxlan_rcv() crashes while queueing packets via
gro_cells_receive().
Move the gro_cells_destroy() to vxlan_uninit(), which runs after the RCU
grace period is elapsed and nothing needs the gro_cells anymore.
This is now done in the same way as commit 8e816df87997 ("geneve: Use GRO
cells infrastructure.") originally implemented for GENEVE.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 58ce31cca1ff ("vxlan: GRO support at tunnel layer")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 59cbf56fcd98ba2a715b6e97c4e43f773f956393 ]
Same reasons than the ones explained in commit 4179cb5a4c92
("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
netif_rx() or gro_cells_receive() must be called under a strict contract.
At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.
A similar protocol is used for gro_cells infrastructure, as
gro_cells_destroy() will be called only after a full rcu
grace period is observed after IFF_UP has been cleared.
Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP
Virtual drivers do not have this guarantee, and must
therefore make the check themselves.
Otherwise we risk use-after-free and/or crashes.
Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4179cb5a4c924cd233eaadd081882425bc98f44e ]
netif_rx() must be called under a strict contract.
At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.
Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP
Virtual drivers do not have this guarantee, and must
therefore make the check themselves.
Otherwise we risk use-after-free and/or crashes.
Note this patch also fixes a small issue that came
with commit ce6502a8f957 ("vxlan: fix a use after free
in vxlan_encap_bypass"), since the dev->stats.rx_dropped
change was done on the wrong device.
Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Fixes: ce6502a8f957 ("vxlan: fix a use after free in vxlan_encap_bypass")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Machata <petrm@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e0090a9e979de5202c7d16c635dea2f005221073 ]
This patch fixes vxlan_snoop to not move permanent fdb entries
on learn events. This is consistent with the bridge fdb
handling of permanent entries.
Fixes: 26a41ae60438 ("vxlan: only migrate dynamic FDB entries")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d6acfeb17d030bb3907e77c048b0e7783ad8e5a9 ]
vxlan dev currently ignores lowerdev's gso_max_size, which adversely
affects TSO performance of liquidio if it's the lowerdev. Egress TCP
packets' skb->len often exceed liquidio's advertised gso_max_size. This
may happen on other NIC drivers.
Fix it by assigning lowerdev's gso_max_size to that of vxlan dev. Might as
well do likewise for gso_max_segs.
Single flow TSO throughput of liquidio as lowerdev (using iperf3):
Before the patch: 139 Mbps
After the patch : 8.68 Gbps
Percent increase: 6,144 %
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit efb5f68f32995c146944a9d4257c3cf8eae2c4a1 ]
Mac aging is applicable only for dynamically learnt remote mac
entries. Check for user configured static remote mac entries
and skip aging.
Signed-off-by: Balakrishnan Raman <ramanb@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit a53cb29b0af346af44e4abf13d7e59f807fba690 ]
Adding a vxlan interface to a socket isn't symmetrical, while adding
is done in vxlan_open() the deletion is done in vxlan_dellink().
This can cause a use-after-free error when we close the vxlan
interface before deleting it.
We add vxlan_vs_del_dev() to match vxlan_vs_add_dev() and call
it from vxlan_stop() to match the call from vxlan_open().
Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Acked-by: Jiri Benc <jbenc@redhat.com>
Tested-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4e37d6911f36545b286d15073f6f2222f840e81c ]
The incorrect check caused an off-by-one error: the maximum VID 0xffffff
was unusable.
Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit fcd91dd449867c6bfe56a81cabba76b829fd05cd ]
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.
This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.
Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 7e059158d57b79159eaf1f504825d19866ef2c42 ]
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.
Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.
Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 72564b59ffc438ea103b0727a921aaddce766728 ]
Allow the MTU of vxlan devices without an underlying device to be set
to larger values (up to a maximum based on IP packet limits and vxlan
overhead).
Previously, their MTUs could not be set to higher than the
conventional ethernet value of 1500. This is a very arbitrary value
in the context of vxlan, and prevented vxlan devices from being able
to take advantage of jumbo frames etc.
The default MTU remains 1500, for compatibility.
Signed-off-by: David Wragg <david@weave.works>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ce577668a426c6a9e2470a09dcd07fbd6e45272a ]
When create a new vxlan link, example:
ip link add vtap mtu 1440 type vxlan vni 1 dev eth0
The argument "mtu" has no effect, because it is not set to conf->mtu. The
default value is used in vxlan_dev_configure function.
This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device
configuration).
Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration)
Signed-off-by: Chen Haiquan <oc@yunify.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e5aed006be918af163eb397e45aa5ea6cefd5e01 ]
In case we find a socket with encapsulation enabled we should call
the encap_recv function even if just a udp header without payload is
available. The callbacks are responsible for correctly verifying and
dropping the packets.
Also, in case the header validation fails for geneve and vxlan we
shouldn't put the skb back into the socket queue, no one will pick
them up there. Instead we can simply discard them in the respective
encap_recv functions.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4024fcf70556311521e7b6cf79fa50e16f31013a ]
When signalling to metadata consumers that the metadata_dst entry
carries additional GBP extension data for vxlan (TUNNEL_VXLAN_OPT),
the dst's vxlan_metadata information is populated, but options_len
is left to zero. F.e. in ovs, ovs_flow_key_extract() checks for
options_len before extracting the data through ip_tunnel_info_opts_get().
Geneve uses ip_tunnel_info_opts_set() helper in receive path, which
sets options_len internally, vxlan however uses ip_tunnel_info_opts(),
so when filling vxlan_metadata, we do need to update options_len.
Fixes: 4c22279848c5 ("ip-tunnel: Use API to access tunnel metadata options.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 35e2d1152b22eae99c961affbe85374bef05a775 ]
When configuring checksums on UDP tunnels, the flags are different
for IPv4 vs. IPv6 (and reversed). However, when lightweight tunnels
are enabled the flags used are always the IPv4 versions, which are
ignored in the IPv6 code paths. This uses the correct IPv6 flags, so
checksums can be controlled appropriately.
Fixes: a725e514 ("vxlan: metadata based tunneling for IPv6")
Fixes: abe492b4 ("geneve: UDP checksum configuration via netlink")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 07b9b37c227cb8d88d478b4a9c5634fee514ede1 ]
When a vxlan interface is created, the driver checks that there is not
another vxlan interface with the same properties. To do this, it checks
the existing vxlan udp socket. Since commit 1c51a9159dde, the creation of
the vxlan socket is done only when the interface is set up, thus it breaks
that test.
Example:
$ ip l a vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
$ ip l a vxlan11 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
$ ip -br l | grep vxlan
vxlan10 DOWN f2:55:1c:6a:fb:00 <BROADCAST,MULTICAST>
vxlan11 DOWN 7a:cb:b9:38:59:0d <BROADCAST,MULTICAST>
Instead of checking sockets, let's loop over the vxlan iface list.
Fixes: 1c51a9159dde ("vxlan: fix race caused by dropping rtnl_unlock")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When looking for outer IP header, use the actual socket address family, not
the address family of the default destination which is not set for metadata
based interfaces (and doesn't have to match the address family of the
received packet even if it was set).
Fix also the misleading comment.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Fill the metadata correctly even when tunneling over IPv6. Also, check that
the provided metadata is of an address family that is supported by the
tunnel.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
Will be used also for ndo_fill_metadata_dst.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h
The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.
The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices. Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
drivers/net/usb/asix_common.c
net/ipv4/inet_connection_sock.c
net/switchdev/switchdev.c
In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.
The other two conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Before lightweight tunnels existed, it really didn't make sense to
create a tunnel that was not fully specified, such as without a
destination IP address - the resulting packets would go nowhere.
However, with lightweight tunnels, the opposite is true - it doesn't
make sense to require this information when it will be provided later
on by the route. This loosens the requirements for this information.
An alternative would be to allow the relaxed version only when
COLLECT_METADATA is enabled. However, since there are several
variations on this theme (such as NBMA tunnels in GRE), just dropping
the restrictions seems the most consistent across tunnels and with
the existing configuration.
CC: John Linville <linville@tuxdriver.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is
much more user friendly: it's not necessary to create two vxlan interfaces
and pay attention to using the right one in routing rules.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
| |
Make vxlan_sock_add both alloc the socket and attach it to vxlan_dev. Let
vxlan_sock_release accept vxlan_dev as its parameter instead of vxlan_sock.
This makes vxlan_sock_add and vxlan_sock release complementary. It reduces
code duplication in the next patch.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
When IPv6 address is set without IPv6 configured, the vxlan socket is mostly
treated as an IPv4 one but various lookus in fdb etc. still take the
AF_INET6 into account. This creates incosistencies with weird consequences.
Just reject IPv6 addresses in such case.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vxlan_setup is called when allocating the net_device, i.e. way before
vxlan_newlink (or vxlan_dev_configure) is called. This means
vxlan->default_dst is actually unset in vxlan_setup and the condition that
sets needed_headroom always takes the else branch.
Set the needed_headrom at the point when we have the information about
the address family available.
Fixes: e4c7ed415387c ("vxlan: add ipv6 support")
Fixes: 2853af6a2ea1a ("vxlan: use dev->needed_headroom instead of dev->hard_header_len")
CC: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
drivers/net/vxlan.c: In function ‘vxlan_udp_encap_recv’:
drivers/net/vxlan.c:1226: warning: ‘info’ may be used uninitialized in this function
While this warning is a false positive, it can be killed easily by
getting rid of the pointer intermediary and referring directly to the
ip_tunnel_info structure.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default (subject to the sysctl settings), IPv6 sockets listen also for
IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in
packets received through an IPv6 socket.
In addition, it's currently not possible to have both IPv4 and IPv6 vxlan
tunnel on the same port (unless bindv6only sysctl is enabled), as it's not
possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's
no way to specify both IPv4 and IPv6 remote/group IP addresses.
Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not
done globally in udp_tunnel, as l2tp and tipc seems to work okay when
receiving IPv4 packets on IPv6 socket and people may rely on this behavior.
The other tunnels (geneve and fou) do not support IPv6.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's currently nothing preventing directing packets with IPv6
encapsulation data to IPv4 tunnels (and vice versa). If this happens,
IPv6 addresses are incorrectly interpreted as IPv4 ones.
Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
in ip_tunnel_info. Reject packets at appropriate places if they are supposed
to be encapsulated into an incompatible protocol.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mode field holds a single bit of information only (whether the
ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags.
This allows more mode flags to be added.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Before 56ef9c909b40[1] it used to ignore all errors from igmp_join().
That commit enhanced that and made it error out whatever error happened
with igmp_join(), but that's not good because when using multicast
groups vxlan will try to join it multiple times if the socket is reused
and then the 2nd and further attempts will fail with EADDRINUSE.
As we don't track to which groups the socket is already subscribed, it's
okay to just ignore that error.
Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Reported-by: John Nielsen <lists@jnielsen.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce function udp_tun_rx_dst() to initialize tunnel dst on
receive path.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel.
Testing:
Ran 200 netperf TCP_STREAM instance
- With fix (GRO enabled on VXLAN interface)
Verify GRO is happening.
9084 MBps tput
3.44% CPU utilization
- Without fix (GRO disabled on VXLAN interface)
Verified no GRO is happening.
9084 MBps tput
5.54% CPU utilization
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The remote checksum offload GRO did not consider the case that frag0
might be in use. This patch fixes that by accessing headers using the
skb_gro functions and not saving offsets relative to skb->head.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support metadata based (formerly flow based) tunneling also for IPv6.
This complements commit ee122c79d422 ("vxlan: Flow based tunneling").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The 'flags' variable is already defined in the outer scope.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| | |
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|