summaryrefslogtreecommitdiffstats
path: root/net/8021q/vlan.c
Commit message (Collapse)AuthorAgeFilesLines
* net: core: dev: Add extack argument to dev_change_flags()Petr Machata2018-12-061-1/+3
| | | | | | | | | | | | | | | | | | | | In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_change_flags(). Therefore extend dev_change_flags() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available. Since the function declaration line is changed anyway, name the other function arguments to placate checkpatch. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: 8021q: move vlan offload registrations into vlan_coreJiri Pirko2018-11-161-96/+0
| | | | | | | | | | | | | | | | Currently, the vlan packet offloads are registered only upon 8021q module load. However, even without this module loaded, the offloads could be utilized, for example by openvswitch datapath. As reported by Michael, that causes 2x to 5x performance improvement, depending on a testcase. So move the vlan offload registrations into vlan_core and make this available even without 8021q module loaded. Reported-by: Michael Shteinbok <michaelsh86@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Michael Shteinbok <michaelsh86@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: add support for tunnel offloadDavide Caratti2018-11-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | GSO tunneled packets are always segmented in software before they are transmitted by a VLAN, even when the lower device can offload tunnel encapsulation and VLAN together (i.e., some bits in NETIF_F_GSO_ENCAP_ALL mask are set in the lower device 'vlan_features'). If we let VLANs have the same tunnel offload capabilities as their lower device, throughput can improve significantly when CPU is limited on the transmitter side. - set NETIF_F_GSO_ENCAP_ALL bits in the VLAN 'hw_features', to ensure that 'features' will have those bits zeroed only when the lower device has no hardware support for tunnel encapsulation. - for the same reason, copy GSO-related bits of 'hw_enc_features' from lower device to VLAN, and ensure to update that value when the lower device changes its features. - set NETIF_F_HW_CSUM bit in the VLAN 'hw_enc_features' if 'real_dev' is able to compute checksums at least for a kind of packets, like done with commit 8403debeead8 ("vlan: Keep NETIF_F_HW_CSUM similar to other software devices"). This avoids software segmentation due to mismatching checksum capabilities between VLAN's 'features' and 'hw_enc_features'. Reported-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-07-031-1/+1
|\ | | | | | | | | | | | | | | | | | | Simple overlapping changes in stmmac driver. Adjust skb_gro_flush_final_remcsum function signature to make GRO list changes in net-next, as per Stephen Rothwell's example merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: fix use-after-free in GRO with ESPSabrina Dubroca2018-07-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Convert GRO SKB handling to list_head.David Miller2018-06-261-6/+7
|/ | | | | | | | | | | | | | | Manage pending per-NAPI GRO packets via list_head. Return an SKB pointer from the GRO receive handlers. When GRO receive handlers return non-NULL, it means that this SKB needs to be completed at this time and removed from the NAPI queue. Several operations are greatly simplified by this transformation, especially timing out the oldest SKB in the list when gro_count exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue in reverse order. Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Add extack messages for link createDavid Ahern2018-05-171-3/+8
| | | | | | | | Add informative messages for error paths related to adding a VLAN to a device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Call add/kill vid ndo on vlan filter feature togglingGal Pressman2018-03-301-0/+21
| | | | | | | | | | | | | | | | | | | NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit flip in dev->features in order to keep the driver in a consistent state. These features notify the driver of each added/removed vlan, but toggling of vlan-filter does not notify the driver accordingly for each of the existing vlans. This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT behavior (which notifies the driver about UDP ports in the same manner that vids are reported). Each toggling of the features propagates to the 8021q module, which iterates over the vlans and call add/kill ndo accordingly. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Drop pernet_operations::asyncKirill Tkhai2018-03-271-1/+0
| | | | | | | | Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Convert /proc creating and destroying pernet_operationsKirill Tkhai2018-02-271-0/+1
| | | | | | | | | | | | | | | | These pernet_operations just create and destroy /proc entries, and they can safely marked as async: pppoe_net_ops vlan_net_ops canbcm_pernet_ops kcm_net_ops pfkey_net_ops pppol2tp_net_ops phonet_net_ops Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* 8021q: fix a memory leak for VLAN 0 deviceCong Wang2018-01-101-6/+1
| | | | | | | | | | | | | | | | | A vlan device with vid 0 is allow to creat by not able to be fully cleaned up by unregister_vlan_dev() which checks for vlan_id!=0. Also, VLAN 0 is probably not a valid number and it is kinda "reserved" for HW accelerating devices, but it is probably too late to reject it from creation even if makes sense. Instead, just remove the check in unregister_vlan_dev(). Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-11-121-3/+3
|\
| * vlan: fix a use-after-free in vlan_device_event()Cong Wang2017-11-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After refcnt reaches zero, vlan_vid_del() could free dev->vlan_info via RCU: RCU_INIT_POINTER(dev->vlan_info, NULL); call_rcu(&vlan_info->rcu, vlan_info_rcu_free); However, the pointer 'grp' still points to that memory since it is set before vlan_vid_del(): vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) goto out; grp = &vlan_info->grp; Depends on when that RCU callback is scheduled, we could trigger a use-after-free in vlan_group_for_each_dev() right following this vlan_vid_del(). Fix it by moving vlan_vid_del() before setting grp. This is also symmetric to the vlan_vid_add() we call in vlan_device_event(). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct") Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | add support of IFF_XMIT_DST_RELEASE bit in vlanVadim Fedorenko2017-11-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE flag on the vlan netdev". But the last comment was "does not support properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE bit changes, we want to update all the vlans at the same time )" I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in bonding/team. Both bonding and team call netdev_change_features() after recalculation of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only thing needed to support is to recheck this bit in vlan_transfer_features(). Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add extack to upper device linkingDavid Ahern2017-10-041-3/+3
|/ | | | | | | Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: 8021q: Fix one possible panic caused by BUG_ON in free_netdevGao Feng2017-06-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The register_vlan_device would invoke free_netdev directly, when register_vlan_dev failed. It would trigger the BUG_ON in free_netdev if the dev was already registered. In this case, the netdev would be freed in netdev_run_todo later. So add one condition check now. Only when dev is not registered, then free it directly. The following is the part coredump when netdev_upper_dev_link failed in register_vlan_dev. I removed the lines which are too long. [ 411.237457] ------------[ cut here ]------------ [ 411.237458] kernel BUG at net/core/dev.c:7998! [ 411.237484] invalid opcode: 0000 [#1] SMP [ 411.237705] [last unloaded: 8021q] [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6 [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000 [ 411.237782] RIP: 0010:free_netdev+0x116/0x120 [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297 [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878 [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000 [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801 [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000 [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000 [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000 [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0 [ 411.240936] Call Trace: [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q] [ 411.241910] sock_ioctl+0x18b/0x2c0 [ 411.242394] do_vfs_ioctl+0xa1/0x5d0 [ 411.242853] ? sock_alloc_file+0xa6/0x130 [ 411.243465] SyS_ioctl+0x79/0x90 [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 411.244425] RIP: 0033:0x7fb69089a357 [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357 [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003 [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999 [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004 [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001 [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0 Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-241-1/+1
| | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* netns: make struct pernet_operations::id unsigned intAlexey Dobriyan2016-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-10-301-1/+1
|\ | | | | | | | | | | | | | | | | Mostly simple overlapping changes. For example, David Ahern's adjacency list revamp in 'net-next' conflicted with an adjacency list traversal bug fix in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: add recursion limit to GROSabrina Dubroca2016-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, GRO can do unlimited recursion through the gro_receive handlers. This was fixed for tunneling protocols by limiting tunnel GRO to one level with encap_mark, but both VLAN and TEB still have this problem. Thus, the kernel is vulnerable to a stack overflow, if we receive a packet composed entirely of VLAN headers. This patch adds a recursion counter to the GRO layer to prevent stack overflow. When a gro_receive function hits the recursion limit, GRO is aborted for this skb and it is processed normally. This recursion counter is put in the GRO CB, but could be turned into a percpu counter if we run out of space in the CB. Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report. Fixes: CVE-2016-7039 Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: vlan: Use sizeof instead of literal numberGao Feng2016-10-181-2/+2
| | | | | | | | | | | | | | Use sizeof variable instead of literal number to enhance the readability. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vlan: Remove unnecessary comparison of unsigned against 0Tobias Klauser2016-10-181-2/+1
|/ | | | | | | | | | | | args.u.name_type is of type unsigned int and is always >= 0. This fixes the following GCC warning: net/8021q/vlan.c: In function ‘vlan_ioctl_handler’: net/8021q/vlan.c:574:14: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove type_check from dev_get_nest_level()Sabrina Dubroca2016-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea for type_check in dev_get_nest_level() was to count the number of nested devices of the same type (currently, only macvlan or vlan devices). This prevented the false positive lockdep warning on configurations such as: eth0 <--- macvlan0 <--- vlan0 <--- macvlan1 However, this doesn't prevent a warning on a configuration such as: eth0 <--- macvlan0 <--- vlan0 eth1 <--- vlan1 <--- macvlan1 In this case, all the locks end up with a nesting subclass of 1, so lockdep thinks that there is still a deadlock: - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then take (vlan_netdev_xmit_lock_key, 1) - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then take (macvlan_netdev_addr_lock_key, 1) By removing the linktype check in dev_get_nest_level() and always incrementing the nesting depth, lockdep considers this configuration valid. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Propagate MAC address to VLANsMike Manning2016-05-311-0/+5
| | | | | | | | | | | | | | | The MAC address of the physical interface is only copied to the VLAN when it is first created, resulting in an inconsistency after MAC address changes of only newly created VLANs having an up-to-date MAC. The VLANs should continue inheriting the MAC address of the physical interface until the VLAN MAC address is explicitly set to any value. This allows IPv6 EUI64 addresses for the VLAN to reflect any changes to the MAC of the physical interface and thus for DAD to behave as expected. Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: propagate gso_max_segsEric Dumazet2016-03-171-0/+1
| | | | | | | | vlan drivers lack proper propagation of gso_max_segs from lower device. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: turn on unicast filtering on vlan deviceZhang Shengju2016-02-211-1/+0
| | | | | | | | | | | | | Currently vlan device inherits unicast filtering flag from underlying device. If underlying device doesn't support unicast filter, this will put vlan device into promiscuous mode when it's stacked. Tun on IFF_UNICAST_FLT on the vlan device in any case so that it does not go into promiscuous mode needlessly. If underlying device does not support unicast filtering, that device will enter promiscuous mode. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Add GRO support for non hardware accelerated vlanToshiaki Makita2015-06-011-0/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently packets with non-hardware-accelerated vlan cannot be handled by GRO. This causes low performance for 802.1ad and stacked vlan, as their vlan tags are currently not stripped by hardware. This patch adds GRO support for non-hardware-accelerated vlan and improves receive performance of them. Test Environment: vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599) Result: - Before $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 5233.17 Rx side CPU usage: %usr %sys %irq %soft %idle 0.27 58.03 0.00 41.70 0.00 - After $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 7586.85 Rx side CPU usage: %usr %sys %irq %soft %idle 0.50 25.83 0.00 59.53 14.14 [ Register VLAN offloads with priority 10 -DaveM ] Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Correctly propagate promisc|allmulti flags in notifier.Vlad Yasevich2015-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently vlan notifier handler will try to update all vlans for a device when that device comes up. A problem occurs, however, when the vlan device was set to promiscuous, but not by the user (ex: a bridge). In that case, dev->gflags are not updated. What results is that the lower device ends up with an extra promiscuity count. Here are the backtraces that prove this: [62852.052179] [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0 [62852.052186] [<ffffffff8160bcbb>] ? _raw_spin_unlock_bh+0x1b/0x40 [62852.052188] [<ffffffff814fe4be>] ? dev_set_rx_mode+0x2e/0x40 [62852.052190] [<ffffffff814fe694>] dev_set_promiscuity+0x24/0x50 [62852.052194] [<ffffffffa0324795>] vlan_dev_open+0xd5/0x1f0 [8021q] [62852.052196] [<ffffffff814fe58f>] __dev_open+0xbf/0x140 [62852.052198] [<ffffffff814fe88d>] __dev_change_flags+0x9d/0x170 [62852.052200] [<ffffffff814fe989>] dev_change_flags+0x29/0x60 The above comes from the setting the vlan device to IFF_UP state. [62852.053569] [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0 [62852.053571] [<ffffffffa032459b>] ? vlan_dev_set_rx_mode+0x2b/0x30 [8021q] [62852.053573] [<ffffffff814fe8d5>] __dev_change_flags+0xe5/0x170 [62852.053645] [<ffffffff814fe989>] dev_change_flags+0x29/0x60 [62852.053647] [<ffffffffa032334a>] vlan_device_event+0x18a/0x690 [8021q] [62852.053649] [<ffffffff8161036c>] notifier_call_chain+0x4c/0x70 [62852.053651] [<ffffffff8109d456>] raw_notifier_call_chain+0x16/0x20 [62852.053653] [<ffffffff814f744d>] call_netdevice_notifiers+0x2d/0x60 [62852.053654] [<ffffffff814fe1a3>] __dev_notify_flags+0x33/0xa0 [62852.053656] [<ffffffff814fe9b2>] dev_change_flags+0x52/0x60 [62852.053657] [<ffffffff8150cd57>] do_setlink+0x397/0xa40 And this one comes from the notification code. What we end up with is a vlan with promiscuity count of 1 and and a physical device with a promiscuity count of 2. They should both have a count 1. To resolve this issue, vlan code can use dev_get_flags() api which correctly masks promiscuity and allmulti flags. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix high overhead of vlan sub-device teardown.David S. Miller2015-03-181-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a networking device is taken down that has a non-trivial number of VLAN devices configured under it, we eat a full synchronize_net() for every such VLAN device. This is because of the call chain: NETDEV_DOWN notifier --> vlan_device_event() --> dev_change_flags() --> __dev_change_flags() --> __dev_close() --> __dev_close_many() --> dev_deactivate_many() --> synchronize_net() This is kind of rediculous because we already have infrastructure for batching doing operation X to a list of net devices so that we only incur one sync. So make use of that by exporting dev_close_many() and adjusting it's interfaace so that the caller can fully manage the batch list. Use this in vlan_device_event() and all the overhead goes away. Reported-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: fail early when creating netdev named configWANG Cong2014-07-291-8/+13
| | | | | | | | | | | Similarly, vlan will create /proc/net/vlan/<dev>, so when we create dev with name "config", it will confict with /proc/net/vlan/config. Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: set name_assign_type in alloc_netdev()Tom Gundersen2014-07-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Fix lockdep warning with stacked vlan devices.Vlad Yasevich2014-05-161-0/+1
| | | | | | | | | | | | | This reverts commit dc8eaaa006350d24030502a4521542e74b5cb39f. vlan: Fix lockdep warning when vlan dev handle notification Instead we use the new new API to find the lock subclass of our vlan device. This way we can support configurations where vlans are interspersed with other devices: bond -> vlan -> macvlan -> vlan Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Set hard_header_len according to available accelerationVlad Yasevich2014-03-271-1/+3
| | | | | | | | | | | | | Currently, if the card supports CTAG acceleration we do not account for the vlan header even if we are configuring an 8021AD vlan. This may not be best since we'll do software tagging for 8021AD which will cause data copy on skb head expansion Configure the length based on available hw offload capabilities and vlan protocol. CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* 8021q: Use ether_addr_copyJoe Perches2014-01-211-1/+1
| | | | | | | | Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: unlink the upper neighbour before unregisteringVeaceslav Falico2013-09-261-2/+2
| | | | | | | | | | | | | | On netdev unregister we're removing also all of its sysfs-associated stuff, including the sysfs symlinks that are controlled by netdev neighbour code. Also, it's a subtle race condition - cause we can still access it after unregistering. Move the unlinking right before the unregistering to fix both. CC: Patrick McHardy <kaber@trash.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: link the upper neighbour only after registeringVeaceslav Falico2013-09-261-7/+7
| | | | | | | | | | | Otherwise users might access it without being fully registered, as per sysfs - it only inits in register_netdevice(), so is unusable till it is called. CC: Patrick McHardy <kaber@trash.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: cleanup the usage of vlan_dev_priv(dev)Wang Sheng-Hui2013-08-031-5/+7
| | | | | | | | | | | | | | This patch cleanup 2 points for the usage of vlan_dev_priv(dev): * In vlan_dev.c/vlan_dev_hard_header, we should use the var *vlan directly after grabing the pointer at the beginning with *vlan = vlan_dev_priv(dev); when we need to access the fields of *vlan. * In vlan.c/register_vlan_device, add the var *vlan pointer struct vlan_dev_priv *vlan; to cleanup the code to access the fields of vlan_dev_priv(new_dev). Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert resend IGMP to notifier eventJiri Pirko2013-07-231-0/+1
| | | | | | | | | | | | Until now, bond_resend_igmp_join_requests() looks for vlans attached to bonding device, bridge where bonding act as port manually. It does not care of other scenarios, like stacked bonds or team device above. Make this more generic and use netdev notifier to propagate the event to upper devices and to actually call ip_mc_rejoin_groups(). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: pass info struct via netdevice notifierJiri Pirko2013-05-281-1/+1
| | | | | | | | | | | | | | So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: prepare for 802.1ad supportPatrick McHardy2013-04-191-57/+30
| | | | | | | | Make the encapsulation protocol value a property of VLAN devices and change the device lookup functions to take the protocol value into account. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: prepare for 802.1ad VLAN filtering offloadPatrick McHardy2013-04-191-5/+5
| | | | | | | | | Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*Patrick McHardy2013-04-191-3/+3
| | | | | | | | | | Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* 8021q: fix a potential use-after-freeCong Wang2013-03-241-7/+7
| | | | | | | | | | | | | | vlan_vid_del() could possibly free ->vlan_info after a RCU grace period, however, we may still refer to the freed memory area by 'grp' pointer. Found by code inspection. This patch moves vlan_vid_del() as behind as possible. Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/8021q: Implement Multiple VLAN Registration Protocol (MVRP)David Ward2013-02-101-5/+22
| | | | | | | | | | Initial implementation of the Multiple VLAN Registration Protocol (MVRP) from IEEE 802.1Q-2011, based on the existing implementation of the GARP VLAN Registration Protocol (GVRP). Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: disallow drivers with buggy VLAN accel to register_netdevice()Michał Mirosław2013-01-291-7/+0
| | | | | | | | | | | | | | | | | | | Instead of jumping aroung bugs that are easily fixed just don't let them in: affected drivers should be either fixed or have NETIF_F_HW_VLAN_FILTER removed from advertised features. Quick grep in drivers/net shows two drivers that have NETIF_F_HW_VLAN_FILTER but not ndo_vlan_rx_add/kill_vid(), but those are false-positives (features are commented out). OTOH two drivers have ndo_vlan_rx_add/kill_vid() implemented but don't advertise NETIF_F_HW_VLAN_FILTER. Those are: +ethernet/cisco/enic/enic_main.c +ethernet/qlogic/qlcnic/qlcnic_main.c Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: add link to upper deviceJiri Pirko2013-01-041-1/+9
| | | | | Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* 8021q: fix vlan device to inherit the unicast filtering capability flagYi Zou2012-11-301-0/+1
| | | | | | | | | | | | | | | This bug is observed on running FCoE over a VLAN device associated w/ a real device that has IFF_UNICAST_FLT set since FCoE would add unicast address such as FLOGI MAC to the VLAN interface that FCoE is on. Since currently, VLAN device is not inheriting the IFF_UNICAST_FLT flag from the parent real device even though the real device is capable of doing unicast filtering. This forces the VLAN device and its real device go to promiscuous mode unnecessarily even the added address is actually being added to the available unicast filter table in real device. Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: devel@open-fcoe.org Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Allow the userns root to control vlans.Eric W. Biederman2012-11-181-6/+6
| | | | | | | | | | | | | | | | | | | Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Allow the vlan ioctls: SET_VLAN_INGRESS_PRIORITY_CMD SET_VLAN_EGRESS_PRIORITY_CMD SET_VLAN_FLAG_CMD SET_VLAN_NAME_TYPE_CMD ADD_VLAN_CMD DEL_VLAN_CMD Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: use IS_ENABLED()Amerigo Wang2012-11-011-1/+1
| | | | | | | | | | | | #if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: allow to change type when no vlan device is hooked on netdevJiri Pirko2012-10-181-1/+3
| | | | | | | | | | | vlan_info might be present but still no vlan devices might be there. That is in case of vlan0 automatically added. So in that case, allow to change netdev type. Reported-by: Jon Stanley <jstanley@rmrf.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>