summaryrefslogtreecommitdiffstats
path: root/net/bridge
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2021-03-251-0/+2
|\ | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bridge: don't notify switchdev for local FDB addressesVladimir Oltean2021-03-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As explained in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug. The bridge would not say that this entry is local: ip link add br0 type bridge ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master local and the switchdev driver would be more than happy to offload it as a normal static FDB entry. This is despite the fact that 'local' and non-'local' entries have completely opposite directions: a local entry is locally terminated and not forwarded, whereas a static entry is forwarded and not locally terminated. So, for example, DSA would install this entry on swp0 instead of installing it on the CPU port as it should. There is an even sadder part, which is that the 'local' flag is implicit if 'static' is not specified, meaning that this command produces the same result of adding a 'local' entry: bridge fdb add dev swp0 00:01:02:03:04:05 master I've updated the man pages for 'bridge', and after reading it now, it should be pretty clear to any user that the commands above were broken and should have never resulted in the 00:01:02:03:04:05 address being forwarded (this behavior is coherent with non-switchdev interfaces): https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/ If you're a user reading this and this is what you want, just use: bridge fdb add dev swp0 00:01:02:03:04:05 master static Because switchdev should have given drivers the means from day one to classify FDB entries as local/non-local, but didn't, it means that all drivers are currently broken. So we can just as well omit the switchdev notifications for local FDB entries, which is exactly what this patch does to close the bug in stable trees. For further development work where drivers might want to trap the local FDB entries to the host, we can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and selectively make drivers act upon that bit, while all the others ignore those entries if the 'is_local' bit is set. Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: flowtable: bridge vlan hardware offload and switchdevFelix Fietkau2021-03-242-0/+3
| | | | | | | | | | | | | | | | | | The switch might have already added the VLAN tag through PVID hardware offload. Keep this extra VLAN in the flowtable but skip it on egress. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: resolve forwarding path for VLAN tag actions in bridge devicesFelix Fietkau2021-03-243-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | Depending on the VLAN settings of the bridge and the port, the bridge can either add or remove a tag. When vlan filtering is enabled, the fdb lookup also needs to know the VLAN tag/proto for the destination address To provide this, keep track of the stack of VLAN tags for the path in the lookup context Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: resolve forwarding path for bridge devicesPablo Neira Ayuso2021-03-241-0/+27
| | | | | | | | | | | | | | Add .ndo_fill_forward_path for bridge devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: Fix missing return assignment from br_vlan_replay_one callColin Ian King2021-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The call to br_vlan_replay_one is returning an error return value but this is not being assigned to err and the following check on err is currently always false because err was initialized to zero. Fix this by assigning err. Addresses-Coverity: ("'Constant' variable guards dead code") Fixes: 22f67cdfae6a ("net: bridge: add helper to replay VLANs installed on port") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: mrp: Disable roles before deleting the MRP instanceHoratiu Vultur2021-03-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an MRP instance was created, the driver was notified that the instance is created and then in a different callback about role of the instance. But when the instance was deleted the driver was notified only that the MRP instance is deleted and not also that the role is disabled. This patch make sure that the driver is notified that the role is changed to disabled before the MRP instance is deleted to have similar callbacks with the creating of the instance. In this way it would simplify the logic in the drivers. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: add helper to replay VLANs installed on portVladimir Oltean2021-03-231-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently this simple setup with DSA: ip link add br0 type bridge vlan_filtering 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 will not work because the bridge has created the PVID in br_add_if -> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1, but that was too early, since swp0 was not yet a lower of bond0, so it had no reason to act upon that notification. We need a helper in the bridge to replay the switchdev VLAN objects that were notified since the bridge port creation, because some of them may have been missed. As opposed to the br_mdb_replay function, the vg->vlan_list write side protection is offered by the rtnl_mutex which is sleepable, so we don't need to queue up the objects in atomic context, we can replay them right away. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: add helper to replay port and local fdb entriesVladimir Oltean2021-03-231-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a switchdev port starts offloading a LAG that is already in a bridge and has an FDB entry pointing to it: ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp0 master bond0 the switchdev driver will have no idea that this FDB entry is there, because it missed the switchdev event emitted at its creation. Ido Schimmel pointed this out during a discussion about challenges with switchdev offloading of stacked interfaces between the physical port and the bridge, and recommended to just catch that condition and deny the CHANGEUPPER event: https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/ But in fact, we might need to deal with the hard thing anyway, which is to replay all FDB addresses relevant to this port, because it isn't just static FDB entries, but also local addresses (ones that are not forwarded but terminated by the bridge). There, we can't just say 'oh yeah, there was an upper already so I'm not joining that'. So, similar to the logic for replaying MDB entries, add a function that must be called by individual switchdev drivers and replays local FDB entries as well as ones pointing towards a bridge port. This time, we use the atomic switchdev notifier block, since that's what FDB entries expect for some reason. Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: add helper to replay port and host-joined mdb entriesVladimir Oltean2021-03-231-17/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have a system with DSA ports, and udhcpcd is configured to bring interfaces up as soon as they are created. I create a bridge as follows: ip link add br0 type bridge As soon as I create the bridge and udhcpcd brings it up, I also have avahi which automatically starts sending IPv6 packets to advertise some local services, and because of that, the br0 bridge joins the following IPv6 groups due to the code path detailed below: 33:33:ff:6d:c1:9c vid 0 33:33:00:00:00:6a vid 0 33:33:00:00:00:fb vid 0 br_dev_xmit -> br_multicast_rcv -> br_ip6_multicast_add_group -> __br_multicast_add_group -> br_multicast_host_join -> br_mdb_notify This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host hooked up, and switchdev will attempt to offload the host joined groups to an empty list of ports. Of course nobody offloads them. Then when we add a port to br0: ip link set swp0 master br0 the bridge doesn't replay the host-joined MDB entries from br_add_if, and eventually the host joined addresses expire, and a switchdev notification for deleting it is emitted, but surprise, the original addition was already completely missed. The strategy to address this problem is to replay the MDB entries (both the port ones and the host joined ones) when the new port joins the bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can be populated and only then attached to a bridge that you offload). However there are 2 possibilities: the addresses can be 'pushed' by the bridge into the port, or the port can 'pull' them from the bridge. Considering that in the general case, the new port can be really late to the party, and there may have been many other switchdev ports that already received the initial notification, we would like to avoid delivering duplicate events to them, since they might misbehave. And currently, the bridge calls the entire switchdev notifier chain, whereas for replaying it should just call the notifier block of the new guy. But the bridge doesn't know what is the new guy's notifier block, it just knows where the switchdev notifier chain is. So for simplification, we make this a driver-initiated pull for now, and the notifier block is passed as an argument. To emulate the calling context for mdb objects (deferred and put on the blocking notifier chain), we must iterate under RCU protection through the bridge's mdb entries, queue them, and only call them once we're out of the RCU read-side critical section. There was some opportunity for reuse between br_mdb_switchdev_host_port, br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev mdb object is created, so a helper was created. Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: add helper to retrieve the current ageing timeVladimir Oltean2021-03-231-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: sysfs/ioctl/netlink -> br_set_ageing_time -> __set_ageing_time therefore not at bridge port creation time, so: (a) switchdev drivers have to hardcode the initial value for the address ageing time, because they didn't get any notification (b) that hardcoded value can be out of sync, if the user changes the ageing time before enslaving the port to the bridge We need a helper in the bridge, such that switchdev drivers can query the current value of the bridge ageing time when they start offloading it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: add helper for retrieving the current bridge port STP stateVladimir Oltean2021-03-231-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It may happen that we have the following topology with DSA or any other switchdev driver with LAG offload: ip link add br0 type bridge stp_state 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 ip link set swp1 master bond0 STP decides that it should put bond0 into the BLOCKING state, and that's that. The ports that are actively listening for the switchdev port attributes emitted for the bond0 bridge port (because they are offloading it) and have the honor of seeing that switchdev port attribute can react to it, so we can program swp0 and swp1 into the BLOCKING state. But if then we do: ip link set swp2 master bond0 then as far as the bridge is concerned, nothing has changed: it still has one bridge port. But this new bridge port will not see any STP state change notification and will remain FORWARDING, which is how the standalone code leaves it in. We need a function in the bridge driver which retrieves the current STP state, such that drivers can synchronize to it when they may have missed switchdev events. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: when suppression is enabled exclude RARP packetsNikolay Aleksandrov2021-03-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently we had an interop issue where RARP packets got suppressed with bridge neigh suppression enabled, but the check in the code was meant to suppress GARP. Exclude RARP packets from it which would allow some VMWare setups to work, to quote the report: "Those RARP packets usually get generated by vMware to notify physical switches when vMotion occurs. vMware may use random sip/tip or just use sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep and other times get dropped by the logic" Reported-by: Amer Abdalamer <amer@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: declare br_vlan_tunnel_lookup argument tunnel_id as __be64Vladimir Oltean2021-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only caller of br_vlan_tunnel_lookup, br_handle_ingress_vlan_tunnel, extracts the tunnel_id from struct ip_tunnel_info::struct ip_tunnel_key:: tun_id which is a __be64 value. The exact endianness does not seem to matter, because the tunnel id is just used as a lookup key for the VLAN group's tunnel hash table, and the value is not interpreted directly per se. Moreover, rhashtable_lookup_fast treats the key argument as a const void *. Therefore, there is no functional change associated with this patch, just one to silence "make W=1" builds. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: mcast: factor out common allow/block EHT handlingNikolay Aleksandrov2021-03-161-71/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We hande EHT state change for ALLOW messages in INCLUDE mode and for BLOCK messages in EXCLUDE mode similarly - create the new set entries with the proper filter mode. We also handle EHT state change for ALLOW messages in EXCLUDE mode and for BLOCK messages in INCLUDE mode in a similar way - delete the common entries (current set and new set). Factor out all the common code as follows: - ALLOW/INCLUDE, BLOCK/EXCLUDE: call __eht_create_set_entries() - ALLOW/EXCLUDE, BLOCK/INCLUDE: call __eht_del_common_set_entries() The set entries creation can be reused in __eht_inc_exc() as well. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: mcast: remove unreachable EHT codeNikolay Aleksandrov2021-03-161-42/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | In the initial EHT versions there were common functions which handled allow/block messages for both INCLUDE and EXCLUDE modes, but later they were separated. It seems I've left some common code which cannot be reached because the filter mode is checked before calling the respective functions, i.e. the host filter is always in EXCLUDE mode when using __eht_allow_excl() and __eht_block_excl() thus we can drop the host_excl checks inside and simplify the code a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: Fix fall-through warnings for ClangGustavo A. R. Silva2021-03-101-0/+1
|/ | | | | | | | | | | In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: mrp: Update br_mrp to use new return values of br_mrp_switchdevHoratiu Vultur2021-02-161-16/+27
| | | | | | | | | | | Check the return values of the br_mrp_switchdev function. In case of: - BR_MRP_NONE, return the error to userspace, - BR_MRP_SW, continue with SW implementation, - BR_MRP_HW, continue without SW implementation, Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: mrp: Extend br_mrp_switchdev to detect better the errorsHoratiu Vultur2021-02-162-77/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the br_mrp_switchdev functions to be able to have a better understanding what cause the issue and if the SW needs to be used as a backup. There are the following cases: - when the code is compiled without CONFIG_NET_SWITCHDEV. In this case return success so the SW can continue with the protocol. Depending on the function, it returns 0 or BR_MRP_SW. - when code is compiled with CONFIG_NET_SWITCHDEV and the driver doesn't implement any MRP callbacks. In this case the HW can't run MRP so it just returns -EOPNOTSUPP. So the SW will stop further to configure the node. - when code is compiled with CONFIG_NET_SWITCHDEV and the driver fully supports any MRP functionality. In this case the SW doesn't need to do anything. The functions will return 0 or BR_MRP_HW. - when code is compiled with CONFIG_NET_SWITCHDEV and the HW can't run completely the protocol but it can help the SW to run it. For example, the HW can't support completely MRM role(can't detect when it stops receiving MRP Test frames) but it can redirect these frames to CPU. In this case it is possible to have a SW fallback. The SW will try initially to call the driver with sw_backup set to false, meaning that the HW should implement completely the role. If the driver returns -EOPNOTSUPP, the SW will try again with sw_backup set to false, meaning that the SW will detect when it stops receiving the frames but it needs HW support to redirect the frames to CPU. In case the driver returns 0 then the SW will continue to configure the node accordingly. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: mrp: Add 'enum br_mrp_hw_support'Horatiu Vultur2021-02-161-0/+14
| | | | | | | | | Add the enum br_mrp_hw_support that is used by the br_mrp_switchdev functions to allow the SW to detect the cases where HW can't implement the functionality or when SW is used as a backup. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: fix br_vlan_filter_toggle stub when CONFIG_BRIDGE_VLAN_FILTERING=nVladimir Oltean2021-02-151-1/+2
| | | | | | | | | | The prototype of br_vlan_filter_toggle was updated to include a netlink extack, but the stub definition wasn't, which results in a build error when CONFIG_BRIDGE_VLAN_FILTERING=n. Fixes: 9e781401cbfc ("net: bridge: propagate extack through store_bridge_parm") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: propagate extack through switchdev_port_attr_setVladimir Oltean2021-02-147-17/+21
| | | | | | | | | | The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: propagate extack through store_bridge_parmVladimir Oltean2021-02-144-46/+142
| | | | | | | | | | | | | | | | | | | | The bridge sysfs interface stores parameters for the STP, VLAN, multicast etc subsystems using a predefined function prototype. Sometimes the underlying function being called supports a netlink extended ack message, and we ignore it. Let's expand the store_bridge_parm function prototype to include the extack, and just print it to console, but at least propagate it where applicable. Where not applicable, create a shim function in the br_sysfs_br.c file that discards the extra function argument. This patch allows us to propagate the extack argument to br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle, and from there, further up in br_changelink from br_netlink.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: remove __br_vlan_filter_toggleVladimir Oltean2021-02-143-10/+4
| | | | | | | | | This function is identical with br_vlan_filter_toggle. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributesVladimir Oltean2021-02-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag gets passed a "flags" and a "mask", those are passed piecemeal to the driver, so while the PRE_BRIDGE_FLAGS listener knows what changed because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final value. But certain drivers can offload only certain combinations of settings, like for example they cannot change unicast flooding independently of multicast flooding - they must be both on or both off. The way the information is passed to switchdev makes drivers not expressive enough, and unable to reject this request ahead of time, in the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during the deferred BRIDGE_FLAGS attribute, where the rejection is currently ignored. This patch also changes drivers to make use of the "mask" field for edge detection when possible. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: don't print in br_switchdev_set_port_flagVladimir Oltean2021-02-124-14/+21
| | | | | | | | | | | | | For the netlink interface, propagate errors through extack rather than simply printing them to the console. For the sysfs interface, we still print to the console, but at least that's one layer higher than in switchdev, which also allows us to silently ignore the offloading of flags if that is ever needed in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: offload all port flags at once in br_setportVladimir Oltean2021-02-122-76/+39
| | | | | | | | | | | | | | | | | If for example this command: ip link set swp0 type bridge_slave flood off mcast_flood off learning off succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at BR_LEARNING, there would be no attempt to revert the partial state in any way. Arguably, if the user changes more than one flag through the same netlink command, this one _should_ be all or nothing, which means it should be passed through switchdev as all or nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2021-02-103-9/+10
|\
| * bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_stateHoratiu Vultur2021-02-083-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function br_mrp_port_switchdev_set_state was called both with MRP port state and STP port state, which is an issue because they don't match exactly. Therefore, update the function to be used only with STP port state and use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE. The choice of using STP over MRP is that the drivers already implement SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port STP state. Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Fixes: fadd409136f0f2 ("bridge: switchdev: mrp: Implement MRP API for switchdev") Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: use switchdev for port flags set through sysfs tooVladimir Oltean2021-02-081-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looking through patchwork I don't see that there was any consensus to use switchdev notifiers only in case of netlink provided port flags but not sysfs (as a sort of deprecation, punishment or anything like that), so we should probably keep the user interface consistent in terms of functionality. http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-jiri@resnulli.us/ http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-jiri@resnulli.us/ Fixes: 3922285d96e7 ("net: bridge: Add support for offloading port attributes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski2021-02-061-3/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Remove indirection and use nf_ct_get() instead from nfnetlink_log and nfnetlink_queue, from Florian Westphal. 2) Add weighted random twos choice least-connection scheduling for IPVS, from Darby Payne. 3) Add a __hash placeholder in the flow tuple structure to identify the field to be included in the rhashtable key hash calculation. 4) Add a new nft_parse_register_load() and nft_parse_register_store() to consolidate register load and store in the core. 5) Statify nft_parse_register() since it has no more module clients. 6) Remove redundant assignment in nft_cmp, from Colin Ian King. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nftables: remove redundant assignment of variable err netfilter: nftables: statify nft_parse_register() netfilter: nftables: add nft_parse_register_store() and use it netfilter: nftables: add nft_parse_register_load() and use it netfilter: flowtable: add hash offset field to tuple ipvs: add weighted random twos choice algorithm netfilter: ctnetlink: remove get_ct indirection ==================== Link: https://lore.kernel.org/r/20210206015005.23037-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | netfilter: nftables: add nft_parse_register_store() and use itPablo Neira Ayuso2021-01-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new function combines the netlink register attribute parser and the store validation function. This update requires to replace: enum nft_registers dreg:8; in many of the expression private areas otherwise compiler complains with: error: cannot take address of bit-field ‘dreg’ when passing the register field as reference. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | net: bridge: mcast: Use ERR_CAST instead of ERR_PTR(PTR_ERR())Xu Wang2021-02-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)). net/bridge/br_multicast.c:1246:9-16: WARNING: ERR_CAST can be used with mp Generated by: scripts/coccinelle/api/err_cast.cocci Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Link: https://lore.kernel.org/r/20210204070549.83636-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: bridge: add warning comments to avoid extending sysfsNikolay Aleksandrov2021-01-292-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | We're moving to netlink-only options, so add comments in the bridge's sysfs files to warn against adding any new sysfs entries. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: bridge: mcast: drop hosts limit sysfs supportNikolay Aleksandrov2021-01-291-26/+0
| | | | | | | | | | | | | | | | | | | | | | | | We decided to stop adding new sysfs bridge options and continue with netlink only, so remove hosts limit sysfs support. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-01-281-0/+29
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/net/can/dev.c b552766c872f ("can: dev: prevent potential information leak in can_fill_info()") 3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir") 0a042c6ec991 ("can: dev: move netlink related code into seperate file") Code move. drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 57ac4a31c483 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down") 214baf22870c ("net/mlx5e: Support HTB offload") Adjacent code changes net/switchdev/switchdev.c 20776b465c0c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP") ffb68fc58e96 ("net: switchdev: remove the transaction structure from port object notifiers") bae33f2b5afe ("net: switchdev: remove the transaction structure from port attributes") Transaction parameter gets dropped otherwise keep the fix. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: mrp: move struct definitions out of uapiRasmus Villemoes2021-01-231-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | None of these are actually used in the kernel/userspace interface - there's a userspace component of implementing MRP, and userspace will need to construct certain frames to put on the wire, but there's no reason the kernel should provide the relevant definitions in a UAPI header. In fact, some of those definitions were broken until previous commit, so only keep the few that are actually referenced in the kernel code, and move them to the br_private_mrp.h header. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: bridge: multicast: make tracked EHT hosts limit configurableNikolay Aleksandrov2021-01-274-1/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two new port attributes which make EHT hosts limit configurable and export the current number of tracked EHT hosts: - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed. Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've increased it to 40 to have space for two more future entries. v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c, no functional change Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: bridge: multicast: add per-port EHT hosts limitNikolay Aleksandrov2021-01-274-0/+36
| | | | | | | | | | | | | | | | | | | | | Add a default limit of 512 for number of tracked EHT hosts per-port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentationNikolay Aleksandrov2021-01-261-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: baa74d39ca39 ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERRJiapeng Zhong2021-01-251-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code. Fix the following coccicheck warnings: ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be used. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletesNikolay Aleksandrov2021-01-223-12/+20
| | | | | | | | | | | | | | Mark groups which were deleted due to fast leave/EHT. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: handle block pg delete for all casesNikolay Aleksandrov2021-01-221-8/+9
| | | | | | | | | | | | | | | | | | | | A block report can result in empty source and host sets for both include and exclude groups so if there are no hosts left we can safely remove the group. Pull the block group handling so it can cover both cases and add a check if EHT requires the delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: add EHT host filter_mode handlingNikolay Aleksandrov2021-01-221-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | We should be able to handle host filter mode changing. For exclude mode we must create a zero-src entry so the group will be kept even without any S,G entries (non-zero source sets). That entry doesn't count to the entry limit and can always be created, its timer is refreshed on new exclude reports and if we change the host filter mode to include then it gets removed and we rely only on the non-zero source sets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: optimize TO_INCLUDE EHT timeoutsNikolay Aleksandrov2021-01-221-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an optimization specifically for TO_INCLUDE which sends queries for the older entries and thus lowers the S,G timers to LMQT. If we have the following situation for a group in either include or exclude mode: - host A was interested in srcs X and Y, but is timing out - host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts to LMQT - host B sends BLOCK src Z after LMQT time has passed => since host B is the last host we can delete the group, but if we still have host A's EHT entries for X and Y (i.e. if they weren't lowered to LMQT previously) then we'll have to wait another LMQT time before deleting the group, with this optimization we can directly remove it regardless of the group mode as there are no more interested hosts Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: add EHT include and exclude handlingNikolay Aleksandrov2021-01-223-23/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - is_include: create missing entries - to_include: flush existing entries and create a new set from the report, obviously if the src set is empty then we delete the group - group exclude - is_exclude: create missing entries - to_exclude: flush existing entries and create a new set from the report, any empty source set entries are removed If the group is in a different mode then we just flush all entries reported by the host and we create a new set with the new mode entries created from the report. If the report is include type, the source list is empty and the group has empty sources' set then we remove it. Any source set entries which are empty are removed as well. If the group is in exclude mode it can exist without any S,G entries (allowing for all traffic to pass). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: add EHT allow/block handlingNikolay Aleksandrov2021-01-224-14/+290
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - allow: create missing entries - block: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - group exclude - allow - host include: create missing entries - host exclude: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries - block - host include: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - host exclude: create missing entries Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: add EHT host delete functionNikolay Aleksandrov2021-01-221-0/+17
| | | | | | | | | | | | | | | | | | | | Now that we can delete set entries, we can use that to remove EHT hosts. Since the group's host set entries exist only when there are related source set entries we just have to flush all source set entries joined by the host set entry and it will be automatically removed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: add EHT source set handling functionsNikolay Aleksandrov2021-01-223-0/+325
| | | | | | | | | | | | | | | | | | | | | | | | Add EHT source set and set-entry create, delete and lookup functions. These allow to manipulate source sets which contain their own host sets with entries which joined that S,G. We're limiting the maximum number of tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is the current maximum of S,G entries for a group. There's a per-set timer which will be used to destroy the whole set later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: bridge: multicast: add EHT host handling functionsNikolay Aleksandrov2021-01-223-1/+117
| | | | | | | | | | | | | | | | | | | | | | | | Add functions to create, destroy and lookup an EHT host. These are per-host entries contained in the eht_host_tree in net_bridge_port_group which are used to store a list of all sources (S,G) entries joined for that group by each host, the host's current filter mode and total number of joined entries. No functional changes yet, these would be used in later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>