summaryrefslogtreecommitdiffstats
path: root/net/bridge/br_mdb.c
Commit message (Collapse)AuthorAgeFilesLines
* rtnetlink: bridge: mcast: Relax group address validation in common codeIdo Schimmel2023-03-171-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the upcoming VXLAN MDB implementation, the 0.0.0.0 and :: MDB entries will act as catchall entries for unregistered IP multicast traffic in a similar fashion to the 00:00:00:00:00:00 VXLAN FDB entry that is used to transmit BUM traffic. In deployments where inter-subnet multicast forwarding is used, not all the VTEPs in a tenant domain are members in all the broadcast domains. It is therefore advantageous to transmit BULL (broadcast, unknown unicast and link-local multicast) and unregistered IP multicast traffic on different tunnels. If the same tunnel was used, a VTEP only interested in IP multicast traffic would also pull all the BULL traffic and drop it as it is not a member in the originating broadcast domain [1]. Prepare for this change by allowing the 0.0.0.0 group address in the common rtnetlink MDB code and forbid it in the bridge driver. A similar change is not needed for IPv6 because the common code only validates that the group address is not the all-nodes address. [1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: bridge: mcast: Move MDB handlers out of bridge driverIdo Schimmel2023-03-171-289/+12
| | | | | | | | | | | | | | | | | | | | | Currently, the bridge driver registers handlers for MDB netlink messages, making it impossible for other drivers to implement MDB support. As a preparation for VXLAN MDB support, move the MDB handlers out of the bridge driver to the core rtnetlink code. The rtnetlink code will call into individual drivers by invoking their previously added MDB net device operations. Note that while the diffstat is large, the change is mechanical. It moves code out of the bridge driver to rtnetlink code. Also note that a similar change was made in 2012 with commit 77162022ab26 ("net: add generic PF_BRIDGE:RTM_ FDB hooks") that moved FDB handlers out of the bridge driver to the core rtnetlink code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: mcast: Implement MDB net device operationsIdo Schimmel2023-03-171-0/+124
| | | | | | | | | | | | | | | | | | | | | | Implement the previously added MDB net device operations in the bridge driver so that they could be invoked by core rtnetlink code in the next patch. The operations are identical to the existing br_mdb_{dump,add,del} functions. The '_new' suffix will be removed in the next patch. The functions are re-implemented in this patch to make the conversion in the next patch easier to review. Add dummy implementations when 'CONFIG_BRIDGE_IGMP_SNOOPING' is disabled, so that an error will be returned to user space when it is trying to add or delete an MDB entry. This is consistent with existing behavior where the bridge driver does not even register rtnetlink handlers for RTM_{NEW,DEL,GET}MDB messages when this Kconfig option is disabled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: mcast: Move validation to a policyIdo Schimmel2023-02-101-18/+27
| | | | | | | | | | | | | | | | Future patches are going to move parts of the bridge MDB code to the common rtnetlink code in preparation for VXLAN MDB support. To facilitate code sharing between both drivers, move the validation of the top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will eventually be moved to the rtnetlink code. Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of NLA_POLICY_NESTED() as this attribute is going to be validated using different policies in the underlying drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Remove pointless sequence generation counter assignmentIdo Schimmel2023-02-101-2/+0
| | | | | | | | | | | | | | | | | | The purpose of the sequence generation counter in the netlink callback is to identify if a multipart dump is consistent or not by calling nl_dump_check_consistent() whenever a message is generated. The function is not invoked by the MDB code, rendering the sequence generation counter assignment pointless. Remove it. Note that even if the function was invoked, we still could not accurately determine if the dump is consistent or not, as there is no sequence generation counter for MDB entries, unlike nexthop objects, for example. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Use correct define in MDB dumpIdo Schimmel2023-02-101-1/+1
| | | | | | | | | | 'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same value, but the latter is uAPI and cannot change, so use it when dumping an MDB entry. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: bridge: Add br_multicast_del_port_group()Petr Machata2023-02-061-2/+1
| | | | | | | | | | | | | | | Since cleaning up the effects of br_multicast_new_port_group() just consists of delisting and freeing the memory, the function br_mdb_add_group_star_g() inlines the corresponding code. In the following patches, number of per-port and per-port-VLAN MDB entries is going to be maintained, and that counter will have to be updated. Because that logic is going to be hidden in the br_multicast module, introduce a new hook intended to again remove a newly-created group. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: Move extack-setting to br_multicast_new_port_group()Petr Machata2023-02-061-6/+3
| | | | | | | | | | | | | | | Now that br_multicast_new_port_group() takes an extack argument, move setting the extack there. The downside is that the error messages end up being less specific (the function cannot distinguish between (S,G) and (*,G) groups). However, the alternative is to check in the caller whether the callee set the extack, and if it didn't, set it. But that is only done when the callee is not exactly known. (E.g. in case of a notifier invocation.) Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: Add extack to br_multicast_new_port_group()Petr Machata2023-02-061-2/+3
| | | | | | | | | | | | Make it possible to set an extack in br_multicast_new_port_group(). Eventually, this function will check for per-port and per-port-vlan MDB maximums, and will use the extack to communicate the reason for the bounce. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: mcast: Support replacement of MDB port group entriesIdo Schimmel2022-12-121-5/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that user space can specify additional attributes of port group entries such as filter mode and source list, it makes sense to allow user space to atomically modify these attributes by replacing entries instead of forcing user space to delete the entries and add them back. Replace MDB port group entries when the 'NLM_F_REPLACE' flag is specified in the netlink message header. When a (*, G) entry is replaced, update the following attributes: Source list, state, filter mode, protocol and flags. If the entry is temporary and in EXCLUDE mode, reset the group timer to the group membership interval. If the entry is temporary and in INCLUDE mode, reset the source timers of associated sources to the group membership interval. Examples: # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.2 filter_mode include # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.2 permanent filter_mode include proto static 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto static 0.00 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode include source_list 192.0.2.2/0.00,192.0.2.1/0.00 proto static 0.00 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra blocked 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra blocked 0.00 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra 0.00 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 temp source_list 192.0.2.4,192.0.2.3 filter_mode include proto bgp # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.4 temp filter_mode include proto bgp 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 temp filter_mode include proto bgp 0.00 dev br0 port dummy10 grp 239.1.1.1 temp filter_mode include source_list 192.0.2.4/259.44,192.0.2.3/259.44 proto bgp 0.00 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Allow user space to specify MDB entry routing protocolIdo Schimmel2022-12-121-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the routing protocol of the MDB port group entry. Enforce a minimum value of 'RTPROT_STATIC' to prevent user space from using protocol values that should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain backward compatibility by defaulting to 'RTPROT_STATIC'. The protocol is already visible to user space in RTM_NEWMDB responses and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute. The routing protocol allows a routing daemon to distinguish between entries configured by it and those configured by the administrator. Once MDB flush is supported, the protocol can be used as a criterion according to which the flush is performed. Examples: # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel Error: integer out of range. # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra # bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250 # bridge -d mdb show dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250 dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250 dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Allow user space to add (*, G) with a source list and filter modeIdo Schimmel2022-12-121-0/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new netlink attributes to the RTM_NEWMDB request that allow user space to add (*, G) with a source list and filter mode. The RTM_NEWMDB message can already dump such entries (created by the kernel) so there is no need to add dump support. However, the message contains a different set of attributes depending if it is a request or a response. The naming and structure of the new attributes try to follow the existing ones used in the response. Request: [ struct nlmsghdr ] [ struct br_port_msg ] [ MDBA_SET_ENTRY ] struct br_mdb_entry [ MDBA_SET_ENTRY_ATTRS ] [ MDBE_ATTR_SOURCE ] struct in_addr / struct in6_addr [ MDBE_ATTR_SRC_LIST ] // new [ MDBE_SRC_LIST_ENTRY ] [ MDBE_SRCATTR_ADDRESS ] struct in_addr / struct in6_addr [ ...] [ MDBE_ATTR_GROUP_MODE ] // new u8 Response: [ struct nlmsghdr ] [ struct br_port_msg ] [ MDBA_MDB ] [ MDBA_MDB_ENTRY ] [ MDBA_MDB_ENTRY_INFO ] struct br_mdb_entry [ MDBA_MDB_EATTR_TIMER ] u32 [ MDBA_MDB_EATTR_SOURCE ] struct in_addr / struct in6_addr [ MDBA_MDB_EATTR_RTPROT ] u8 [ MDBA_MDB_EATTR_SRC_LIST ] [ MDBA_MDB_SRCLIST_ENTRY ] [ MDBA_MDB_SRCATTR_ADDRESS ] struct in_addr / struct in6_addr [ MDBA_MDB_SRCATTR_TIMER ] u8 [...] [ MDBA_MDB_EATTR_GROUP_MODE ] u8 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Add support for (*, G) with a source list and filter modeIdo Schimmel2022-12-121-3/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for allowing user space to add (*, G) entries with a source list and associated filter mode, add the necessary plumbing to handle such requests. Extend the MDB configuration structure with a currently empty source array and filter mode that is currently hard coded to EXCLUDE. Add the source entries and the corresponding (S, G) entries before making the new (*, G) port group entry visible to the data path. Handle the creation of each source entry in a similar fashion to how it is created from the data path in response to received Membership Reports: Create the source entry, arm the source timer (if needed), add a corresponding (S, G) forwarding entry and finally mark the source entry as installed (by user space). Add the (S, G) entry by populating an MDB configuration structure and calling br_mdb_add_group_sg() as if a new entry is created by user space, with the sole difference that the 'src_entry' field is set to make sure that the group timer of such entries is never armed. Note that it is not currently possible to add more than 32 source entries to a port group entry. If this proves to be a problem we can either increase 'PG_SRC_ENT_LIMIT' or avoid forcing a limit on entries created by user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Avoid arming group timer when (S, G) corresponds to a sourceIdo Schimmel2022-12-121-1/+1
| | | | | | | | | | | | | | | | | | User space will soon be able to install a (*, G) with a source list, prompting the creation of a (S, G) entry for each source. In this case, the group timer of the (S, G) entry should never be set. Solve this by adding a new field to the MDB configuration structure that denotes whether the (S, G) corresponds to a source or not. The field will be set in a subsequent patch where br_mdb_add_group_sg() is called in order to create a (S, G) entry for each user provided source. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Add a centralized error pathIdo Schimmel2022-12-121-4/+6
| | | | | | | | | | | | | | | | | Subsequent patches will add memory allocations in br_mdb_config_init() as the MDB configuration structure will include a linked list of source entries. This memory will need to be freed regardless if br_mdb_add() succeeded or failed. As a preparation for this change, add a centralized error path where the memory will be freed. Note that br_mdb_del() already has one error path and therefore does not require any changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Place netlink policy before validation functionsIdo Schimmel2022-12-121-6/+6
| | | | | | | | | | | | | Subsequent patches are going to add additional validation functions and netlink policies. Some of these functions will need to perform parsing using nla_parse_nested() and the new policies. In order to keep all the policies next to each other, move the current policy to before the validation functions. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Split (*, G) and (S, G) addition into different functionsIdo Schimmel2022-12-121-49/+96
| | | | | | | | | | | | | | | | | | | | | | When the bridge is using IGMP version 3 or MLD version 2, it handles the addition of (*, G) and (S, G) entries differently. When a new (S, G) port group entry is added, all the (*, G) EXCLUDE ports need to be added to the port group of the new entry. Similarly, when a new (*, G) EXCLUDE port group entry is added, the port needs to be added to the port group of all the matching (S, G) entries. Subsequent patches will create more differences between both entry types. Namely, filter mode and source list can only be specified for (*, G) entries. Given the current and future differences between both entry types, handle the addition of each entry type in a different function, thereby avoiding the creation of one complex function. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Do not derive entry type from its filter modeIdo Schimmel2022-12-121-6/+3
| | | | | | | | | | | | | | | | | | Currently, the filter mode (i.e., INCLUDE / EXCLUDE) of MDB entries cannot be set from user space. Instead, it is set by the kernel according to the entry type: (*, G) entries are treated as EXCLUDE and (S, G) entries are treated as INCLUDE. This allows the kernel to derive the entry type from its filter mode. Subsequent patches will allow user space to set the filter mode of (*, G) entries, making the current assumption incorrect. As a preparation, remove the current assumption and instead determine the entry type from its key, which is a more direct way. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Remove redundant function argumentsIdo Schimmel2022-12-071-4/+5
| | | | | | | | | Drop the first three arguments and instead extract them from the MDB configuration structure. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Move checks out of critical sectionIdo Schimmel2022-12-071-18/+18
| | | | | | | | | | | | | The checks only require information parsed from the RTM_NEWMDB netlink message and do not rely on any state stored in the bridge driver. Therefore, there is no need to perform the checks in the critical section under the multicast lock. Move the checks out of the critical section. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Remove br_mdb_parse()Ido Schimmel2022-12-071-88/+5
| | | | | | | | | | | | The parsing of the netlink messages and the validity checks are now performed in br_mdb_config_init() so we can remove br_mdb_parse(). This finally allows us to stop passing netlink attributes deep in the MDB control path and only use the MDB configuration structure. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Use MDB group key from configuration structureIdo Schimmel2022-12-071-8/+7
| | | | | | | | | | | | | The MDB group key (i.e., {source, destination, protocol, VID}) is currently determined under the multicast lock from the netlink attributes. Instead, use the group key from the MDB configuration structure that was prepared before acquiring the lock. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Propagate MDB configuration structure furtherIdo Schimmel2022-12-071-13/+11
| | | | | | | | | | | | As an intermediate step towards only using the new MDB configuration structure, pass it further in the control path instead of passing individual attributes. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Use MDB configuration structure where possibleIdo Schimmel2022-12-071-19/+15
| | | | | | | | | | | | | | | The MDB configuration structure (i.e., struct br_mdb_config) now includes all the necessary information from the parsed RTM_{NEW,DEL}MDB netlink messages, so use it. This will later allow us to delete the calls to br_mdb_parse() from br_mdb_add() and br_mdb_del(). No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Remove redundant checksIdo Schimmel2022-12-071-54/+9
| | | | | | | | | | | These checks are now redundant as they are performed by br_mdb_config_init() while parsing the RTM_{NEW,DEL}MDB messages. Remove them. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Centralize netlink attribute parsingIdo Schimmel2022-12-071-0/+120
| | | | | | | | | | | | | | | | | | | | | Netlink attributes are currently passed deep in the MDB creation call chain, making it difficult to add new attributes. In addition, some validity checks are performed under the multicast lock although they can be performed before it is ever acquired. As a first step towards solving these issues, parse the RTM_{NEW,DEL}MDB messages into a configuration structure, relieving other functions from the need to handle raw netlink attributes. Subsequent patches will convert the MDB code to use this configuration structure. This is consistent with how other rtnetlink objects are handled, such as routes and nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bridge: mcast: Simplify MDB entry creationIdo Schimmel2022-10-191-8/+3
| | | | | | | | | | | | Before creating a new MDB entry, br_multicast_new_group() will call br_mdb_ip_get() to see if one exists and return it if so. Therefore, simply call br_multicast_new_group() and omit the call to br_mdb_ip_get(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: allow add/remove permanent mdb entries on disabled portsCasper Andersson2022-06-151-6/+9
| | | | | | | | | | Adding mdb entries on disabled ports allows you to do setup before accepting any traffic, avoiding any time where the port is not in the multicast group. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: add support for host l2 mdb entriesJoachim Wiberg2022-04-121-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | This patch expands on the earlier work on layer-2 mdb entries by adding support for host entries. Due to the fact that host joined entries do not have any flag field, we infer the permanent flag when reporting the entries to userspace, which otherwise would be listed as 'temp'. Before patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent Error: bridge: Flags are not allowed for host groups. ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee Error: bridge: Only permanent L2 entries allowed. After patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent ~# bridge mdb show dev br0 port br0 grp 01:00:00:c0:ff:ee permanent vid 1 Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: bridge: mdb: move all switchdev logic to br_switchdev.cVladimir Oltean2021-10-281-244/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following functions: br_mdb_complete br_switchdev_mdb_populate br_mdb_replay_one br_mdb_queue_one br_mdb_replay br_mdb_switchdev_host_port br_mdb_switchdev_host br_switchdev_mdb_notify are only accessible from code paths where CONFIG_NET_SWITCHDEV is enabled. So move them to br_switchdev.c, in order for that code to be compiled out if that config option is disabled. Note that br_switchdev.c gets build regardless of whether CONFIG_BRIDGE_IGMP_SNOOPING is enabled or not, whereas br_mdb.c only got built when CONFIG_BRIDGE_IGMP_SNOOPING was enabled. So to preserve correct compilation with CONFIG_BRIDGE_IGMP_SNOOPING being disabled, we must now place an #ifdef around these functions in br_switchdev.c. The offending bridge data structures that need this are br->multicast_lock and br->mdb_list, these are also compiled out of struct net_bridge when CONFIG_BRIDGE_IGMP_SNOOPING is turned off. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: bridge: split out the switchdev portion of br_mdb_notifyVladimir Oltean2021-10-281-27/+35
| | | | | | | | | | Similar to fdb_notify() and br_switchdev_fdb_notify(), split the switchdev specific logic from br_mdb_notify() into a different function. This will be moved later in br_switchdev.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: make use of helper netif_is_bridge_master()Kyungrok Chung2021-10-161-2/+2
| | | | | | | | | Make use of netdev helper functions to improve code readability. Replace 'dev->priv_flags & IFF_EBRIDGE' with netif_is_bridge_master(dev). Signed-off-by: Kyungrok Chung <acadx0@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: vlan: account for router port lists when notifyingNikolay Aleksandrov2021-08-171-0/+30
| | | | | | | | | | When sending a global vlan notification we should account for the number of router ports when allocating the skb, otherwise we might end up losing notifications. Fixes: dc002875c22b ("net: bridge: vlan: use br_rports_fill_info() to export mcast router ports") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: vlan: use br_rports_fill_info() to export mcast router portsNikolay Aleksandrov2021-08-111-13/+4
| | | | | | | | | | | | | | | | | | | Embed the standard multicast router port export by br_rports_fill_info() into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS. In order to have the same format for the global bridge mcast context and the per-vlan mcast context we need a double-nesting: - BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS - MDBA_ROUTER Currently we don't compare router lists, if any router port exists in the bridge mcast contexts we consider their option sets as different and export them separately. In addition we export the router port vlan id when dumping similar to the router port notification format. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: mcast: use the proper multicast context when dumping router portsNikolay Aleksandrov2021-08-111-18/+31
| | | | | | | | | When we are dumping the router ports of a vlan mcast context we need to use the bridge/vlan and port/vlan's multicast contexts to check if IPv4/IPv6 router port is present and later to dump the vlan id. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: move the switchdev object replay helpers to "push" modeVladimir Oltean2021-07-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances: - an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port. Also, since commit 0d2cfbd41c4a ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void *ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG. With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers. To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev). Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays. Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void *ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler. Also unexport the replay helpers, since nobody except the bridge calls them directly now. Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch. We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them: - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge. - br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality. - br_mdb_replay() should fix a real issue which is described in commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay(). So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/ Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: guard the switchdev replay helpers against a NULL notifier blockVladimir Oltean2021-07-221-0/+3
| | | | | | | | | | | | | | | There is a desire to make the object and FDB replay helpers optional when moving them inside the bridge driver. For example a certain driver might not offload host MDBs and there is no case where the replay helpers would be of immediate use to it. So it would be nice if we could allow drivers to pass NULL pointers for the atomic and blocking notifier blocks, and the replay helpers to do nothing in that case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: multicast: add context support for host-joined groupsNikolay Aleksandrov2021-07-211-1/+1
| | | | | | | | | Adding bridge multicast context support for host-joined groups is easy because we only need the proper timer value. We pass the already chosen context and use its timer value. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: multicast: add mdb context supportNikolay Aleksandrov2021-07-211-3/+40
| | | | | | | | | | | Choose the proper bridge multicast context when user-spaces is adding mdb entries. Currently we require the vlan to be configured on at least one device (port or bridge) in order to add an mdb entry if vlan mcast snooping is enabled (vlan snooping implies vlan filtering). Note that we always allow deleting an entry, regardless of the vlan state. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: multicast: include router port vlan id in notificationsNikolay Aleksandrov2021-07-201-7/+22
| | | | | | | | Use the port multicast context to check if the router port is a vlan and in case it is include its vlan id in the notification. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: multicast: use multicast contexts instead of bridge or portNikolay Aleksandrov2021-07-201-1/+1
| | | | | | | | | Pass multicast context pointers to multicast functions instead of bridge/port. This would make it easier later to switch these contexts to their per-vlan versions. The patch is basically search and replace, no functional changes. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: multicast: factor out bridge multicast contextNikolay Aleksandrov2021-07-201-11/+12
| | | | | | | | | Factor out the bridge's global multicast context into a separate structure which will later be used for per-vlan global context. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: multicast: factor out port multicast contextNikolay Aleksandrov2021-07-201-5/+5
| | | | | | | | | | Factor out the port's multicast context into a separate structure which will later be shared for per-port,vlan context. No functional changes intended. We need the structure even if bridge multicast is not defined to pass down as pointer to forwarding functions. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: allow the switchdev replay functions to be called for deletionVladimir Oltean2021-06-281-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a switchdev port leaves a LAG that is a bridge port, the switchdev objects and port attributes offloaded to that port are not removed: ip link add br0 type bridge ip link add bond0 type bond mode 802.3ad ip link set swp0 master bond0 ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 nomaster VLAN 100 will remain installed on swp0 despite it going into standalone mode, because as far as the bridge is concerned, nothing ever happened to its bridge port. Let's extend the bridge vlan, fdb and mdb replay functions to take a 'bool adding' argument, and make DSA and ocelot call the replay functions with 'adding' as false from the switchdev unsync path, for the switch port that leaves the bridge. Note that this patch in itself does not salvage anything, because in the current pull mode of operation, DSA still needs to call the replay helpers with adding=false. This will be done in another patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: constify variables in the replay helpersVladimir Oltean2021-06-281-4/+4
| | | | | | | | | Some of the arguments and local variables for the newly added switchdev replay helpers can be const, so let's make them so. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: ignore switchdev events for LAG ports which didn't request replayVladimir Oltean2021-06-281-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a slight inconvenience in the switchdev replay helpers added recently, and this is when: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 master bond0 ip link set swp1 master bond0 Since the underlying driver (currently only DSA) asks for a replay of VLANs when swp0 and swp1 join the LAG because it is bridged, what will happen is that DSA will try to react twice on the VLAN event for swp0. This is not really a huge problem right now, because most drivers accept duplicates since the bridge itself does, but it will become a problem when we add support for replaying switchdev object deletions. Let's fix this by adding a blank void *ctx in the replay helpers, which will be passed on by the bridge in the switchdev notifications. If the context is NULL, everything is the same as before. But if the context is populated with a valid pointer, the underlying switchdev driver (currently DSA) can use the pointer to 'see through' the bridge port (which in the example above is bond0) and 'know' that the event is only for a particular physical port offloading that bridge port, and not for all of them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: mcast: add ip4+ip6 mcast router timers to mdb netlinkLinus Lüssing2021-05-131-1/+7
| | | | | | | | | Now that we have split the multicast router state into two, one for IPv4 and one for IPv6, also add individual timers to the mdb netlink router port dump. Leaving the old timer attribute for backwards compatibility. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: mcast: split multicast router state for IPv4 and IPv6Linus Lüssing2021-05-131-0/+10
| | | | | | | | | | | | | | | | | | A multicast router for IPv4 does not imply that the same host also is a multicast router for IPv6 and vice versa. To reduce multicast traffic when a host is only a multicast router for one of these two protocol families, keep router state for IPv4 and IPv6 separately. Similar to how querier state is kept separately. For backwards compatibility for netlink and switchdev notifications these two will still only notify if a port switched from either no IPv4/IPv6 multicast router to any IPv4/IPv6 multicast router or the other way round. However a full netlink MDB router dump will now also include a multicast router timeout for both IPv4 and IPv6. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: mcast: prepare mdb netlink for mcast router splitLinus Lüssing2021-05-131-5/+34
| | | | | | | | | | | | In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add some inline functions for the protocol specific parts in the mdb router netlink code. Also the we need iterate over the port instead of router list to be able put one router port entry with both the IPv4 and IPv6 multicast router info later. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: mcast: rename multicast router lists and timersLinus Lüssing2021-05-131-3/+3
| | | | | | | | | In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants, rename the affected variable to the IPv4 version first to avoid some renames in later commits. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>