summaryrefslogtreecommitdiffstats
path: root/net/dsa
Commit message (Collapse)AuthorAgeFilesLines
* net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TXVladimir Oltean2023-04-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most appropriate helper I could find which strips away a VLAN header. That's all I need it to do, but __skb_vlan_pop() has more logic, which will become incompatible with the future revert of commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). Namely, it performs a sanity check on skb_mac_header(), which will stop being set after the above revert, so it will return an error instead of removing the VLAN tag. ocelot_xmit_get_vlan_info() gets called in 2 circumstances: (1) the port is under a VLAN-aware bridge and the bridge sends VLAN-tagged packets (2) the port is under a VLAN-aware bridge and somebody else (an 8021q upper) sends VLAN-tagged packets (using a VID that isn't in the bridge vlan tables) In case (1), there is actually no bug to defend against, because br_dev_xmit() calls skb_reset_mac_header() and things continue to work. However, in case (2), illustrated using the commands below, it can be seen that our intervention is needed, since __skb_vlan_pop() complains: $ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set $eth master br0 && ip link set $eth up $ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up $ ip addr add 192.168.100.1/24 dev $eth.100 I could fend off the checks in __skb_vlan_pop() with some skb_mac_header_was_set() calls, but seeing how few callers of __skb_vlan_pop() there are from TX paths, that seems rather unproductive. As an alternative solution, extract the bare minimum logic to strip a VLAN header, and move it to a new helper named vlan_remove_tag(), close to the definition of vlan_insert_tag(). Document it appropriately and make ocelot_xmit_get_vlan_info() call this smaller helper instead. Seeing that it doesn't appear illegal to test skb->protocol in the TX path, I guess it would be a good for vlan_remove_tag() to also absorb the vlan_set_encap_proto() function call. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: update TX path comments to not mention skb_mac_header()Vladimir Oltean2023-04-232-3/+3
| | | | | | | | | | | | | | | | Once commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") will be reverted, it will no longer be true that skb->data points at skb_mac_header(skb) - since the skb->mac_header will not be set - so stop saying that, and just say that it points to the MAC header. I've reviewed vlan_insert_tag() and it does not *actually* depend on skb_mac_header(), so reword that to avoid the confusion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: tag_sja1105: replace skb_mac_header() with vlan_eth_hdr()Vladimir Oltean2023-04-231-1/+1
| | | | | | | | | | | This is a cosmetic patch which consolidates the code to use the helper function offered by if_vlan.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: tag_sja1105: don't rely on skb_mac_header() in TX pathsVladimir Oltean2023-04-231-1/+1
| | | | | | | | | | | | | | skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: tag_ksz: do not rely on skb_mac_header() in TX pathsVladimir Oltean2023-04-231-9/+9
| | | | | | | | | | | | | | skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_eth_hdr() to get to the Ethernet header's MAC DA instead, helper which assumes this header is located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: tag_ocelot: do not rely on skb_mac_header() for VLAN xmitVladimir Oltean2023-04-231-1/+1
| | | | | | | | | | | | | | skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: add trace points for VLAN operationsVladimir Oltean2023-04-122-5/+137
| | | | | | | | | These are not as critical as the FDB/MDB trace points (I'm not aware of outstanding VLAN related bugs), but maybe they are useful to somebody, either debugging something or simply trying to learn more. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: add trace points for FDB/MDB operationsVladimir Oltean2023-04-124-12/+423
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | DSA performs non-trivial housekeeping of unicast and multicast addresses on shared (CPU and DSA) ports, and puts a bit of pressure on higher layers, requiring them to behave correctly (remove these addresses exactly as many times as they were added). Otherwise, either addresses linger around forever, or DSA returns -ENOENT complaining that entries that were already deleted must be deleted again. To aid debugging, introduce some trace points specifically for FDB and MDB - that's where some of the bugs still are right now. Some bugs I have seen were also due to race conditions, see: - 630fd4822af2 ("net: dsa: flush switchdev workqueue on bridge join error path") - a2614140dc0f ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN") so it would be good to not disturb the timing too much, hence the choice to use trace points vs regular dev_dbg(). I've had these for some time on my computer in a less polished form, and they've proven useful. What I found most useful was to enable CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline, and run "cat /sys/kernel/debug/tracing/trace". This is to debug more complex environments with network managers started by the init system, things like that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stubVladimir Oltean2023-04-096-13/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a netdev notifier for DSA to reject PTP on DSA master"), due to a desire to convert DSA's attempt to deny TX timestamping on a DSA master to something that doesn't block the kernel-wide API conversion from ndo_eth_ioctl() to ndo_hwtstamp_set(). What was required was a mechanism that did not depend on ndo_eth_ioctl(), and what was provided was a mechanism that did not depend on ndo_eth_ioctl(), while at the same time introducing something that wasn't absolutely necessary - a new netdev notifier. There have been objections from Jakub Kicinski that using notifiers in general when they are not absolutely necessary creates complications to the control flow and difficulties to maintainers who look at the code. So there is a desire to not use notifiers. In addition to that, the notifier chain gets called even if there is no DSA in the system and no one is interested in applying any restriction. Take the model of udp_tunnel_nic_ops and introduce a stub mechanism, through which net/core/dev_ioctl.c can call into DSA even when CONFIG_NET_DSA=m. Compared to the code that existed prior to the notifier conversion, aka what was added in commits: - 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops") - 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers") this is different because we are not overloading any struct net_device_ops of the DSA master anymore, but rather, we are exposing a rather specific functionality which is orthogonal to which API is used to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set(). Also, what is similar is that both approaches use function pointers to get from built-in code to DSA. There is no point in replicating the function pointers towards __dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr). Instead, it is sufficient to introduce a singleton struct dsa_stubs, built into the kernel, which contains a single function pointer to __dsa_master_hwtstamp_validate(). I find this approach preferable to what we had originally, because dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any attempts to add any data encapsulation and hide DSA data structures from the outside world. Link: https://lore.kernel.org/netdev/20230403083019.120b72fd@kernel.org/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: create a netdev notifier for DSA to reject PTP on DSA masterVladimir Oltean2023-04-033-35/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fact that PTP 2-step TX timestamping is broken on DSA switches if the master also timestamps the same packets is documented by commit f685e609a301 ("net: dsa: Deny PTP on master if switch supports it"). We attempt to help the users avoid shooting themselves in the foot by making DSA reject the timestamping ioctls on an interface that is a DSA master, and the switch tree beneath it contains switches which are aware of PTP. The only problem is that there isn't an established way of intercepting ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network stack by creating a struct dsa_netdevice_ops with overlaid function pointers that are manually checked from the relevant call sites. There used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only one left. There is an ongoing effort to migrate driver-visible hardware timestamping control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set() model, but DSA actively prevents that migration, since dsa_master_ioctl() is currently coded to manually call the master's legacy ndo_eth_ioctl(), and so, whenever a network device driver would be converted to the new API, DSA's restrictions would be circumvented, because any device could be used as a DSA master. The established way for unrelated modules to react on a net device event is via netdevice notifiers. So we create a new notifier which gets called whenever there is an attempt to change hardware timestamping settings on a device. Finally, there is another reason why a netdev notifier will be a good idea, besides strictly DSA, and this has to do with PHY timestamping. With ndo_eth_ioctl(), all MAC drivers must manually call phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP, otherwise they must pass this ioctl to the PHY driver via phy_mii_ioctl(). With the new ndo_hwtstamp_set() API, it will be desirable to simply not make any calls into the MAC device driver when timestamping should be performed at the PHY level. But there exist drivers, such as the lan966x switch, which need to install packet traps for PTP regardless of whether they are the layer that provides the hardware timestamps, or the PHY is. That would be impossible to support with the new API. The proposal there, too, is to introduce a netdev notifier which acts as a better cue for switching drivers to add or remove PTP packet traps, than ndo_hwtstamp_set(). The one introduced here "almost" works there as well, except for the fact that packet traps should only be installed if the PHY driver succeeded to enable hardware timestamping, whereas here, we need to deny hardware timestamping on the DSA master before it actually gets enabled. This is why this notifier is called "PRE_", and the notifier that would get used for PHY timestamping and packet traps would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing. In expectation of future netlink UAPI, we also pass a non-NULL extack pointer to the netdev notifier, and we make DSA populate it with an informative reason for the rejection. To avoid making it go to waste, we make the ioctl-based dev_set_hwtstamp() create a fake extack and print the message to the kernel log. Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/ Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: make dsa_port_supports_hwtstamp() construct a fake ifreqVladimir Oltean2023-04-033-6/+8
| | | | | | | | | | | | | | | | | | | | | dsa_master_ioctl() is in the process of getting converted to a different API, where we won't have access to a struct ifreq * anymore, but rather, to a struct kernel_hwtstamp_config. Since ds->ops->port_hwtstamp_get() still uses struct ifreq *, this creates a difficult situation where we have to make up such a dummy pointer. The conversion is a bit messy, because it forces a "good" implementation of ds->ops->port_hwtstamp_get() to return -EFAULT in copy_to_user() because of the NULL ifr->ifr_data pointer. However, it works, and it is only a transient step until ds->ops->port_hwtstamp_get() gets converted to the new API which passes struct kernel_hwtstamp_config and does not call copy_to_user(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: fix db type confusion in host fdb/mdb add/delVladimir Oltean2023-03-301-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have the following code paths: Host FDB (unicast RX filtering): dsa_port_standalone_host_fdb_add() dsa_port_bridge_host_fdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_add() dsa_port_standalone_host_fdb_del() dsa_port_bridge_host_fdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_del() Host MDB (multicast RX filtering): dsa_port_standalone_host_mdb_add() dsa_port_bridge_host_mdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_add() dsa_port_standalone_host_mdb_del() dsa_port_bridge_host_mdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_del() The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") zeroes out db.bridge.num if the switch doesn't support ds->fdb_isolation (the majority doesn't). This is done for a reason explained in commit c26933639b54 ("net: dsa: request drivers to perform FDB isolation"). Taking a single code path as example - dsa_port_host_fdb_add() - the others are similar - the problem is that this function handles: - DSA_DB_PORT databases, when called from dsa_port_standalone_host_fdb_add() - DSA_DB_BRIDGE databases, when called from dsa_port_bridge_host_fdb_add() So, if dsa_port_host_fdb_add() were to make any change on the "bridge.num" attribute of the database, this would only be correct for a DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge. However, this bug is without consequences, for 2 reasons: - dsa_port_standalone_host_fdb_add() is only called from code which is (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that function only returns true if ds->fdb_isolation is set. So, the code only executed for DSA_DB_BRIDGE databases. - Even if the code was not dead for DSA_DB_PORT, we have the following memory layout: struct dsa_bridge { struct net_device *dev; unsigned int num; bool tx_fwd_offload; refcount_t refcount; }; struct dsa_db { enum dsa_db_type type; union { const struct dsa_port *dp; // DSA_DB_PORT struct dsa_lag lag; struct dsa_bridge bridge; // DSA_DB_BRIDGE }; }; So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of type DSA_DB_PORT would access memory which is unused, because we only use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition. It is correct to fix up dsa_db :: bridge :: num only from code paths that come from the bridge / switchdev, so move these there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: sync unicast and multicast addresses for VLAN filters tooVladimir Oltean2023-03-301-5/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If certain conditions are met, DSA can install all necessary MAC addresses on the CPU ports as FDB entries and disable flooding towards the CPU (we call this RX filtering). There is one corner case where this does not work. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up ip link set swp0 master br0 && ip link set swp0 up ip link add link swp0 name swp0.100 type vlan id 100 ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100 Traffic through swp0.100 is broken, because the bridge turns on VLAN filtering in the swp0 port (causing RX packets to be classified to the FDB database corresponding to the VID from their 802.1Q header), and although the 8021q module does call dev_uc_add() towards the real device, that API is VLAN-unaware, so it only contains the MAC address, not the VID; and DSA's current implementation of ndo_set_rx_mode() is only for VID 0 (corresponding to FDB entries which are installed in an FDB database which is only hit when the port is VLAN-unaware). It's interesting to understand why the bridge does not turn on IFF_PROMISC for its swp0 bridge port, and it may appear at first glance that this is a regression caused by the logic in commit 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode."). After all, a bridge port needs to have IFF_PROMISC by its very nature - it needs to receive and forward frames with a MAC DA different from the bridge ports' MAC addresses. While that may be true, when the bridge is VLAN-aware *and* it has a single port, there is no real reason to enable promiscuity even if that is an automatic port, with flooding and learning (there is nowhere for packets to go except to the BR_FDB_LOCAL entries), and this is how the corner case appears. Adding a second automatic interface to the bridge would make swp0 promisc as well, and would mask the corner case. Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't pass a VLAN ID), the only way to address that problem is to install host FDB entries for the cartesian product of RX filtering MAC addresses and VLAN RX filters. Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: tag_brcm: legacy: fix daisy-chained switchesÁlvaro Fernández Rojas2023-03-211-2/+8
| | | | | | | | | | | | | | | | | | | When BCM63xx internal switches are connected to switches with a 4-byte Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one based on its PVID (which is likely 0). Right now, the packet is received by the BCM63xx internal switch and the 6-byte tag is properly processed. The next step would to decode the corresponding 4-byte tag. However, the internal switch adds an invalid VLAN tag after the 6-byte tag and the 4-byte tag handling fails. In order to fix this we need to remove the invalid VLAN tag after the 6-byte tag before passing it to the 4-byte tag decoding. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: report rx_bytes unadjusted for ETH_HLENVladimir Oltean2023-03-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We collect the software statistics counters for RX bytes (reported to /proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when skb->len has already been adjusted by the eth_type_trans() -> skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header. This means that when connecting 2 DSA interfaces back to back and sending 1 packet with length 100, the sending interface will report tx_bytes as incrementing by 100, and the receiving interface will report rx_bytes as incrementing by 86. Since accounting for that in scripts is quirky and is something that would be DSA-specific behavior (requiring users to know that they are running on a DSA interface in the first place), the proposal is that we treat it as a bug and fix it. This design bug has always existed in DSA, according to my analysis: commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") also updates skb->dev->stats.rx_bytes += skb->len after the eth_type_trans() call. Technically, prior to Florian's commit a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and every vendor-specific tagging protocol driver open-coded the same bug, until the buggy code was consolidated into something resembling what can be seen now. So each and every driver should have its own Fixes: tag, because of their different histories until the convergence point. I'm not going to do that, for the sake of simplicity, but just blame the oldest appearance of buggy code. There are 2 ways to fix the problem. One is the obvious way, and the other is how I ended up doing it. Obvious would have been to move dev_sw_netstats_rx_add() one line above eth_type_trans(), and below skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that. We count the bytes after removing everything DSA-related from the packet, to emulate what the packet's length was, on the wire, when the user port received it. When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet, so in case the switch driver requests this behavior - commit 412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the details - the obvious variant of the fix wouldn't have worked, because the positioning there would have also counted the not-yet-stripped VLAN header length, something which is absent from the packet as seen on the wire (there it may be untagged, whereas software will see it as PVID-tagged). Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu()Vladimir Oltean2023-03-161-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when dsa_slave_change_mtu() is called on a user port where dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code will stumble upon this check: if (new_master_mtu > mtu_limit) return -ERANGE; because new_master_mtu is adjusted for the tagger overhead but mtu_limit is not. But it would be good if the logic went through, for example if the DSA master really depends on an MTU adjustment to accept DSA-tagged frames. To make the code pass through the check, we need to adjust mtu_limit for the overhead as well, if the minimum restriction was caused by the DSA user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU are always offset by the protocol overhead. Currently no drivers return 1500 .port_max_mtu(), but this is only temporary and a bug in itself - mv88e6xxx should have done that, but since commit b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") it no longer does. This is a preparation for fixing that. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: use NL_SET_ERR_MSG_WEAK_MOD() more consistentlyVladimir Oltean2023-02-031-5/+6
| | | | | | | | | | | | | | | | Now that commit 028fb19c6ba7 ("netlink: provide an ability to set default extack message") provides a weak function that doesn't override an existing extack message provided by the driver, it makes sense to use it also for LAG and HSR offloading, not just for bridge offloading. Also consistently put the message string on a separate line, to reduce line length from 92 to 84 characters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230202140354.3158129-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: Use sysfs_emit() to instead of sprintf()Bo Liu2023-02-021-1/+1
| | | | | | | | | | Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20230201081438.3151-1-liubo03@inspur.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* netlink: provide an ability to set default extack messageLeon Romanovsky2023-02-012-6/+2
| | | | | | | | | | | | | | | | | In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwrite the filled message. Instead of adding multiple "if (!extack->_msg)" checks before any NL_SET_ERR_MSG() call, which appears after call to the driver, let's add new macro to common code. [1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: microchip: enable port queues for tc mqprioArun Ramadoss2023-01-231-0/+15
| | | | | | | | | | | | LAN937x family of switches has 8 queues per port where the KSZ switches has 4 queues per port. By default, only one queue per port is enabled. The queues are configurable in 2, 4 or 8. This patch add 8 number of queues for LAN937x and 4 for other switches. In the tag_ksz.c file, prioirty of the packet is queried using the skb buffer and the corresponding value is updated in the tag. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: add plumbing for changing and getting MAC merge layer stateVladimir Oltean2023-01-231-0/+37
| | | | | | | | | | The DSA core is in charge of the ethtool_ops of the net devices associated with switch ports, so in case a hardware driver supports the MAC merge layer, DSA must pass the callbacks through to the driver. Add support for precisely that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tagChristian Eggers2023-01-131-1/+28
| | | | | | | | | | | | | | For PDelay_Resp messages we will likely have a negative value in the correction field. The switch hardware cannot correctly update such values (produces an off by one error in the UDP checksum), so it must be moved to the time stamp field in the tail tag. Format of the correction field is 48 bit ns + 16 bit fractional ns. After updating the correction field, clone is no longer required hence it is freed. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: microchip: ptp: add packet transmission timestampingChristian Eggers2023-01-131-3/+51
| | | | | | | | | | | | | | | | | | | | | | This patch adds the routines for transmission of ptp packets. When the ptp pdelay_req packet to be transmitted, it uses the deferred xmit worker to schedule the packets. During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are enabled. So interrupt is triggered for all three packets. But for p2p1step, we require only time stamp of Pdelay_req packet. Hence to avoid posting of the completion from ISR routine for Sync and Pdelay_resp packets, ts_en flag is introduced. This controls which packets need to processed for timestamp. After the packet is transmitted, ISR is triggered. The time at which packet transmitted is recorded to separate register. This value is reconstructed to absolute time and posted to the user application through socket error queue. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: microchip: ptp: add packet reception timestampingChristian Eggers2023-01-131-6/+19
| | | | | | | | | | | | | | | | | Rx Timestamping is done through 4 additional bytes in tail tag. Whenever the ptp packet is received, the 4 byte hardware time stamped value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates it as PTP frame. This 4 byte value is extracted from the tail tag and reconstructed to absolute time and assigned to skb hwtstamp. If the packet received in PDelay_Resp, then partial ingress timestamp is subtracted from the correction field. Since user space tools expects to be done in hardware. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabledArun Ramadoss2023-01-131-7/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the transmit frame needs additional 4 bytes before the tail tag. It is needed for all the transmission packets irrespective of PTP packets or not. The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For the one-step Pdelay_Resp, the switch needs the receive timestamp of the Pdelay_Req message so that it can put the turnaround time in the correction field. Since PTP has to be enabled for both Transmission and reception timestamping, driver needs to track of the tx and rx setting of the all the user ports in the switch. Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the timestampping setting of each port. When any one of ports has tx or rx timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to the tail tag. When tx and rx timestamping of all the ports are disabled, then 4 bytes are not added. Tested using hwstamp -i <interface> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2022-12-131-1/+10
|\ | | | | | | | | | | | | | | | | | | Merge in the left-over fixes before the net-next pull-request. net/mptcp/subflow.c d3295fee3c75 ("mptcp: use proper req destructor for IPv6") 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * net: dsa: tag_8021q: avoid leaking ctx on dsa_tag_8021q_register() error pathVladimir Oltean2022-12-121-1/+10
| | | | | | | | | | | | | | | | | | | | | | If dsa_tag_8021q_setup() fails, for example due to the inability of the device to install a VLAN, the tag_8021q context of the switch will leak. Make sure it is freed on the error path. Fixes: 328621f6131f ("net: dsa: tag_8021q: absorb dsa_8021q_setup into dsa_tag_8021q_{,un}register") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221209235242.480344-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: don't call ptp_classify_raw() if switch doesn't provide RX ↵Vladimir Oltean2022-12-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timestamping ptp_classify_raw() is not exactly cheap, since it invokes a BPF program for every skb in the receive path. For switches which do not provide ds->ops->port_rxtstamp(), running ptp_classify_raw() provides precisely nothing, so check for the presence of the function pointer first, since that is much cheaper. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20221209175840.390707-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-12-083-3/+6
|\| | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: dsa: sja1105: Check return valueArtem Chernyshev2022-12-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | Return NULL if we got unexpected value from skb_trim_rcsum() in sja1110_rcv_inband_control_extension() Fixes: 4913b8ebf8a9 ("net: dsa: add support for the SJA1110 native tagging protocol") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221201140032.26746-3-artem.chernyshev@red-soft.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: dsa: hellcreek: Check return valueArtem Chernyshev2022-12-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Return NULL if we got unexpected value from skb_trim_rcsum() in hellcreek_rcv() Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20221201140032.26746-2-artem.chernyshev@red-soft.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: dsa: ksz: Check return valueArtem Chernyshev2022-12-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Return NULL if we got unexpected value from skb_trim_rcsum() in ksz_common_rcv() Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: bafe9ba7d908 ("net: dsa: ksz: Factor out common tag code") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221201140032.26746-1-artem.chernyshev@red-soft.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: kill off dsa_priv.hVladimir Oltean2022-11-228-25/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | The last remnants in dsa_priv.h are a netlink-related definition for which we create a new header, and DSA_MAX_NUM_OFFLOADING_BRIDGES which is only used from dsa.c, so move it there. Some inclusions need to be adjusted now that we no longer have headers included transitively from dsa_priv.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move tag_8021q headers to their proper placeVladimir Oltean2022-11-227-8/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tag_8021q definitions are all over the place. Some are exported to linux/dsa/8021q.h (visible by DSA core, taggers, switch drivers and everyone else), and some are in dsa_priv.h. Move the structures that don't need external visibility into tag_8021q.c, and the ones which don't need the world or switch drivers to see them into tag_8021q.h. We also have the tag_8021q.h inclusion from switch.c, which is basically the entire reason why tag_8021q.c was built into DSA in commit 8b6e638b4be2 ("net: dsa: build tag_8021q.c as part of DSA core"). I still don't know how to better deal with that, so leave it alone. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move definitions from dsa_priv.h to slave.cVladimir Oltean2022-11-222-42/+42
| | | | | | | | | | | | | | | | | | There are some definitions in dsa_priv.h which are only used from slave.c. So move them to slave.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: rename dsa2.c back into dsa.c and create its headerVladimir Oltean2022-11-228-31/+46
| | | | | | | | | | | | | | | | | | | | The previous change moved the code into the larger file (dsa2.c) to minimize the delta. Rename that now to dsa.c, and create dsa.h, where all related definitions from dsa_priv.h go. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: merge dsa.c into dsa2.cVladimir Oltean2022-11-224-238/+221
| | | | | | | | | | | | | | | | | | There is no longer a meaningful distinction between what goes into dsa2.c and what goes into dsa.c. Merge the 2 into a single file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move notifier definitions to switch.hVladimir Oltean2022-11-222-105/+108
| | | | | | | | | | | | | | | | | | Reduce bloat in dsa_priv.h by moving the cross-chip notifier data structures to switch.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move dsa_tree_notify() and dsa_broadcast() to switch.cVladimir Oltean2022-11-224-48/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | There isn't an intuitive place for these 2 cross-chip notifier functions according to the function-to-file classification based on names (dsa_switch_*() goes to switch.c), but I consider these to be part of the cross-chip notifier handling, therefore part of switch.c. Move them there to reduce bloat in dsa2.c (the place where all code with no better place to go goes). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move headers exported by switch.c to switch.hVladimir Oltean2022-11-226-4/+15
| | | | | | | | | | | | | | | | | | Reduce code bloat in dsa_priv.h by moving the prototypes exported by switch.h into their own header file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move tagging protocol code to tag.{c,h}Vladimir Oltean2022-11-2227-541/+581
| | | | | | | | | | | | | | | | | | | | | | | | | | It would be nice if tagging protocol drivers could include just the header they need, since they are (mostly) data path and isolated from most of the other DSA core code does. Create a tag.c and a tag.h file which are meant to support tagging protocol drivers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move headers exported by slave.c to slave.hVladimir Oltean2022-11-228-57/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | Minimize the use of the bloated dsa_priv.h by moving the prototypes exported by slave.c to their own header file. This is just approximate to get the code structure right. There are some interdependencies with static inline code left in dsa_priv.h, so leave slave.h included from there for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move headers exported by master.c to master.hVladimir Oltean2022-11-225-9/+27
| | | | | | | | | | | | | | | | | | Minimize the use of the bloated dsa_priv.h by moving the prototypes exported by master.c to their own header file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move headers exported by port.c to port.hVladimir Oltean2022-11-228-97/+120
| | | | | | | | | | | | | | | | | | Minimize the use of the bloated dsa_priv.h by moving the prototypes exported by port.c to their own header file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move rest of devlink setup/teardown to devlink.cVladimir Oltean2022-11-223-20/+49
| | | | | | | | | | | | | | | | | | | | The code that needed further refactoring into dedicated functions in dsa2.c was left aside. Move it now to devlink.c, and make dsa2.c stop including net/devlink.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: if ds->setup is true, ds->devlink is always non-NULLVladimir Oltean2022-11-221-7/+5
| | | | | | | | | | | | | | | | | | Simplify dsa_switch_teardown() to remove the NULL checking for ds->devlink. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: move bulk of devlink code to devlink.{c,h}Vladimir Oltean2022-11-225-346/+370
| | | | | | | | | | | | | | | | | | | | | | | | | | dsa.c and dsa2.c are bloated with too much off-topic code. Identify all code related to devlink and move it to a new devlink.c file. Steer clear of the dsa_priv.h dumping ground antipattern and create a dedicated devlink.h for it, which will be included only by the C files which need it. Usage of dsa_priv.h will be minimized in later patches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: modularize DSA_TAG_PROTO_NONEVladimir Oltean2022-11-225-22/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason that I can see why the no-op tagging protocol should be registered manually, so make it a module and make all drivers which have any sort of reference to DSA_TAG_PROTO_NONE select it. Note that I don't know if ksz_get_tag_protocol() really needs this, or if it's just the logic which is poorly written. All switches seem to have their own tagging protocol, and DSA_TAG_PROTO_NONE is just a fallback that never gets used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: unexport dsa_dev_to_net_device()Vladimir Oltean2022-11-222-1/+2
| | | | | | | | | | | | | | | | | | dsa.o and dsa2.o are linked into the same dsa_core.o, there is no reason to export this symbol when its only caller is local. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: tag_mtk: assign per-port queuesFelix Fietkau2022-11-181-0/+2
| | | | | | | | | | | | | | | | | | Keeps traffic sent to the switch within link speed limits Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221116080734.44013-6-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>