summaryrefslogtreecommitdiffstats
path: root/net/ethtool/netlink.c
Commit message (Collapse)AuthorAgeFilesLines
* ethtool: do not use rtnl in ethnl_default_dumpit()Eric Dumazet2024-02-081-9/+5
| | | | | | | | | for_each_netdev_dump() can be used with RCU protection, no need for rtnl if we are going to use dev_hold()/dev_put(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240207153514.3640952-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: don't propagate EOPNOTSUPP from dumpsJakub Kicinski2023-11-291-0/+1
| | | | | | | | | | | | | | | | The default dump handler needs to clear ret before returning. Otherwise if the last interface returns an inconsequential error this error will propagate to user space. This may confuse user space (ethtool CLI seems to ignore it, but YNL doesn't). It will also terminate the dump early for mutli-skb dump, because netlink core treats EOPNOTSUPP as a real error. Fixes: 728480f12442 ("ethtool: default handlers for GET requests") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231126225806.2143528-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: netlink: always pass genl_info to .prepare_dataJakub Kicinski2023-08-151-5/+8
| | | | | | | | | | | | | | | | We had a number of bugs in the past because developers forgot to fully test dumps, which pass NULL as info to .prepare_data. .prepare_data implementations would try to access info->extack leading to a null-deref. Now that dumps and notifications can access struct genl_info we can pass it in, and remove the info null checks. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # pause Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: netlink: simplify arguments to ethnl_default_parse()Jakub Kicinski2023-08-151-12/+9
| | | | | | | | Pass struct genl_info directly instead of its members. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* genetlink: use attrs from struct genl_infoJakub Kicinski2023-08-151-1/+2
| | | | | | | | | | | Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: convert some netlink netdev iterators to depend on the xarrayJakub Kicinski2023-07-281-48/+17
| | | | | | | | | Reap the benefits of easier iteration thanks to the xarray. Convert just the genetlink ones, those are easier to test. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: create device lookup API with reference trackingJakub Kicinski2023-06-151-5/+5
| | | | | | | | | | | | | | New users of dev_get_by_index() and dev_get_by_name() keep getting added and it would be nice to steer them towards the APIs with reference tracking. Add variants of those calls which allocate the reference tracker and use them in a couple of places. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethtool: don't require empty header nestsJakub Kicinski2023-06-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Ethtool currently requires a header nest (which is used to carry the common family options) in all requests including dumps. $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 64 (48) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'request header missing'} $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \ --json '{"header":{}}'; ) [{'combined-count': 1, 'combined-max': 1, 'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}] Requiring the header nest to always be there may seem nice from the consistency perspective, but it's not serving any practical purpose. We shouldn't burden the user like this. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: netlink: convert commands to common SETJakub Kicinski2023-01-271-14/+28
| | | | | | | Convert all SET commands where new common code is applicable. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: netlink: handle SET intro/outro in the common codeJakub Kicinski2023-01-271-1/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most ethtool SET callbacks follow the same general structure. ethnl_parse_header_dev_get() rtnl_lock() ethnl_ops_begin() ... do stuff ... ethtool_notify() ethnl_ops_complete() rtnl_unlock() ethnl_parse_header_dev_put() This leads to a lot of copy / pasted code an bugs when people mis-handle the error path. Add a generic implementation of this pattern with a .set callback in struct ethnl_request_ops called to "do stuff". Also add an optional .set_validate which is called before ethnl_ops_begin() -- a lot of implementations do basic request capability / sanity checking at that point. Because we want to avoid generating the notification when no change happened - adopt a slightly hairy return values: - 0 means nothing to do (no notification) - 1 means done / continue - negative error codes on error Reuse .hdr_attr from struct ethnl_request_ops, GET and SET use the same attr spaces in all cases. Convert pause as an example (and to avoid unused function warnings). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethtool: add support for MAC Merge layerVladimir Oltean2023-01-231-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2 specifications (the other being Frame Preemption; IEEE 802.1Q-2018 clause 6.7.2), which work together to minimize latency caused by frame interference at TX. The overall goal of TSN is for normal traffic and traffic with a bounded deadline to be able to cohabitate on the same L2 network and not bother each other too much. The standards achieve this (partly) by introducing the concept of preemptible traffic, i.e. Ethernet frames that have a custom value for the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented and reassembled at L2 on a link-local basis. The non-preemptible frames are called express traffic, they are transmitted using a normal SFD, and they can preempt preemptible frames, therefore having lower latency, which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo frames around 9K). Preemption is not recursive, i.e. a P frame cannot preempt another P frame. Preemption also does not depend upon priority, or otherwise said, an E frame with prio 0 will still preempt a P frame with prio 7. In terms of implementation, the standards talk about the presence of an express MAC (eMAC) which handles express traffic, and a preemptible MAC (pMAC) which handles preemptible traffic, and these MACs are multiplexed on the same MII by a MAC merge layer. To support frame preemption, the definition of the SFD was generalized to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an Ethernet frame fragment, or a complete frame. Stations unaware of an SMD value different from the standard SFD will treat P frames as error frames. To prevent that from happening, a negotiation process is defined. On RX, packets are dispatched to the eMAC or pMAC after being filtered by their SMD. On TX, the eMAC/pMAC classification decision is taken by the 802.1Q spec, based on packet priority (each of the 8 user priority values may have an admin-status of preemptible or express). The MAC Merge layer and the Frame Preemption parameters have some degree of independence in terms of how software stacks are supposed to deal with them. The activation of the MM layer is supposed to be controlled by an LLDP daemon (after it has been communicated that the link partner also supports it), after which a (hardware-based or not) verification handshake takes place, before actually enabling the feature. So the process is intended to be relatively plug-and-play. Whereas FP settings are supposed to be coordinated across a network using something approximating NETCONF. The support contained here is exclusively for the 802.3 (MAC Merge) portions and not for the 802.1Q (Frame Preemption) parts. This API is sufficient for an LLDP daemon to do its job. The FP adminStatus variable from 802.1Q is outside the scope of an LLDP daemon. I have taken a few creative licenses and augmented the Linux kernel UAPI compared to the standard managed objects recommended by IEEE 802.3. These are: - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive Processing state diagram, a MAC Merge layer is always supposed to be able to receive P frames. However, this implies keeping the pMAC powered on, which will consume needless power in applications where FP will never be used. If LLDP is used, the reception of an Additional Ethernet Capabilities TLV from the link partner is sufficient indication that the pMAC should be enabled. So my proposal is that in Linux, we keep the pMAC turned off by default and that user space turns it on when needed. - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in the boolean netlink attributes offered, so this is also positive here. Other than the meaning being reversed, they correspond to the same thing. - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP daemon to maximize the verifyTime variable (delay between SMD-V transmissions), to maximize its chances that the LP replies. IEEE says that the verifyTime can range between 1 and 128 ms, but the NXP ENETC stupidly keeps this variable in a 7 bit register, so the maximum supported value is 127 ms. I could have chosen to hardcode this in the LLDP daemon to a lower value, but why not let the kernel expose its supported range directly. - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called aMACMergeAddFragSize, and expresses the "additional" fragment size (on top of ETH_ZLEN), whereas this expresses the absolute value of the fragment size. - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed object mandated by the standard, but user space clearly needs to know what is the minimum supported fragment size of our local receiver, since LLDP must advertise a value no lower than that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ethtool: add netlink interface for the PLCA RSPiergiorgio Beruto2023-01-111-0/+29
| | | | | | | | | | | Add support for configuring the PLCA Reconciliation Sublayer on multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g., 10BASE-T1S). This patch adds the appropriate netlink interface to ethtool. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: add netlink based get rss supportSudheer Mogilappagari2022-12-051-0/+7
| | | | | | | | | | | | | | | | Add netlink based support for "ethtool -x <dev> [context x]" command by implementing ETHTOOL_MSG_RSS_GET netlink message. This is equivalent to functionality provided via ETHTOOL_GRSSH in ioctl path. It sends RSS table, hash key and hash function of an interface to user space. This patch implements existing functionality available in ioctl path and enables addition of new RSS context based parameters in future. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: add interface to interact with Ethernet Power EquipmentOleksij Rempel2022-10-031-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add interface to support Power Sourcing Equipment. At current step it provides generic way to address all variants of PSE devices as defined in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4 PoDL Power Sourcing Equipment (PSE). Currently supported and mandatory objects are: IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl This is minimal interface needed to control PSE on each separate ethernet port but it provides not all mandatory objects specified in IEEE 802.3-2018. Since "PoDL PSE" and "PSE" have similar names, but some different values I decide to not merge them and keep separate naming schema. This should allow as to be as close to IEEE 802.3 spec as possible and avoid name conflicts in the future. This implementation is connected to PHYs instead of MACs because PSE auto classification can potentially interfere with PHY auto negotiation. So, may be some extra PHY related initialization will be needed. With WIP version of ethtools interaction with PSE capable link looks as following: $ ip l ... 5: t1l1@eth0: <BROADCAST,MULTICAST> .. ... $ ethtool --show-pse t1l1 PSE attributs for t1l1: PoDL PSE Admin State: disabled PoDL PSE Power Detection Status: disabled $ ethtool --set-pse t1l1 podl-pse-admin-control enable $ ethtool --show-pse t1l1 PSE attributs for t1l1: PoDL PSE Admin State: enabled PoDL PSE Power Detection Status: delivering power Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: report missing header via ext_ack in the default handlerJakub Kicinski2022-08-301-0/+3
| | | | | | | | | | | | | The actual presence check for the header is in ethnl_parse_header_dev_get() but it's a few layers in, and already has a ton of arguments so let's just pick the low hanging fruit and check for missing header in the default request handler. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* genetlink: start to validate reserved header bytesJakub Kicinski2022-08-291-0/+1
| | | | | | | | | | | | | | | | | We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rename reference+tracking helpersJakub Kicinski2022-06-091-3/+3
| | | | | | | | | | | | | | | Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* netlink: do not allocate a device refcount tracker in ethnl_default_notify()Eric Dumazet2022-01-051-1/+0
| | | | | | | | | | | | | As reported by Johannes, the tracker allocated in ethnl_default_notify() is not really needed, as this function is not expected to change a device reference count. Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20220105170849.2610470-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: Remove redundant ret assignmentsluo penghao2021-12-301-1/+0
| | | | | | | | | | | | | | The assignment here will be overwritten, so it should be deleted The clang_analyzer complains as follows: net/ethtool/netlink.c: Value stored to 'ret' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: always write dev in ethnl_parse_header_dev_getJakub Kicinski2021-12-151-3/+2
| | | | | | | | | | | | Commit 0976b888a150 ("ethtool: fix null-ptr-deref on ref tracker") made the write to req_info.dev conditional, but as Eric points out in a different follow up the structure is often allocated on the stack and not kzalloc()'d so seems safer to always write the dev, in case it's garbage on input. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: fix null-ptr-deref on ref trackerJakub Kicinski2021-12-141-2/+4
| | | | | | | | | dev can be a NULL here, not all requests set require_dev. Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-12-091-1/+2
|\ | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ethtool: do not perform operations on net devices being unregisteredAntoine Tenart2021-12-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a short period between a net device starts to be unregistered and when it is actually gone. In that time frame ethtool operations could still be performed, which might end up in unwanted or undefined behaviours[1]. Do not allow ethtool operations after a net device starts its unregistration. This patch targets the netlink part as the ioctl one isn't affected: the reference to the net device is taken and the operation is executed within an rtnl lock section and the net device won't be found after unregister. [1] For example adding Tx queues after unregister ends up in NULL pointer exceptions and UaFs, such as: BUG: KASAN: use-after-free in kobject_get+0x14/0x90 Read of size 1 at addr ffff88801961248c by task ethtool/755 CPU: 0 PID: 755 Comm: ethtool Not tainted 5.15.0-rc6+ #778 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/014 Call Trace: dump_stack_lvl+0x57/0x72 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x7f/0x11b kobject_get+0x14/0x90 kobject_add_internal+0x3d1/0x450 kobject_init_and_add+0xba/0xf0 netdev_queue_update_kobjects+0xcf/0x200 netif_set_real_num_tx_queues+0xb4/0x310 veth_set_channels+0x1c3/0x550 ethnl_set_channels+0x524/0x610 Fixes: 041b1c5d4a53 ("ethtool: helper functions for netlink interface") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20211203101318.435618-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | netlink: add net device refcount tracker to struct ethnl_req_infoEric Dumazet2021-12-071-3/+5
|/ | | | | Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: Add ability to control transceiver modules' power modeIdo Schimmel2021-10-061-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver modules parameters and retrieve their status. The first parameter to control is the power mode of the module. It is only relevant for paged memory modules, as flat memory modules always operate in low power mode. When a paged memory module is in low power mode, its power consumption is reduced to the minimum, the management interface towards the host is available and the data path is deactivated. User space can choose to put modules that are not currently in use in low power mode and transition them to high power mode before putting the associated ports administratively up. This is useful for user space that favors reduced power consumption and lower temperatures over reduced link up times. In QSFP-DD modules the transition from low power mode to high power mode can take a few seconds and this transition is only expected to get longer with future / more complex modules. User space can control the power mode of the module via the power mode policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible values: * high: Module is always in high power mode. * auto: Module is transitioned by the host to high power mode when the first port using it is put administratively up and to low power mode when the last port using it is put administratively down. The operational power mode of the module is available to user space via the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not reported to user space when a module is not plugged-in. The user API is designed to be generic enough so that it could be used for modules with different memory maps (e.g., SFF-8636, CMIS). The only implementation of the device driver API in this series is for a MAC driver (mlxsw) where the module is controlled by the device's firmware, but it is designed to be generic enough so that it could also be used by implementations where the module is controlled by the CPU. CMIS testing ============ # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off The module is not in low power mode, as it is not forced by hardware (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off). The power mode can be queried from the kernel. In case LowPwrAllowRequestHW was on, the kernel would need to take into account the state of the LowPwrRequestHW signal, which is not visible to user space. $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp11 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp11 up Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp11 down Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On SFF-8636 testing ================ # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm The module is not in low power mode, as it is not forced by hardware (Power override is on) or by software (Power set is off). The power mode can be queried from the kernel. In case Power override was off, the kernel would need to take into account the state of the LPMode signal, which is not visible to user space. $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp13 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp13 up Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp13 down Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: return error from ethnl_ops_begin if dev is NULLHeiner Kallweit2021-08-061-2/+2
| | | | | | | | | | | | Julian reported that after d43c65b05b84 Coverity complains about a missing check whether dev is NULL in ethnl_ops_complete(). There doesn't seem to be any valid case where dev could be NULL when calling ethnl_ops_begin(), therefore return an error if dev is NULL. Fixes: d43c65b05b84 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin") Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Remove redundant if statementsYajun Deng2021-08-051-4/+2
| | | | | | | | The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: runtime-resume netdev parent in ethnl_ops_beginHeiner Kallweit2021-08-031-6/+25
| | | | | | | | | | | | | | | | | If a network device is runtime-suspended then: - network device may be flagged as detached and all ethtool ops (even if not accessing the device) will fail because netif_device_present() returns false - ethtool ops may fail because device is not accessible (e.g. because being in D3 in case of a PCI device) It may not be desirable that userspace can't use even simple ethtool ops that not access the device if interface or link is down. To be more friendly to userspace let's ensure that device is runtime-resumed when executing the respective ethtool op in kernel. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: move netif_device_present check from ethnl_parse_header_dev_get to ↵Heiner Kallweit2021-08-031-7/+7
| | | | | | | | | | | | | | | ethnl_ops_begin If device is runtime-suspended and not accessible then it may be flagged as not present. If checking whether device is present is done too early then we may bail out before we have the chance to runtime-resume the device. Therefore move this check to ethnl_ops_begin(). This is in preparation of a follow-up patch that tries to runtime-resume the device before executing ethtool ops. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: move implementation of ethnl_ops_begin/complete to netlink.cHeiner Kallweit2021-08-031-0/+14
| | | | | | | | In preparation of subsequent extensions to both functions move the implementations from netlink.h to netlink.c. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: add a new command for getting PHC virtual clocksYangbo Lu2021-07-011-0/+10
| | | | | | | | | Add an interface for getting PHC (PTP Hardware Clock) virtual clocks, which are based on PHC physical clock providing hardware timestamp to network packets. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: add a stricter length checkJakub Kicinski2021-06-161-3/+8
| | | | | | | | | | | | | | There has been a few errors in the ethtool reply size calculations, most of those are hard to trigger during basic testing because of skb size rounding up and netdev names being shorter than max. Add a more precise check. This change will affect the value of payload length displayed in case of -EMSGSIZE but that should be okay, "payload length" isn't a well defined term here. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: fix missing NLM_F_MULTI flag when dumpingFernando Fernandez Mancera2021-05-051-1/+2
| | | | | | | | | | | When dumping the ethtool information from all the interfaces, the netlink reply should contain the NLM_F_MULTI flag. This flag allows userspace tools to identify that multiple messages are expected. Link: https://bugzilla.redhat.com/1953847 Fixes: 365f9ae4ee36 ("ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: add a new command for reading standard statsJakub Kicinski2021-04-161-0/+10
| | | | | | | | | | | | | | | | | Add an interface for reading standard stats, including stats which don't have a corresponding control interface. Start with IEEE 802.3 PHY stats. There seems to be only one stat to expose there. Define API to not require user space changes when new stats or groups are added. Groups are based on bitset, stats have a string set associated. v1: wrap stats in a nest Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: Allow network drivers to dump arbitrary EEPROM dataVladyslav Tarasiuk2021-04-111-0/+11
| | | | | | | | | | | | | | Define get_module_eeprom_by_page() ethtool callback and implement netlink infrastructure. get_module_eeprom_by_page() allows network drivers to dump a part of module's EEPROM specified by page and bank numbers along with offset and length. It is effectively a netlink replacement for get_module_info() and get_module_eeprom() pair, which is needed due to emergence of complex non-linear EEPROM layouts. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: support FEC settings over netlinkJakub Kicinski2021-03-311-0/+19
| | | | | | | | | | | | | | | | | | | Add FEC API to netlink. This is not a 1-to-1 conversion. FEC settings already depend on link modes to tell user which modes are supported. Take this further an use link modes for manual configuration. Old struct ethtool_fecparam is still used to talk to the drivers, so we need to translate back and forth. We can revisit the internal API if number of FEC encodings starts to grow. Enforce only one active FEC bit (by using a bit position rather than another mask). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: correct policy for ETHTOOL_MSG_CHANNELS_SETJohannes Berg2020-10-081-2/+2
| | | | | | | | | | | | | This accidentally got wired up to the *get* policy instead of the *set* policy, causing operations to be rejected. Fix it by wiring up the correct policy instead. Fixes: 5028588b62cb ("ethtool: wire up set policies to ops") Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* ethtool: specify which header flags are supported per commandJakub Kicinski2020-10-061-10/+19
| | | | | | | | | | | | | | | | | | | | | Perform header flags validation through the policy. Only pause command supports ETHTOOL_FLAG_STATS. Create a separate policy to be able to express that in policy dumps to user space. Note that even though the core will validate the header policy, it cannot record multiple layers of attributes and we have to re-parse header sub-attrs. When doing so we could skip attribute validation, or use most permissive policy. Opt for the former. We will no longer return the extack cookie for flags but since we only added first new flag in this release it's not expected that any user space had a chance to make use of it. v2: - remove the re-validation in ethnl_parse_header_dev_get() Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: link up ethnl_header_policy as a nested policyJakub Kicinski2020-10-061-1/+1
| | | | | | | | | | | To get the most out of parsing by the core, and to allow dumping full policies we need to specify which policy applies to nested attrs. For headers it's ethnl_header_policy. $ sed -i 's@\(ETHTOOL_A_.*HEADER\].*=\) { .type = NLA_NESTED },@\1\n\t\tNLA_POLICY_NESTED(ethnl_header_policy),@' net/ethtool/* Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: trim policy tablesJakub Kicinski2020-10-061-4/+3
| | | | | | | | | | | | | | | Since ethtool uses strict attribute validation there's no need to initialize all attributes in policy tables. 0 is NLA_UNSPEC which is going to be rejected. Remove the NLA_REJECTs. Similarly attributes above maxattrs are rejected, so there's no need to always size the policy tables to ETHTOOL_A_..._MAX. v2: - new patch Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: wire up set policies to opsJakub Kicinski2020-10-061-0/+26
| | | | | | | | Similarly to get commands wire up the policies of set commands to get parsing by the core and policy dumps. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: wire up get policies to opsJakub Kicinski2020-10-061-22/+40
| | | | | | | | | | | | | | | | | | | | | | | | Wire up policies for get commands in struct nla_policy of the ethtool family. Make use of genetlink code attr validation and parsing, as well as allow dumping policies to user space. For every ETHTOOL_MSG_*_GET: - add 'ethnl_' prefix to policy name - add extern declaration in net/ethtool/netlink.h - wire up the policy & attr in ethtool_genl_ops[]. - remove .request_policy and .max_attr from ethnl_request_ops. Obviously core only records the first "layer" of parsed attrs so we still need to parse the sub-attrs of the nested header attribute. v2: - merge of patches 1 and 2 from v1 - remove stray empty lines in ops - also remove .max_attr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethtool: mark netlink family as __ro_after_initJakub Kicinski2020-09-281-1/+1
| | | | | | | | | | | | | | | Like all genl families ethtool_genl_family needs to not be a straight up constant, because it's modified/initialized by genl_register_family(). After init, however, it's only passed to genlmsg_put() & co. therefore we can mark it as __ro_after_init. Since genl_family structure contains function pointers mark this as a fix. Fixes: 2b4a8990b7df ("ethtool: introduce ethtool netlink interface") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2020-07-111-14/+13
|\ | | | | | | | | | | | | All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()Michal Kubecek2020-07-091-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the genlmsg_put() call in ethnl_default_dumpit() fails, we bail out without checking if we already have some messages in current skb like we do with ethnl_default_dump_one() failure later. Therefore if existing messages almost fill up the buffer so that there is not enough space even for netlink and genetlink header, we lose all prepared messages and return and error. Rather than duplicating the skb->len check, move the genlmsg_put(), genlmsg_cancel() and genlmsg_end() calls into ethnl_default_dump_one(). This is also more logical as all message composition will be in ethnl_default_dump_one() and only iteration logic will be left in ethnl_default_dumpit(). Fixes: 728480f12442 ("ethtool: default handlers for GET requests") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ethtool: add tunnel info interfaceJakub Kicinski2020-07-101-0/+12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an interface to report offloaded UDP ports via ethtool netlink. Now that core takes care of tracking which UDP tunnel ports the NICs are aware of we can quite easily export this information out to user space. The responsibility of writing the netlink dumps is split between ethtool code and udp_tunnel_nic.c - since udp_tunnel module may not always be loaded, yet we should always report the capabilities of the NIC. $ ethtool --show-tunnels eth0 Tunnel information for eth0: UDP port table 0: Size: 4 Types: vxlan No entries UDP port table 1: Size: 4 Types: geneve, vxlan-gpe Entries (1): port 1230, vxlan-gpe v4: - back to v2, build fix is now directly in udp_tunnel.h v3: - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET not set. v2: - fix string set count, - reorder enums in the uAPI, - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset in docs and comments. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethtool: Add generic parts of cable test TDRAndrew Lunn2020-05-261-0/+5
| | | | | | | | | | | | | Add the generic parts of the code used to trigger a cable test and return raw TDR data. Any PHY driver which support this must implement the new driver op. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v2 Update nxp-tja11xx for API change. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2020-05-241-2/+2
|\ | | | | | | | | | | | | The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ethtool: count header size in reply size estimateMichal Kubecek2020-05-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | As ethnl_request_ops::reply_size handlers do not include common header size into calculated/estimated reply size, it needs to be added in ethnl_default_doit() and ethnl_default_notify() before allocating the message. On the other hand, strset_reply_size() should not add common header size. Fixes: 728480f12442 ("ethtool: default handlers for GET requests") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ethtool: Make helpers publicAndrew Lunn2020-05-101-2/+2
| | | | | | | | | | | | | | | | | | | | Make some helpers for building ethtool netlink messages available outside the compilation unit, so they can be used for building messages which are not simple get/set. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>