summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* | | Merge branch 'Add-ethtool-extended-link-state'David S. Miller2020-06-2915-1563/+2140
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ido Schimmel says: ==================== Add ethtool extended link state Amit says: Currently, device drivers can only indicate to user space if the network link is up or down, without additional information. This patch set provides an infrastructure that allows these drivers to expose more information to user space about the link state. The information can save users' time when trying to understand why a link is not operationally up, for example. The above is achieved by extending the existing ethtool LINKSTATE_GET command with attributes that carry the extended state. For example, no link due to missing cable: $ ethtool ethX ... Link detected: no (No cable) Beside the general extended state, drivers can pass additional information about the link state using the sub-state field. For example: $ ethtool ethX ... Link detected: no (Autoneg, No partner detected) In the future the infrastructure can be extended - for example - to allow PHY drivers to report whether a downshift to a lower speed occurred. Something like: $ ethtool ethX ... Link detected: yes (downshifted) Patch set overview: Patches #1-#3 move mlxsw ethtool code to a separate file Patches #4-#5 add the ethtool infrastructure for extended link state Patches #6-#7 add support of extended link state in the mlxsw driver Patches #8-#10 add test cases Changes since v1: * In documentation, show ETHTOOL_LINK_EXT_STATE_* and ETHTOOL_LINK_EXT_SUBSTATE_* constants instead of user-space strings * Add `_CI_` to cable_issue substates to be consistent with other substates * Keep the commit messages within 75 columns * Use u8 variable for __link_ext_substate * Document the meaning of -ENODATA in get_link_ext_state() callback description * Do not zero data->link_ext_state_provided after getting an error * Use `ret` variable for error value Changes since RFC: * Move documentation patch before ethtool patch * Add nla_total_size() instead of sizeof() directly * Return an error code from linkstate_get_ext_state() * Remove SHORTED_CABLE, add CABLE_TEST_FAILURE instead * Check if the interface is administratively up before setting ext_state * Document all sub-states ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | selftests: forwarding: Add tests for ethtool extended stateAmit Cohen2020-06-291-0/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add tests to check ethtool report about extended state. The tests configure several states and verify that the correct extended state is reported by ethtool. Check extended state with substate (Autoneg) and extended state without substate (No cable). Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | selftests: forwarding: forwarding.config.sample: Add port with no cable ↵Amit Cohen2020-06-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | connected Add NETIF_NO_CABLE port to tests topology. The port can also be declared as an environment variable and tests can be run like that: NETIF_NO_CABLE=eth9 ./test.sh eth{1..8} The NETIF_NO_CABLE port will be used by ethtool_extended_state test. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | selftests: forwarding: ethtool: Move different_speeds_get() to ethtool_libAmit Cohen2020-06-292-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently different_speeds_get() is used only by ethtool.sh tests. The function can be useful for another tests that check ethtool configurations. Move the function to ethtool_lib in order to allow other tests to use it. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mlxsw: spectrum_ethtool: Add link extended stateAmit Cohen2020-06-291-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement .get_down_ext_state() as part of ethtool_ops. Query link down reason from PDDR register and convert it to ethtool link_ext_state. In case that more information than common link_ext_state is provided, fill link_ext_substate also with the appropriate value. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mlxsw: reg: Port Diagnostics Database RegisterAmit Cohen2020-06-291-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PDDR register enables to read the Phy debug database. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ethtool: Add link extended stateAmit Cohen2020-06-294-4/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, drivers can only tell whether the link is up/down using LINKSTATE_GET, but no additional information is given. Add attributes to LINKSTATE_GET command in order to allow drivers to expose the user more information in addition to link state to ease the debug process, for example, reason for link down state. Extended state consists of two attributes - link_ext_state and link_ext_substate. The idea is to avoid 'vendor specific' states in order to prevent drivers to use specific link_ext_state that can be in the future common link_ext_state. The substates allows drivers to add more information to the common link_ext_state. For example, vendor can expose 'Autoneg' as link_ext_state and add 'No partner detected during force mode' as link_ext_substate. If a driver cannot pinpoint the extended state with the substate accuracy, it is free to expose only the extended state and omit the substate attribute. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Documentation: networking: ethtool-netlink: Add link extended stateAmit Cohen2020-06-291-4/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add link extended state attributes. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mlxsw: spectrum_ethtool: Move mlxsw_sp_port_type_speed_ops structsAmit Cohen2020-06-293-660/+660
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move mlxsw_sp1_port_type_speed_ops and mlxsw_sp2_port_type_speed_ops with the relevant code from spectrum.c to spectrum_ethtool.c. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mlxsw: Move ethtool_ops to spectrum_ethtool.cAmit Cohen2020-06-294-875/+892
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add spectrum_ethtool.c file for ethtool code. Move ethtool_ops and the relevant code from spectrum.c to spectrum_ethtool.c. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mlxsw: spectrum_dcb: Rename mlxsw_sp_port_headroom_set()Amit Cohen2020-06-291-3/+3
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mlxsw_sp_port_headroom_set() is defined twice - in spectrum.c and in spectrum_dcb.c, with different arguments and different implementation but the name is same. Rename mlxsw_sp_port_headroom_set() to mlxsw_sp_port_headroom_ets_set() in order to allow using the second function in several files, and not only as static function in spectrum.c. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'dpaa2-eth-send-a-scatter-gather-FD-instead-of-realloc-ing'David S. Miller2020-06-294-32/+166
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ioana Ciornei says: ==================== dpaa2-eth: send a scatter-gather FD instead of realloc-ing This patch set changes the behaviour in case the Tx path is confroted with an SKB with insufficient headroom for our hardware necessities (SW annotation area). In the first patch, instead of realloc-ing the SKB we now send a S/G frames descriptor while the second one adds a new software held counter to account for for these types of frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | dpaa2-eth: add software counter for Tx frames converted to S/GIoana Ciornei2020-06-294-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the previous commit, in case of insufficient SKB headroom on the Tx path instead of reallocing the SKB we now send a S/G frame descriptor. Export the number of occurences of this case as a per CPU counter (in debugfs) and a total number in the ethtool statistics - "tx converted sg frames'. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | dpaa2-eth: send a scatter-gather FD instead of realloc-ingIoana Ciornei2020-06-294-34/+160
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of realloc-ing the skb on the Tx path when the provided headroom is smaller than the HW requirements, create a Scatter/Gather frame descriptor with only one entry. Remove the '[drv] tx realloc frames' counter exposed previously through ethtool since it is no longer used. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'sfc-prerequisites-for-EF100-driver-part-1'David S. Miller2020-06-2924-1683/+8643
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Edward Cree says: ==================== sfc: prerequisites for EF100 driver, part 1 This continues the work started by Alex Maftei <amaftei@solarflare.com> in the series "sfc: code refactoring", "sfc: more code refactoring", "sfc: even more code refactoring" and "sfc: refactor mcdi filtering code", to prepare for a new driver which will share much of the code to support the new EF100 family of Solarflare/Xilinx NICs. After this series, there will be approximately two more of these 'prerequisites' series, followed by the sfc_ef100 driver itself. v2: fix reverse xmas tree in patch 5. (Left the cases in patches 7, 9 and 14 alone as those are all in pure movement of existing code.) ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: extend common GRO interface to support CHECKSUM_COMPLETEEdward Cree2020-06-293-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EF100 will use CHECKSUM_COMPLETE, but will also make use of efx_rx_packet_gro(), thus needs to be able to pass the checksum value into that function. Drivers for older NICs pass in a csum of 0 to get the old semantics (use the RX flags for CHECKSUM_UNNECESSARY marking). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise ARFS handlingEdward Cree2020-06-294-239/+238
| | | | | | | | | | | | | | | | | | | | | | | | | | | | EF100 will use the same approach to ARFS as EF10. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise drain event handlingEdward Cree2020-06-293-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoids a call from generic MCDI code into ef10.c. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise PCI error handlersEdward Cree2020-06-293-91/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | EF100 will use the same mechanisms for PCI error recovery. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: track which BAR is mappedEdward Cree2020-06-294-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EF100 needs to map multiple BARs (sequentially, not concurrently) in order to read the Function Control Window during probe. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise FC advertisingEdward Cree2020-06-294-27/+27
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise other ethtool bitsEdward Cree2020-06-293-93/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A few more ethtool handlers which EF100 will share. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise ethtool NFC and RXFH/RSS functionsEdward Cree2020-06-293-672/+688
| | | | | | | | | | | | | | | | | | | | | | | | | | | | EF100 will share EF10's model of filtering, hashing and spreading. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: commonise ethtool link handling functionsEdward Cree2020-06-293-149/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Link speeds, FEC, and autonegotiation are all things EF100 will share. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: split up nic.hEdward Cree2020-06-295-301/+322
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new nic_common.h contains the inlines for NIC-type function dispatch, declarations for NIC-generic functions in nic.c, and other similar NIC- generic functionality. Retained in nic.h are NIC-specific declarations such as the siena and ef10 nic_data structs and various farch functions. The EF100 driver will thus include nic_common.h but not nic.h. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: refactor EF10 stats handlingEdward Cree2020-06-293-34/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate the generation-count handling from the format conversion, to make it easier to re-use both for EF100. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: don't try to create more channels than we can have VIsEdward Cree2020-06-294-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculate efx->max_vis at probe time, and check against it in efx_allocate_msix_channels() when considering whether to create XDP TX channels. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: extend bitfield macros up to POPULATE_DWORD_13Edward Cree2020-06-291-5/+29
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: determine flag word automatically in efx_has_cap()Edward Cree2020-06-293-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have an _OFST definition for each individual flag bit, callers of efx_has_cap() don't need to specify which flag word it's in; we can just use the flag name directly in MCDI_CAPABILITY_OFST. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sfc: update MCDI protocol headersEdward Cree2020-06-291-36/+6841
|/ / / | | | | | | | | | | | | | | | | | | | | | The script used to generate these now includes _OFST definitions for flags, to identify the containing flag word. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net:qos: police action offloading parameter 'burst' change to the original valuePo Liu2020-06-2911-36/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 'tcfp_burst' with TICK factor, driver side always need to recover it to the original value, this patch moves the generic calculation and recover to the 'burst' original value before offloading to device driver. Signed-off-by: Po Liu <po.liu@nxp.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'MPTCP-improve-fallback-to-TCP'David S. Miller2020-06-294-201/+175
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Davide Caratti says: ==================== MPTCP: improve fallback to TCP there are situations where MPTCP sockets should fall-back to regular TCP: this series reworks the fallback code to pursue the following goals: 1) cleanup the non fallback code, removing most of 'if (<fallback>)' in the data path 2) improve performance for non-fallback sockets, avoiding locks in poll() further work will also leverage on this changes to achieve: a) more consistent behavior of gestockopt()/setsockopt() on passive sockets after fallback b) support for "infinite maps" as per RFC8684, section 3.7 the series is made of the following items: - patch 1 lets sendmsg() / recvmsg() / poll() use the main socket also after fallback - patch 2 fixes 'simultaneous connect' scenario after fallback. The problem was present also before the rework, but the fix is much easier to implement after patch 1 - patch 3, 4, 5 are clean-ups for code that is no more needed after the fallback rework - patch 6 fixes a race condition between close() and poll(). The problem was theoretically present before the rework, but it became almost systematic after patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mptcp: close poll() racesPaolo Abeni2020-06-291-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mptcp_poll always return POLLOUT for unblocking connect(), ensure that the socket is a suitable state. The MPTCP_DATA_READY bit is never cleared on accept: ensure we don't leave mptcp_accept() with an empty accept queue and such bit set. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mptcp: __mptcp_tcp_fallback() returns a struct sockPaolo Abeni2020-06-291-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently __mptcp_tcp_fallback() always return NULL on incoming connections, because MPTCP does not create the additional socket for the first subflow. Since the previous commit no __mptcp_tcp_fallback() caller needs a struct socket, so let __mptcp_tcp_fallback() return the first subflow sock and cope correctly even with incoming connections. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mptcp: create first subflow at msk creation timePaolo Abeni2020-06-291-33/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cleans the code a bit and makes the behavior more consistent. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mptcp: check for plain TCP sock at accept timePaolo Abeni2020-06-291-62/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cleanup the code a bit and avoid corrupted states on weird syscall sequence (accept(), connect()). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mptcp: fallback in case of simultaneous connectDavide Caratti2020-06-292-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when a MPTCP client tries to connect to itself, tcp_finish_connect() is never reached. Because of this, depending on the socket current state, multiple faulty behaviours can be observed: 1) a WARN_ON() in subflow_data_ready() is hit WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230 [...] CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187 [...] RIP: 0010:subflow_data_ready+0x18b/0x230 [...] Call Trace: tcp_data_queue+0xd2f/0x4250 tcp_rcv_state_process+0xb1c/0x49d3 tcp_v4_do_rcv+0x2bc/0x790 __release_sock+0x153/0x2d0 release_sock+0x4f/0x170 mptcp_shutdown+0x167/0x4e0 __sys_shutdown+0xe6/0x180 __x64_sys_shutdown+0x50/0x70 do_syscall_64+0x9a/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 2) client is stuck forever in mptcp_sendmsg() because the socket is not TCP_ESTABLISHED crash> bt 4847 PID: 4847 TASK: ffff88814b2fb100 CPU: 1 COMMAND: "gh35" #0 [ffff8881376ff680] __schedule at ffffffff97248da4 #1 [ffff8881376ff778] schedule at ffffffff9724a34f #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0 #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859 #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52 #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26 #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c RIP: 00007f126f6956ed RSP: 00007ffc2a320278 RFLAGS: 00000217 RAX: ffffffffffffffda RBX: 0000000020000044 RCX: 00007f126f6956ed RDX: 0000000000000004 RSI: 00000000004007b8 RDI: 0000000000000003 RBP: 00007ffc2a3202a0 R8: 0000000000400720 R9: 0000000000400720 R10: 0000000000400720 R11: 0000000000000217 R12: 00000000004004b0 R13: 00007ffc2a320380 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b 3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake didn't complete. $ tcpdump -tnnr bad.pcap IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0 force a fallback to TCP in these cases, and adjust the main socket state to avoid hanging in mptcp_sendmsg(). Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35 Reported-by: Christoph Paasch <cpaasch@apple.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: mptcp: improve fallback to TCPDavide Caratti2020-06-294-89/+98
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep using MPTCP sockets and a use "dummy mapping" in case of fallback to regular TCP. When fallback is triggered, skip addition of the MPTCP option on send. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22 Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: phy: marvell10g: support XFI rate matching modeBaruch Siach2020-06-291-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the hardware MACTYPE hardware configuration pins are set to "XFI with Rate Matching" the PHY interface operate at fixed 10Gbps speed. The MAC buffer packets in both directions to match various wire speeds. Read the MAC Type field in the Port Control register, and set the MAC interface speed accordingly. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge tag 'mlx5-tls-2020-06-26' of ↵David S. Miller2020-06-2943-493/+2128
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-tls-2020-06-26 1) Improve hardware layouts and structure for kTLS support 2) Generalize ICOSQ (Internal Channel Operations Send Queue) Due to the asynchronous nature of adding new kTLS flows and handling HW asynchronous kTLS resync requests, the XSK ICOSQ was extended to support generic async operations, such as kTLS add flow and resync, in addition to the existing XSK usages. 3) kTLS hardware flow steering and classification: The driver already has the means to classify TCP ipv4/6 flows to send them to the corresponding RSS HW engine, as reflected in patches 3 through 5, the series will add a steering layer that will hook to the driver's TCP classifiers and will match on well known kTLS connection, in case of a match traffic will be redirected to the kTLS decryption engine, otherwise traffic will continue flowing normally to the TCP RSS engine. 3) kTLS add flow RX HW offload support New offload contexts post their static/progress params WQEs (Work Queue Element) to communicate the newly added kTLS contexts over the per-channel async ICOSQ. The Channel/RQ is selected according to the socket's rxq index. A new TLS-RX workqueue is used to allow asynchronous addition of steering rules, out of the NAPI context. It will be also used in a downstream patch in the resync procedure. Feature is OFF by default. Can be turned on by: $ ethtool -K <if> tls-hw-rx-offload on 4) Added mlx5 kTLS sw stats and new counters are documented in Documentation/networking/tls-offload.rst rx_tls_ctx - number of TLS RX HW offload contexts added to device for decryption. rx_tls_ooo - number of RX packets which were part of a TLS stream but did not arrive in the expected order and triggered the resync procedure. rx_tls_del - number of TLS RX HW offload contexts deleted from device (connection has finished). rx_tls_err - number of RX packets which were part of a TLS stream but were not decrypted due to unexpected error in the state machine. 5) Asynchronous RX resync a. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. b. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Performance: CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off NIC: ConnectX-6 Dx 100GbE dual port Goodput (app-layer throughput) comparison: +---------------+-------+-------+---------+ | # connections | 1 | 4 | 8 | +---------------+-------+-------+---------+ | SW (Gbps) | 7.26 | 24.70 | 50.30 | +---------------+-------+-------+---------+ | HW (Gbps) | 18.50 | 64.30 | 92.90 | +---------------+-------+-------+---------+ | Speedup | 2.55x | 2.56x | 1.85x * | +---------------+-------+-------+---------+ * After linerate is reached, diff is observed in CPU util ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/mlx5e: kTLS, Improve rx handler function callTariq Toukan2020-06-277-21/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch mlx5e tls rx handler was called unconditionally on all rx frames and the decision whether a frame is a valid tls record is done inside that function. A function call can be expensive especially for regular rx packet rate. To avoid this, check the tls validity before jumping into the tls rx handler. While at it, split between kTLS device offload rx handler and FPGA tls rx handler using a similar method. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
| * | | net/mlx5e: kTLS, Cleanup redundant capability checkTariq Toukan2020-06-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers of mlx5e_ktls_build_netdev() check capability before the call. Remove the repeated check in the function. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/mlx5e: Increase Async ICO SQ sizeTariq Toukan2020-06-271-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resync communication with HW for kTLS RX is done via the async ICOSQs. kTLS RX resync requests might come in bursts. To improve the success chances for such bursts, use a larger ICOSQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/mlx5e: kTLS, Add kTLS RX statsTariq Toukan2020-06-274-3/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add global and per-channel ethtool SW stats for the device offload. Document the new counters in tls-offload.rst. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/mlx5e: kTLS, Add kTLS RX resync supportTariq Toukan2020-06-278-7/+381
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the RX resync procedure, using the TLS async resync API. The HW offload of TLS decryption in RX side might get out-of-sync due to out-of-order reception of packets. This requires SW intervention to update the HW context and get it back in-sync. Performance: CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off NIC: ConnectX-6 Dx 100GbE dual port Goodput (app-layer throughput) comparison: +---------------+-------+-------+---------+ | # connections | 1 | 4 | 8 | +---------------+-------+-------+---------+ | SW (Gbps) | 7.26 | 24.70 | 50.30 | +---------------+-------+-------+---------+ | HW (Gbps) | 18.50 | 64.30 | 92.90 | +---------------+-------+-------+---------+ | Speedup | 2.55x | 2.56x | 1.85x * | +---------------+-------+-------+---------+ * After linerate is reached, diff is observed in CPU util. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/tls: Add asynchronous resyncBoris Pismenny2020-06-272-1/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for asynchronous resynchronization in tls_device. Async resync follows two distinct stages: 1. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. 2. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | Revert "net/tls: Add force_resync for driver resync"Boris Pismenny2020-06-272-17/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b3ae2459f89773adcbf16fef4b68deaaa3be1929. Revert the force resync API. Not in use. To be replaced by a better async resync API downstream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/mlx5e: kTLS, Add kTLS RX HW offload supportTariq Toukan2020-06-2719-32/+529
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement driver support for the kTLS RX HW offload feature. Resync support is added in a downstream patch. New offload contexts post their static/progress params WQEs over the per-channel async ICOSQ, protected under a spin-lock. The Channel/RQ is selected according to the socket's rxq index. Feature is OFF by default. Can be turned on by: $ ethtool -K <if> tls-hw-rx-offload on A new TLS-RX workqueue is used to allow asynchronous addition of steering rules, out of the NAPI context. It will be also used in a downstream patch in the resync procedure. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/mlx5e: kTLS, Use kernel API to extract private offload contextTariq Toukan2020-06-271-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the implementation of the private kTLS TX HW offload context getter and setter, so it uses the kernel API functions, instead of a local shadow structure. A single BUILD_BUG_ON check is sufficient, remove the duplicate. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | net/mlx5e: kTLS, Improve TLS feature modularityTariq Toukan2020-06-2713-288/+376
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Better separate the code into c/h files, so that kTLS internals are exposed to the corresponding non-accel flow as follows: - Necessary datapath functions are exposed via ktls_txrx.h. - Necessary caps and configuration functions are exposed via ktls.h, which became very small. In addition, kTLS internal code sharing is done via ktls_utils.h, which is not exposed to any non-accel file. Add explicit WQE structures for the TLS static and progress params, breaking the union of the static with UMR, and the progress with PSV. Generalize the API as a preparation for TLS RX offload support. Move kTLS TX-specific code to the proper file. Remove the inline tag for function in C files, let the compiler decide. Use kzalloc/kfree for the priv_tx context. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>