summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Revert "net: dccp: initialize (addr,port) listening hashtable"David S. Miller2018-12-161-3/+0
| | | | | | | | | | This reverts commit ec49d83f245453515a9b6e88324e27bbcb69fbae. Cause build failures when DCCP is modular. ERROR: "inet_hashinfo2_init" [net/dccp/dccp.ko] undefined! Signed-off-by: David S. Miller <davem@davemloft.net>
* neighbor: Add protocol attributeDavid Ahern2018-12-163-1/+26
| | | | | | | | Similar to routes and rules, add protocol attribute to neighbor entries for easier tracking of how each was created. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* selftests: net: reuseport_addr_any: add DCCPPeter Oskolkov2018-12-161-2/+47
| | | | | | | This patch adds coverage of DCCP to reuseport_addr_any selftest. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dccp: initialize (addr,port) listening hashtablePeter Oskolkov2018-12-161-0/+3
| | | | | | | | | | | | | | | | | | | | | Commit d9fbc7f6431f "net: tcp: prefer listeners bound to an address" removes port-only listener lookups. This caused segfaults in DCCP lookups because DCCP did not initialize the (addr,port) hashtable. This patch adds said initialization. The only non-trivial issue here is the size of the new hashtable. It seemed reasonable to make it match the size of the port-only hashtable (= INET_LHTABLE_SIZE) that was used previously. Other parameters to inet_hashinfo2_init() match those used in TCP. Tested: syzcaller issues fixed; the second patch in the patchset tests that DCCP lookups work correctly. Fixes: d9fbc7f6431f "net: tcp: prefer listeners bound to an address" Reported-by: syzcaller <syzkaller@googlegroups.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: Add protocol field decompressionSam Protsenko2018-12-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | When Protocol Field Compression (PFC) is enabled, the "Protocol" field in PPP packet will be received without leading 0x00. See section 6.5 in RFC 1661 for details. So let's decompress protocol field if needed, the same way it's done in drivers/net/ppp/pptp.c. In case when "nopcomp" pppd option is not enabled, PFC (pcomp) can be negotiated during LCP handshake, and L2TP driver in kernel will receive PPP packets with compressed Protocol field, which in turn leads to next error: Protocol Rejected (unsupported protocol 0x2145) because instead of Protocol=0x0021 in PPP packet there will be Protocol=0x21. This patch unwraps it back to 0x0021, which fixes the issue. Sending the compressed Protocol field will be implemented in subsequent patch, this one is self-sufficient. Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'mlx5e-updates-2018-12-14' of ↵David S. Miller2018-12-1519-174/+1048
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-12-14 (VF Lag) From Aviv Heller, Subsequent patches introduce VF LAG, which provdies load-balancing and high-availability capabilities for VFs associated with different physical ports of the same Connect-X card. This series consists of the following: - mlx5 devcom, driver infrastructure that facilitates operations that involve both core devices (physical functions) of the same card, to synchronize and communicate between two driver instances of the same card. - Infrastructure for TC rule duplication. - Changes to LAG logic to enable its use when SR-IOV is enabled - PFs in switchdev mode is the only mode currently supported. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5: Handle LAG FW commands failure gracefullyAviv Heller2018-12-141-21/+70
| | | | | | | | | | | | | | | | | | | | When create_lag or destroy_lag FW commands fail, display an appropriate error message, and try to recover, if possible. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Make RoCE and SR-IOV LAG modes explicitAviv Heller2018-12-148-26/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of SR-IOV LAG, checking whether LAG is active is no longer good enough, since RoCE and SR-IOV LAG each entails different behavior by both the core and infiniband drivers. This patch introduces facilities to discern LAG type, in addition to mlx5_lag_is_active(). These are implemented in such a way as to allow more complex mode combinations in the future. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()Aviv Heller2018-12-141-9/+9
| | | | | | | | | | | | | | | | | | The new name better represents the function's aim, and sets a precedent for a '__' prefix for internal, non-locked versions of LAG APIs. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Allow co-enablement of uplink LAG and SRIOVRabie Loulou2018-12-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Enable setting uplink LAG if sriov is enabled on both ports in switchdev mode. Once the sriov mode is changed from switchdev for any of the ports, the LAG instance is disabled. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Allow/disallow LAG according to pre-req onlyRabie Loulou2018-12-145-61/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the lag forbid/allow functions, change the lag prereq check to run in the do-bond logic, so every change in the prereq state will cause LAG to be disabled/enabled accordingly after the next do-bond run. Add lag update function, so every component which changes the prereq state and want the LAG to re-calc the conditions can call the update function. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Adjustments for the activate LAG logic to run under sriovRabie Loulou2018-12-141-12/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When HW lag is set/unset, roce must not be enabled on the port, as such we wrap such changes with roce enable/disable either directly or through re-creation of IB device. Currently, lag and sriov are mutually exclusive, so by definition this code doesn't run under sriov. Towards changing this exclusion, we need to make sure that roce will not be enabled on the eswitch manager port under sriov since this is requirement of the switchdev mode. We are going strict here and avoiding this all together under sriov. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAGAviv Heller2018-12-141-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under uplink LAG, packets that match a flow might arrive on both uplink ports and transmitted through both as part of supporting aggregation and high-availability. When the netdevs representing the uplinks are set into LAG (bonding, teaming), duplicate the TC flow offloading into each of the per-uplink e-switches. Duplication is not required if the source is the bond device, since in this case it is assumed that the bond and the uplink netdevs share the same TC block, and thus duplication will occur naturally by the stack. Note that under encapsulation scheme, both flows will use the same neighbour and hence both will contribute to the last-used feedback towards the stack. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Offload TC e-switch rules with egress LAG deviceRabie Loulou2018-12-141-0/+9
| | | | | | | | | | | | | | | | | | | | | | When parsing TC FDB actions, if the egress device is a bond/team net-device which enslaved the uplink representor of the e-switch, use the uplink representor as the destination in the HW rule. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: In case of LAG, one switch parent id is used for all representorsRabie Loulou2018-12-142-13/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the uplink representors are put into lag, set all the representors (VFs and uplinks) of the same NIC to return the same switchdev id. Currently, the route lookup code on the encapsulation offload path assumes that same switchdev id for the source and dest devices means that the dest is also mlx5 HW netdev. This doesn't hold anymore when we align the switchdev Id of the uplinks to be same, which in turn causes the bond/team to return that id to the caller. As such, enhance the relevant check to take into account the uplink lag case. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rulesShahar Klein2018-12-142-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assign a counter dev attribute according to device capability and use it for management of counters related to offloaded eswitch TC flows. With upcoming support for uplink LAG, we have two HW rules per one logical SW (TC) rule. Although the HW supports attaching one counter to multiple rules, we are allocating counter per HW rule. We need this separation for two reasons: 1. "flow eswitch" counter affinity HW require the counter to be allocated on the device where the eswitch rule is set. 2. for some use-cases (multi-path routing) each HW flow relates to different neighbour, hence our neigh update logic must have a per-rule HW accountant in order to provide the proper feedback to the kernel. Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Infrastructure for duplicated offloading of TC flowsRoi Dayan2018-12-143-8/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under uplink LAG or multipath schemes, traffic that matches one flow might arrive on both uplink ports and transmitted through both as part of supporting aggregation and high-availability. To cope with the fact that the SW model might use logical SW port (e.g uplink team or bond) but we have two HW ports with e-switch on each, there are cases where in order to offload a SW TC rule we need to duplicate it to two HW flows. Since each HW rule has its own counter we also aggregate the counter of both rules when a flow stats query is executed from user-space. Introduce the changes for the different elements (add/delete/stats), currently nothing is duplicated. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: E-Switch, Add peer miss rulesRoi Dayan2018-12-143-1/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the sriov offloads mode, packets that are not matched by any other rule are sent towards the e-switch vport manager for further processing. Under upcoming patches (e.g for uplink LAG), packets sent from VF vports belonging to esw0 (e-switch related to PF0) might end up in esw1 (e-switch related to PF1) due to muxing logic applied by the FW. In such a case we still want the missed packet to be sent to the "base" esw manager vport in order to present the control plane a consistent view of the source (VF reresentor) port. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Introduce inter-device communication mechanismAviv Heller2018-12-145-4/+313
| | | | | | | | | | | | | | | | | | | | | | This introduces devcom, a generic mechanism for performing operations on both physical functions of the same Connect-X card. The first user of this API is merged eswitch, which will be introduced in subsequent patches. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * Merge branch 'mlx5-next' of ↵Saeed Mahameed2018-12-146-51/+74
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev conflicts. Highlights: 1) Lag refactroing and flow counter affinity bits. 2) mlx5 core cleanups By Roi Dayan (2) and others * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Fold the modify lag code into function net/mlx5: Add lag affinity info to log net/mlx5: Split the activate lag function into two routines net/mlx5: E-Switch, Introduce flow counter affinity IB/mlx5: Unify e-switch representors load approach between uplink and VFs net/mlx5: Use lowercase 'X' for hex values net/mlx5: Remove duplicated include from eswitch.c Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Fold the modify lag code into functionShahar Klein2018-12-141-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Handle the code of modifying the lag affinity within a separate function. Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Add lag affinity info to logRoi Dayan2018-12-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Report the initial LAG port affinity upon LAG creation. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Split the activate lag function into two routinesRoi Dayan2018-12-141-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split the activate lag function in order to be symmetric with the deactivate lag function. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: E-Switch, Introduce flow counter affinityShahar Klein2018-12-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This dictates the device affinity for eswitch flow counters, set by the FW according to the HW device capabilities. Under "source eswitch" affinity, the counter should be allocated on the device related to the source vport in the match. This covers both non merged e-switch mode as well as old FW that does not advertise this cap. Under "flow eswitch" affinity, the counter should be allocated on the device where the eswitch rule is set. Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * IB/mlx5: Unify e-switch representors load approach between uplink and VFsMark Bloch2018-12-143-22/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When in switchdev mode and the add function is called by the core level driver, make sure we only register the callbacks, but don't create the mlx5 IB device or initialize anything. With this change all the IB devices in switchdev mode are created only once the load callback is invoked by the e-switch core sub-module. This follows the design paradigm under which the all the Eth representors must be loaded before any of IB reprs is loaded. Signed-off-by: Mark Bloch <markb@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Use lowercase 'X' for hex valuesSaeed Mahameed2018-12-141-8/+8
| | | | | | | | | | | | | | | | | | | | | Apparently gcc is cool with upper case '0X' but it is not commonly used. Replace '0X' with lowercase '0x' in mlx5_ifc.h file. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Remove duplicated include from eswitch.cYueHaibing2018-12-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | Merge branch 'net-mitigate-retpoline-overhead'David S. Miller2018-12-159-22/+136
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Paolo Abeni says: ==================== net: mitigate retpoline overhead The spectre v2 counter-measures, aka retpolines, are a source of measurable overhead[1]. We can partially address that when the function pointer refers to a builtin symbol resorting to a list of tests vs well-known builtin function and direct calls. Experimental results show that replacing a single indirect call via retpoline with several branches and a direct call gives performance gains even when multiple branches are added - 5 or more, as reported in [2]. This may lead to some uglification around the indirect calls. In netconf 2018 Eric Dumazet described a technique to hide the most relevant part of the needed boilerplate with some macro help. This series is a [re-]implementation of such idea, exposing the introduced helpers in a new header file. They are later leveraged to avoid the indirect call overhead in the GRO path, when possible. Overall this gives > 10% performance improvement for UDP GRO benchmark and smaller but measurable for TCP syn flood. The added infra can be used in follow-up patches to cope with retpoline overhead in other points of the networking stack (e.g. at the qdisc layer) and possibly even in other subsystems. v2 -> v3: - fix build error with CONFIG_IPV6=m v1 -> v2: - list explicitly the builtin function names in INDIRECT_CALL_*(), as suggested by Ed Cree - expand the recipients list rfc -> v1: - use branch prediction hints, as suggested by Eric [1] http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf [2] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | udp: use indirect call wrappers for GRO socket lookupPaolo Abeni2018-12-151-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids another indirect call for UDP GRO. Again, the test for the IPv6 variant is performed first. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: use indirect call wrappers at GRO transport layerPaolo Abeni2018-12-157-15/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids an indirect call in the receive path for TCP and UDP packets. TCP takes precedence on UDP, so that we have a single additional conditional in the common case. When IPV6 is build as module, all gro symbols except UDPv6 are builtin, while the latter belong to the ipv6 module, so we need some special care. v1 -> v2: - adapted to INDIRECT_CALL_ changes v2 -> v3: - fix build issue with CONFIG_IPV6=m Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: use indirect call wrappers at GRO network layerPaolo Abeni2018-12-153-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids an indirect calls for L3 GRO receive path, both for ipv4 and ipv6, if the latter is not compiled as a module. Note that when IPv6 is compiled as builtin, it will be checked first, so we have a single additional compare for the more common path. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | indirect call wrappers: helpers to speed-up indirect calls of builtinPaolo Abeni2018-12-151-0/+51
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This header define a bunch of helpers that allow avoiding the retpoline overhead when calling builtin functions via function pointers. It boils down to explicitly comparing the function pointers to known builtin functions and eventually invoke directly the latter. The macros defined here implement the boilerplate for the above schema and will be used by the next patches. rfc -> v1: - use branch prediction hint, as suggested by Eric v1 -> v2: - list explicitly the builtin function names in INDIRECT_CALL_*(), as suggested by Ed Cree Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: socionext: remove mmio reads on TxIlias Apalodimas2018-12-151-43/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver issues 2 mmio reads to figure out the number of transmitted packets and clean them. We can get rid of the expensive reads since BIT 31 of the Tx descriptor can be used for that. We can also remove the budget counting of Tx completions since all of the descriptors are not deliberately processed. Performance numbers using pktgen are: size pre-patch(pps) post-patch(pps) 64 362483 427916 128 358315 411686 256 352725 389683 512 215675 216464 1024 113812 114442 Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: socionext: correctly recover txq after being fullIlias Apalodimas2018-12-151-11/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running pktgen with packets sizes > 512b ends up in the interface Txq getting stuck. "netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!" appears on dmesg but the interface never recovers. It requires an ifconfig down/up to make the interface usable again. The reason that triggers this, is a race condition between .ndo_start_xmit and the napi completion. The available budget is calculated first and indicates the queue is full. Due to a costly netif_err() the queue is not stopped in time while the napi completion runs, clears the irq and frees up descriptors, thus the queue never wakes up again. Fix this by moving the print after stopping the queue, make the print ratelimited, add barriers and check for cleaned descriptors.. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | dt-bindings: net: ravb: Add support for r8a774c0 SoCFabrizio Castro2018-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document RZ/G2E (R8A774C0) SoC bindings. Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | neighbor: Improve neighbour struct layoutDavid Ahern2018-12-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes. Ensure ha element is always 8-byte aligned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: simplify the qdisc_leaf codeTonghao Zhang2018-12-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Except for returning, the var leaf is not used in the qdisc_leaf(). For simplicity, remove it. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: Fix handling of LLA with VRF and sockets bound to VRFDavid Ahern2018-12-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent commit allows sockets bound to a VRF to receive ipv6 link local packets. However, it only works for UDP and worse TCP connection attempts to the LLA with the only listener bound to the VRF just hang where as before the client gets a reset and connection refused. Fix by adjusting ir_iif for LL addresses and packets received through a device enslaved to a VRF. Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF") Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | r8169: improve spurious interrupt detectionHeiner Kallweit2018-12-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Improve detection of spurious interrupts by checking against the interrupt mask as currently set in the chip. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()Yangtao Li2018-12-153-117/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the DEFINE_SHOW_ATTRIBUTE macro to simplify some code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipconfig: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li2018-12-151-12/+1
| | | | | | | | | | | | | | | | | | | | | Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'David S. Miller2018-12-1511-5/+1366
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Salil Mehta says: ==================== net: hns3: Add more commands to Debugfs in HNS3 driver This patch-set adds few more debugfs commands to HNS3 Ethernet Driver. Support has been added to query info related to below items: 1. Packet buffer descriptor ("echo bd info [queue no] [bd index] > cmd") 2. Manager table("echo dump mng tbl > cmd") 3. Dfx status register("echo dump reg ssu [prt id] > cmd") 4. Dcb status register("echo dump reg dcb [port id] > cmd") 5. Queue map ("echo queue map [queue no] > cmd") 6. Tm map ("echo tm map [queue no] > cmd") NOTE: Above commands are *read-only* and are only intended to query the information from the SoC(and dump inside the kernel, for now) and in no way tries to perform write operations for the purpose of configuration etc. Change Log: V1-->V2: 1. Addressed the GCC-8.2 compiler issue reported by David S. Miller. Link: https://lkml.org/lkml/2018/12/14/1298 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hns3: Add "tm map" status information query functionliuzhongzhu2018-12-154-0/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints dcb register status information by module. debugfs command: root@(none)# echo dump tm map 100 > cmd queue_id | qset_id | pri_id | tc_id 0100 | 0065 | 08 | 00 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hns3: Add "queue map" information query functionliuzhongzhu2018-12-157-2/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints queue map information. debugfs command: echo dump queue map > cmd Sample Command: root@(none)# echo queue map > cmd local queue id | global queue id | vector id 0 32 769 1 33 770 2 34 771 3 35 772 4 36 773 5 37 774 6 38 775 7 39 776 8 40 777 9 41 778 10 42 779 11 43 780 12 44 781 13 45 782 14 46 783 15 47 784 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hns3: Add "dcb register" status information query functionliuzhongzhu2018-12-154-0/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints dcb register status information by module. debugfs command: root@(none)# echo dump reg dcb > cmd roce_qset_mask: 0x0 nic_qs_mask: 0x0 qs_shaping_pass: 0x0 qs_bp_sts: 0x0 pri_mask: 0x0 pri_cshaping_pass: 0x0 pri_pshaping_pass: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hns3: Add "status register" information query functionliuzhongzhu2018-12-154-0/+885
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints status register information by module. debugfs command: echo dump reg [mode name] > cmd Sample Command: root@(none)# echo dump reg bios common > cmd BP_CPU_STATE: 0x0 DFX_MSIX_INFO_NIC_0: 0xc000 DFX_MSIX_INFO_NIC_1: 0xf DFX_MSIX_INFO_NIC_2: 0x2 DFX_MSIX_INFO_NIC_3: 0x2 DFX_MSIX_INFO_ROC_0: 0xc000 DFX_MSIX_INFO_ROC_1: 0x0 DFX_MSIX_INFO_ROC_2: 0x0 DFX_MSIX_INFO_ROC_3: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hns3: Add "manager table" information query functionliuzhongzhu2018-12-154-2/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints manager table information. debugfs command: echo dump mng tbl > cmd Sample Command: root@(none)# echo dump mng tbl > cmd entry|mac_addr |mask|ether|mask|vlan|mask|i_map|i_dir|e_type 00 |01:00:5e:00:00:01|0 |00000|0 |0000|0 |00 |00 |0 01 |c2:f1:c5:82:68:17|0 |00000|0 |0000|0 |00 |00 |0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: hns3: Add "bd info" query functionliuzhongzhu2018-12-151-1/+78
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints Sending and receiving package descriptor information. debugfs command: echo dump bd info 1 > cmd Sample Command: root@(none)# echo bd info 1 > cmd hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0 hns3 0000:7d:00.0: (TX) addr: 0x0 hns3 0000:7d:00.0: (TX)vlan_tag: 0 hns3 0000:7d:00.0: (TX)send_size: 0 hns3 0000:7d:00.0: (TX)vlan_tso: 0 hns3 0000:7d:00.0: (TX)l2_len: 0 hns3 0000:7d:00.0: (TX)l3_len: 0 hns3 0000:7d:00.0: (TX)l4_len: 0 hns3 0000:7d:00.0: (TX)vlan_tag: 0 hns3 0000:7d:00.0: (TX)tv: 0 hns3 0000:7d:00.0: (TX)vlan_msec: 0 hns3 0000:7d:00.0: (TX)ol2_len: 0 hns3 0000:7d:00.0: (TX)ol3_len: 0 hns3 0000:7d:00.0: (TX)ol4_len: 0 hns3 0000:7d:00.0: (TX)paylen: 0 hns3 0000:7d:00.0: (TX)vld_ra_ri: 0 hns3 0000:7d:00.0: (TX)mss: 0 hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120 hns3 0000:7d:00.0: (RX)addr: 0xffee7000 hns3 0000:7d:00.0: (RX)pkt_len: 0 hns3 0000:7d:00.0: (RX)size: 0 hns3 0000:7d:00.0: (RX)rss_hash: 0 hns3 0000:7d:00.0: (RX)fd_id: 0 hns3 0000:7d:00.0: (RX)vlan_tag: 0 hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0 hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0 hns3 0000:7d:00.0: (RX)bd_base_info: 0 Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'net-prefer-listeners-bound-to-an-address'David S. Miller2018-12-148-217/+325
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peter Oskolkov says: ==================== net: prefer listeners bound to an address A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patchset eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In a future patchset I plan to explore whether it is possible to remove port-only hashtables completely: additional refactoring will be required, as some non-lookup code uses the hashtables. ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | selftests: net: test that listening sockets match on address properlyPeter Oskolkov2018-12-144-2/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a selftest that verifies that a socket listening on a specific address is chosen in preference over sockets that listen on any address. The test covers UDP/UDP6/TCP/TCP6. It is based on, and similar to, reuseport_dualstack.c selftest. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>