summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* selftests: net: cmsg_so_mark: test ICMP and RAW socketsJakub Kicinski2022-02-101-8/+16
| | | | | | | | | | | | | | | | | | | Use new capabilities of cmsg_sender to test ICMP and RAW sockets, previously only UDP was tested. Before SO_MARK support was added to ICMPv6: # ./cmsg_so_mark.sh Case ICMP rejection returned 0, expected 1 FAIL - 1/12 cases failed After: # ./cmsg_so_mark.sh OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* selftests: net: cmsg_sender: support icmp and raw socketsJakub Kicinski2022-02-101-9/+55
| | | | | | | Support sending fake ICMP(v6) messages and UDP via RAW sockets. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* selftests: net: make cmsg_so_mark ready for more optionsJakub Kicinski2022-02-102-26/+117
| | | | | | | | Parametrize the code so that it can support UDP and ICMP sockets in the future, and more cmsg types. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* selftests: net: rename cmsg_so_markJakub Kicinski2022-02-104-6/+6
| | | | | | | Rename the file in prep for generalization. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ping6: support setting socket options via cmsgJakub Kicinski2022-02-101-4/+9
| | | | | | | | | | Minor reordering of the code and a call to sock_cmsg_send() gives us support for setting the common socket options via cmsg (the usual ones - SO_MARK, SO_TIMESTAMPING_OLD, SCM_TXTIME). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ping6: support packet timestampingJakub Kicinski2022-02-101-0/+1
| | | | | | | | | | Nothing prevents the user from requesting timestamping on ping6 sockets, yet timestamps are not going to be reported. Plumb the flags through. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ping6: remove a pr_debug() statementJakub Kicinski2022-02-101-2/+0
| | | | | | | | | We have ftrace and BPF today, there's no need for printing arguments at the start of a function. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'ieee802154-for-davem-2022-02-10' of ↵David S. Miller2022-02-105-122/+92
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2022-02-10 An update from ieee802154 for your *net-next* tree. There is more ongoing in ieee802154 than usual. This will be the first pull request for this cycle, but I expect one more. Depending on review and rework times. Pavel Skripkin ported the atusb driver over to the new USB api to avoid unint problems as well as making use of the modern api without kmalloc() needs in he driver. Miquel Raynal landed some changes to ensure proper frame checksum checking with hwsim, documenting our use of wake and stop_queue and eliding a magic value by using the proper define. David Girault documented the address struct used in ieee802154. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ieee802154: Provide a kdoc to the address structureDavid Girault2022-02-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | Give this structure a header to better explain its content. Signed-off-by: David Girault <david.girault@qorvo.com> [miquel.raynal@bootlin.com: Isolate this change from a bigger commit and reword the comment] Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220201180956.93581-1-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| * net: mac802154: Explain the use of ieee802154_wake/stop_queue()Miquel Raynal2022-01-281-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not straightforward to the newcomer that a single skb can currently be sent at a time and that the internal process is to stop the queue when processing a frame before re-enabling it. Make this clear by documenting the ieee802154_wake/stop_queue() helpers. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220125122540.855604-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| * net: ieee802154: Use the IEEE802154_MAX_PAGE define when relevantMiquel Raynal2022-01-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | This define already exist but is hardcoded in nl-phy.c. Use the definition when relevant. While at it, also convert the type from uint32_t to u32. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220125122540.855604-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| * net: ieee802154: hwsim: Ensure frame checksum are validMiquel Raynal2022-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | There is no point in accepting frames with a wrong or missing checksum, at least not outside of a promiscuous setting. Set the right flag by default in the hwsim driver to ensure checksums are not ignored. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220125122540.855604-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
| * ieee802154: atusb: move to new USB APIPavel Skripkin2022-01-101-119/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Old USB API is prone to uninit value bugs if error handling is not correct. Let's move atusb to use new USB API to 1) Make code more simple, since new API does not require memory to be allocates via kmalloc() 2) Defend driver from usb-related uninit value bugs. 3) Make code more modern and simple This patch removes atusb usb wrappers as Greg suggested [0], this will make code more obvious and easier to understand over time, and replaces old API calls with new ones. Also this patch adds and updates usb related error handling to prevent possible uninit value bugs in future Link: https://lore.kernel.org/all/YdL0GPxy4TdGDzOO@kroah.com/ [0] Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
* | Merge branch '100GbE' of ↵David S. Miller2022-02-1041-689/+4150
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-02-09 This series contains updates to ice driver only. Brett adds support for QinQ. This begins with code refactoring and re-organization of VLAN configuration functions to allow for introduction of VSI VLAN ops to enable setting and calling of respective operations based on device support of single or double VLANs. Implementations are added for outer VLAN support. To support QinQ, the device must be set to double VLAN mode (DVM). In order for this to occur, the DDP package and NVM must also support DVM. Functions to determine compatibility and properly configure the device are added as well as setting the proper bits to advertise and utilize the proper offloads. Support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 is also included to allow for VF to negotiate and utilize this functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ice: Add ability for PF admin to enable VF VLAN pruningBrett Creeley2022-02-094-2/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFs by default are able to see all tagged traffic regardless of trust and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers expect VFs to receive all VLAN tagged traffic with a matching destination MAC. Add an ethtool private flag 'vf-vlan-pruning' and set the default to off so VFs will receive all VLAN traffic directed towards them. When the flag is turned on, VF will only be able to receive untagged traffic or traffic with VLAN tags it has created interfaces for. Also, the flag cannot be changed while any VFs are allocated. This was done to simplify the implementation. So, if this flag is needed, then the PF admin must enable it. If the user tries to enable the flag while VFs are active, then print an unsupported message with the vf-vlan-pruning flag included. In case multiple flags were specified, this makes it clear to the user which flag failed. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Add support for 802.1ad port VLANs VFBrett Creeley2022-02-091-7/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is only support for 802.1Q port VLANs on SR-IOV VFs. Add support to also allow 802.1ad port VLANs when double VLAN mode is enabled. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Advertise 802.1ad VLAN filtering and offloads for PF netdevBrett Creeley2022-02-092-49/+238
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order for the driver to support 802.1ad VLAN filtering and offloads, it needs to advertise those VLAN features and also support modifying those VLAN features, so make the necessary changes to ice_set_netdev_features(). By default, enable CTAG insertion/stripping and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM). Also, in DVM, enable STAG filtering by default. This is done by setting the feature bits in netdev->features. Also, in DVM, support toggling of STAG insertion/stripping, but don't enable them by default. This is done by setting the feature bits in netdev->hw_features. Since 802.1ad VLAN filtering and offloads are only supported in DVM, make sure they are not enabled by default and that they cannot be enabled during runtime, when the device is in SVM. Add an implementation for the ndo_fix_features() callback. This is needed since the hardware cannot support multiple VLAN ethertypes for VLAN insertion/stripping simultaneously and all supported VLAN filtering must either be enabled or disabled together. Disable inner VLAN stripping by default when DVM is enabled. If a VSI supports stripping the inner VLAN in DVM, then it will have to configure that during runtime. For example if a VF is configured in a port VLAN while DVM is enabled it will be allowed to offload inner VLANs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Support configuring the device to Double VLAN ModeBrett Creeley2022-02-0917-59/+996
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support configuring the device in Double VLAN Mode (DVM), the DDP and FW have to support DVM. If both support DVM, the PF that downloads the package needs to update the default recipes, set the VLAN mode, and update boost TCAM entries. To support updating the default recipes in DVM, add support for updating an existing switch recipe's lkup_idx and mask. This is done by first calling the get recipe AQ (0x0292) with the desired recipe ID. Then, if that is successful update one of the lookup indices (lkup_idx) and its associated mask if the mask is valid otherwise the already existing mask will be used. The VLAN mode of the device has to be configured while the global configuration lock is held while downloading the DDP, specifically after the DDP has been downloaded. If supported, the device will default to DVM. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2Brett Creeley2022-02-096-43/+1226
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the VF driver to be able to request VIRTCHNL_VF_OFFLOAD_VLAN_V2, negotiate its VLAN capabilities via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS, add/delete VLAN filters, and enable/disable VLAN offloads. VFs supporting VIRTCHNL_OFFLOAD_VLAN_V2 will be able to use the following virtchnl opcodes: VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS VIRTCHNL_OP_ADD_VLAN_V2 VIRTCHNL_OP_DEL_VLAN_V2 VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2 VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 Legacy VF drivers may expect the initial VLAN stripping settings to be configured by the PF, so the PF initializes VLAN stripping based on the VIRTCHNL_OP_GET_VF_RESOURCES opcode. However, with VLAN support via VIRTCHNL_VF_OFFLOAD_VLAN_V2, this function is only expected to be used for VFs that only support VIRTCHNL_VF_OFFLOAD_VLAN, which will only be supported when a port VLAN is configured. Update the function based on the new expectations. Also, change the message when the PF can't enable/disable VLAN stripping to a dev_dbg() as this isn't fatal. When a VF isn't in a port VLAN and it only supports VIRTCHNL_VF_OFFLOAD_VLAN when Double VLAN Mode (DVM) is enabled, then the PF needs to reject the VIRTCHNL_VF_OFFLOAD_VLAN capability and configure the VF in software only VLAN mode. To do this add the new function ice_vf_vsi_cfg_legacy_vlan_mode(), which updates the VF's inner and outer ice_vsi_vlan_ops functions and sets up software only VLAN mode. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Add hot path support for 802.1Q and 802.1ad VLAN offloadsBrett Creeley2022-02-099-22/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver only supports 802.1Q VLAN insertion and stripping. However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q and 802.1ad VLAN insertion and stripping will be supported. Unfortunately the VSI context parameters only allow for one VLAN ethertype at a time for VLAN offloads so only one or the other VLAN ethertype offload can be supported at once. To support this, multiple changes are needed. Rx path changes: [1] In DVM, the Rx queue context l2tagsel field needs to be cleared so the outermost tag shows up in the l2tag2_2nd field of the Rx flex descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain 1 to support SVM configurations. [2] Modify the ice_test_staterr() function to take a __le16 instead of the ice_32b_rx_flex_desc union pointer so this function can be used for both rx_desc->wb.status_error0 and rx_desc->wb.status_error1. [3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that checks if there is a VLAN tag in l2tag1 or l2tag2_2nd. [4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX is enabled in netdev->features. If it is, then this is the VLAN ethertype that needs to be added to the stripping VLAN tag. Since ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad. Tx path changes: [1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was added to the list of tx_flags to handle this case. [2] When the stack requests the VLAN tag to be offloaded on Tx, the driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1 respectively. To determine which location to use, set a bit in the Tx ring flags field during ring allocation that can be used to determine which field to use in the Tx descriptor. In DVM, always use l2tag2, and in SVM, always use l2tag1. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Add outer_vlan_ops and VSI specific VLAN ops implementationsBrett Creeley2022-02-0916-86/+813
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN ops are only available when the device is in Double VLAN Mode (DVM). Depending on the VSI type, the requirements for what operations to use/allow differ. By default all VSI's have unsupported inner and outer VSI VLAN ops. This implementation was chosen to prevent unexpected crashes due to null pointer dereferences. Instead, if a VSI calls an unsupported op, it will just return -EOPNOTSUPP. Add implementations to support modifying outer VLAN fields for VSI context. This includes the ability to modify VLAN stripping, insertion, and the port VLAN based on the outer VLAN handling fields of the VSI context. These functions should only ever be used if DVM is enabled because that means the firmware supports the outer VLAN fields in the VSI context. If the device is in DVM, then always use the outer_vlan_ops, else use the vlan_ops since the device is in Single VLAN Mode (SVM). Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to ice_vsi_vlan_setup() as the latter function is specific to the PF and all other VSI types that need an untagged VLAN 0 filter already do this in their specific flows. Without this change, Flow Director is failing to initialize because it does not implement any VSI VLAN ops. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Adjust naming for inner VLAN operationsBrett Creeley2022-02-096-142/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current operations act on inner VLAN fields. To support double VLAN, outer VLAN operations and functions will be implemented. Add the "inner" naming to existing VLAN operations to distinguish them from the upcoming outer values and functions. Some spacing adjustments are made to align values. Note that the inner is not talking about a tunneled VLAN, but the second VLAN in the packet. For SVM the driver uses inner or single VLAN filtering and offloads and in Double VLAN Mode the driver uses the inner filtering and offloads for SR-IOV VFs in port VLANs in order to support offloading the guest VLAN while a port VLAN is configured. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Use the proto argument for VLAN opsBrett Creeley2022-02-0911-26/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the proto argument is unused. This is because the driver only supports 802.1Q VLAN filtering. This policy is enforced via netdev features that the driver sets up when configuring the netdev, so the proto argument won't ever be anything other than 802.1Q. However, this will allow for future iterations of the driver to seemlessly support 802.1ad filtering. Begin using the proto argument and extend the related structures to support its use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Refactor vf->port_vlan_info to use ice_vlanBrett Creeley2022-02-092-35/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current vf->port_vlan_info variable is a packed u16 that contains the port VLAN ID and QoS/prio value. This is fine, but changes are incoming that allow for an 802.1ad port VLAN. Add flexibility by changing the vf->port_vlan_info member to be an ice_vlan structure. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Introduce ice_vlan structBrett Creeley2022-02-0910-59/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new struct for VLAN related information. Currently this holds VLAN ID and priority values, but will be expanded to hold TPID value. This reduces the changes necessary if any other values are added in future. Remove the action argument from these calls as it's always ICE_FWD_VSI. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Add new VSI VLAN opsBrett Creeley2022-02-0914-334/+450
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and offloads require more flexibility when configuring VLANs. The VSI VLAN interface will allow flexibility for configuring VLANs for all VSI types. Add new files to separate the VSI VLAN ops and move functions to make the code more organized. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Add helper function for adding VLAN 0Brett Creeley2022-02-094-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple places where VLAN 0 is being added. Create a function to be called in order to minimize changes as the implementation is expanded to support double VLAN and avoid duplicated code. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | ice: Refactor spoofcheck configuration functionsBrett Creeley2022-02-092-50/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add functions to configure Tx VLAN antispoof based on iproute configuration and/or VLAN mode and VF driver support. This is needed later so the driver can control when it can be configured. Also, add functions that can be used to enable and disable MAC and VLAN spoofcheck. Move spoofchk configuration during VSI setup into the SR-IOV initialization path and into the post VSI rebuild flow for VF VSIs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski2022-02-0933-524/+449
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Conntrack sets on CHECKSUM_UNNECESSARY for UDP packet with no checksum, from Kevin Mitchell. 2) skb->priority support for nfqueue, from Nicolas Dichtel. 3) Remove conntrack extension register API, from Florian Westphal. 4) Move nat destroy hook to nf_nat_hook instead, to remove nf_ct_ext_destroy(), also from Florian. 5) Wrap pptp conntrack NAT hooks into single structure, from Florian Westphal. 6) Support for tcp option set to noop for nf_tables, also from Florian. 7) Do not run x_tables comment match from packet path in nf_tables, from Florian Westphal. 8) Replace spinlock by cmpxchg() loop to update missed ct event, from Florian Westphal. 9) Wrap cttimeout hooks into single structure, from Florian. 10) Add fast nft_cmp expression for up to 16-bytes. 11) Use cb->ctx to store context in ctnetlink dump, instead of using cb->args[], from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: ctnetlink: use dump structure instead of raw args nfqueue: enable to set skb->priority netfilter: nft_cmp: optimize comparison for 16-bytes netfilter: cttimeout: use option structure netfilter: ecache: don't use nf_conn spinlock netfilter: nft_compat: suppress comment match netfilter: exthdr: add support for tcp option removal netfilter: conntrack: pptp: use single option structure netfilter: conntrack: remove extension register api netfilter: conntrack: handle ->destroy hook via nat_ops instead netfilter: conntrack: move extension sizes into core netfilter: conntrack: make all extensions 8-byte alignned netfilter: nfqueue: enable to get skb->priority netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY ==================== Link: https://lore.kernel.org/r/20220209133616.165104-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | netfilter: ctnetlink: use dump structure instead of raw argsFlorian Westphal2022-02-091-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netlink_dump structure has a union of 'long args[6]' and a context buffer as scratch space. Convert ctnetlink to use a structure, its easier to read than the raw 'args' usage which comes with no type checks and no readable names. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | nfqueue: enable to set skb->priorityNicolas Dichtel2022-02-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up of the previous patch that enables to get skb->priority. It's now posssible to set it also. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nft_cmp: optimize comparison for 16-bytesPablo Neira Ayuso2022-02-093-2/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow up to 16-byte comparisons with a new cmp fast version. Use two 64-bit words and calculate the mask representing the bits to be compared. Make sure the comparison is 64-bit aligned and avoid out-of-bound memory access on registers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: cttimeout: use option structureFlorian Westphal2022-02-093-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of two exported functions, export a single option structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: ecache: don't use nf_conn spinlockFlorian Westphal2022-02-092-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For updating eache missed value we can use cmpxchg. This also avoids need to disable BH. kernel robot reported build failure on v1 because not all arches support cmpxchg for u16, so extend this to u32. This doesn't increase struct size, existing padding is used. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nft_compat: suppress comment matchFlorian Westphal2022-02-041-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to have the datapath call the always-true comment match stub. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: exthdr: add support for tcp option removalFlorian Westphal2022-02-041-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows to replace a tcp option with nop padding to selectively disable a particular tcp option. Optstrip mode is chosen when userspace passes the exthdr expression with neither a source nor a destination register attribute. This is identical to xtables TCPOPTSTRIP extension. The only difference is that TCPOPTSTRIP allows to pass in a bitmap of options to remove rather than a single number. Unlike TCPOPTSTRIP this expression can be used multiple times in the same rule to get the same effect. We could add a new nested attribute later on in case there is a use case for single-expression-multi-remove. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: conntrack: pptp: use single option structureFlorian Westphal2022-02-043-77/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of exposing the four hooks individually use a sinle hook ops structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: conntrack: remove extension register apiFlorian Westphal2022-02-0419-292/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These no longer register/unregister a meaningful structure so remove it. Cc: Paul Blakey <paulb@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: conntrack: handle ->destroy hook via nat_ops insteadFlorian Westphal2022-02-045-36/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nat module already exposes a few functions to the conntrack core. Move the nat extension destroy hook to it. After this, no conntrack extension needs a destroy hook. 'struct nf_ct_ext_type' and the register/unregister api can be removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: conntrack: move extension sizes into coreFlorian Westphal2022-02-0413-58/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to specify this in the registration modules, we already collect all sizes for build-time checks on the maximum combined size. After this change, all extensions except nat have no meaningful content in their nf_ct_ext_type struct definition. Next patch handles nat, this will then allow to remove the dynamic register api completely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: conntrack: make all extensions 8-byte alignnedFlorian Westphal2022-02-0412-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All extensions except one need 8 byte alignment, so just make that the default. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nfqueue: enable to get skb->priorityNicolas Dichtel2022-02-042-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This info could be useful to improve traffic analysis. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARYKevin Mitchell2022-02-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The udp_error function verifies the checksum of incoming UDP packets if one is set. This has the desirable side effect of setting skb->ip_summed to CHECKSUM_COMPLETE, signalling that this verification need not be repeated further up the stack. Conversely, when the UDP checksum is empty, which is perfectly legal (at least inside IPv4), udp_error previously left no trace that the checksum had been deemed acceptable. This was a problem in particular for nf_reject_ipv4, which verifies the checksum in nf_send_unreach() before sending ICMP_DEST_UNREACH. It makes no accommodation for zero UDP checksums unless they are already marked as CHECKSUM_UNNECESSARY. This commit ensures packets with empty UDP checksum are marked as CHECKSUM_UNNECESSARY, which is explicitly recommended in skbuff.h. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.Sebastian Andrzej Siewior2022-02-092-25/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations") removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired. On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves is also acquired with enabled bottom halves followed by disabling bottom halves. This is the reverse locking order. It has been observed with inet_listen_hashbucket::lock: local_bh_disable() + spin_lock(&ilb->lock): inet_listen() inet_csk_listen_start() sk->sk_prot->hash() := inet_hash() local_bh_disable() __inet_hash() spin_lock(&ilb->lock); acquire(&ilb->lock); Reverse order: spin_lock(&ilb2->lock) + local_bh_disable(): tcp_seq_next() listening_get_next() spin_lock(&ilb2->lock); acquire(&ilb2->lock); tcp4_seq_show() get_tcp4_sock() sock_i_ino() read_lock_bh(&sk->sk_callback_lock); acquire(softirq_ctrl) // <---- whoops acquire(&sk->sk_callback_lock) Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock. Split inet_unhash() and acquire the listen_hashbucket lock without disabling bottom halves; the inet_ehash lock with disabled bottom halves. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2022-02-09201-2215/+4049
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * \ \ \ Merge branch 'Split bpf_sk_lookup remote_port field'Alexei Starovoitov2022-02-095-5/+14
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jakub Sitnicki says: ==================== Following the recent split-up of the bpf_sock dst_port field, apply the same to technique to the bpf_sk_lookup remote_port field to make uAPI more user friendly. v1 -> v2: - Remove remote_port range check and cast to be16 in TEST_RUN for sk_lookup (kernel test robot) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookupJakub Sitnicki2022-02-092-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the context access tests for sk_lookup prog to cover the surprising case of a 4-byte load from the remote_port field, where the expected value is actually shifted by 16 bits. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220209184333.654927-3-jakub@cloudflare.com
| | * | | | bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wideJakub Sitnicki2022-02-093-4/+6
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order. First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide"). Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits. Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
| * | | | libbpf: Fix compilation warning due to mismatched printf formatAndrii Nakryiko2022-02-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ppc64le architecture __s64 is long int and requires %ld. Cast to ssize_t and use %zd to avoid architecture-specific specifiers. Fixes: 4172843ed4a3 ("libbpf: Fix signedness bug in btf_dump_array_data()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org
| * | | | Merge branch 'libbpf: Add syscall-specific variant of BPF_KPROBE'Andrii Nakryiko2022-02-083-0/+64
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hengqi Chen says: ==================== Add new macro BPF_KPROBE_SYSCALL, which provides easy access to syscall input arguments. See [0] and [1] for background. [0]: https://github.com/libbpf/libbpf-bootstrap/issues/57 [1]: https://github.com/libbpf/libbpf/issues/425 v2->v3: - Use PT_REGS_SYSCALL_REGS - Move selftest to progs/bpf_syscall_macro.c v1->v2: - Use PT_REGS_PARM2_CORE_SYSCALL instead ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>