summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-08-3114-20/+117
|\ | | | | | | | | | | | | | | | | | | | | | | include/linux/netdevice.h net/socket.c d0efb16294d1 ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls") 876f0bf9d0d5 ("net: socket: simplify dev_ifconf handling") 29c4964822aa ("net: socket: rework compat_ifreq_ioctl()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * atlantic: Fix driver resume flow.Sudarsana Reddy Kalluru2021-08-291-0/+3
| | | | | | | | | | | | | | | | | | | | Driver crashes when restoring from the Hibernate. In the resume flow, driver need to clean up the older nic/vec objects and re-initialize them. Fixes: 8aaa112a57c1d ("net: atlantic: refactoring pm logic") Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch '100GbE' of ↵David S. Miller2021-08-282-16/+63
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-08-27 This series contains updates to ice driver only. Jake corrects the iterator used for looping Tx timestamp and removes dead code related to pin configuration. He also adds locking around flushing of the Tx tracker and restarts the periodic clock following time changes. Brett corrects the locking around updating netdev dev_addr. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ice: Only lock to update netdev dev_addrBrett Creeley2021-08-271-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list") introduced calls to netif_addr_lock_bh() and netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is fine since the driver is updated the netdev's dev_addr, but since this is a spinlock, the driver cannot sleep when the lock is held. Unfortunately the functions to add/delete MAC filters depend on a mutex. This was causing a trace with the lock debug kernel config options enabled when changing the mac address via iproute. [ 203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip [ 203.273068] Preemption disabled at: [ 203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S W I 5.14.0-rc4 #2 [ 203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 203.273102] Call Trace: [ 203.273107] dump_stack_lvl+0x33/0x42 [ 203.273113] ? ice_set_mac_address+0x8b/0x1c0 [ice] [ 203.273124] ___might_sleep.cold.150+0xda/0xea [ 203.273131] mutex_lock+0x1c/0x40 [ 203.273136] ice_remove_mac+0xe3/0x180 [ice] [ 203.273155] ? ice_fltr_add_mac_list+0x20/0x20 [ice] [ 203.273175] ice_fltr_prepare_mac+0x43/0xa0 [ice] [ 203.273194] ice_set_mac_address+0xab/0x1c0 [ice] [ 203.273206] dev_set_mac_address+0xb8/0x120 [ 203.273210] dev_set_mac_address_user+0x2c/0x50 [ 203.273212] do_setlink+0x1dd/0x10e0 [ 203.273217] ? __nla_validate_parse+0x12d/0x1a0 [ 203.273221] __rtnl_newlink+0x530/0x910 [ 203.273224] ? __kmalloc_node_track_caller+0x17f/0x380 [ 203.273230] ? preempt_count_add+0x68/0xa0 [ 203.273236] ? _raw_spin_lock_irqsave+0x1f/0x30 [ 203.273241] ? kmem_cache_alloc_trace+0x4d/0x440 [ 203.273244] rtnl_newlink+0x43/0x60 [ 203.273245] rtnetlink_rcv_msg+0x13a/0x380 [ 203.273248] ? rtnl_calcit.isra.40+0x130/0x130 [ 203.273250] netlink_rcv_skb+0x4e/0x100 [ 203.273256] netlink_unicast+0x1a2/0x280 [ 203.273258] netlink_sendmsg+0x242/0x490 [ 203.273260] sock_sendmsg+0x58/0x60 [ 203.273263] ____sys_sendmsg+0x1ef/0x260 [ 203.273265] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273268] ? ____sys_recvmsg+0xe6/0x170 [ 203.273270] ___sys_sendmsg+0x7c/0xc0 [ 203.273272] ? copy_msghdr_from_user+0x5c/0x90 [ 203.273274] ? ___sys_recvmsg+0x89/0xc0 [ 203.273276] ? __netlink_sendskb+0x50/0x50 [ 203.273278] ? mod_objcg_state+0xee/0x310 [ 203.273282] ? __dentry_kill+0x114/0x170 [ 203.273286] ? get_max_files+0x10/0x10 [ 203.273288] __sys_sendmsg+0x57/0xa0 [ 203.273290] do_syscall_64+0x37/0x80 [ 203.273295] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 203.273296] RIP: 0033:0x7f8edf96e278 [ 203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 [ 203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278 [ 203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003 [ 203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0 [ 203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005 Fix this by only locking when changing the netdev->dev_addr. Also, make sure to restore the old netdev->dev_addr on any failures. Fixes: 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * ice: restart periodic outputs around time changesJacob Keller2021-08-271-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we enabled auxiliary input/output support for the E810 device, we forgot to add logic to restart the output when we change time. This is important as the periodic output will be incorrect after a time change otherwise. This unfortunately includes the adjust time function, even though it uses an atomic hardware interface. The atomic adjustment can still cause the pin output to stall permanently, so we need to stop and restart it. Introduce wrapper functions to temporarily disable and then re-enable the clock outputs. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha D Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * ice: add lock around Tx timestamp tracker flushJacob Keller2021-08-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver didn't take the lock while flushing the Tx tracker, which could cause a race where one thread is trying to read timestamps out while another thread is trying to read the tracker to check the timestamps. Avoid this by ensuring that flushing is locked against read accesses. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * ice: remove dead code for allocating pin_configJacob Keller2021-08-271-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have code in the ice driver which allocates the pin_config structure if n_pins is > 0, but we never set n_pins to be greater than zero. There's no reason to keep this code until we actually have pin_config support. Remove this. We can re-add it properly when we implement support for pin_config for E810-T devices. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * ice: fix Tx queue iteration for Tx timestamp enablementJacob Keller2021-08-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver accidentally copied the ice_for_each_rxq iterator when implementing enablement of the ptp_tx bit for the Tx rings. We still load the Tx rings and set the ptp_tx field, but we iterate over the count of the num_rxq. If the number of Tx and Rx queues differ, this could either cause a buffer overrun when accessing the tx_rings list if num_txq is greater than num_rxq, or it could cause us to fail to enable Tx timestamps for some rings. This was not noticed originally as we generally have the same number of Tx and Rx queues. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | net: phy: marvell10g: fix broken PHY interrupts for anyone after us in the ↵Vladimir Oltean2021-08-271-0/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | driver probe list Enabling interrupts via device tree for the internal PHYs on the mv88e6390 DSA switch does not work. The driver insists to use poll mode. Stage one debugging shows that the fwnode_mdiobus_phy_device_register function calls fwnode_irq_get properly, and phy->irq is set to a valid interrupt line initially. But it is then cleared. Stage two debugging shows that it is cleared here: phy_probe: /* Disable the interrupt if the PHY doesn't support it * but the interrupt is still a valid one */ if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev)) phydev->irq = PHY_POLL; Okay, so does the "Marvell 88E6390 Family" PHY driver not have the .config_intr and .handle_interrupt function pointers? Yes it does. Stage three debugging shows that the PHY device does not attempt a probe against the "Marvell 88E6390 Family" driver, but against the "mv88x3310" driver. Okay, so why does the "mv88x3310" driver match on a mv88x6390 internal PHY? The PHY IDs (MARVELL_PHY_ID_88E6390_FAMILY vs MARVELL_PHY_ID_88X3310) are way different. Stage four debugging has us looking through: phy_device_register -> device_add -> bus_probe_device -> device_initial_probe -> __device_attach -> bus_for_each_drv -> driver_match_device -> drv->bus->match -> phy_bus_match Okay, so as we said, the MII_PHYSID1 of mv88e6390 does not match the mv88x3310 driver's PHY mask & ID, so why would phy_bus_match return... Ahh, phy_bus_match calls a shortcircuit method, phydrv->match_phy_device, and does not even bother to compare the PHY ID if that is implemented. So of course, we go inside the marvell10g.c driver and sure enough, it implements .match_phy_device and does not bother to check the PHY ID. What's interesting though is that at the end of the device_add() from phy_device_register(), the driver for the internal PHYs _is_ the proper "Marvell 88E6390 Family". This is because "mv88x3310" ends up failing to probe after all, and __device_attach_driver(), to quote: /* * Ignore errors returned by ->probe so that the next driver can try * its luck. */ The next (and only other) driver that matches is the 6390 driver. For this one, phy_probe doesn't fail, and everything expects to work as normal, EXCEPT phydev->irq has already been cleared by the previous unsuccessful probe of a driver which did not implement PHY interrupts, and therefore cleared that IRQ. Okay, so it is not just Marvell 6390 that has PHY interrupts broken. Stuff like Atheros, Aquantia, Broadcom, Qualcomm work because they are lexicographically before Marvell, and stuff like NXP, Realtek, Vitesse are broken. This goes to show how fragile it is to reset phydev->irq = PHY_POLL from the actual beginning of phy_probe itself. That seems like an actual bug of its own too, since phy_probe has side effects which are not restored on probe failure, but the line of thought probably was, the same driver will attempt probe again, so it doesn't matter. Well, looks like it does. Maybe it would make more sense to move the phydev->irq clearing after the actual device_add() in phy_device_register() completes, and the bound driver is the actual final one. (also, a bit frightening that drivers are permitted to bypass the MDIO bus matching in such a trivial way and perform PHY reads and writes from the .match_phy_device method, on devices that do not even belong to them. In the general case it might not be guaranteed that the MDIO accesses one driver needs to make to figure out whether to match on a device is safe for all other PHY devices) Fixes: a5de4be0aaaa ("net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Marek Behún <kabel@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210827132541.28953-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge tag 'mlx5-fixes-2021-08-26' of ↵David S. Miller2021-08-278-4/+34
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-08-26 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net/mlx5: DR, fix a potential use-after-free bugWentao_Liang2021-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In line 849 (#1), "mlx5dr_htbl_put(cur_htbl);" drops the reference to cur_htbl and may cause cur_htbl to be freed. However, cur_htbl is subsequently used in the next line, which may result in an use-after-free bug. Fix this by calling mlx5dr_err() before the cur_htbl is put. Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * net/mlx5e: Use correct eswitch for stack devices with lagDmytro Linkin2021-08-261-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If link aggregation is used within stack devices driver rejects encap rules if PF of the VF tunnel device is down. This happens because route resolved for other PF and its eswitch instance is used to determine correct vport. To fix that use devcom feature to retrieve other eswitch instance if failed to find vport for the 1st eswitch and LAG is active. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * net/mlx5: E-Switch, Set vhca id valid flag when creating indir fwd groupMaor Dickman2021-08-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When indirect forward group is created, flow is added with vhca id but without setting vhca id valid flag which violates the PRM. Fix by setting the missing flag, vhca id valid. Fixes: 34ca65352ddf ("net/mlx5: E-Switch, Indirect table infrastructure") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * net/mlx5e: Fix possible use-after-free deleting fdb ruleRoi Dayan2021-08-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After neigh-update-add failure we are still with a slow path rule but the driver always assume the rule is an fdb rule. Fix neigh-update-del by checking slow path tc flag on the flow. Also fix neigh-update-add for when neigh-update-del fails the same. Fixes: 5dbe906ff1d5 ("net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * net/mlx5: Remove all auxiliary devices at the unregister eventLeon Romanovsky2021-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call to mlx5_unregister_device() means that mlx5_core driver is removed. In such scenario, we need to disregard all other flags like attach/detach and forcibly remove all auxiliary devices. Fixes: a5ae8fc9058e ("net/mlx5e: Don't create devices during unload flow") Tested-and-Reported-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * net/mlx5: Lag, fix multipath lag activationDima Chumak2021-08-263-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling FIB_EVENT_ENTRY_REPLACE event for a new multipath route, lag activation can be missed if a stale (struct lag_mp)->mfi pointer exists, which was associated with an older multipath route that had been removed. Normally, when a route is removed, it triggers mlx5_lag_fib_event(), which handles FIB_EVENT_ENTRY_DEL and clears mfi pointer. But, if mlx5_lag_check_prereq() condition isn't met, for example when eswitch is in legacy mode, the fib event is skipped and mfi pointer becomes stale. Fix by resetting mfi pointer to NULL in mlx5_deactivate_lag(). Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net: don't unconditionally copy_from_user a struct ifreq for socket ioctlsPeter Collingbourne2021-08-272-1/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common implementation of isatty(3) involves calling a ioctl passing a dummy struct argument and checking whether the syscall failed -- bionic and glibc use TCGETS (passing a struct termios), and musl uses TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will copy sizeof(struct ifreq) bytes of data from the argument and return -EFAULT if that fails. The result is that the isatty implementations may return a non-POSIX-compliant value in errno in the case where part of the dummy struct argument is inaccessible, as both struct termios and struct winsize are smaller than struct ifreq (at least on arm64). Although there is usually enough stack space following the argument on the stack that this did not present a practical problem up to now, with MTE stack instrumentation it's more likely for the copy to fail, as the memory following the struct may have a different tag. Fix the problem by adding an early check for whether the ioctl is a valid socket ioctl, and return -ENOTTY if it isn't. Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers") Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d Signed-off-by: Peter Collingbourne <pcc@google.com> Cc: <stable@vger.kernel.org> # 4.19 Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add depends on OF_NET for LiteX's LiteETHSlark Xiao2021-08-311-0/+1
| | | | | | | | | | | | | | | | | | | | Current settings may produce a build error when CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls a headfile <linux/of.h> and some functions in <linux/of_net.h>. Signed-off-by: Slark Xiao <slark_xiao@163.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | ipv6: seg6: remove duplicated includeLv Ruyi2021-08-311-1/+0
| | | | | | | | | | | | | | | | Remove all but the first include of net/lwtunnel.h from 'seg6_local.c. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns3: remove unnecessary spacesHao Chen2021-08-312-2/+2
| | | | | | | | | | | | | | | | This patch removes some unnecessary spaces for cleanup. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns3: add some required spacesHao Chen2021-08-315-25/+25
| | | | | | | | | | | | | | | | Add some required spaces to improve readability. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns3: clean up a type mismatch warningGuojia Liao2021-08-311-1/+8
| | | | | | | | | | | | | | | | | | | | | | abs() returns signed long, which could not convert the type as unsigned, and it may cause a mismatch type warning from static tools. To fix it, this patch uses an variable to save the abs()'s result and does a explicit conversion. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: hns3: refine function hns3_set_default_feature()Jian Shen2021-08-311-46/+16
| | | | | | | | | | | | | | | | | | | | | | | | Currently, the driver sets default feature for netdev->features, netdev->hw_features, netdev->vlan_features and netdev->hw_enc_features separately. It's fussy, because most of the feature bits are same. So refine it by copy value from netdev->features. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: remove duplicated 'net/lwtunnel.h' includeLv Ruyi2021-08-311-1/+0
| | | | | | | | | | | | | | | | Remove all but the first include of net/lwtunnel.h from seg6_iptunnel.c. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: w5100: check return value after calling platform_get_resource()Yang Yingliang2021-08-311-0/+2
| | | | | | | | | | | | | | | | It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()Cai Huoqing2021-08-314-36/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately Use the devm_platform_ioremap_resource() helper instead of calling platform_get_resource() and devm_ioremap_resource() separately Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mdio: mscc-miim: Make use of the helper function ↵Cai Huoqing2021-08-311-8/+4
| | | | | | | | | | | | | | | | | | | | | | devm_platform_ioremap_resource() Use the devm_platform_ioremap_resource() helper instead of calling platform_get_resource() and devm_ioremap_resource() separately Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()Cai Huoqing2021-08-311-4/+1
| | | | | | | | | | | | | | | | | | Use the devm_platform_ioremap_resource() helper instead of calling platform_get_resource() and devm_ioremap_resource() separately Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | fou: remove sparse errorsEric Dumazet2021-08-312-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to add __rcu qualifier to avoid these errors: net/ipv4/fou.c:250:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:250:18: expected struct net_offload const **offloads net/ipv4/fou.c:250:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:251:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:251:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:251:15: struct net_offload const * net/ipv4/fou.c:272:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:272:18: expected struct net_offload const **offloads net/ipv4/fou.c:272:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:273:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:273:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:273:15: struct net_offload const * net/ipv4/fou.c:442:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:442:18: expected struct net_offload const **offloads net/ipv4/fou.c:442:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:443:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:443:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:443:15: struct net_offload const * net/ipv4/fou.c:489:18: warning: incorrect type in assignment (different address spaces) net/ipv4/fou.c:489:18: expected struct net_offload const **offloads net/ipv4/fou.c:489:18: got struct net_offload const [noderef] __rcu ** net/ipv4/fou.c:490:15: error: incompatible types in comparison expression (different address spaces): net/ipv4/fou.c:490:15: struct net_offload const [noderef] __rcu * net/ipv4/fou.c:490:15: struct net_offload const * net/ipv4/udp_offload.c:170:26: warning: incorrect type in assignment (different address spaces) net/ipv4/udp_offload.c:170:26: expected struct net_offload const **offloads net/ipv4/udp_offload.c:170:26: got struct net_offload const [noderef] __rcu ** net/ipv4/udp_offload.c:171:23: error: incompatible types in comparison expression (different address spaces): net/ipv4/udp_offload.c:171:23: struct net_offload const [noderef] __rcu * net/ipv4/udp_offload.c:171:23: struct net_offload const * Fixes: efc98d08e1ec ("fou: eliminate IPv4,v6 specific GRO functions") Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: fix endianness issue in inet_rtm_getroute_build_skb()Eric Dumazet2021-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UDP length field should be in network order. This removes the following sparse error: net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types) net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len net/ipv4/route.c:3173:27: got unsigned long Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'octeon-npc-fixes'David S. Miller2021-08-311-10/+12
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subbaraya Sundeep says: ==================== octeontx2-af: Miscellaneous fixes in npc This patchset consists of consolidated fixes in rvu_npc file. Two of the patches prevent infinite loop which can happen in corner cases. One patch is for fixing static code analyzer reported issues. And the last patch is a minor change in npc protocol checker hardware block to report ipv4 checksum errors as a known error codes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: Set proper errorcode for IPv4 checksum errorsSunil Goutham2021-08-311-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With current config, for packets with IPv4 checksum errors, errorcode is being set to UNKNOWN. Hence added a separate errorcodes for outer and inner IPv4 checksum and changed NPC configuration accordingly. Also turn on L2 multicast address check in NPC protocol check block. Fixes: 6b3321bacc5a ("octeontx2-af: Enable packet length and csum validation") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: Fix static code analyzer reported issuesSubbaraya Sundeep2021-08-311-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the static code analyzer reported issues in rvu_npc.c. The reported errors are different sizes of operands in bitops and returning uninitialized values. Fixes: 651cd2652339 ("octeontx2-af: MCAM entry installation support") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfgSubbaraya Sundeep2021-08-311-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In npc_update_vf_flow_entry function the loop cursor 'index' is being changed inside the loop causing the loop to spin forever. This in turn hogs the kworker thread forever and no other mbox message is processed by AF driver after that. Fix this by using another variable in the loop. Fixes: 55307fcb9258 ("octeontx2-af: Add mbox messages to install and delete MCAM rules") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: Fix loop in free and unmap counterSubbaraya Sundeep2021-08-311-1/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | When the given counter does not belong to the entry then code ends up in infinite loop because the loop cursor, entry is not getting updated further. This patch fixes that by updating entry for every iteration. Fixes: a958dd59f9ce ("octeontx2-af: Map or unmap NPC MCAM entry and counter") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | af_unix: fix potential NULL deref in unix_dgram_connect()Eric Dumazet2021-08-311-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot was able to trigger NULL deref in unix_dgram_connect() [1] This happens in if (unix_peer(sk)) sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL Because locks have been dropped, unix_peer() might be non NULL, while @other is NULL (AF_UNSPEC case) We need to move code around, so that we no longer access unix_peer() and sk_state while locks have been released. [1] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sys_connect_file+0x155/0x1a0 net/socket.c:1890 __sys_connect+0x161/0x190 net/socket.c:1907 __do_sys_connect net/socket.c:1917 [inline] __se_sys_connect net/socket.c:1914 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1914 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x446a89 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89 RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000 R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000 Modules linked in: ---[ end trace 4eb809357514968c ]--- RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Alexei Starovoitov <ast@kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | dpaa2-eth: Replace strlcpy with strscpyJason Wang2021-08-311-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The strlcpy should not be used because it doesn't limit the source length. As linus says, it's a completely useless function if you can't implicitly trust the source string - but that is almost always why people think they should use it! All in all the BSD function will lead some potential bugs. But the strscpy doesn't require reading memory from the src string beyond the specified "count" bytes, and since the return value is easier to error-check than strlcpy()'s. In addition, the implementation is robust to the string changing out from underneath it, unlike the current strlcpy() implementation. Thus, We prefer using strscpy instead of strlcpy. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | octeontx2-af: Use NDC TX for transmit packet dataGeetha sowjanya2021-08-312-0/+4
| | | | | | | | | | | | | | | | | | For better performance set hardware to use NDC TX for reading packet data specified NIX_SEND_SG_S. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bridge: use mld2r_ngrec instead of icmpv6_dataunMichelleJin2021-08-311-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | br_ip6_multicast_mld2_report function uses icmp6h to parse mld2_report packet. mld2r_ngrec defines mld2r_hdr.icmp6_dataun.un_data16[1] in include/net/mld.h. So, it is more compact to use mld2r rather than icmp6h. By doing printk test, it is confirmed that icmp6h->icmp6_dataun.un_data16[1] and mld2r->mld2r_ngrec are indeed equivalent. Also, sizeof(*mld2r) and sizeof(*icmp6h) are equivalent, too. Signed-off-by: MichelleJin <shjy180909@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: qualcomm: fix QCA7000 checksum handlingStefan Wahren2021-08-312-2/+2
| | | | | | | | | | | | | | | | | | | | | | Based on tests the QCA7000 doesn't support checksum offloading. So assume ip_summed is CHECKSUM_NONE and let the kernel take care of the checksum handling. This fixes data transfer issues in noisy environments. Reported-by: Michael Heimpold <michael.heimpold@in-tech.com> Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: pasemi: Remove usage of the deprecated "pci-dma-compat.h" APIChristophe JAILLET2021-08-301-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In [1], Christoph Hellwig has proposed to remove the wrappers in include/linux/pci-dma-compat.h. Some reasons why this API should be removed have been given by Julia Lawall in [2]. A coccinelle script has been used to perform the needed transformation Only relevant parts are given below. An 'unlikely()' has been removed when calling 'dma_mapping_error()' because this function, which is inlined, already has such an annotation. @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) [1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/ [2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/bc6cd281eae024b26fd9c7ef6678d2d1dc9d74fd.1630150008.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failedXiyu Yang2021-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reference counting issue happens in one exception handling path of cbq_change_class(). When failing to get tcf_block, the function forgets to decrease the refcount of "rtab" increased by qdisc_put_rtab(), causing a refcount leak. Fix this issue by jumping to "failure" label when get tcf_block failed. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2021-08-30126-4027/+6813
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== bpf-next 2021-08-31 We've added 116 non-merge commits during the last 17 day(s) which contain a total of 126 files changed, 6813 insertions(+), 4027 deletions(-). The main changes are: 1) Add opaque bpf_cookie to perf link which the program can read out again, to be used in libbpf-based USDT library, from Andrii Nakryiko. 2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu. 3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang. 4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch to another congestion control algorithm during init, from Martin KaFai Lau. 5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima. 6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta. 7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT} progs, from Xu Liu and Stanislav Fomichev. 8) Support for __weak typed ksyms in libbpf, from Hao Luo. 9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky. 10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov. 11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann. 12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian, Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others. 13) Another big batch to revamp XDP samples in order to give them consistent look and feel, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits) MAINTAINERS: Remove self from powerpc BPF JIT selftests/bpf: Fix potential unreleased lock samples: bpf: Fix uninitialized variable in xdp_redirect_cpu selftests/bpf: Reduce more flakyness in sockmap_listen bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS bpf: selftests: Add dctcp fallback test bpf: selftests: Add connect_to_fd_opts to network_helpers bpf: selftests: Add sk_state to bpf_tcp_helpers.h bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt selftests: xsk: Preface options with opt selftests: xsk: Make enums lower case selftests: xsk: Generate packets from specification selftests: xsk: Generate packet directly in umem selftests: xsk: Simplify cleanup of ifobjects selftests: xsk: Decrease sending speed selftests: xsk: Validate tx stats on tx thread selftests: xsk: Simplify packet validation in xsk tests selftests: xsk: Rename worker_* functions that are not thread entry points selftests: xsk: Disassociate umem size with packets sent selftests: xsk: Remove end-of-test packet ... ==================== Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | MAINTAINERS: Remove self from powerpc BPF JITSandipan Das2021-08-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Stepping down as I haven't had a chance to look into the powerpc BPF JIT compilers for a while. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210827111905.396145-1-sandipan@linux.ibm.com
| * | selftests/bpf: Fix potential unreleased lockChengfeng Ye2021-08-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This lock is not released if the program return at the patched branch. Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210827074140.118671-1-cyeaa@connect.ust.hk
| * | samples: bpf: Fix uninitialized variable in xdp_redirect_cpuKumar Kartikeya Dwivedi2021-08-261-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While at it, also improve help output when CPU number is greater than possible. Fixes: e531a220cc59 ("samples: bpf: Convert xdp_redirect_cpu to XDP samples helper") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210826120910.454081-1-memxor@gmail.com
| * | selftests/bpf: Reduce more flakyness in sockmap_listenYucong Sun2021-08-261-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds similar retry logic to more places where read() is used, to reduce flakyness in slow CI environment. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210825184745.2680830-1-fallentree@fb.com
| * | bpf: Fix bpf-next builds without CONFIG_BPF_EVENTSDaniel Xu2021-08-253-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes linker errors along the lines of: s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'` Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c since there exists code that unconditionally uses btf_task_struct_ids. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz
| * | Merge branch 'bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt'Alexei Starovoitov2021-08-2511-32/+230
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin KaFai says: ==================== This set allows the bpf-tcp-cc to call bpf_setsockopt. One use case is to allow a bpf-tcp-cc switching to another cc during init(). For example, when the tcp flow is not ecn ready, the bpf_dctcp can switch to another cc by calling setsockopt(TCP_CONGESTION). bpf_getsockopt() is also added to have a symmetrical API, so less usage surprise. v2: - Not allow switching to kernel's tcp_cdg because it is the only kernel tcp-cc that stores a pointer to icsk_ca_priv. Please see the commit log in patch 1 for details. Test is added in patch 4 to check switching to tcp_cdg. - Refactor the logic finding the offset of a func ptr in the "struct tcp_congestion_ops" to prog_ops_moff() in patch 1. - bpf_setsockopt() has been disabled in release() since v1 (please see commit log in patch 1 for reason). bpf_getsockopt() is also disabled together in release() in v2 to avoid usage surprise because both of them are usually expected to be available together. bpf-tcp-cc can already use PTR_TO_BTF_ID to read from tcp_sock. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | bpf: selftests: Add dctcp fallback testMartin KaFai Lau2021-08-253-23/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the bpf_dctcp test to fallback to cubic by using setsockopt(TCP_CONGESTION) when the tcp flow is not ecn ready. It also checks setsockopt() is not available to release(). The settimeo() from the network_helpers.h is used, so the local one is removed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210824173026.3979130-1-kafai@fb.com