summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ip6_tunnel: Fix the type of functionsHongbin Wang2022-08-131-11/+8
| | | | | | | | Functions ip6_tnl_change, ip6_tnl_update and ip6_tnl0_update do always return 0, change the type of functions to void. Signed-off-by: Hongbin Wang <wh_bin@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: mv88e6060: prevent crash on an unused portSergei Antonov2022-08-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the port isn't a CPU port nor a user port, 'cpu_dp' is a null pointer and a crash happened on dereferencing it in mv88e6060_setup_port(): [ 9.575872] Unable to handle kernel NULL pointer dereference at virtual address 00000014 ... [ 9.942216] mv88e6060_setup from dsa_register_switch+0x814/0xe84 [ 9.948616] dsa_register_switch from mdio_probe+0x2c/0x54 [ 9.954433] mdio_probe from really_probe.part.0+0x98/0x2a0 [ 9.960375] really_probe.part.0 from driver_probe_device+0x30/0x10c [ 9.967029] driver_probe_device from __device_attach_driver+0xb8/0x13c [ 9.973946] __device_attach_driver from bus_for_each_drv+0x90/0xe0 [ 9.980509] bus_for_each_drv from __device_attach+0x110/0x184 [ 9.986632] __device_attach from bus_probe_device+0x8c/0x94 [ 9.992577] bus_probe_device from deferred_probe_work_func+0x78/0xa8 [ 9.999311] deferred_probe_work_func from process_one_work+0x290/0x73c [ 10.006292] process_one_work from worker_thread+0x30/0x4b8 [ 10.012155] worker_thread from kthread+0xd4/0x10c [ 10.017238] kthread from ret_from_fork+0x14/0x3c Fixes: 0abfd494deef ("net: dsa: use dedicated CPU port") CC: Vivien Didelot <vivien.didelot@savoirfairelinux.com> CC: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220811070939.1717146-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* fec: Fix timer capture timing in `fec_ptp_enable_pps()`Csókás Bence2022-08-121-5/+1
| | | | | | | | | | Code reimplements functionality already in `fec_ptp_read()`, but misses check for FEC_QUIRK_BUG_CAPTURE. Replace with function call. Fixes: 28b5f058cf1d ("net: fec: ptp: fix convergence issue to support LinuxPTP stack") Signed-off-by: Csókás Bence <csokas.bence@prolan.hu> Link: https://lore.kernel.org/r/20220811101348.13755-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: lan966x: fix checking for return value of platform_get_irq_byname()Li Qiong2022-08-121-4/+4
| | | | | | | | | The platform_get_irq_byname() returns non-zero IRQ number or negative error number. "if (irq)" always true, chang it to "if (irq > 0)" Signed-off-by: Li Qiong <liqiong@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cxgb3: Fix comment typoJason Wang2022-08-121-1/+1
| | | | | | | The double `the' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnx2x: Fix comment typoJason Wang2022-08-121-1/+1
| | | | | | | The double `the' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipa: Fix comment typoJason Wang2022-08-121-1/+1
| | | | | | | | The double `is' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* virtio_net: fix endian-ness for RSSMichael S. Tsirkin2022-08-121-2/+2
| | | | | | | | | | Using native endian-ness for device supplied fields is wrong on BE platforms. Sparse warns about this. Fixes: 91f41f01d219 ("drivers/net/virtio_net: Added RSS hash report.") Cc: "Andrew Melnychenko" <andrew@daynix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sunrpc: fix potential memory leaks in rpc_sysfs_xprt_state_change()Xin Xiong2022-08-121-2/+4
| | | | | | | | | | | | | | | | | The issue happens on some error handling paths. When the function fails to grab the object `xprt`, it simply returns 0, forgetting to decrease the reference count of another object `xps`, which is increased by rpc_sysfs_xprt_kobj_get_xprt_switch(), causing refcount leaks. Also, the function forgets to check whether `xps` is valid before using it, which may result in NULL-dereferencing issues. Fix it by adding proper error handling code when either `xprt` or `xps` is NULL. Fixes: 5b7eb78486cd ("SUNRPC: take a xprt offline using sysfs") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* skfp/h: fix repeated words in commentsJilin Yuan2022-08-121-1/+1
| | | | | | | Delete the redundant word 'the'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rds: add missing barrier to release_refillMikulas Patocka2022-08-121-0/+1
| | | | | | | | | | | | The functions clear_bit and set_bit do not imply a memory barrier, thus it may be possible that the waitqueue_active function (which does not take any locks) is moved before clear_bit and it could miss a wakeup event. Fix this bug by adding a memory barrier after clear_bit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'net-6.0-rc1' of ↵Linus Torvalds2022-08-11116-646/+1865
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...
| * net: atm: bring back zatm uAPIJakub Kicinski2022-08-111-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | Jiri reports that linux-atm does not build without this header. Bring it back. It's completely dead code but we can't break the build for user space :( Reported-by: Jiri Slaby <jirislaby@kernel.org> Fixes: 052e1f01bfae ("net: atm: remove support for ZeitNet ZN122x ATM devices") Link: https://lore.kernel.org/all/8576aef3-37e4-8bae-bab5-08f82a78efd3@kernel.org/ Link: https://lore.kernel.org/r/20220810164547.484378-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * dpaa2-eth: trace the allocated address instead of page structChen Lin2022-08-111-2/+2
| | | | | | | | | | | | | | | | | | | | We should trace the allocated address instead of page struct. Fixes: 27c874867c4e ("dpaa2-eth: Use a single page per Rx buffer") Signed-off-by: Chen Lin <chen.lin5@zte.com.cn> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20220811151651.3327-1-chen45464546@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: add missing kdoc for struct genl_multicast_group::flagsJakub Kicinski2022-08-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | Multicast group flags were added in commit 4d54cc32112d ("mptcp: avoid lock_fast usage in accept path"), but it missed adding the kdoc. Mention which flags go into that field, and do the same for op structs. Link: https://lore.kernel.org/r/20220809232012.403730-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * nfp: fix use-after-free in area_cache_get()Jialiang Wang2022-08-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | area_cache_get() is used to distribute cache->area and set cache->id, and if cache->id is not 0 and cache->area->kref refcount is 0, it will release the cache->area by nfp_cpp_area_release(). area_cache_get() set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire(). But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is is already set but the refcount is not increased as expected. At this time, calling the nfp_cpp_area_release() will cause use-after-free. To avoid the use-after-free, set cache->id after area_init() and nfp_cpp_area_acquire() complete successfully. Note: This vulnerability is triggerable by providing emulated device equipped with specified configuration. BUG: KASAN: use-after-free in nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760) Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1 Call Trace: <TASK> nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760) area_cache_get.constprop.8 (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:884) Allocated by task 1: nfp_cpp_area_alloc_with_name (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:303) nfp_cpp_area_cache_add (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:802) nfp6000_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:1230) nfp_cpp_from_operations (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:1215) nfp_pci_probe (drivers/net/ethernet/netronome/nfp/nfp_main.c:744) Freed by task 1: kfree (mm/slub.c:4562) area_cache_get.constprop.8 (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:873) nfp_cpp_read (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:924 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:973) nfp_cpp_readl (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c:48) Signed-off-by: Jialiang Wang <wangjialiang0806@163.com> Reviewed-by: Yinjun Zhang <yinjun.zhang@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220810073057.4032-1-wangjialiang0806@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * MAINTAINERS: use my korg address for mt7601uJakub Kicinski2022-08-111-1/+1
| | | | | | | | | | | | | | Change my address for mt7601u to the main one. Link: https://lore.kernel.org/r/20220809233843.408004-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * mlxsw: minimal: Fix deadlock in ports creationVadim Pasternak2022-08-111-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Drop devl_lock() / devl_unlock() from ports creation and removal flows since the devlink instance lock is now taken by mlxsw_core. Fixes: 72a4c8c94efa ("mlxsw: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/f4afce5ab0318617f3866b85274be52542d59b32.1660211614.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * bonding: fix reference count leak in balance-alb modeJay Vosburgh2022-08-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") introduced a reference count leak by not releasing the reference acquired by ip_dev_find(). Remedy this by insuring the reference is released. Fixes: d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/26758.1660194413@famine Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: usb: qmi_wwan: Add support for Cinterion MV32Slark Xiao2022-08-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2 models for MV32 serials. MV32-W-A is designed based on Qualcomm SDX62 chip, and MV32-W-B is designed based on Qualcomm SDX65 chip. So we use 2 different PID to separate it. Test evidence as below: T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1e2d ProdID=00f3 Rev=05.04 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00F3 USB Mobile Broadband S: SerialNumber=d7b4be8d C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1e2d ProdID=00f4 Rev=05.04 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00F4 USB Mobile Broadband S: SerialNumber=d095087d C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option Signed-off-by: Slark Xiao <slark_xiao@163.com> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20220810014521.9383-1-slark_xiao@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2022-08-111-0/+8
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull in bpf to silence a false positive warning. * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Shut up kern_sys_bpf warning. Link: https://lore.kernel.org/r/CAADnVQK589CZN1Q9w8huJqkEyEed+ZMTWqcpA1Rm2CjN3a4XoQ@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * bpf: Shut up kern_sys_bpf warning.Alexei Starovoitov2022-08-101-0/+8
| |/ | | | | | | | | | | | | | | | | Shut up this warning: kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes] int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * net/tls: Use RCU API to access tls_ctx->netdevMaxim Mikityanskiy2022-08-106-15/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * tls: rx: device: don't try to copy too much on detachJakub Kicinski2022-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Another device offload bug, we use the length of the output skb as an indication of how much data to copy. But that skb is sized to offset + record length, and we start from offset. So we end up double-counting the offset which leads to skb_copy_bits() returning -EFAULT. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * tls: rx: device: bound the frag walkJakub Kicinski2022-08-101-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't do skb_walk_frags() on the input skbs, because the input skbs is really just a pointer to the tcp read queue. We need to bound the "is decrypted" check by the amount of data in the message. Note that the walk in tls_device_reencrypt() is after a CoW so the skb there is safe to walk. Actually in the current implementation it can't have frags at all, but whatever, maybe one day it will. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net_sched: cls_route: remove from list when handle is 0Thadeu Lima de Souza Cascardo2022-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a route filter is replaced and the old filter has a 0 handle, the old one won't be removed from the hashtable, while it will still be freed. The test was there since before commit 1109c00547fc ("net: sched: RCU cls_route"), when a new filter was not allocated when there was an old one. The old filter was reused and the reinserting would only be necessary if an old filter was replaced. That was still wrong for the same case where the old handle was 0. Remove the old filter from the list independently from its handle value. This fixes CVE-2022-2588, also reported as ZDI-CAN-17440. Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Kamal Mostafa <kamal@canonical.com> Cc: <stable@vger.kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * selftests: forwarding: Fix failing tests with old libnetIdo Schimmel2022-08-103-24/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The custom multipath hash tests use mausezahn in order to test how changes in various packet fields affect the packet distribution across the available nexthops. The tool uses the libnet library for various low-level packet construction and injection. The library started using the "SO_BINDTODEVICE" socket option for IPv6 sockets in version 1.1.6 and for IPv4 sockets in version 1.2. When the option is not set, packets are not routed according to the table associated with the VRF master device and tests fail. Fix this by prefixing the command with "ip vrf exec", which will cause the route lookup to occur in the VRF routing table. This makes the tests pass regardless of the libnet library version. Fixes: 511e8db54036 ("selftests: forwarding: Add test for custom multipath hash") Fixes: 185b0c190bb6 ("selftests: forwarding: Add test for custom multipath hash with IPv4 GRE") Fixes: b7715acba4d3 ("selftests: forwarding: Add test for custom multipath hash with IPv6 GRE") Reported-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Link: https://lore.kernel.org/r/20220809113320.751413-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2022-08-1019-35/+424
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== bpf 2022-08-10 We've added 23 non-merge commits during the last 7 day(s) which contain a total of 19 files changed, 424 insertions(+), 35 deletions(-). The main changes are: 1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao. 2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia. 3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu. 4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev. 5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney. 6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi. 7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa. 8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai. 9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address offset buffer, from Aijun Sun. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator bpf: Check the validity of max_rdwr_access for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for hash map iterator bpf: Acquire map uref in .init_seq_private for array map iterator bpf: Disallow bpf programs call prog_run command. bpf, arm64: Fix bpf trampoline instruction endianness selftests/bpf: Add test for prealloc_lru_pop bug bpf: Don't reinit map value in prealloc_lru_pop bpf: Allow calling bpf_prog_test kfuncs in tracing programs bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf bpf: Use proper target btf when exporting attach_btf_obj_id mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled bpf: Cleanup ftrace hash in bpf_trampoline_put BPF: Fix potential bad pointer dereference in bpf_sys_bpf() ... ==================== Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * Merge branch 'fixes for bpf map iterator'Alexei Starovoitov2022-08-108-7/+191
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hou Tao says: ==================== From: Hou Tao <houtao1@huawei.com> Hi, The patchset constitues three fixes for bpf map iterator: (1) patch 1~4: fix user-after-free during reading map iterator fd It is possible when both the corresponding link fd and map fd are closed bfore reading the iterator fd. I had squashed these four patches into one, but it was not friendly for stable backport, so I break these fixes into four single patches in the end. Patch 7 is its testing patch. (2) patch 5: fix invalidity check for values in sk local storage map Patch 8 adds two tests for it. (3) patch 6: reject sleepable program for non-resched map iterator Patch 9 add a test for it. Please check the individual patches for more details. And comments are always welcome. Regards, Tao Changes since v2: * patch 1~6: update commit messages (from Yonghong & Martin) * patch 7: add more detailed comments (from Yonghong) * patch 8: use NULL directly instead of (void *)0 v1: https://lore.kernel.org/bpf/20220806074019.2756957-1-houtao@huaweicloud.com ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * selftests/bpf: Ensure sleepable program is rejected by hash map iterHou Tao2022-08-102-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a test to ensure sleepable program is rejected by hash map iterator. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-10-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * selftests/bpf: Add write tests for sk local storage map iteratorHou Tao2022-08-102-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add test to validate the overwrite of sock local storage map value in map iterator and another one to ensure out-of-bound value writing is rejected. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-9-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * selftests/bpf: Add tests for reading a dangling map iter fdHou Tao2022-08-101-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After closing both related link fd and map fd, reading the map iterator fd to ensure it is OK to do so. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-8-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Only allow sleepable program for resched-able iteratorHou Tao2022-08-101-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a sleepable program is attached to a hash map iterator, might_fault() will report "BUG: sleeping function called from invalid context..." if CONFIG_DEBUG_ATOMIC_SLEEP is enabled. The reason is that rcu_read_lock() is held in bpf_hash_map_seq_next() and won't be released until all elements are traversed or bpf_hash_map_seq_stop() is called. Fixing it by reusing BPF_ITER_RESCHED to indicate that only non-sleepable program is allowed for iterator without BPF_ITER_RESCHED. We can revise bpf_iter_link_attach() later if there are other conditions which may cause rcu_read_lock() or spin_lock() issues. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-7-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Check the validity of max_rdwr_access for sock local storage map iteratorHou Tao2022-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of sock local storage map is writable in map iterator, so check max_rdwr_access instead of max_rdonly_access. Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Acquire map uref in .init_seq_private for sock{map,hash} iteratorHou Tao2022-08-101-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sock_map_iter_attach_target() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in sock_map_iter_detach_target() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Fixing it by acquiring an extra map uref in .init_seq_private and releasing it in .fini_seq_private. Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Acquire map uref in .init_seq_private for sock local storage map iteratorHou Tao2022-08-101-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and releasing it in bpf_iter_fini_sk_storage_map(). Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Acquire map uref in .init_seq_private for hash map iteratorHou Tao2022-08-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_hash_map() and releasing it in bpf_iter_fini_hash_map(). Fixes: d6c4503cc296 ("bpf: Implement bpf iterator for hash maps") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Acquire map uref in .init_seq_private for array map iteratorHou Tao2022-08-101-0/+6
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Alternative fix is acquiring an extra bpf_link reference just like a pinned map iterator does, but it introduces unnecessary dependency on bpf_link instead of bpf_map. So choose another fix: acquiring an extra map uref in .init_seq_private for array map iterator. Fixes: d3cc2ab546ad ("bpf: Implement bpf iterator for array maps") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * bpf: Disallow bpf programs call prog_run command.Alexei Starovoitov2022-08-102-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN command from within the program. To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf() kernel function that can only be used by the kernel light skeleton directly. Reported-by: YiFei Zhu <zhuyifei@google.com> Fixes: b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * bpf, arm64: Fix bpf trampoline instruction endiannessXu Kuohai2022-08-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sparse tool complains as follows: arch/arm64/net/bpf_jit_comp.c:1684:16: warning: incorrect type in assignment (different base types) arch/arm64/net/bpf_jit_comp.c:1684:16: expected unsigned int [usertype] *branch arch/arm64/net/bpf_jit_comp.c:1684:16: got restricted __le32 [usertype] * arch/arm64/net/bpf_jit_comp.c:1700:52: error: subtraction of different types can't work (different base types) arch/arm64/net/bpf_jit_comp.c:1734:29: warning: incorrect type in assignment (different base types) arch/arm64/net/bpf_jit_comp.c:1734:29: expected unsigned int [usertype] * arch/arm64/net/bpf_jit_comp.c:1734:29: got restricted __le32 [usertype] * arch/arm64/net/bpf_jit_comp.c:1918:52: error: subtraction of different types can't work (different base types) This is because the variable branch in function invoke_bpf_prog and the variable branches in function prepare_trampoline are defined as type u32 *, which conflicts with ctx->image's type __le32 *, so sparse complains when assignment or arithmetic operation are performed on these two variables and ctx->image. Since arm64 instructions are always little-endian, change the type of these two variables to __le32 * and call cpu_to_le32() to convert instruction to little-endian before writing it to memory. This is also in line with emit() which internally does cpu_to_le32(), too. Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/bpf/20220808040735.1232002-1-xukuohai@huawei.com
| | * Merge branch 'Don't reinit map value in prealloc_lru_pop'Alexei Starovoitov2022-08-094-5/+72
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kumar Kartikeya Dwivedi says: ==================== Fix for a bug in prealloc_lru_pop spotted while reading the code, then a test + example that checks whether it is fixed. Changelog: ---------- v2 -> v3: v2: https://lore.kernel.org/bpf/20220809140615.21231-1-memxor@gmail.com * Switch test to use kptr instead of kptr_ref to stabilize test runs * Fix missing lru_bug__destroy (Yonghong) * Collect Acks v1 -> v2: v1: https://lore.kernel.org/bpf/20220806014603.1771-1-memxor@gmail.com * Expand commit log to include summary of the discussion with Yonghong * Make lru_bug selftest serial to not mess up refcount for map_kptr test ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * selftests/bpf: Add test for prealloc_lru_pop bugKumar Kartikeya Dwivedi2022-08-092-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a regression test to check against invalid check_and_init_map_value call inside prealloc_lru_pop. The kptr should not be reset to NULL once we set it after deleting the map element. Hence, we trigger a program that updates the element causing its reuse, and checks whether the unref kptr is reset or not. If it is, prealloc_lru_pop does an incorrect check_and_init_map_value call and the test fails. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220809213033.24147-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Don't reinit map value in prealloc_lru_popKumar Kartikeya Dwivedi2022-08-091-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LRU map that is preallocated may have its elements reused while another program holds a pointer to it from bpf_map_lookup_elem. Hence, only check_and_free_fields is appropriate when the element is being deleted, as it ensures proper synchronization against concurrent access of the map value. After that, we cannot call check_and_init_map_value again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while they can be concurrently accessed from a BPF program. This is safe to do as when the map entry is deleted, concurrent access is protected against by check_and_free_fields, i.e. an existing timer would be freed, and any existing kptr will be released by it. The program can create further timers and kptrs after check_and_free_fields, but they will eventually be released once the preallocated items are freed on map destruction, even if the item is never reused again. Hence, the deleted item sitting in the free list can still have resources attached to it, and they would never leak. With spin_lock, we never touch the field at all on delete or update, as we may end up modifying the state of the lock. Since the verifier ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock call, the program will eventually release the lock so that on reuse the new user of the value can take the lock. Essentially, for the preallocated case, we must assume that the map value may always be in use by the program, even when it is sitting in the freelist, and handle things accordingly, i.e. use proper synchronization inside check_and_free_fields, and never reinitialize the special fields when it is reused on update. Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220809213033.24147-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | | * bpf: Allow calling bpf_prog_test kfuncs in tracing programsKumar Kartikeya Dwivedi2022-08-091-0/+1
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | In addition to TC hook, enable these in tracing programs so that they can be used in selftests. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * bpf, arm64: Allocate program buffer using kvcalloc instead of kcallocAijun Sun2022-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not necessary to allocate contiguous physical memory for BPF program buffer using kcalloc. When the BPF program is large more than memory page size, kcalloc allocates multiple memory pages from buddy system. If the device can not provide sufficient memory, for example in low-end android devices [0], memory allocation for BPF program is likely to fail. Test cases in lib/test_bpf.c all pass on ARM64 QEMU. [0] AndroidTestSuit: page allocation failure: order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=foreground,mems_allowed=0 Call trace: dump_stack+0xa4/0x114 warn_alloc+0xf8/0x14c __alloc_pages_slowpath+0xac8/0xb14 __alloc_pages_nodemask+0x194/0x3d0 kmalloc_order_trace+0x44/0x1e8 __kmalloc+0x29c/0x66c bpf_int_jit_compile+0x17c/0x568 bpf_prog_select_runtime+0x4c/0x1b0 bpf_prepare_filter+0x5fc/0x6bc bpf_prog_create_from_user+0x118/0x1c0 seccomp_set_mode_filter+0x1c4/0x7cc __do_sys_prctl+0x380/0x1424 __arm64_sys_prctl+0x20/0x2c el0_svc_common+0xc8/0x22c el0_svc_handler+0x1c/0x28 el0_svc+0x8/0x100 Signed-off-by: Aijun Sun <aijun.sun@unisoc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220804025442.22524-1-aijun.sun@unisoc.com
| | * selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpfStanislav Fomichev2022-08-081-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently, no existing selftest covers it. Add a new one where we load cgroup/bind4 program and attach fentry to it. Calling bpf_obj_get_info_by_fd on the fentry program should return non-zero btf_id/btf_obj_id instead of crashing the kernel. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220804201140.1340684-2-sdf@google.com
| | * bpf: Use proper target btf when exporting attach_btf_obj_idStanislav Fomichev2022-08-081-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When attaching to program, the program itself might not be attached to anything (and, hence, might not have attach_btf), so we can't unconditionally use 'prog->aux->dst_prog->aux->attach_btf'. Instead, use bpf_prog_get_target_btf to pick proper target BTF: * when attached to dst_prog, use dst_prog->aux->btf * when attached to kernel btf, use prog->aux->attach_btf Fixes: b79c9fc9551b ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220804201140.1340684-1-sdf@google.com
| | * mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabledJiri Olsa2022-08-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The btf_sock_ids array needs struct mptcp_sock BTF ID for the bpf_skc_to_mptcp_sock helper. When CONFIG_MPTCP is disabled, the 'struct mptcp_sock' is not defined and resolve_btfids will complain with: [...] BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol mptcp_sock [...] Add an empty definition for struct mptcp_sock when CONFIG_MPTCP is disabled. Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220802163324.1873044-1-jolsa@kernel.org
| | * bpf: Cleanup ftrace hash in bpf_trampoline_putJiri Olsa2022-08-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to release possible hash from trampoline fops object before removing it, otherwise we leak it. Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20220802135651.1794015-1-jolsa@kernel.org
| | * BPF: Fix potential bad pointer dereference in bpf_sys_bpf()Jinghao Jia2022-08-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bpf_sys_bpf() helper function allows an eBPF program to load another eBPF program from within the kernel. In this case the argument union bpf_attr pointer (as well as the insns and license pointers inside) is a kernel address instead of a userspace address (which is the case of a usual bpf() syscall). To make the memory copying process in the syscall work in both cases, bpfptr_t was introduced to wrap around the pointer and distinguish its origin. Specifically, when copying memory contents from a bpfptr_t, a copy_from_user() is performed in case of a userspace address and a memcpy() is performed for a kernel address. This can lead to problems because the in-kernel pointer is never checked for validity. The problem happens when an eBPF syscall program tries to call bpf_sys_bpf() to load a program but provides a bad insns pointer -- say 0xdeadbeef -- in the bpf_attr union. The helper calls __sys_bpf() which would then call bpf_prog_load() to load the program. bpf_prog_load() is responsible for copying the eBPF instructions to the newly allocated memory for the program; it creates a kernel bpfptr_t for insns and invokes copy_from_bpfptr(). Internally, all bpfptr_t operations are backed by the corresponding sockptr_t operations, which performs direct memcpy() on kernel pointers for copy_from/strncpy_from operations. Therefore, the code is always happy to dereference the bad pointer to trigger a un-handle-able page fault and in turn an oops. However, this is not supposed to happen because at that point the eBPF program is already verified and should not cause a memory error. Sample KASAN trace: [ 25.685056][ T228] ================================================================== [ 25.685680][ T228] BUG: KASAN: user-memory-access in copy_from_bpfptr+0x21/0x30 [ 25.686210][ T228] Read of size 80 at addr 00000000deadbeef by task poc/228 [ 25.686732][ T228] [ 25.686893][ T228] CPU: 3 PID: 228 Comm: poc Not tainted 5.19.0-rc7 #7 [ 25.687375][ T228] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014 [ 25.687991][ T228] Call Trace: [ 25.688223][ T228] <TASK> [ 25.688429][ T228] dump_stack_lvl+0x73/0x9e [ 25.688747][ T228] print_report+0xea/0x200 [ 25.689061][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.689401][ T228] ? _printk+0x54/0x6e [ 25.689693][ T228] ? _raw_spin_lock_irqsave+0x70/0xd0 [ 25.690071][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.690412][ T228] kasan_report+0xb5/0xe0 [ 25.690716][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.691059][ T228] kasan_check_range+0x2bd/0x2e0 [ 25.691405][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.691734][ T228] memcpy+0x25/0x60 [ 25.692000][ T228] copy_from_bpfptr+0x21/0x30 [ 25.692328][ T228] bpf_prog_load+0x604/0x9e0 [ 25.692653][ T228] ? cap_capable+0xb4/0xe0 [ 25.692956][ T228] ? security_capable+0x4f/0x70 [ 25.693324][ T228] __sys_bpf+0x3af/0x580 [ 25.693635][ T228] bpf_sys_bpf+0x45/0x240 [ 25.693937][ T228] bpf_prog_f0ec79a5a3caca46_bpf_func1+0xa2/0xbd [ 25.694394][ T228] bpf_prog_run_pin_on_cpu+0x2f/0xb0 [ 25.694756][ T228] bpf_prog_test_run_syscall+0x146/0x1c0 [ 25.695144][ T228] bpf_prog_test_run+0x172/0x190 [ 25.695487][ T228] __sys_bpf+0x2c5/0x580 [ 25.695776][ T228] __x64_sys_bpf+0x3a/0x50 [ 25.696084][ T228] do_syscall_64+0x60/0x90 [ 25.696393][ T228] ? fpregs_assert_state_consistent+0x50/0x60 [ 25.696815][ T228] ? exit_to_user_mode_prepare+0x36/0xa0 [ 25.697202][ T228] ? syscall_exit_to_user_mode+0x20/0x40 [ 25.697586][ T228] ? do_syscall_64+0x6e/0x90 [ 25.697899][ T228] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 25.698312][ T228] RIP: 0033:0x7f6d543fb759 [ 25.698624][ T228] Code: 08 5b 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 a6 0e 00 f7 d8 64 89 01 48 [ 25.699946][ T228] RSP: 002b:00007ffc3df78468 EFLAGS: 00000287 ORIG_RAX: 0000000000000141 [ 25.700526][ T228] RAX: ffffffffffffffda RBX: 00007ffc3df78628 RCX: 00007f6d543fb759 [ 25.701071][ T228] RDX: 0000000000000090 RSI: 00007ffc3df78478 RDI: 000000000000000a [ 25.701636][ T228] RBP: 00007ffc3df78510 R08: 0000000000000000 R09: 0000000000300000 [ 25.702191][ T228] R10: 0000000000000005 R11: 0000000000000287 R12: 0000000000000000 [ 25.702736][ T228] R13: 00007ffc3df78638 R14: 000055a1584aca68 R15: 00007f6d5456a000 [ 25.703282][ T228] </TASK> [ 25.703490][ T228] ================================================================== [ 25.704050][ T228] Disabling lock debugging due to kernel taint Update copy_from_bpfptr() and strncpy_from_bpfptr() so that: - for a kernel pointer, it uses the safe copy_from_kernel_nofault() and strncpy_from_kernel_nofault() functions. - for a userspace pointer, it performs copy_from_user() and strncpy_from_user(). Fixes: af2ac3e13e45 ("bpf: Prepare bpf syscall to be used from kernel and user space.") Link: https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/ Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220729201713.88688-1-jinghao@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>