summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* netlink: remove the flex array from struct nlmsghdrJakub Kicinski2022-11-181-2/+0
| | | | | | | | | | | | | | I've added a flex array to struct nlmsghdr in commit 738136a0e375 ("netlink: split up copies in the ack construction") to allow accessing the data easily. It leads to warnings with clang, if user space wraps this structure into another struct and the flex array is not at the end of the container. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/all/20221114023927.GA685@u2004-local/ Link: https://lore.kernel.org/r/20221118033903.1651026-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* mrp: introduce active flags to prevent UAF when applicant uninitSchspa Shi2022-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The caller of del_timer_sync must prevent restarting of the timer, If we have no this synchronization, there is a small probability that the cancellation will not be successful. And syzbot report the fellowing crash: ================================================================== BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:929 [inline] BUG: KASAN: use-after-free in enqueue_timer+0x18/0xa4 kernel/time/timer.c:605 Write at addr f9ff000024df6058 by task syz-fuzzer/2256 Pointer tag: [f9], memory tag: [fe] CPU: 1 PID: 2256 Comm: syz-fuzzer Not tainted 6.1.0-rc5-syzkaller-00008- ge01d50cbd6ee #0 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0xe0/0xf0 arch/arm64/kernel/stacktrace.c:156 dump_backtrace arch/arm64/kernel/stacktrace.c:162 [inline] show_stack+0x18/0x40 arch/arm64/kernel/stacktrace.c:163 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x1a8/0x4a0 mm/kasan/report.c:395 kasan_report+0x94/0xb4 mm/kasan/report.c:495 __do_kernel_fault+0x164/0x1e0 arch/arm64/mm/fault.c:320 do_bad_area arch/arm64/mm/fault.c:473 [inline] do_tag_check_fault+0x78/0x8c arch/arm64/mm/fault.c:749 do_mem_abort+0x44/0x94 arch/arm64/mm/fault.c:825 el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:367 el1h_64_sync_handler+0xd8/0xe4 arch/arm64/kernel/entry-common.c:427 el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576 hlist_add_head include/linux/list.h:929 [inline] enqueue_timer+0x18/0xa4 kernel/time/timer.c:605 mod_timer+0x14/0x20 kernel/time/timer.c:1161 mrp_periodic_timer_arm net/802/mrp.c:614 [inline] mrp_periodic_timer+0xa0/0xc0 net/802/mrp.c:627 call_timer_fn.constprop.0+0x24/0x80 kernel/time/timer.c:1474 expire_timers+0x98/0xc4 kernel/time/timer.c:1519 To fix it, we can introduce a new active flags to make sure the timer will not restart. Reported-by: syzbot+6fd64001c20aa99e34a4@syzkaller.appspotmail.com Signed-off-by: Schspa Shi <schspa@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'wireless-next-2022-11-18' of ↵David S. Miller2022-11-183-55/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.2 Second set of patches for v6.2. Only driver patches this time, nothing really special. Unused platform data support was removed from wl1251 and rtw89 got WoWLAN support. Major changes: ath11k * support configuring channel dwell time during scan rtw89 * new dynamic header firmware format support * Wake-over-WLAN support rtl8xxxu * enable IEEE80211_HW_SUPPORT_FAST_XMIT ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * wifi: wl1251: drop support for platform dataDmitry Torokhov2022-11-161-44/+0
| | | | | | | | | | | | | | | | | | Remove support for configuring the device via platform data because there are no users of wl1251_platform_data left in the mainline kernel. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221109224250.2885119-2-dmitry.torokhov@gmail.com
| * wifi: cfg80211: Avoid clashing function prototypesGustavo A. R. Silva2022-11-161-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1]. Fix a total of 73 warnings like these: drivers/net/wireless/intersil/orinoco/wext.c:1379:27: warning: cast from 'int (*)(struct net_device *, struct iw_request_info *, struct iw_param *, char *)' to 'iw_handler' (aka 'int (*)(struct net_device *, struct iw_request_info *, union iwreq_data *, char *)') converts to incompatible function type [-Wcast-function-type-strict] IW_HANDLER(SIOCGIWPOWER, (iw_handler)orinoco_ioctl_getpower), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../net/wireless/wext-compat.c:1607:33: warning: cast from 'int (*)(struct net_device *, struct iw_request_info *, struct iw_point *, char *)' to 'iw_handler' (aka 'int (*)(struct net_device *, struct iw_request_info *, union iwreq_data *, char *)') converts to incompatible function type [-Wcast-function-type-strict] [IW_IOCTL_IDX(SIOCSIWGENIE)] = (iw_handler) cfg80211_wext_siwgenie, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/wireless/intersil/orinoco/wext.c:1390:27: error: incompatible function pointer types initializing 'const iw_handler' (aka 'int (*const)(struct net_device *, struct iw_request_info *, union iwreq_data *, char *)') with an expression of type 'int (struct net_device *, struct iw_request_info *, struct iw_param *, char *)' [-Wincompatible-function-pointer-types] IW_HANDLER(SIOCGIWRETRY, cfg80211_wext_giwretry), ^~~~~~~~~~~~~~~~~~~~~~ The cfg80211 Wireless Extension handler callbacks (iw_handler) use a union for the data argument. Actually use the union and perform explicit member selection in the function body instead of having a function prototype mismatch. There are no resulting binary differences before/after changes. These changes were made partly manually and partly with the help of Coccinelle. Link: https://github.com/KSPP/linux/issues/234 Link: https://reviews.llvm.org/D134831 [1] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/a68822bf8dd587988131bb6a295280cb4293f05d.1667934775.git.gustavoars@kernel.org
| * bcma: Use the proper gpio includeLinus Walleij2022-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The <linux/bcma/bcma_driver_chipcommon.h> is including the legacy header <linux/gpio.h> to obtain struct gpio_chip. Instead, include <linux/gpio/driver.h> where this struct is defined. It turns out that the brcm80211 brcmsmac depends on this to bring in the symbol gpio_is_valid(). The driver looks up the BCMA parent GPIO driver and checks that this succeeds, but then it goes on to use the deprecated GPIO call gpio_is_valid() to check the consistency of the .base member of the BCMA GPIO struct. The whole check can be dropped because the bcma_gpio is initialized in the declarations: struct gpio_chip *bcma_gpio = &cc_drv->gpio; And this can never be NULL. Cc: Jonas Gorski <jonas.gorski@gmail.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221028092332.238728-1-linus.walleij@linaro.org
* | sctp: add dif and sdif check in asoc and ep lookupXin Long2022-11-183-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch at first adds a pernet global l3mdev_accept to decide if it accepts the packets from a l3mdev when a SCTP socket doesn't bind to any interface. It's set to 1 to avoid any possible incompatible issue, and in next patch, a sysctl will be introduced to allow to change it. Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is added to check either dif or sdif is equal to sk_bound_dev_if, and to check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set. This function is used to match a association or a endpoint, namely called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match(). All functions that needs updating are: sctp_rcv(): asoc: __sctp_rcv_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_lookup_harder() __sctp_rcv_init_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_walk_lookup() __sctp_rcv_asconf_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() ep: __sctp_rcv_lookup_endpoint() -> sctp_endpoint_is_match() sctp_connect(): sctp_endpoint_is_peeled_off() __sctp_lookup_association() sctp_has_association() sctp_lookup_association() __sctp_lookup_association() -> sctp_addrs_lookup_transport() sctp_diag_dump_one(): sctp_transport_lookup_process() -> sctp_addrs_lookup_transport() Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: add skb_sdif in struct sctp_afXin Long2022-11-181-0/+1
| | | | | | | | | | | | | | | | | | Add skb_sdif function in struct sctp_af to get the enslaved device for both ipv4 and ipv6 when adding SCTP VRF support in sctp_rcv in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.hXin Long2022-11-172-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | Move these two macros from net/sctp/sctp.h to linux/sctp.h, so that it will be enough to include only linux/sctp.h in nft_exthdr.c and xt_sctp.c. It should not include "net/sctp/sctp.h" if a module does not have a dependence on SCTP module. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Link: https://lore.kernel.org/r/ef6468a687f36da06f575c2131cd4612f6b7be88.1668526821.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sctp: change to include linux/sctp.h in net/sctp/checksum.hXin Long2022-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently "net/sctp/checksum.h" including "net/sctp/sctp.h" is included in quite some places in netfilter and openswitch and net/sched. It's not necessary to include "net/sctp/sctp.h" if a module does not have dependence on SCTP, "linux/sctp.h" is the right one to include. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Link: https://lore.kernel.org/r/ca7ea96d62a26732f0491153c3979dc1c0d8d34a.1668526793.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | devlink: Allow to set up parent in devl_rate_leaf_create()Michal Wilczynski2022-11-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver is able to create leaf nodes for the devlink-rate, but is unable to set parent for them. This wasn't as issue before the possibility to export hierarchy from the driver. After adding the export feature, in order for the driver to supply correct hierarchy, it's necessary for it to be able to supply a parent name to devl_rate_leaf_create(). Introduce a new parameter 'parent_name' in devl_rate_leaf_create(). Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | devlink: Enable creation of the devlink-rate nodes from the driverMichal Wilczynski2022-11-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very rigid and can't be easily removed. This requires an ability to export default hierarchy to allow user to modify it. Currently the driver is only able to create the 'leaf' nodes, which usually represent the vport. This is not enough for HQoS implemented in Intel hardware. Introduce new function devl_rate_node_create() that allows for creation of the devlink-rate nodes from the driver. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | devlink: Introduce new attribute 'tx_weight' to devlink-rateMichal Wilczynski2022-11-172-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To fully utilize offload capabilities of Intel 100G card QoS capabilities new attribute 'tx_weight' needs to be introduced. This attribute allows for usage of Weighted Fair Queuing arbitration scheme among siblings. This arbitration scheme can be used simultaneously with the strict priority. Introduce new attribute in devlink-rate that will allow for configuration of Weighted Fair Queueing. New attribute is optional. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | devlink: Introduce new attribute 'tx_priority' to devlink-rateMichal Wilczynski2022-11-172-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To fully utilize offload capabilities of Intel 100G card QoS capabilities new attribute 'tx_priority' needs to be introduced. This attribute allows for usage of strict priority arbiter among siblings. This arbitration scheme attempts to schedule nodes based on their priority as long as the nodes remain within their bandwidth limit. Introduce new attribute in devlink-rate that will allow for configuration of strict priority. New attribute is optional. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: dsa: stop exposing tag proto module helpers to the worldVladimir Oltean2022-11-171-70/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DSA tagging protocol driver macros are in the public include/net/dsa.h probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are (MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions). But there is no reason to expose these helpers to <net/dsa.h>. That header is shared between switch drivers (drivers/net/dsa/), tagging protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c), and the rest of the world (DSA master drivers, network stack, etc). Too much exposure. On the other hand, net/dsa/dsa_priv.h is included only by the DSA core and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a bit too much exposure - I've contemplated creating a new header which is only included by tagging protocol drivers, but completely separating a new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for example dsa_slave_to_port() is used both from the fast path and from the control path. So for now, move these definitions to dsa_priv.h which at least hides them from the world. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | ethtool: doc: clarify what drivers can implement in their get_drvinfo()Vincent Mailhol2022-11-172-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many of the drivers which implement ethtool_ops::get_drvinfo() will prints the .driver, .version or .bus_info of struct ethtool_drvinfo. To have a glance of current state, do: $ git grep -W "get_drvinfo(struct" Printing in those three fields is useless because: - since [1], the driver version should be the kernel version (at least for upstream drivers). Arguably, out of tree drivers might still want to set a custom version, but out of tree is not our focus. - since [2], the core is able to provide default values for .driver and .bus_info. In summary, drivers may provide .fw_version and .erom_version, the rest is expected to be done by the core. In struct ethtool_ops doc from linux/ethtool: rephrase field get_drvinfo() doc to discourage developers from implementing this callback. In struct ethtool_drvinfo doc from uapi/linux/ethtool.h: remove the paragraph mentioning what drivers should do. Rationale: no need to repeat what is already written in struct ethtool_ops doc. But add a note that .fw_version and .erom_version are driver defined. Also update the dummy driver and simply remove the callback in order not to confuse the newcomers: most of the drivers will not need this callback function any more. [1] commit 6a7e25c7fb48 ("net/core: Replace driver version to be kernel version") Link: https://git.kernel.org/torvalds/linux/c/6a7e25c7fb48 [2] commit edaf5df22cb8 ("ethtool: ethtool_get_drvinfo: populate drvinfo fields even if callback exits") Link: https://git.kernel.org/netdev/net-next/c/edaf5df22cb8 Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/r/20221116171828.4093-1-mailhol.vincent@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-11-1714-31/+72
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * \ Merge tag 'net-6.1-rc6' of ↵Linus Torvalds2022-11-176-28/+50
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Current release - regressions: - tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init() Previous releases - regressions: - bridge: fix memory leaks when changing VLAN protocol - dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims - dsa: don't leak tagger-owned storage on switch driver unbind - eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6 is removed - eth: stmmac: ensure tx function is not running in stmmac_xdp_release() - eth: hns3: fix return value check bug of rx copybreak Previous releases - always broken: - kcm: close race conditions on sk_receive_queue - bpf: fix alignment problem in bpf_prog_test_run_skb() - bpf: fix writing offset in case of fault in strncpy_from_kernel_nofault - eth: macvlan: use built-in RCU list checking - eth: marvell: add sleep time after enabling the loopback bit - eth: octeon_ep: fix potential memory leak in octep_device_setup() Misc: - tcp: configurable source port perturb table size - bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)" * tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits) net: use struct_group to copy ip/ipv6 header addresses net: usb: smsc95xx: fix external PHY reset net: usb: qmi_wwan: add Telit 0x103a composition netdevsim: Fix memory leak of nsim_dev->fa_cookie tcp: configurable source port perturb table size l2tp: Serialize access to sk_user_data with sk_callback_lock net: thunderbolt: Fix error handling in tbnet_init() net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start() net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init() net: dsa: don't leak tagger-owned storage on switch driver unbind net/x25: Fix skb leak in x25_lapb_receive_frame() net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open() bridge: switchdev: Fix memory leaks when changing VLAN protocol net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process net: hns3: fix return value check bug of rx copybreak net: hns3: fix incorrect hw rss hash type of rx packet net: phy: marvell: add sleep time after enabling the loopback bit net: ena: Fix error handling in ena_init() kcm: close race conditions on sk_receive_queue net: ionic: Fix error handling in ionic_init_module() ...
| | * | net: use struct_group to copy ip/ipv6 header addressesHangbin Liu2022-11-174-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel test robot reported warnings when build bonding module with make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/: from ../drivers/net/bonding/bond_main.c:35: In function ‘fortify_memcpy_chk’, inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2, inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3: ../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 413 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function ‘fortify_memcpy_chk’, inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2, inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3: ../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 413 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is because we try to copy the whole ip/ip6 address to the flow_key, while we only point the to ip/ip6 saddr. Note that since these are UAPI headers, __struct_group() is used to avoid the compiler warnings. Reported-by: kernel test robot <lkp@intel.com> Fixes: c3f8324188fa ("net: Add full IPv6 addresses to flow_keys") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| | * | l2tp: Serialize access to sk_user_data with sk_callback_lockJakub Sitnicki2022-11-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk->sk_user_data has multiple users, which are not compatible with each other. Writers must synchronize by grabbing the sk->sk_callback_lock. l2tp currently fails to grab the lock when modifying the underlying tunnel socket fields. Fix it by adding appropriate locking. We err on the side of safety and grab the sk_callback_lock also inside the sk_destruct callback overridden by l2tp, even though there should be no refs allowing access to the sock at the time when sk_destruct gets called. v4: - serialize write to sk_user_data in l2tp sk_destruct v3: - switch from sock lock to sk_callback_lock - document write-protection for sk_user_data v2: - update Fixes to point to origin of the bug - use real names in Reported/Tested-by tags Cc: Tom Parkin <tparkin@katalix.com> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2022-11-111-21/+39
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrii Nakryiko says: ==================== bpf 2022-11-11 We've added 11 non-merge commits during the last 8 day(s) which contain a total of 11 files changed, 83 insertions(+), 74 deletions(-). The main changes are: 1) Fix strncpy_from_kernel_nofault() to prevent out-of-bounds writes, from Alban Crequy. 2) Fix for bpf_prog_test_run_skb() to prevent wrong alignment, from Baisong Zhong. 3) Switch BPF_DISPATCHER to static_call() instead of ftrace infra, with a small build fix on top, from Peter Zijlstra and Nathan Chancellor. 4) Fix memory leak in BPF verifier in some error cases, from Wang Yufen. 5) 32-bit compilation error fixes for BPF selftests, from Pu Lehui and Yang Jihong. 6) Ensure even distribution of per-CPU free list elements, from Xu Kuohai. 7) Fix copy_map_value() to track special zeroed out areas properly, from Xu Kuohai. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix offset calculation error in __copy_map_value and zero_map_value bpf: Initialize same number of free nodes for each pcpu_freelist selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() selftests/bpf: Fix test_progs compilation failure in 32-bit arch selftests/bpf: Fix casting error when cross-compiling test_verifier for 32-bit platforms bpf: Fix memory leaks in __check_func_call bpf: Add explicit cast to 'void *' for __BPF_DISPATCHER_UPDATE() bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop") bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() ==================== Link: https://lore.kernel.org/r/20221111231624.938829-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | | * | bpf: Fix offset calculation error in __copy_map_value and zero_map_valueXu Kuohai2022-11-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function __copy_map_value and zero_map_value miscalculated copy offset, resulting in possible copy of unwanted data to user or kernel. Fix it. Fixes: cc48755808c6 ("bpf: Add zero_map_value to zero map value with special fields") Fixes: 4d7d7f69f4b1 ("bpf: Adapt copy_map_value for multiple offset case") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20221111125620.754855-1-xukuohai@huaweicloud.com
| | | * | bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)Peter Zijlstra2022-11-041-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dispatcher function is currently abusing the ftrace __fentry__ call location for its own purposes -- this obviously gives trouble when the dispatcher and ftrace are both in use. A previous solution tried using __attribute__((patchable_function_entry())) which works, except it is GCC-8+ only, breaking the build on the earlier still supported compilers. Instead use static_call() -- which has its own annotations and does not conflict with ftrace -- to rewrite the dispatch function. By using: return static_call()(ctx, insni, bpf_func) you get a perfect forwarding tail call as function body (iow a single jmp instruction). By having the default static_call() target be bpf_dispatcher_nop_func() it retains the default behaviour (an indirect call to the argument function). Only once a dispatcher program is attached is the target rewritten to directly call the JIT'ed image. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/Y1/oBlK0yFk5c/Im@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/bpf/20221103120647.796772565@infradead.org
| | | * | bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")Peter Zijlstra2022-11-041-20/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because __attribute__((patchable_function_entry)) is only available since GCC-8 this solution fails to build on the minimum required GCC version. Undo these changes so we might try again -- without cluttering up the patches with too many changes. This is an almost complete revert of: dbe69b299884 ("bpf: Fix dispatcher patchable function entry to 5 bytes nop") ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") (notably the arch/x86/Kconfig hunk is kept). Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/439d8dc735bb4858875377df67f1b29a@AcuMS.aculab.com Link: https://lore.kernel.org/bpf/20221103120647.728830733@infradead.org
| * | | | Merge tag 'vfio-v6.1-rc6' of https://github.com/awilliam/linux-vfioLinus Torvalds2022-11-141-0/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VFIO fixes from Alex Williamson: - Fixes for potential container registration leak for drivers not implementing a close callback, duplicate container de-registrations, and a regression in support for bus reset on last device close from a device set (Anthony DeRossi) * tag 'vfio-v6.1-rc6' of https://github.com/awilliam/linux-vfio: vfio/pci: Check the device set open count on reset vfio: Export the device set open count vfio: Fix container device registration life cycle
| | * | | | vfio: Export the device set open countAnthony DeRossi2022-11-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The open count of a device set is the sum of the open counts of all devices in the set. Drivers can use this value to determine whether shared resources are in use without tracking them manually or accessing the private open_count in vfio_device. Signed-off-by: Anthony DeRossi <ajderossi@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20221110014027.28780-3-ajderossi@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | | | | Merge tag 'efi-fixes-for-v6.1-3' of ↵Linus Torvalds2022-11-131-0/+1
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Force the use of SetVirtualAddressMap() on Ampera Altra arm64 machines, which crash in SetTime() if no virtual remapping is used This is the first time we've added an SMBIOS based quirk on arm64, but fortunately, we can just call a EFI protocol to grab the type #1 SMBIOS record when running in the stub, so we don't need all the machinery we have in the kernel proper to parse SMBIOS data. - Drop a spurious warning on misaligned runtime regions when using 16k or 64k pages on arm64 * tag 'efi-fixes-for-v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Fix handling of misaligned runtime regions and drop warning arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines
| | * | | | | arm64: efi: Force the use of SetVirtualAddressMap() on Altra machinesArd Biesheuvel2022-11-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ampere Altra machines are reported to misbehave when the SetTime() EFI runtime service is called after ExitBootServices() but before calling SetVirtualAddressMap(). Given that the latter is horrid, pointless and explicitly documented as optional by the EFI spec, we no longer invoke it at boot if the configured size of the VA space guarantees that the EFI runtime memory regions can remain mapped 1:1 like they are at boot time. On Ampere Altra machines, this results in SetTime() calls issued by the rtc-efi driver triggering synchronous exceptions during boot. We can now recover from those without bringing down the system entirely, due to commit 23715a26c8d81291 ("arm64: efi: Recover from synchronous exceptions occurring in firmware"). However, it would be better to avoid the issue entirely, given that the firmware appears to remain in a funny state after this. So attempt to identify these machines based on the 'family' field in the type #1 SMBIOS record, and call SetVirtualAddressMap() unconditionally in that case. Tested-by: Alexandru Elisei <alexandru.elisei@gmail.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
| * | | | | | Merge tag 'mm-hotfixes-stable-2022-11-11' of ↵Linus Torvalds2022-11-111-0/+7
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. Eight are cc:stable and the remainder address issues which were introduced post-6.0 or which aren't considered serious enough to justify a -stable backport" * tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) docs: kmsan: fix formatting of "Example report" mm/damon/dbgfs: check if rm_contexts input is for a real context maple_tree: don't set a new maximum on the node when not reusing nodes maple_tree: fix depth tracking in maple_state arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging fs: fix leaked psi pressure state nilfs2: fix use-after-free bug of ns_writer on remount x86/traps: avoid KMSAN bugs originating from handle_bug() kmsan: make sure PREEMPT_RT is off Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN x86/uaccess: instrument copy_from_user_nmi() kmsan: core: kmsan_in_runtime() should return true in NMI context mm: hugetlb_vmemmap: include missing linux/moduleparam.h mm/shmem: use page_mapping() to detect page cache for uffd continue mm/memremap.c: map FS_DAX device memory as decrypted Partly revert "mm/thp: carry over dirty bit when thp splits on pmd" nilfs2: fix deadlock in nilfs_count_free_blocks() mm/mmap: fix memory leak in mmap_region() hugetlbfs: don't delete error page from pagecache maple_tree: reorganize testing to restore module testing ...
| | * | | | | | maple_tree: reorganize testing to restore module testingLiam Howlett2022-11-081-0/+7
| | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [akpm@linux-foundation.org: fix module export] [akpm@linux-foundation.org: fix it some more] [liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | | Merge tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linuxLinus Torvalds2022-11-111-1/+1
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "Nothing major, just a few minor tweaks: - Tweak for the TCP zero-copy io_uring self test (Pavel) - Rather than use our internal cached value of number of CQ events available, use what the user can see (Dylan) - Fix a typo in a comment, added in this release (me) - Don't allow wrapping while adding provided buffers (me) - Fix a double poll race, and add a lockdep assertion for it too (Pavel)" * tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux: io_uring/poll: lockdep annote io_poll_req_insert_locked io_uring/poll: fix double poll req->flags races io_uring: check for rollover of buffer ID when providing buffers io_uring: calculate CQEs from the user visible value io_uring: fix typo in io_uring.h comment selftests/net: don't tests batched TCP io_uring zc
| | * | | | | | io_uring: fix typo in io_uring.h commentJens Axboe2022-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just a basic s/thig/this swap, fixing up a typo introduced by a commit added in the 6.1 release. Fixes: 9cda70f622cd ("io_uring: introduce fixed buffer support for io_uring_cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | Merge tag 'hardening-v6.1-rc5' of ↵Linus Torvalds2022-11-111-1/+1
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fix from Kees Cook: - Fix !SMP placement of '.data..decrypted' section (Nathan Chancellor) * tag 'hardening-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: vmlinux.lds.h: Fix placement of '.data..decrypted' section
| | * | | | | | | vmlinux.lds.h: Fix placement of '.data..decrypted' sectionNathan Chancellor2022-11-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP") fixed an orphan section warning by adding the '.data..decrypted' section to the linker script under the PERCPU_DECRYPTED_SECTION define but that placement introduced a panic with !SMP, as the percpu sections are not instantiated with that configuration so attempting to access variables defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault. Move the '.data..decrypted' section to the DATA_MAIN define so that the variables in it are properly instantiated at boot time with CONFIG_SMP=n. Cc: stable@vger.kernel.org Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP") Link: https://lore.kernel.org/cbbd3548-880c-d2ca-1b67-5bb93b291d5f@huawei.com/ Debugged-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Zhao Wenhui <zhaowenhui8@huawei.com> Tested-by: xiafukun <xiafukun@huawei.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221108174934.3384275-1-nathan@kernel.org
| * | | | | | | | Merge tag 'hyperv-fixes-signed-20221110' of ↵Linus Torvalds2022-11-111-0/+9
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix TSC MSR write for root partition (Anirudh Rayabharam) - Fix definition of vector in pci-hyperv driver (Dexuan Cui) - A few other misc patches * tag 'hyperv-fixes-signed-20221110' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: PCI: hv: Fix the definition of vector in hv_compose_msi_msg() MAINTAINERS: remove sthemmin x86/hyperv: fix invalid writes to MSRs during root partition kexec clocksource/drivers/hyperv: add data structure for reference TSC MSR Drivers: hv: fix repeated words in comments x86/hyperv: Remove BUG_ON() for kmap_local_page()
| | * | | | | | | | clocksource/drivers/hyperv: add data structure for reference TSC MSRAnirudh Rayabharam2022-11-031-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a data structure to represent the reference TSC MSR similar to other MSRs. This simplifies the code for updating the MSR. Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221027095729.1676394-2-anrayabh@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
| * | | | | | | | | Merge tag 'dmaengine-fix-6.1' of ↵Linus Torvalds2022-11-111-0/+1
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "Misc minor driver fixes and a big pile of at_hdmac driver fixes. More work on this driver is done and sitting in next: - Pile of at_hdmac driver rework which fixes many long standing issues for this driver. - couple of stm32 driver fixes for clearing structure and race fix - idxd fixes for RO device state and batch size - ti driver mem leak fix - apple fix for grabbing channels in xlate - resource leak fix in mv xor" * tag 'dmaengine-fix-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (24 commits) dmaengine: at_hdmac: Check return code of dma_async_device_register dmaengine: at_hdmac: Fix impossible condition dmaengine: at_hdmac: Don't allow CPU to reorder channel enable dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware dmaengine: at_hdmac: Fix concurrency over the active list dmaengine: at_hdmac: Free the memset buf without holding the chan lock dmaengine: at_hdmac: Fix concurrency over descriptor dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all() dmaengine: at_hdmac: Protect atchan->status with the channel lock dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all dmaengine: at_hdmac: Fix premature completion of desc in issue_pending dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending dmaengine: at_hdmac: Don't start transactions at tx_submit level dmaengine: at_hdmac: Fix at_lli struct definition dmaengine: stm32-dma: fix potential race between pause and resume dmaengine: ti: k3-udma-glue: fix memory leak when register device fail dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove() dmaengine: apple-admac: Fix grabbing of channels in of_xlate dmaengine: idxd: fix RO device state error after been disabled/reset ...
| | * | | | | | | | | dmaengine: idxd: Do not enable user type Work Queue without Shared Virtual ↵Fenghua Yu2022-10-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Addressing When the idxd_user_drv driver is bound to a Work Queue (WQ) device without IOMMU or with IOMMU Passthrough without Shared Virtual Addressing (SVA), the application gains direct access to physical memory via the device by programming physical address to a submitted descriptor. This allows direct userspace read and write access to arbitrary physical memory. This is inconsistent with the security goals of a good kernel API. Unlike vfio_pci driver, the IDXD char device driver does not provide any ways to pin user pages and translate the address from user VA to IOVA or PA without IOMMU SVA. Therefore the application has no way to instruct the device to perform DMA function. This makes the char device not usable for normal application usage. Since user type WQ without SVA cannot be used for normal application usage and presents the security issue, bind idxd_user_drv driver and enable user type WQ only when SVA is enabled (i.e. user PASID is enabled). Fixes: 448c3de8ac83 ("dmaengine: idxd: create user driver for wq 'device'") Cc: stable@vger.kernel.org Suggested-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221014222541.3912195-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * | | | | | | | | | Merge tag 'drm-fixes-2022-11-11' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2022-11-111-1/+1
| |\ \ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Weekly pull request for graphics, mostly amdgpu and i915, with a couple of fixes for vc4 and panfrost, panel quirks and a kconfig change for rcar-du. Nothing seems to be too strange at this stage. amdgpu: - Fix s/r in amdgpu_vram_mgr_new - SMU 13.0.4 update - GPUVM TLB race fix - DCN 3.1.4 fixes - DCN 3.2.x fixes - Vega10 fan fix - BACO fix for Beige Goby board - PSR fix - GPU VM PT locking fixes amdkfd: - CRIU fixes vc4: - HDMI fixes to vc4. panfrost: - Make panfrost's uapi header compile with C++. - Handle 1 gb boundary correctly in panfrost mmu code. panel: - Add rotation quirks for 2 panels. rcar-du: - DSI Kconfig fix i915: - Fix sg_table handling in map_dma_buf - Send PSR update also on invalidate - Do not set cache_dirty for DGFX - Restore userptr probe_range behaviour" * tag 'drm-fixes-2022-11-11' of git://anongit.freedesktop.org/drm/drm: (29 commits) drm/amd/display: only fill dirty rectangles when PSR is enabled drm/amdgpu: disable BACO on special BEIGE_GOBY card drm/amdgpu: Drop eviction lock when allocating PT BO drm/amdgpu: Unlock bo_list_mutex after error handling Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly"" drm/amd/display: Enforce minimum prefetch time for low memclk on DCN32 drm/amd/display: Fix gpio port mapping issue drm/amd/display: Fix reg timeout in enc314_enable_fifo drm/amd/display: Fix FCLK deviation and tool compile issues drm/amd/display: Zeromem mypipe heap struct before using it drm/amd/display: Update SR watermarks for DCN314 drm/amdgpu: workaround for TLB seq race drm/amdkfd: Fix error handling in criu_checkpoint drm/amdkfd: Fix error handling in kfd_criu_restore_events drm/amd/pm: update SMU IP v13.0.4 msg interface header drm: rcar-du: Fix Kconfig dependency between RCAR_DU and RCAR_MIPI_DSI drm/panfrost: Split io-pgtable requests properly drm/amdgpu: Fix the lpfn checking condition in drm buddy drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017) drm: panel-orientation-quirks: Add quirk for Nanote UMPC-01 ...
| | * | | | | | | | | Merge tag 'drm-misc-fixes-2022-11-09' of ↵Dave Airlie2022-11-111-1/+1
| | |\ \ \ \ \ \ \ \ \ | | | |_|_|_|_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v6.1-rc5: - HDMI fixes to vc4. - Make panfrost's uapi header compile with C++. - Add rotation quirks for 2 panels. - Fix s/r in amdgpu_vram_mgr_new - Handle 1 gb boundary correctly in panfrost mmu code. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e02de501-4b85-28a0-3f6e-751ca13f5f9d@linux.intel.com
| | | * | | | | | | | drm/panfrost: Remove type name from internal struct againSteven Price2022-11-071-1/+1
| | | | |/ / / / / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 72655fb942c1 ("drm/panfrost: replace endian-specific types with native ones") accidentally reverted part of the parent commit 7228d9d79248 ("drm/panfrost: Remove type name from internal structs") leading to the situation that the Panfrost UAPI header still doesn't compile correctly in C++. Revert the accidental revert and pass me a brown paper bag. Reported-by: Alyssa Rosenzweig <alyssa@collabora.com> Fixes: 72655fb942c1 ("drm/panfrost: replace endian-specific types with native ones") Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221103114036.1581854-1-steven.price@arm.com
* | | | | | | | | | net: add atomic_long_t to net_device_stats fieldsEric Dumazet2022-11-162-26/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long standing KCSAN issues are caused by data-race around some dev->stats changes. Most performance critical paths already use per-cpu variables, or per-queue ones. It is reasonable (and more correct) to use atomic operations for the slow paths. This patch adds an union for each field of net_device_stats, so that we can convert paths that are not yet protected by a spinlock or a mutex. netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64 Note that the memcpy() we were using on 64bit arches had no provision to avoid load-tearing, while atomic_long_read() is providing the needed protection at no cost. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | udp: Introduce optional per-netns hash table.Kuniyuki Iwashima2022-11-162-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The maximum hash table size is 64K due to the nature of the protocol. [0] It's smaller than TCP, and fewer sockets can cause a performance drop. On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in different netns, creating 32Mi sockets without data transfer in the root netns causes regression for the iperf3's connection. uhash_entries sockets length Gbps 64K 1 1 5.69 1Mi 16 5.27 2Mi 32 4.90 4Mi 64 4.09 8Mi 128 2.96 16Mi 256 2.06 32Mi 512 1.12 The per-netns hash table breaks the lengthy lists into shorter ones. It is useful on a multi-tenant system with thousands of netns. With smaller hash tables, we can look up sockets faster, isolate noisy neighbours, and reduce lock contention. The max size of the per-netns table is 64K as well. This is because the possible hash range by udp_hashfn() always fits in 64K within the same netns and we cannot make full use of the whole buckets larger than 64K. /* 0 < num < 64K -> X < hash < X + 64K */ (num + net_hash_mix(net)) & mask; Also, the min size is 128. We use a bitmap to search for an available port in udp_lib_get_port(). To keep the bitmap on the stack and not fire the CONFIG_FRAME_WARN error at build time, we round up the table size to 128. The sysctl usage is the same with TCP: $ dmesg | cut -d ' ' -f 6- | grep "UDP hash" UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc) # sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = 65536 # can be changed by uhash_entries # sysctl net.ipv4.udp_child_hash_entries net.ipv4.udp_child_hash_entries = 0 # disabled by default # ip netns add test1 # ip netns exec test1 sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = -65536 # share the global table # sysctl -w net.ipv4.udp_child_hash_entries=100 net.ipv4.udp_child_hash_entries = 100 # ip netns add test2 # ip netns exec test2 sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = 128 # own a per-netns table with 2^n buckets We could optimise the hash table lookup/iteration further by removing the netns comparison for the per-netns one in the future. Also, we could optimise the sparse udp_hslot layout by putting it in udp_table. [0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | udp: Set NULL to sk->sk_prot->h.udp_table.Kuniyuki Iwashima2022-11-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will soon introduce an optional per-netns hash table for UDP. This means we cannot use the global sk->sk_prot->h.udp_table to fetch a UDP hash table. Instead, set NULL to sk->sk_prot->h.udp_table for UDP and get a proper table from net->ipv4.udp_table. Note that we still need sk->sk_prot->h.udp_table for UDP LITE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | net: dsa: remove phylink_validate() methodVladimir Oltean2022-11-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of now, no DSA driver uses a custom link mode validation procedure anymore. So remove this DSA operation and let phylink determine what is supported based on config->mac_capabilities (if provided by the driver). Leave a comment why we left the code that we did, and that there is more work to do. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski2022-11-155-6/+10
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Fix sparse warning in the new nft_inner expression, reported by Jakub Kicinski. 2) Incorrect vlan header check in nft_inner, from Peng Wu. 3) Two patches to pass reset boolean to expression dump operation, in preparation for allowing to reset stateful expressions in rules. This adds a new NFT_MSG_GETRULE_RESET command. From Phil Sutter. 4) Inconsistent indentation in nft_fib, from Jiapeng Chong. 5) Speed up siphash calculation in conntrack, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: conntrack: use siphash_4u64 netfilter: rpfilter/fib: clean up some inconsistent indenting netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters netfilter: nft_inner: fix return value check in nft_inner_parse_l2l3() netfilter: nft_payload: use __be16 to store gre version ==================== Link: https://lore.kernel.org/r/20221115095922.139954-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | | | | | | | netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESETPhil Sutter2022-11-152-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Analogous to NFT_MSG_GETOBJ_RESET, but for rules: Reset stateful expressions like counters or quotas. The latter two are the only consumers, adjust their 'dump' callbacks to respect the parameter introduced earlier. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | | | | | netfilter: nf_tables: Extend nft_expr_ops::dump callback parametersPhil Sutter2022-11-154-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a 'reset' flag just like with nft_object_ops::dump. This will be useful to reset "anonymous stateful objects", e.g. simple rule counters. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | | | | | | | Merge tag 'mlx5-updates-2022-11-12' of ↵David S. Miller2022-11-141-0/+6
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-11-12 Misc updates to mlx5 driver 1) Support enhanced CQE compression, on ConnectX6-Dx Reduce irq rate, cpu utilization and latency. 2) Connection tracking: Optimize the pre_ct table lookup for rules installed on chain 0. 3) implement ethtool get_link_ext_stats for PHY down events 4) Expose device vhca_id to debugfs 5) misc cleanups and trivial changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | | net/mlx5e: Support enhanced CQE compressionOfer Levi2022-11-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CQE compression feature improves performance by reducing PCI bandwidth bottleneck on CQEs write. Enhanced CQE compression introduced in ConnectX-6 and it aims to reduce CPU utilization of SW side packets decompression by eliminating the need to rewrite ownership bit, which is likely to cost a cache-miss, is replaced by validity byte handled solely by HW. Another advantage of the enhanced feature is that session packets are available to SW as soon as a single CQE slot is filled, instead of waiting for session to close, this improves packet latency from NIC to host. Performance: Following are tested scenarios and reults comparing basic and enahnced CQE compression. setup: IXIA 100GbE connected directly to port 0 and port 1 of ConnectX-6 Dx 100GbE dual port. Case #1 RX only, single flow goes to single queue: IRQ rate reduced by ~ 30%, CPU utilization improved by 2%. Case #2 IP forwarding from port 1 to port 0 single flow goes to single queue: Avg latency improved from 60us to 21us, frame loss improved from 0.5% to 0.0%. Case #3 IP forwarding from port 1 to port 0 Max Throughput IXIA sends 100%, 8192 UDP flows, goes to 24 queues: Enhanced is equal or slightly better than basic. Testing the basic compression feature with this patch shows there is no perfrormance degradation of the basic compression feature. Signed-off-by: Ofer Levi <oferle@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>