summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'net-5.10-rc6' of ↵Linus Torvalds2020-11-2766-507/+864
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc6, including fixes from the WiFi driver, and CAN subtrees. Current release - regressions: - gro_cells: reduce number of synchronize_net() calls - ch_ktls: release a lock before jumping to an error path Current release - always broken: - tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Previous release - regressions: - net/tls: fix missing received data after fast remote close - vsock/virtio: discard packets only when socket is really closed - sock: set sk_err to ee_errno on dequeue from errq - cxgb4: fix the panic caused by non smac rewrite Previous release - always broken: - tcp: fix corner cases around setting ECN with BPF selection of congestion control - tcp: fix race condition when creating child sockets from syncookies on loopback interface - usbnet: ipheth: fix connectivity with iOS 14 - tun: honor IOCB_NOWAIT flag - net/packet: fix packet receive on L3 devices without visible hard header - devlink: Make sure devlink instance and port are in same net namespace - net: openvswitch: fix TTL decrement action netlink message format - bonding: wait for sysfs kobject destruction before freeing struct slave - net: stmmac: fix upstream patch applied to the wrong context - bnxt_en: fix return value and unwind in probe error paths Misc: - devlink: add extra layer of categorization to the reload stats uAPI before it's released" * tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) sock: set sk_err to ee_errno on dequeue from errq mptcp: fix NULL ptr dereference on bad MPJ net: openvswitch: fix TTL decrement action netlink message format can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0 can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given can: gs_usb: fix endianess problem with candleLight firmware ch_ktls: lock is not freed net/tls: Protect from calling tls_dev_del for TLS RX twice devlink: Make sure devlink instance and port are in same net namespace devlink: Hold rtnl lock while reading netdev attributes ptp: clockmatrix: bug fix for idtcm_strverscmp enetc: Let the hardware auto-advance the taprio base-time of 0 gro_cells: reduce number of synchronize_net() calls net: stmmac: fix incorrect merge of patch upstream ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init Documentation: netdev-FAQ: suggest how to post co-dependent series ibmvnic: enhance resetting status check during module exit ...
| * Merge tag 'linux-can-fixes-for-5.10-20201127' of ↵Jakub Kicinski2020-11-274-65/+83
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-11-27 The first patch is by me and target the gs_usb driver and fixes the endianess problem with candleLight firmware. Another patch by me for the mcp251xfd driver add sanity checking to bail out if no IRQ is configured. The next three patches target the m_can driver. A patch by me removes the hardcoded IRQF_TRIGGER_FALLING from the request_threaded_irq() as this clashes with the trigger level specified in the DT. Further a patch by me fixes the nominal bitiming tseg2 min value for modern m_can cores. Pankaj Sharma's patch add support for cores version 3.3.x. The last patch by Oliver Hartkopp is for af_can and converts a WARN() into a pr_warn(), which is triggered by the syzkaller. It was able to create a situation where the closing of a socket runs simultaneously to the notifier call chain for removing the CAN network device in use. * tag 'linux-can-fixes-for-5.10-20201127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0 can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given can: gs_usb: fix endianess problem with candleLight firmware ==================== Link: https://lore.kernel.org/r/20201127100301.512603-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * can: af_can: can_rx_unregister(): remove WARN() statement from list ↵Oliver Hartkopp2020-11-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operation sanity check To detect potential bugs in CAN protocol implementations (double removal of receiver entries) a WARN() statement has been used if no matching list item was found for removal. The fault injection issued by syzkaller was able to create a situation where the closing of a socket runs simultaneously to the notifier call chain for removing the CAN network device in use. This case is very unlikely in real life but it doesn't break anything. Therefore we just replace the WARN() statement with pr_warn() to preserve the notification for the CAN protocol development. Reported-by: syzbot+381d06e0c8eaacb8706f@syzkaller.appspotmail.com Reported-by: syzbot+d0ddd88c9a7432f041e6@syzkaller.appspotmail.com Reported-by: syzbot+76d62d3b8162883c7d11@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20201126192140.14350-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | * can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0Pankaj Sharma2020-11-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for mcan bit timing and control mode according to bosch mcan IP version 3.3.0. The mcan version read from the Core Release field of CREL register would be 33. Accordingly the properties are to be set for mcan v3.3.0 Signed-off-by: Pankaj Sharma <pankj.sharma@samsung.com> Link: https://lore.kernel.org/r/1606366302-5520-1-git-send-email-pankj.sharma@samsung.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | * can: m_can: fix nominal bitiming tseg2 min for version >= 3.1Marc Kleine-Budde2020-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At lest the revision 3.3.0 of the bosch m_can IP core specifies that valid register values for "Nominal Time segment after sample point (NTSEG2)" are from 1 to 127. As the hardware uses a value of one more than the programmed value, mean tseg2_min is 2. This patch fixes the tseg2_min value accordingly. Cc: Dan Murphy <dmurphy@ti.com> Cc: Mario Huettel <mario.huettel@gmx.net> Acked-by: Sriram Dash <sriram.dash@samsung.com> Link: https://lore.kernel.org/r/20201124190751.3972238-1-mkl@pengutronix.de Fixes: b03cfc5bb0e1 ("can: m_can: Enable M_CAN version dependent initialization") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | * can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from ↵Marc Kleine-Budde2020-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | request_threaded_irq()'s flags The threaded IRQ handler is used for the tcan4x5x driver only. The IRQ pin of the tcan4x5x controller is active low, so better not use IRQF_TRIGGER_FALLING when requesting the IRQ. As this can result in missing interrupts. Further, if the device tree specified the interrupt as "IRQ_TYPE_LEVEL_LOW", unloading and reloading of the driver results in the following error during ifup: | irq: type mismatch, failed to map hwirq-31 for gpio@20a8000! | tcan4x5x spi1.1: m_can device registered (irq=0, version=32) | tcan4x5x spi1.1 can2: TCAN4X5X successfully initialized. | tcan4x5x spi1.1 can2: failed to request interrupt This patch fixes the problem by removing the IRQF_TRIGGER_FALLING from the request_threaded_irq(). Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework") Cc: Dan Murphy <dmurphy@ti.com> Cc: Sriram Dash <sriram.dash@samsung.com> Cc: Pankaj Sharma <pankj.sharma@samsung.com> Link: https://lore.kernel.org/r/20201127093548.509253-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | * can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was givenMarc Kleine-Budde2020-11-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add a check to the mcp251xfd_probe() function to bail out and give the user a proper error message if no IRQ is specified. Otherwise the driver will probe just fine but ifup will fail with a meaningless "RTNETLINK answers: Invalid argument" error message. Link: https://lore.kernel.org/r/20201123113522.3820052-1-mkl@pengutronix.de Reported-by: Niels Petter <petter@ka-long.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | * can: gs_usb: fix endianess problem with candleLight firmwareMarc Kleine-Budde2020-11-261-61/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The firmware on the original USB2CAN by Geschwister Schneider Technologie Entwicklungs- und Vertriebs UG exchanges all data between the host and the device in host byte order. This is done with the struct gs_host_config::byte_order member, which is sent first to indicate the desired byte order. The widely used open source firmware candleLight doesn't support this feature and exchanges the data in little endian byte order. This breaks if a device with candleLight firmware is used on big endianess systems. To fix this problem, all u32 (but not the struct gs_host_frame::echo_id, which is a transparent cookie) are converted to __le32. Cc: Maximilian Schneider <max@schneidersoft.net> Cc: Hubert Denkmair <hubert@denkmair.de> Reported-by: Michael Rausch <mr@netadair.de> Link: https://lore.kernel.org/r/b58aace7-61f3-6df7-c6df-69fee2c66906@netadair.de Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices") Link: https://lore.kernel.org/r/20201120103818.3386964-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| * | sock: set sk_err to ee_errno on dequeue from errqWillem de Bruijn2020-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When setting sk_err, set it to ee_errno, not ee_origin. Commit f5f99309fa74 ("sock: do not set sk_err in sock_dequeue_err_skb") disabled updating sk_err on errq dequeue, which is correct for most error types (origins): - sk->sk_err = err; Commit 38b257938ac6 ("sock: reset sk_err when the error queue is empty") reenabled the behavior for IMCP origins, which do require it: + if (icmp_next) + sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin; But read from ee_errno. Fixes: 38b257938ac6 ("sock: reset sk_err when the error queue is empty") Reported-by: Ayush Ranjan <ayushranjan@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | mptcp: fix NULL ptr dereference on bad MPJPaolo Abeni2020-11-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an msk listener receives an MPJ carrying an invalid token, it will zero the request socket msk entry. That should later cause fallback and subflow reset - as per RFC - at subflow_syn_recv_sock() time due to failing hmac validation. Since commit 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()"), we unconditionally dereference - in mptcp_can_accept_new_subflow - the subflow request msk before performing hmac validation. In the above scenario we hit a NULL ptr dereference. Address the issue doing the hmac validation earlier. Fixes: 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()") Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/03b2cfa3ac80d8fc18272edc6442a9ddf0b1e34e.1606400227.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net: openvswitch: fix TTL decrement action netlink message formatEelco Chaudron2020-11-273-23/+60
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the openvswitch module is not accepting the correctly formated netlink message for the TTL decrement action. For both setting and getting the dec_ttl action, the actions should be nested in the OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi. When the original patch was sent, it was tested with a private OVS userspace implementation. This implementation was unfortunately not upstreamed and reviewed, hence an erroneous version of this patch was sent out. Leaving the patch as-is would cause problems as the kernel module could interpret additional attributes as actions and vice-versa, due to the actions not being encapsulated/nested within the actual attribute, but being concatinated after it. Fixes: 744676e77720 ("openvswitch: add TTL decrement action") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ch_ktls: lock is not freedRohit Maheshwari2020-11-251-1/+3
| | | | | | | | | | | | | | | | | | | | Currently lock gets freed only if timeout expires, but missed a case when HW returns failure and goes for cleanup. Fixes: efca3878a5fb ("ch_ktls: Issue if connection offload fails") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Link: https://lore.kernel.org/r/20201125072626.10861-1-rohitm@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy2020-11-252-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge branch 'devlink-port-attribute-fixes'Jakub Kicinski2020-11-251-1/+6
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Parav Pandit says: ==================== devlink port attribute fixes This patchset contains 2 small fixes for devlink port attributes. Patch summary: Patch-1 synchronize the devlink port attribute reader with net namespace change operation Patch-2 Ensure to return devlink port's netdevice attributes when netdev and devlink instance belong to same net namespace ==================== Link: https://lore.kernel.org/r/20201125091620.6781-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * devlink: Make sure devlink instance and port are in same net namespaceParav Pandit2020-11-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When devlink reload operation is not used, netdev of an Ethernet port may be present in different net namespace than the net namespace of the devlink instance. Ensure that both the devlink instance and devlink port netdev are located in same net namespace. Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * devlink: Hold rtnl lock while reading netdev attributesParav Pandit2020-11-251-0/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A netdevice of a devlink port can be moved to different net namespace than its parent devlink instance. This scenario occurs when devlink reload is not used. When netdevice is undergoing migration to net namespace, its ifindex and name may change. In such use case, devlink port query may read stale netdev attributes. Fix it by reading them under rtnl lock. Fixes: bfcd3a466172 ("Introduce devlink infrastructure") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ptp: clockmatrix: bug fix for idtcm_strverscmpMin Li2020-11-251-33/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Feed kstrtou8 with NULL terminated string. Changes since v1: -Use sscanf to get rid of adhoc string parse. Changes since v2: -Check if sscanf returns 3. Fixes: 7ea5fda2b132 ("ptp: ptp_clockmatrix: update to support 4.8.7 firmware") Signed-off-by: Min Li <min.li.xe@renesas.com> Link: https://lore.kernel.org/r/1606273115-25792-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * enetc: Let the hardware auto-advance the taprio base-time of 0Vladimir Oltean2020-11-251-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tc-taprio base time indicates the beginning of the tc-taprio schedule, which is cyclic by definition (where the length of the cycle in nanoseconds is called the cycle time). The base time is a 64-bit PTP time in the TAI domain. Logically, the base-time should be a future time. But that imposes some restrictions to user space, which has to retrieve the current PTP time from the NIC first, then calculate a base time that will still be larger than the base time by the time the kernel driver programs this value into the hardware. Actually ensuring that the programmed base time is in the future is still a problem even if the kernel alone deals with this. Luckily, the enetc hardware already advances a base-time that is in the past into a congruent time in the immediate future, according to the same formula that can be found in the software implementation of taprio (in taprio_get_start_time): /* Schedule the start time for the beginning of the next * cycle. */ n = div64_s64(ktime_sub_ns(now, base), cycle); *start = ktime_add_ns(base, (n + 1) * cycle); There's only one problem: the driver doesn't let the hardware do that. It interferes with the base-time passed from user space, by special-casing the situation when the base-time is zero, and replaces that with the current PTP time. This changes the intended effective base-time of the schedule, which will in the end have a different phase offset than if the base-time of 0.000000000 was to be advanced by an integer multiple of the cycle-time. Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201124220259.3027991-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * gro_cells: reduce number of synchronize_net() callsEric Dumazet2020-11-251-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After cited commit, gro_cells_destroy() became damn slow on hosts with a lot of cores. This is because we have one additional synchronize_net() per cpu as stated in the changelog. gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough to not have one synchronize_net() call per netif_napi_del() We can factorize all the synchronize_net() to a single one, right before freeing per-cpu memory. Fixes: 5198d545dba8 ("net: remove napi_hash_del() from driver-facing API") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201124203822.1360107-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: stmmac: fix incorrect merge of patch upstreamAntonio Borneo2020-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 757926247836 ("net: stmmac: add flexible PPS to dwmac 4.10a") was intended to modify the struct dwmac410_ops, but it got somehow badly merged and modified the struct dwmac4_ops. Revert the modification in struct dwmac4_ops and re-apply it properly in struct dwmac410_ops. Fixes: 757926247836 ("net: stmmac: add flexible PPS to dwmac 4.10a") Signed-off-by: Antonio Borneo <antonio.borneo@st.com> Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Link: https://lore.kernel.org/r/20201124223729.886992-1-antonio.borneo@st.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_initWang Hai2020-11-251-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmemleak report a memory leak as follows: unreferenced object 0xffff8880059c6a00 (size 64): comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s) hex dump (first 32 bytes): 20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00 ............... 1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 ................ backtrace: [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0 [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170 [<000000006a9ca9d4>] ops_init+0xa8/0x3c0 [<000000002da57bf2>] setup_net+0x2de/0x7e0 [<000000004e52d573>] copy_net_ns+0x27d/0x530 [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30 [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0 [<0000000030653721>] ksys_unshare+0x3a4/0x780 [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40 [<0000000031a10c08>] do_syscall_64+0x33/0x40 [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 We should free all rules when we catch an error in ip6addrlbl_net_init(). otherwise a memory leak will occur. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201124071728.8385-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Documentation: netdev-FAQ: suggest how to post co-dependent seriesJakub Kicinski2020-11-241-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | Make an explicit suggestion how to post user space side of kernel patches to avoid reposts when patchwork groups the wrong patches. v2: mention the cases unlike iproute2 explicitly Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge tag 'batadv-net-pullrequest-20201124' of ↵Jakub Kicinski2020-11-241-0/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - set module owner to THIS_MODULE, by Taehee Yoo * tag 'batadv-net-pullrequest-20201124' of git://git.open-mesh.org/linux-merge: batman-adv: set .owner to THIS_MODULE ==================== Link: https://lore.kernel.org/r/20201124134417.17269-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * batman-adv: set .owner to THIS_MODULETaehee Yoo2020-11-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If THIS_MODULE is not set, the module would be removed while debugfs is being used. It eventually makes kernel panic. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * | Merge branch 'ibmvnic-null-pointer-dereference'Jakub Kicinski2020-11-242-4/+8
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lijun Pan says: ==================== ibmvnic: null pointer dereference Fix two NULL pointer dereference crash issues. Improve module removal procedure. ==================== Link: https://lore.kernel.org/r/20201123193547.57225-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | ibmvnic: enhance resetting status check during module exitLijun Pan2020-11-242-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on the discussion with Sukadev Bhattiprolu and Dany Madden, we believe that checking adapter->resetting bit is preferred since RESETTING state flag is not as strict as resetting bit. RESETTING state flag is removed since it is verbose now. Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | ibmvnic: fix NULL pointer dereference in ibmvic_reset_crqLijun Pan2020-11-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | crq->msgs could be NULL if the previous reset did not complete after freeing crq->msgs. Check for NULL before dereferencing them. Snippet of call trace: ... ibmvnic 30000003 env3 (unregistering): Releasing sub-CRQ ibmvnic 30000003 env3 (unregistering): Releasing CRQ BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0000000000c1a30 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(E-) rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag tun raw_diag inet_diag unix_diag bridge af_packet_diag netlink_diag stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibmvnic] CPU: 20 PID: 8426 Comm: kworker/20:0 Tainted: G E 5.10.0-rc1+ #12 Workqueue: events __ibmvnic_reset [ibmvnic] NIP: c0000000000c1a30 LR: c008000001b00c18 CTR: 0000000000000400 REGS: c00000000d05b7a0 TRAP: 0380 Tainted: G E (5.10.0-rc1+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002480 XER: 20040000 CFAR: c0000000000c19ec IRQMASK: 0 GPR00: 0000000000000400 c00000000d05ba30 c008000001b17c00 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000001e2 GPR08: 000000000001f400 ffffffffffffd950 0000000000000000 c008000001b0b280 GPR12: c0000000000c19c8 c00000001ec72e00 c00000000019a778 c00000002647b440 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000006 0000000000000001 0000000000000003 0000000000000002 GPR24: 0000000000001000 c008000001b0d570 0000000000000005 c00000007ab5d550 GPR28: c00000007ab5c000 c000000032fcf848 c00000007ab5cc00 c000000032fcf800 NIP [c0000000000c1a30] memset+0x68/0x104 LR [c008000001b00c18] ibmvnic_reset_crq+0x70/0x110 [ibmvnic] Call Trace: [c00000000d05ba30] [0000000000000800] 0x800 (unreliable) [c00000000d05bab0] [c008000001b0a930] do_reset.isra.40+0x224/0x634 [ibmvnic] [c00000000d05bb80] [c008000001b08574] __ibmvnic_reset+0x17c/0x3c0 [ibmvnic] [c00000000d05bc50] [c00000000018d9ac] process_one_work+0x2cc/0x800 [c00000000d05bd20] [c00000000018df58] worker_thread+0x78/0x520 [c00000000d05bdb0] [c00000000019a934] kthread+0x1c4/0x1d0 [c00000000d05be20] [c00000000000d5d0] ret_from_kernel_thread+0x5c/0x6c Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | ibmvnic: fix NULL pointer dereference in reset_sub_crq_queuesLijun Pan2020-11-241-0/+3
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset did not complete after freeing sub crqs. Check for NULL before dereferencing them. Snippet of call trace: ibmvnic 30000006 env6: Releasing sub-CRQ ibmvnic 30000006 env6: Releasing CRQ ... ibmvnic 30000006 env6: Got Control IP offload Response ibmvnic 30000006 env6: Re-setting tx_scrq[0] BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc008000003dea7cc Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tun bridge stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmvnic ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod CPU: 80 PID: 1856 Comm: kworker/80:2 Tainted: G W 5.8.0+ #4 Workqueue: events __ibmvnic_reset [ibmvnic] NIP: c008000003dea7cc LR: c008000003dea7bc CTR: 0000000000000000 REGS: c0000007ef7db860 TRAP: 0380 Tainted: G W (5.8.0+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28002422 XER: 0000000d CFAR: c000000000bd9520 IRQMASK: 0 GPR00: c008000003dea7bc c0000007ef7dbaf0 c008000003df7400 c0000007fa26ec00 GPR04: c0000007fcd0d008 c0000007fcd96350 0000000000000027 c0000007fcd0d010 GPR08: 0000000000000023 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000002000 c00000001ec18e00 c0000000001982f8 c0000007bad6e840 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 fffffffffffffef7 GPR24: 0000000000000402 c0000007fa26f3a8 0000000000000003 c00000016f8ec048 GPR28: 0000000000000000 0000000000000000 0000000000000000 c0000007fa26ec00 NIP [c008000003dea7cc] ibmvnic_reset_init+0x15c/0x258 [ibmvnic] LR [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] Call Trace: [c0000007ef7dbaf0] [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] (unreliable) [c0000007ef7dbb80] [c008000003de8860] __ibmvnic_reset+0x408/0x970 [ibmvnic] [c0000007ef7dbc50] [c00000000018b7cc] process_one_work+0x2cc/0x800 [c0000007ef7dbd20] [c00000000018bd78] worker_thread+0x78/0x520 [c0000007ef7dbdb0] [c0000000001984c4] kthread+0x1d4/0x1e0 [c0000007ef7dbe20] [c00000000000cea8] ret_from_kernel_thread+0x5c/0x74 Fixes: 57a49436f4e8 ("ibmvnic: Reset sub-crqs during driver reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Merge branch 'fixes-for-ena-driver'Jakub Kicinski2020-11-242-50/+33
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shay Agroskin says: ==================== Fixes for ENA driver - fix wrong data offset on machines that support rx offset - work-around Intel iommu issue - fix out of bound access when request id is wrong ==================== Link: https://lore.kernel.org/r/20201123190859.21298-1-shayagr@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: ena: fix packet's addresses for rx_offset featureShay Agroskin2020-11-241-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes two lines in which the rx_offset received by the device wasn't taken into account: - prefetch function: In our driver the copied data would reside in rx_info->page + rx_headroom + rx_offset so the prefetch function is changed accordingly. - setting page_offset to zero for descriptors > 1: for every descriptor but the first, the rx_offset is zero. Hence the page_offset value should be set to rx_headroom. The previous implementation changed the value of rx_info after the descriptor was added to the SKB (essentially providing wrong page offset). Fixes: 68f236df93a9 ("net: ena: add support for the rx offset feature") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: ena: set initial DMA width to avoid intel iommu issueShay Agroskin2020-11-241-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ENA driver uses the readless mechanism, which uses DMA, to find out what the DMA mask is supposed to be. If DMA is used without setting the dma_mask first, it causes the Intel IOMMU driver to think that ENA is a 32-bit device and therefore disables IOMMU passthrough permanently. This patch sets the dma_mask to be ENA_MAX_PHYS_ADDR_SIZE_BITS=48 before readless initialization in ena_device_init()->ena_com_mmio_reg_read_request_init(), which is large enough to workaround the intel_iommu issue. DMA mask is set again to the correct value after it's received from the device after readless is initialized. The patch also changes the driver to use dma_set_mask_and_coherent() function instead of the two pci_set_dma_mask() and pci_set_consistent_dma_mask() ones. Both methods achieve the same effect. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Mike Cui <mikecui@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | net: ena: handle bad request id in ena_netdevShay Agroskin2020-11-242-32/+14
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After request id is checked in validate_rx_req_id() its value is still used in the line rx_ring->free_ids[next_to_clean] = rx_ring->ena_bufs[i].req_id; even if it was found to be out-of-bound for the array free_ids. The patch moves the request id to an earlier stage in the napi routine and makes sure its value isn't used if it's found out-of-bounds. Fixes: 30623e1ed116 ("net: ena: avoid memory access violation by validating req_id properly") Signed-off-by: Ido Segev <idose@amazon.com> Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | nfc: s3fwrn5: use signed integer for parsing GPIO numbersKrzysztof Kozlowski2020-11-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GPIOs - as returned by of_get_named_gpio() and used by the gpiolib - are signed integers, where negative number indicates error. The return value of of_get_named_gpio() should not be assigned to an unsigned int because in case of !CONFIG_GPIOLIB such number would be a valid GPIO. Fixes: c04c674fadeb ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20201123162351.209100-1-krzk@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | dpaa2-eth: Fix compile error due to missing devlink supportEzequiel Garcia2020-11-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dpaa2 driver depends on devlink, so it should select NET_DEVLINK in order to fix compile errors, such as: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err': dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report' drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get': dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put' Fixes: ceeb03ad8e22 ("dpaa2-eth: add basic devlink support") Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20201123163553.1666476-1-ciorneiioana@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | MAINTAINERS: Update page pool entryJesper Dangaard Brouer2020-11-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add some file F: matches that is related to page_pool. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/160613894639.2826716.14635284017814375894.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECNAlexander Duyck2020-11-242-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a BPF program is used to select between a type of TCP congestion control algorithm that uses either ECN or not there is a case where the synack for the frame was coming up without the ECT0 bit set. A bit of research found that this was due to the final socket being configured to dctcp while the listener socket was staying in cubic. To reproduce it all that is needed is to monitor TCP traffic while running the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed, assuming tcp_dctcp module is loaded or compiled in and the traffic matches the rules in the sample file, is that for all frames with the exception of the synack the ECT0 bit is set. To address that it is necessary to make one additional call to tcp_bpf_ca_needs_ecn using the request socket and then use the output of that to set the ECT0 bit for the tos/tclass of the packet. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | devlink: Fix reload stats structureMoshe Shemesh2020-11-242-16/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix reload stats structure exposed to the user. Change stats structure hierarchy to have the reload action as a parent of the stat entry and then stat entry includes value per limit. This will also help to avoid string concatenation on iproute2 output. Reload stats structure before this fix: "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 } } After this fix: "stats": { "reload": { "driver_reinit": { "unspecified": 2 }, "fw_activate": { "unspecified": 1, "no_reset": 0 } } Fixes: a254c264267e ("devlink: Add reload stats") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | aquantia: Remove the build_skb pathLincoln Ramsay2020-11-241-74/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When performing IPv6 forwarding, there is an expectation that SKBs will have some headroom. When forwarding a packet from the aquantia driver, this does not always happen, triggering a kernel warning. aq_ring.c has this code (edited slightly for brevity): if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) { skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX); } else { skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE); There is a significant difference between the SKB produced by these 2 code paths. When napi_alloc_skb creates an SKB, there is a certain amount of headroom reserved. However, this is not done in the build_skb codepath. As the hardware buffer that build_skb is built around does not handle the presence of the SKB header, this code path is being removed and the napi_alloc_skb path will always be used. This code path does have to copy the packet header into the SKB, but it adds the packet data as a frag. Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com> Link: https://lore.kernel.org/r/MWHPR1001MB23184F3EAFA413E0D1910EC9E8FC0@MWHPR1001MB2318.namprd10.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/packet: fix packet receive on L3 devices without visible hard headerEyal Birger2020-11-232-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the patchset merged by commit b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which did not have header_ops were given one for the purpose of protocol parsing on af_packet transmit path. That change made af_packet receive path regard these devices as having a visible L3 header and therefore aligned incoming skb->data to point to the skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not reset their mac_header prior to ingress and therefore their incoming packets became malformed. Ideally these devices would reset their mac headers, or af_packet would be able to rely on dev->hard_header_len being 0 for such cases, but it seems this is not the case. Fix by changing af_packet RX ll visibility criteria to include the existence of a '.create()' header operation, which is used when creating a device hard header - via dev_hard_header() - by upper layers, and does not exist in these L3 devices. As this predicate may be useful in other situations, add it as a common dev_has_header() helper in netdevice.h. Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | i40e: Fix removing driver while bare-metal VFs pass trafficSylwester Dziedziuch2020-11-233-18/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent VFs from resetting when PF driver is being unloaded: - introduce new pf state: __I40E_VF_RESETS_DISABLED; - check if pf state has __I40E_VF_RESETS_DISABLED state set, if so, disable any further VFLR event notifications; - when i40e_remove (rmmod i40e) is called, disable any resets on the VFs; Previously if there were bare-metal VFs passing traffic and PF driver was removed, there was a possibility of VFs triggering a Tx timeout right before iavf_remove. This was causing iavf_close to not be called because there is a check in the beginning of iavf_remove that bails out early if adapter->state < IAVF_DOWN_PENDING. This makes it so some resources do not get cleaned up. Fixes: 6a9ddb36eeb8 ("i40e: disable IOV before freeing resources") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20201120180640.3654474-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | vsock/virtio: discard packets only when socket is really closedStefano Garzarella2020-11-231-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting from commit 8692cefc433f ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt"), we discard packets in virtio_transport_recv_pkt() if the socket has been released. When the socket is connected, we schedule a delayed work to wait the RST packet from the other peer, also if SHUTDOWN_MASK is set in sk->sk_shutdown. This is done to complete the virtio-vsock shutdown algorithm, releasing the port assigned to the socket definitively only when the other peer has consumed all the packets. If we discard the RST packet received, the socket will be closed only when the VSOCK_CLOSE_TIMEOUT is reached. Sergio discovered the issue while running ab(1) HTTP benchmark using libkrun [1] and observing a latency increase with that commit. To avoid this issue, we discard packet only if the socket is really closed (SOCK_DONE flag is set). We also set SOCK_DONE in virtio_transport_release() when we don't need to wait any packets from the other peer (we didn't schedule the delayed work). In this case we remove the socket from the vsock lists, releasing the port assigned. [1] https://github.com/containers/libkrun Fixes: 8692cefc433f ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt") Cc: justin.he@arm.com Reported-by: Sergio Lopez <slp@redhat.com> Tested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20201120104736.73749-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | tcp: fix race condition when creating child sockets from syncookiesRicardo Dias2020-11-237-16/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Merge tag 'wireless-drivers-2020-11-23' of ↵Jakub Kicinski2020-11-2310-57/+164
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.10 First set of fixes for v5.10. One fix for iwlwifi kernel panic, others less notable. rtw88 * fix a bogus test found by clang iwlwifi * fix long memory reads causing soft lockup warnings * fix kernel panic during Channel Switch Announcement (CSA) * other smaller fixes MAINTAINERS * email address updates * tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: iwlwifi: mvm: fix kernel panic in case of assert during CSA iwlwifi: pcie: set LTR to avoid completion timeout iwlwifi: mvm: write queue_sync_state only for sync iwlwifi: mvm: properly cancel a session protection for P2P iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROC iwlwifi: sta: set max HE max A-MPDU according to HE capa MAINTAINERS: update maintainers list for Cypress MAINTAINERS: update Yan-Hsuan's email address iwlwifi: pcie: limit memory read spin time rtw88: fix fw_fifo_addr check ==================== Link: https://lore.kernel.org/r/20201123161037.C11D1C43460@smtp.codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | iwlwifi: mvm: fix kernel panic in case of assert during CSASara Sharon2020-11-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During CSA, we briefly nullify the phy context, in __iwl_mvm_unassign_vif_chanctx. In case we have a FW assert right after it, it remains NULL though. We end up running into endless loop due to mac80211 trying repeatedly to move us to ASSOC state, and we keep returning -EINVAL. Later down the road we hit a kernel panic. Detect and avoid this endless loop. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201107104557.d64de2c17bff.Iedd0d2afa20a2aacba5259a5cae31cb3a119a4eb@changeid
| | * | iwlwifi: pcie: set LTR to avoid completion timeoutJohannes Berg2020-11-102-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some platforms, the preset values aren't correct and then we may get a completion timeout in the firmware. Change the LTR configuration to avoid that. The firmware will do some more complex reinit of this later, but for the boot process we use ~250usec. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201107104557.d83d591c05ba.I42885c9fb500bc08b9a4c07c4ff3d436cc7a3c84@changeid
| | * | iwlwifi: mvm: write queue_sync_state only for syncAvraham Stern2020-11-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use mvm->queue_sync_state to wait for synchronous queue sync messages, but if an async one happens inbetween we shouldn't clear mvm->queue_sync_state after sending the async one, that can run concurrently (at least from the CPU POV) with another synchronous queue sync. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 3c514bf831ac ("iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queues") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201107104557.51a3148f2c14.I0772171dbaec87433a11513e9586d98b5d920b5f@changeid
| | * | iwlwifi: mvm: properly cancel a session protection for P2PEmmanuel Grumbach2020-11-102-34/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to feed the configuration id to remove session protection properly. Remember the conf_id when we add the session protection so that we can give it back when we want to remove the session protection. While at it, slightly improve the kernel doc for the conf_id of the notification. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Fixes: fe959c7b2049 ("iwlwifi: mvm: use the new session protection command") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201107104557.3642f730333d.I01a98ecde62096d00d171cf34ad775bf80cb0277@changeid
| | * | iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROCEmmanuel Grumbach2020-11-101-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ROC that runs on the AUX ROC (meaning an ROC on the STA vif), was added with the HOT_SPOT_CMD firmware command and must be cancelled with that same command. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Fixes: fe959c7b2049 ("iwlwifi: mvm: use the new session protection command") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201107104557.a317376154da.I44fa3637373ba4bd421cdff2cabc761bffc0735f@changeid
| | * | iwlwifi: sta: set max HE max A-MPDU according to HE capaMordechay Goodstein2020-11-102-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, our max tpt is limited to max HT A-MPDU for LB, and max VHT A-MPDU for HB. Configure HE exponent value correctly to achieve HE max A-MPDU, both on LB and HB. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201107104557.4486852ebb56.I9eb0d028e31f183597fb90120e7d4ca87e0dd6cb@changeid
| | * | MAINTAINERS: update maintainers list for CypressChi-Hsien Lin2020-11-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Update maintainers' email with Infineon hosted email - Add Chung-hsien Hsu as a maintainer Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com> Signed-off-by: Wright Feng <wright.feng@cypress.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201030085610.145679-1-chi-hsien.lin@cypress.com