summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' of ↵David S. Miller2018-12-185-5/+18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-12-18 1) Fix error return code in xfrm_output_one() when no dst_entry is attached to the skb. From Wei Yongjun. 2) The xfrm state hash bucket count reported to userspace is off by one. Fix from Benjamin Poirier. 3) Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry. 4) Fix freeing of xfrm states on acquire. We use a dedicated slab cache for the xfrm states now, so free it properly with kmem_cache_free. From Mathias Krause. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * xfrm_user: fix freeing of xfrm states on acquireMathias Krause2018-11-233-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 565f0fa902b6 ("xfrm: use a dedicated slab cache for struct xfrm_state") moved xfrm state objects to use their own slab cache. However, it missed to adapt xfrm_user to use this new cache when freeing xfrm states. Fix this by introducing and make use of a new helper for freeing xfrm_state objects. Fixes: 565f0fa902b6 ("xfrm: use a dedicated slab cache for struct xfrm_state") Reported-by: Pan Bian <bianpan2016@163.com> Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears ↵Steffen Klassert2018-11-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the dst_entry. Since commit 222d7dbd258d ("net: prevent dst uses after free") skb_dst_force() might clear the dst_entry attached to the skb. The xfrm code doesn't expect this to happen, so we crash with a NULL pointer dereference in this case. Fix it by checking skb_dst(skb) for NULL after skb_dst_force() and drop the packet in case the dst_entry was cleared. We also move the skb_dst_force() to a codepath that is not used when the transformation was offloaded, because in this case we don't have a dst_entry attached to the skb. The output and forwarding path was already fixed by commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") Fixes: 222d7dbd258d ("net: prevent dst uses after free") Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Fix bucket count reported to userspaceBenjamin Poirier2018-11-061-1/+1
| | | | | | | | | | | | | | | | | | sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the hash mask. Fixes: 28d8909bc790 ("[XFRM]: Export SAD info.") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * xfrm: Fix error return code in xfrm_output_one()Wei Yongjun2018-10-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | xfrm_output_one() does not return a error code when there is no dst_entry attached to the skb, it is still possible crash with a NULL pointer dereference in xfrm_output_resume(). Fix it by return error code -EHOSTUNREACH. Fixes: 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | Merge branch 'mlxsw-VXLAN-and-firmware-flashing-fixes'David S. Miller2018-12-185-3/+30
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ido Schimmel says: ==================== mlxsw: VXLAN and firmware flashing fixes Patch #1 fixes firmware flashing failures by increasing the time period after which the driver fails the transaction with the firmware. The problem is explained in detail in the commit message. Patch #2 adds a missing trap for decapsulated ARP packets. It is necessary for VXLAN routing to work. Patch #3 fixes a memory leak during driver reload caused by NULLing a pointer before kfree(). Please consider patch #1 for 4.19.y ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mlxsw: spectrum_nve: Fix memory leak upon driver reloadIdo Schimmel2018-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pointer was NULLed before freeing the memory, resulting in a memory leak. Trace from kmemleak: unreferenced object 0xffff88820ae36528 (size 512): comm "devlink", pid 5374, jiffies 4295354033 (age 10829.296s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a43f5195>] kmem_cache_alloc_trace+0x1be/0x330 [<00000000312f8140>] mlxsw_sp_nve_init+0xcb/0x1ae0 [<0000000009201d22>] mlxsw_sp_init+0x1382/0x2690 [<000000007227d877>] mlxsw_sp1_init+0x1b5/0x260 [<000000004a16feec>] __mlxsw_core_bus_device_register+0x776/0x1360 [<0000000070ab954c>] mlxsw_devlink_core_bus_device_reload+0x129/0x220 [<00000000432313d5>] devlink_nl_cmd_reload+0x119/0x1e0 [<000000003821a06b>] genl_family_rcv_msg+0x813/0x1150 [<00000000d54d04c0>] genl_rcv_msg+0xd1/0x180 [<0000000040543d12>] netlink_rcv_skb+0x152/0x3c0 [<00000000efc4eae8>] genl_rcv+0x2d/0x40 [<00000000ea645603>] netlink_unicast+0x52f/0x740 [<00000000641fca1a>] netlink_sendmsg+0x9c7/0xf50 [<00000000fed4a4b8>] sock_sendmsg+0xbe/0x120 [<00000000d85795a9>] __sys_sendto+0x397/0x620 [<00000000c5f84622>] __x64_sys_sendto+0xe6/0x1a0 Fixes: 6e6030bd5412 ("mlxsw: spectrum_nve: Implement common NVE core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mlxsw: spectrum: Add trap for decapsulated ARP packetsIdo Schimmel2018-12-182-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a packet was decapsulated it is classified to the relevant FID based on its VNI and undergoes L2 forwarding. Unlike regular (non-encapsulated) ARP packets, Spectrum does not trap decapsulated ARP packets during L2 forwarding and instead can only trap such packets in the underlay router during decapsulation. Add this missing packet trap, which is required for VXLAN routing when the MAC of the target host is not known. Fixes: b02597d513a9 ("mlxsw: spectrum: Add NVE packet traps") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mlxsw: core: Increase timeout during firmware flash processShalom Toledo2018-12-183-2/+27
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the firmware flash process, some of the EMADs get timed out, which causes the driver to send them again with a limit of 5 retries. There are some situations in which 5 retries is not enough and the EMAD access fails. If the failed EMAD was related to the flashing process, the driver fails the flashing. The reason for these timeouts during firmware flashing is cache misses in the CPU running the firmware. In case the CPU needs to fetch instructions from the flash when a firmware is flashed, it needs to wait for the flashing to complete. Since flashing takes time, it is possible for pending EMADs to timeout. Fix by increasing EMADs' timeout while flashing firmware. Fixes: ce6ef68f433f ("mlxsw: spectrum: Implement the ethtool flash_device callback") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dsa: mv88e6xxx: set ethtool regs versionVivien Didelot2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Currently the ethtool_regs version is set to 0 for all DSA drivers. Use this field to store the chip ID to simplify the pretty dump of any interfaces registered by the "dsa" driver. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'net-SO_TIMESTAMPING-fixes'David S. Miller2018-12-175-9/+28
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Willem de Bruijn says: ==================== net: SO_TIMESTAMPING fixes Fix two omissions: - tx timestamping is missing for AF_INET6/SOCK_RAW/IPPROTO_RAW - SOF_TIMESTAMPING_OPT_ID is missing for IPPROTO_RAW, PF_PACKET, CAN Discovered while expanding the selftest in tools/testing/selftests/networking/timestamping/txtimestamp.c Will send the test patchset to net-next once the fixes make it to that branch. For now, it is available at https://github.com/wdebruij/linux/commits/txtimestamp-test-1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: add missing SOF_TIMESTAMPING_OPT_ID supportWillem de Bruijn2018-12-175-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SOF_TIMESTAMPING_OPT_ID is supported on TCP, UDP and RAW sockets. But it was missing on RAW with IPPROTO_IP, PF_PACKET and CAN. Add skb_setup_tx_timestamp that configures both tx_flags and tskey for these paths that do not need corking or use bytestream keys. Fixes: 09c2d251b707 ("net-timestamp: add key to disambiguate concurrent datagrams") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: add missing tx timestamping on IPPROTO_RAWWillem de Bruijn2018-12-171-0/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | | | Raw sockets support tx timestamping, but one case is missing. IPPROTO_RAW takes a separate packet construction path. raw_send_hdrinc has an explicit call to sock_tx_timestamp, but rawv6_send_hdrinc does not. Add it. Fixes: 11878b40ed5c ("net-timestamp: SOCK_RAW and PING timestamping") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | MAINTAINERS: change my email addressVivien Didelot2018-12-171-2/+2
| | | | | | | | | | | | | | Make my Gmail address the primary one from now on. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mvneta: fix operation for 64K PAGE_SIZEMarcin Wojtas2018-12-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes in the mvneta driver reworked allocation and handling of the ingress buffers to use entire pages. Apart from that in SW BM scenario the HW must be informed via PRXDQS about the biggest possible incoming buffer that can be propagated by RX descriptors. The BufferSize field was filled according to the MTU-dependent pkt_size value. Later change to PAGE_SIZE broke RX operation when usin 64K pages, as the field is simply too small. This patch conditionally limits the value passed to the BufferSize of the PRXDQS register, depending on the PAGE_SIZE used. On the occasion remove now unused frag_size field of the mvneta_port structure. Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM") Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'hns-fixes'David S. Miller2018-12-166-178/+413
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peng Li says: ==================== net: hns: Code improvements & fixes for HNS driver This patchset introduces some code improvements and fixes for the identified problems in the HNS driver. Every patch is independent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Fix ping failed when use net bridge and send multicastYonglong Liu2018-12-161-41/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a net bridge, add eth and vnet to the bridge. The vnet is used by a virtual machine. When ping the virtual machine from the outside host and the virtual machine send multicast at the same time, the ping package will lost. The multicast package send to the eth, eth will send it to the bridge too, and the bridge learn the mac of eth. When outside host ping the virtual mechine, it will match the promisc entry of the eth which is not expected, and the bridge send it to eth not to vnet, cause ping lost. So this patch change promisc tcam entry position to the END of 512 tcam entries, which indicate lower priority. And separate one promisc entry to two: mc & uc, to avoid package match the wrong tcam entry. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Add mac pcs config when enable|disable macYonglong Liu2018-12-162-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some case, when mac enable|disable and adjust link, may cause hard to link(or abnormal) between mac and phy. This patch adds the code for rx PCS to avoid this bug. Disable the rx PCS when driver disable the gmac, and enable the rx PCS when driver enable the mac. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Fix ntuple-filters status error.Yonglong Liu2018-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ntuple-filters features is forced on by chip. But it shows "ntuple-filters: off [fixed]" when use ethtool. This patch make it correct with "ntuple-filters: on [fixed]". Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Avoid net reset caused by pause frames stormYonglong Liu2018-12-161-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There will be a large number of MAC pause frames on the net, which caused tx timeout of net device. And then the net device was reset to try to recover it. So that is not useful, and will cause some other problems. So need doubled ndev->watchdog_timeo if device watchdog occurred until watchdog_timeo up to 40s and then try resetting to recover it. When collecting dfx information such as hardware registers when tx timeout. Some registers for count were cleared when read. So need move this task before update net state which also read the count registers. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Free irq when exit from abnormal branchYonglong Liu2018-12-161-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1.In "hns_nic_init_irq", if request irq fail at index i, the function return directly without releasing irq resources that already requested. 2.In "hns_nic_net_up" after "hns_nic_init_irq", if exceptional branch occurs, irqs that already requested are not release. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Clean rx fbd when ae stopped.Yonglong Liu2018-12-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are packets in hardware when changing the speed or duplex, it may cause hardware hang up. This patch adds the code to wait rx fbd clean up when ae stopped. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Fixed bug that netdev was opened twiceYonglong Liu2018-12-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After resetting dsaf to try to repair chip error such as ecc error, the net device will be open if net interface is up. But at this time if there is the users set the net device up with the command ifconfig, the net device will be opened twice consecutively. Function napi_enable was called when open device. And Kernel panic will be occurred if it was called twice consecutively. Such as follow: static inline void napi_enable(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); smp_mb__before_clear_bit(); clear_bit(NAPI_STATE_SCHED, &n->state); } [37255.571996] Kernel panic - not syncing: BUG! [37255.595234] Call trace: [37255.597694] [<ffff80000008ab48>] dump_backtrace+0x0/0x1a0 [37255.603114] [<ffff80000008ad08>] show_stack+0x20/0x28 [37255.608187] [<ffff8000009c4944>] dump_stack+0x98/0xb8 [37255.613258] [<ffff8000009c149c>] panic+0x10c/0x26c [37255.618070] [<ffff80000070f134>] hns_nic_net_up+0x30c/0x4e0 [37255.623664] [<ffff80000070f39c>] hns_nic_net_open+0x94/0x12c [37255.629346] [<ffff80000084be78>] __dev_open+0xf4/0x168 [37255.634504] [<ffff80000084c1ac>] __dev_change_flags+0x98/0x15c [37255.640359] [<ffff80000084c29c>] dev_change_flags+0x2c/0x68 [37255.769580] [<ffff8000008dc400>] devinet_ioctl+0x650/0x704 [37255.775086] [<ffff8000008ddc38>] inet_ioctl+0x98/0xb4 [37255.780159] [<ffff800000827b7c>] sock_do_ioctl+0x44/0x84 [37255.785490] [<ffff800000828e04>] sock_ioctl+0x248/0x30c [37255.790737] [<ffff80000026dc6c>] do_vfs_ioctl+0x480/0x618 [37255.796156] [<ffff80000026de94>] SyS_ioctl+0x90/0xa4 [37255.801139] SMP: stopping secondary CPUs [37255.805079] kbox: catch panic event. [37255.809586] collected_len = 128928, LOG_BUF_LEN_LOCAL = 131072 [37255.816103] flush cache 0xffff80003f000000 size 0x800000 [37255.822192] flush cache 0xffff80003f000000 size 0x800000 [37255.828289] flush cache 0xffff80003f000000 size 0x800000 [37255.834378] kbox: no notify die func register. no need to notify [37255.840413] ---[ end Kernel panic - not syncing: BUG! This patchset fix this bug according to the flag NIC_STATE_DOWN. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Some registers use wrong address according to the datasheet.Yonglong Liu2018-12-162-127/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the hip06 datasheet: 1.Six registers use wrong address: RCB_COM_SF_CFG_INTMASK_RING RCB_COM_SF_CFG_RING_STS RCB_COM_SF_CFG_RING RCB_COM_SF_CFG_INTMASK_BD RCB_COM_SF_CFG_BD_RINT_STS DSAF_INODE_VC1_IN_PKT_NUM_0_REG 2.The offset of DSAF_INODE_VC1_IN_PKT_NUM_0_REG should be 0x103C + 0x80 * all_chn_num 3.The offset to show the value of DSAF_INODE_IN_DATA_STP_DISC_0_REG is wrong, so the value of DSAF_INODE_SW_VLAN_TAG_DISC_0_REG will be overwrite These registers are only used in "ethtool -d", so that did not cause ndev to misfunction. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: All ports can not work when insmod hns ko after rmmod.Yonglong Liu2018-12-162-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two test cases: 1. Remove the 4 modules:hns_enet_drv/hns_dsaf/hnae/hns_mdio, and install them again, must use "ifconfig down/ifconfig up" command pair to bring port to work. This patch calls phy_stop function when init phy to fix this bug. 2. Remove the 2 modules:hns_enet_drv/hns_dsaf, and install them again, all ports can not use anymore, because of the phy devices register failed(phy devices already exists). Phy devices are registered when hns_dsaf installed, this patch removes them when hns_dsaf removed. The two cases are sometimes related, fixing the second case also requires fixing the first case, so fix them together. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns: Incorrect offset address used for some registers.Yonglong Liu2018-12-161-2/+2
|/ / | | | | | | | | | | | | | | | | | | According to the hip06 Datasheet: 1. The offset of INGRESS_SW_VLAN_TAG_DISC should be 0x1A00+4*all_chn_num 2. The offset of INGRESS_IN_DATA_STP_DISC should be 0x1A50+4*all_chn_num Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: clear skb->tstamp in forwarding pathsEric Dumazet2018-12-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sergey reported that forwarding was no longer working if fq packet scheduler was used. This is caused by the recent switch to EDT model, since incoming packets might have been timestamped by __net_timestamp() __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base. The fix is to clear skb->tstamp in forwarding paths. Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sergey Matyukevich <geomatsi@gmail.com> Tested-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mod_devicetable.h: correct kerneldoc typo, "PHYSID2" -> "MII_PHYSID2"Robert P. J. Day2018-12-151-1/+1
| | | | | | | | | | | | Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ipv4: do not handle duplicate fragments as overlappingMichal Kubecek2018-12-151-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") IPv4 reassembly code drops the whole queue whenever an overlapping fragment is received. However, the test is written in a way which detects duplicate fragments as overlapping so that in environments with many duplicate packets, fragmented packets may be undeliverable. Add an extra test and for (potentially) duplicate fragment, only drop the new fragment rather than the whole queue. Only starting offset and length are checked, not the contents of the fragments as that would be too expensive. For similar reason, linear list ("run") of a rbtree node is not iterated, we only check if the new fragment is a subset of the interval covered by existing consecutive fragments. v2: instead of an exact check iterating through linear list of an rbtree node, only check if the new fragment is subset of the "run" (suggested by Eric Dumazet) Fixes: 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
* | qmi_wwan: Added support for Telit LN940 seriesJörgen Storvist2018-12-151-0/+1
| | | | | | | | | | | | | | | | | | Added support for the Telit LN940 series cellular modules QMI interface. QMI_QUIRK_SET_DTR quirk requied for Qualcomm MDM9x40 chipset. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* | qmi_wwan: Added support for Fibocom NL668 seriesJörgen Storvist2018-12-151-0/+1
| | | | | | | | | | | | | | | | Added support for Fibocom NL668 series QMI interface. Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x07 chipsets. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2018-12-1510-39/+109
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-15 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix liveness propagation of callee saved registers, from Jakub. 2) fix overflow in bpf_jit_limit knob, from Daniel. 3) bpf_flow_dissector api fix, from Stanislav. 4) bpf_perf_event api fix on powerpc, from Sandipan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bpf: verifier: make sure callees don't prune with caller differencesJakub Kicinski2018-12-132-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently for liveness and state pruning the register parentage chains don't include states of the callee. This makes some sense as the callee can't access those registers. However, this means that READs done after the callee returns will not propagate into the states of the callee. Callee will then perform pruning disregarding differences in caller state. Example: 0: (85) call bpf_user_rnd_u32 1: (b7) r8 = 0 2: (55) if r0 != 0x0 goto pc+1 3: (b7) r8 = 1 4: (bf) r1 = r8 5: (85) call pc+4 6: (15) if r8 == 0x1 goto pc+1 7: (05) *(u64 *)(r9 - 8) = r3 8: (b7) r0 = 0 9: (95) exit 10: (15) if r1 == 0x0 goto pc+0 11: (95) exit Here we acquire unknown state with call to get_random() [1]. Then we store this random state in r8 (either 0 or 1) [1 - 3], and make a call on line 5. Callee does nothing but a trivial conditional jump (to create a pruning point). Upon return caller checks the state of r8 and either performs an unsafe read or not. Verifier will first explore the path with r8 == 1, creating a pruning point at [11]. The parentage chain for r8 will include only callers states so once verifier reaches [6] it will mark liveness only on states in the caller, and not [11]. Now when verifier walks the paths with r8 == 0 it will reach [11] and since REG_LIVE_READ on r8 was not propagated there it will prune the walk entirely (stop walking the entire program, not just the callee). Since [6] was never walked with r8 == 0, [7] will be considered dead and replaced with "goto -1" causing hang at runtime. This patch weaves the callee's explored states onto the callers parentage chain. Rough parentage for r8 would have looked like this before: [0] [1] [2] [3] [4] [5] [10] [11] [6] [7] | | ,---|----. | | | sl0: sl0: / sl0: \ sl0: sl0: sl0: fr0: r8 <-- fr0: r8<+--fr0: r8 `fr0: r8 ,fr0: r8<-fr0: r8 \ fr1: r8 <- fr1: r8 / \__________________/ after: [0] [1] [2] [3] [4] [5] [10] [11] [6] [7] | | | | | | sl0: sl0: sl0: sl0: sl0: sl0: fr0: r8 <-- fr0: r8 <- fr0: r8 <- fr0: r8 <-fr0: r8<-fr0: r8 fr1: r8 <- fr1: r8 Now the mark from instruction 6 will travel through callees states. Note that we don't have to connect r0 because its overwritten by callees state on return and r1 - r5 because those are not alive any more once a call is made. v2: - don't connect the callees registers twice (Alexei: suggestion & code) - add more details to the comment (Ed & Alexei) v1: don't unnecessarily link caller saved regs (Jiong) Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64KDaniel Borkmann2018-12-113-10/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael and Sandipan report: Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF JIT allocations. At compile time it defaults to PAGE_SIZE * 40000, and is adjusted again at init time if MODULES_VADDR is defined. For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with the compile-time default at boot-time, which is 0x9c400000 when using 64K page size. This overflows the signed 32-bit bpf_jit_limit value: root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit -1673527296 and can cause various unexpected failures throughout the network stack. In one case `strace dhclient eth0` reported: setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) and similar failures can be seen with tools like tcpdump. This doesn't always reproduce however, and I'm not sure why. The more consistent failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host would time out on systemd/netplan configuring a virtio-net NIC with no noticeable errors in the logs. Given this and also given that in near future some architectures like arm64 will have a custom area for BPF JIT image allocations we should get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec() so therefore add another overridable bpf_jit_alloc_exec_limit() helper function which returns the possible size of the memory area for deriving the default heuristic in bpf_jit_charge_init(). Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default JIT memory provider, and therefore in case archs implement their custom module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}. Additionally, for archs supporting large page sizes, we should change the sysctl to be handled as long to not run into sysctl restrictions in future. Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations") Reported-by: Sandipan Das <sandipan@linux.ibm.com> Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | selftests/bpf: use proper type when passing prog_typeStanislav Fomichev2018-12-111-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use bpf_prog_type instead of bpf_map_type when passing prog_type. -Wenum-conversion might be unhappy about it: error: implicit conversion from enumeration type 'enum bpf_map_type' to different enumeration type 'enum bpf_prog_type' Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | selftests/bpf: add missing pointer dereference for map stacktrace fixupStanislav Fomichev2018-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | I get a segfault without it, other fixups always do dereference, and without dereference I don't understand how it can ever work. Fixes: 7c85c448e7d74 ("selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: powerpc: fix broken uapi for BPF_PROG_TYPE_PERF_EVENTSandipan Das2018-12-093-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that there are different variants of pt_regs for userspace and kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must be changed by exporting the user_pt_regs structure instead of the pt_regs structure that is in-kernel only. Fixes: 002af9391bfb ("powerpc: Split user/kernel definitions of struct pt_regs") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | net/flow_dissector: correctly cap nhoff and thoff in case of BPFStanislav Fomichev2018-12-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to make sure that the following condition holds: 0 <= nhoff <= thoff <= skb->len BPF program can set out-of-bounds nhoff and thoff, which is dangerous, see recent commit d0c081b49137 ("flow_dissector: properly cap thoff field")'. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | selftests/bpf: use thoff instead of nhoff in BPF flow dissectorStanislav Fomichev2018-12-072-19/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are returning thoff from the flow dissector, not the nhoff. Pass thoff along with nhoff to the bpf program (initially thoff == nhoff) and expect flow dissector amend/return thoff, not nhoff. This avoids confusion, when by the time bpf flow dissector exits, nhoff == thoff, which doesn't make much sense. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | | tipc: check tsk->group in tipc_wait_for_cond()Cong Wang2018-12-141-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tipc_wait_for_cond() drops socket lock before going to sleep, but tsk->group could be freed right after that release_sock(). So we have to re-check and reload tsk->group after it wakes up. After this patch, tipc_wait_for_cond() returns -ERESTARTSYS when tsk->group is NULL, instead of continuing with the assumption of a non-NULL tsk->group. (It looks like 'dsts' should be re-checked and reloaded too, but it is a different bug.) Similar for tipc_send_group_unicast() and tipc_send_group_anycast(). Reported-by: syzbot+10a9db47c3a0e13eb31c@syzkaller.appspotmail.com Fixes: b7d42635517f ("tipc: introduce flow control for group broadcast messages") Fixes: ee106d7f942d ("tipc: introduce group anycast messaging") Fixes: 27bd9ec027f3 ("tipc: introduce group unicast messaging") Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Allow class-e address assignment via ifconfig ioctlDave Taht2018-12-143-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While most distributions long ago switched to the iproute2 suite of utilities, which allow class-e (240.0.0.0/4) address assignment, distributions relying on busybox, toybox and other forms of ifconfig cannot assign class-e addresses without this kernel patch. While CIDR has been obsolete for 2 decades, and a survey of all the open source code in the world shows the IN_whatever macros are also obsolete... rather than obsolete CIDR from this ioctl entirely, this patch merely enables class-e assignment, sanely. Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ip6mr: Fix potential Spectre v1 vulnerabilityGustavo A. R. Silva2018-12-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vr.mifi is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) Fix this by sanitizing vr.mifi before using it to index mrt->vif_table' Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | w90p910_ether: remove incorrect __init annotationArnd Bergmann2018-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The get_mac_address() function is normally inline, but when it is not, we get a warning that this configuration is broken: WARNING: vmlinux.o(.text+0x4aff00): Section mismatch in reference from the function w90p910_ether_setup() to the function .init.text:get_mac_address() The function w90p910_ether_setup() references the function __init get_mac_address(). This is often because w90p910_ether_setup lacks a __init Remove the __init to make it always do the right thing. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | VSOCK: bind to random port for VMADDR_PORT_ANYLepton Wu2018-12-141-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code always starts from fixed port for VMADDR_PORT_ANY. Sometimes when VMM crashed, there is still orphaned vsock which is waiting for close timer, then it could cause connection time out for new started VM if they are trying to connect to same port with same guest cid since the new packets could hit that orphaned vsock. We could also fix this by doing more in vhost_vsock_reset_orphans, but any way, it should be better to start from a random local port instead of a fixed one. Signed-off-by: Lepton Wu <ytht.net@gmail.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | r8152: Add support for MAC address pass through on RTL8153-BNDMario Limonciello2018-12-141-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All previous docks and dongles that have supported this feature use the RTL8153-AD chip. RTL8153-BND is a new chip that will be used in upcoming Dell type-C docks. It should be added to the whitelist of devices to activate MAC address pass through. Per confirming with Realtek all devices containing RTL8153-BND should activate MAC pass through and there won't use pass through bit on efuse like in RTL8153-AD. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | crypto/chelsio/chtls: send/recv window updateAtul Gupta2018-12-142-26/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | recalculated send and receive window using linkspeed. Determine correct value of eck_ok from SYN received and option configured on local system. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | crypto/chelsio/chtls: macro correction in tx pathAtul Gupta2018-12-142-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | corrected macro used in tx path. removed redundant hdrlen and check for !page in chtls_sendmsg Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | crypto/chelsio/chtls: listen fails with multiadaptAtul Gupta2018-12-142-19/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | listen fails when more than one tls capable device is registered. tls_hw_hash is called for each dev which loops again for each cdev_list causing listen failure. Hence call chtls_listen_start/stop for specific device than loop over all devices. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/tls: sleeping function from invalid contextAtul Gupta2018-12-143-36/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HW unhash within mutex for registered tls devices cause sleep when called from tcp_set_state for TCP_CLOSE. Release lock and re-acquire after function call with ref count incr/dec. defined kref and fp release for tls_device to ensure device is not released outside lock. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:748 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7 INFO: lockdep is turned off. CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O Call Trace: <IRQ> dump_stack+0x5e/0x8b ___might_sleep+0x222/0x260 __mutex_lock+0x5c/0xa50 ? vprintk_emit+0x1f3/0x440 ? kmem_cache_free+0x22d/0x2a0 ? tls_hw_unhash+0x2f/0x80 ? printk+0x52/0x6e ? tls_hw_unhash+0x2f/0x80 tls_hw_unhash+0x2f/0x80 tcp_set_state+0x5f/0x180 tcp_done+0x2e/0xe0 tcp_rcv_state_process+0x92c/0xdd3 ? lock_acquire+0xf5/0x1f0 ? tcp_v4_rcv+0xa7c/0xbe0 ? tcp_v4_do_rcv+0x70/0x1e0 Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/tls: Init routines in create_ctxAtul Gupta2018-12-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | create_ctx is called from tls_init and tls_hw_prot hence initialize function pointers in common routine. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>