summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-05-0424-101/+228
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Doug Ledford: "This is our first pull request of the rc cycle. It's not that it's been overly quiet, we were just waiting on a few things before sending this off. For instance, the 6 patch series from Intel for the hfi1 driver had actually been pulled in on Tuesday for a Wednesday pull request, only to have Jason notice something I missed, so we held off for some testing, and then on Thursday had to respin the series because the very first patch needed a minor fix (unnecessary cast is all). There is a sizable hns patch series in here, as well as a reasonably largish hfi1 patch series, then all of the lines of uapi updates are just the change to the new official Linux-OpenIB SPDX tag (a bunch of our files had what amounts to a BSD-2-Clause + MIT Warranty statement as their license as a result of the initial code submission years ago, and the SPDX folks decided it was unique enough to warrant a unique tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4 and core/cache/cma updates to round out the bunch. None of it was overly large by itself, but in the 2 1/2 weeks we've been collecting patches, it has added up :-/. As best I can tell, it's been through 0day (I got a notice about my last for-next push, but not for my for-rc push, but Jason seems to think that failure messages are prioritized and success messages not so much). It's also been through linux-next. And yes, we did notice in the context portion of the CMA query gid fix patch that there is a dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage and remove it anywhere we can. Summary: - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off) - SPDX license tag cleanups (new tag Linux-OpenIB) - RoCE GID fixes related to default GIDs - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch), mlx4, mlx5, and hfi1 (medium batch)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits) RDMA/cma: Do not query GID during QP state transition to RTR IB/mlx4: Fix integer overflow when calculating optimal MTT size IB/hfi1: Fix memory leak in exception path in get_irq_affinity() IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used IB/hfi1: Fix loss of BECN with AHG IB/hfi1 Use correct type for num_user_context IB/hfi1: Fix handling of FECN marked multicast packet IB/core: Make ib_mad_client_id atomic iw_cxgb4: Atomically flush per QP HW CQEs IB/uverbs: Fix kernel crash during MR deregistration flow IB/uverbs: Prevent reregistration of DM_MR to regular MR RDMA/mlx4: Add missed RSS hash inner header flag RDMA/hns: Fix a couple misspellings RDMA/hns: Submit bad wr RDMA/hns: Update assignment method for owner field of send wqe RDMA/hns: Adjust the order of cleanup hem table RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set RDMA/hns: Remove some unnecessary attr_mask judgement RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set ...
| * IB/mlx4: Fix integer overflow when calculating optimal MTT sizeJack Morgenstein2018-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the kernel was compiled using the UBSAN option, we saw the following stack trace: [ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27 [ 1184.828114] signed integer overflow: [ 1184.828247] -2147483648 - 1 cannot be represented in type 'int' The problem was caused by calling round_up in procedure mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack trace) with the second parameter (1 << block_shift) (which is an int). The second parameter should have been (1ULL << block_shift) (which is an unsigned long long). (1 << block_shift) is treated by the compiler as an int (because 1 is an integer). Now, local variable block_shift is initialized to 31. If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-214748368. This is the most negative int value. Inside the round_up macro, there is a cast applied to ((1 << 31) - 1). However, this cast is applied AFTER ((1 << 31) - 1) is calculated. Since (1 << 31) is treated as an int, we get the negative overflow identified by UBSAN in the process of calculating ((1 << 31) - 1). The fix is to change (1 << block_shift) to (1ULL << block_shift) on line 349. Fixes: 9901abf58368 ("IB/mlx4: Use optimal numbers of MTT entries") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Fix memory leak in exception path in get_irq_affinity()Sebastian Sanchez2018-05-031-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | When IRQ affinity is set and the interrupt type is unknown, a cpu mask allocated within the function is never freed. Fix this memory leak by allocating memory within the scope where it is used. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failureSebastian Sanchez2018-05-033-10/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When allocating device data, if there's an allocation failure, the already allocated memory won't be freed such as per-cpu counters. Fix memory leaks in exception path by creating a common reentrant clean up function hfi1_clean_devdata() to be used at driver unload time and device data allocation failure. To accomplish this, free_platform_config() and clean_up_i2c() are changed to be reentrant to remove dependencies when they are called in different order. This helps avoid NULL pointer dereferences introduced by this patch if those two functions weren't reentrant. In addition, set dd->int_counter, dd->rcv_limit, dd->send_schedule and dd->tx_opstats to NULL after they're freed in hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Fix NULL pointer dereference when invalid num_vls is usedSebastian Sanchez2018-05-032-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an invalid num_vls is used as a module parameter, the code execution follows an exception path where the macro dd_dev_err() expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This causes a NULL pointer dereference. Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev earlier in the code. If a dd exists, then dd->pcidev and dd->pcidev->dev always exists. BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 IP: __dev_printk+0x15/0x90 Workqueue: events work_for_cpu_fn RIP: 0010:__dev_printk+0x15/0x90 Call Trace: dev_err+0x6c/0x90 ? hfi1_init_pportdata+0x38d/0x3f0 [hfi1] hfi1_init_dd+0xdd/0x2530 [hfi1] ? pci_conf1_read+0xb2/0xf0 ? pci_read_config_word.part.9+0x64/0x80 ? pci_conf1_write+0xb0/0xf0 ? pcie_capability_clear_and_set_word+0x57/0x80 init_one+0x141/0x490 [hfi1] local_pci_probe+0x3f/0xa0 work_for_cpu_fn+0x10/0x20 process_one_work+0x152/0x350 worker_thread+0x1cf/0x3e0 kthread+0xf5/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ? do_syscall_64+0x6e/0x1a0 ? SyS_exit_group+0x10/0x10 ret_from_fork+0x35/0x40 Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Fix loss of BECN with AHGMike Marciniszyn2018-05-031-10/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AHG may be armed to use the stored header, which by design is limited to edits in the PSN/A 32 bit word (bth2). When the code is trying to send a BECN, the use of the stored header will lose the BECN bit. Fix by avoiding AHG when getting ready to send a BECN. This is accomplished by always claiming the packet is not a middle packet which is an AHG precursor. BECNs are not a normal case and this should not hurt AHG optimizations. Cc: <stable@vger.kernel.org> # 4.14.x Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1 Use correct type for num_user_contextMichael J. Ruhl2018-05-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The module parameter num_user_context is defined as 'int' and defaults to -1. The module_param_named() says that it is uint. Correct module_param_named() type information and update the modinfo text to reflect the default value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Fix handling of FECN marked multicast packetMike Marciniszyn2018-05-033-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code for handling a marked UD packet unconditionally returns the dlid in the header of the FECN marked packet. This is not correct for multicast packets where the DLID is in the multicast range. The subsequent attempt to send the CNP with the multicast lid will cause the chip to halt the ack send context because the source lid doesn't match the chip programming. The send context will be halted and flush any other pending packets in the pio ring causing the CNP to not be sent. A part of investigating the fix, it was determined that the 16B work broke the FECN routine badly with inconsistent use of 16 bit and 32 bits types for lids and pkeys. Since the port's source lid was correctly 32 bits the type mixmatches need to be dealt with at the same time as fixing the CNP header issue. Fix these issues by: - Using the ports lid for as the SLID for responding to FECN marked UD packets - Insure pkey is always 16 bit in this and subordinate routines - Insure lids are 32 bits in this and subordinate routines Cc: <stable@vger.kernel.org> # 4.14.x Fixes: 88733e3b8450 ("IB/hfi1: Add 16B UD support") Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * iw_cxgb4: Atomically flush per QP HW CQEsBharat Potnuri2018-04-273-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire corresponding QP lock before moving the CQEs into its corresponding SW queue and accessing the SQ contents for completing a WR. Ignore CQEs if corresponding QP is already flushed. Cc: stable@vger.kernel.org Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/mlx4: Add missed RSS hash inner header flagLeon Romanovsky2018-04-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Despite being advertised to user space application, the RSS inner header flag was filtered by checks at the beginning of QP creation routine. Cc: <stable@vger.kernel.org> # 4.15 Fixes: 4d02ebd9bbbd ("IB/mlx4: Fix RSS hash fields restrictions") Fixes: 07d84f7b6adf ("IB/mlx4: Add support to RSS hash for inner headers") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Fix a couple misspellingsoulijun2018-04-272-2/+2
| | | | | | | | | | | | | | This patch fixes two spelling errors. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Submit bad wroulijun2018-04-271-1/+2
| | | | | | | | | | | | | | | | When generated bad work reqeust, it needs to report to user. This patch mainly fixes it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Update assignment method for owner field of send wqeoulijun2018-04-271-1/+2
| | | | | | | | | | | | | | | | | | When posting a work reqeust, it need to update the owner bit of send wqe. This patch mainly fix the bug when posting multiply work request. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Adjust the order of cleanup hem tableoulijun2018-04-271-2/+2
| | | | | | | | | | | | | | | | This patch update the order of cleaning hem table for trrl_table and irrl_table as well as mtt_cqe_table and mtt_table. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is setoulijun2018-04-271-8/+12
| | | | | | | | | | | | | | | | Only when the IB_QP_PATH_DEST_QPN flag of attr_mask is set is it valid to assign the dqpn field of qp context Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Remove some unnecessary attr_mask judgementoulijun2018-04-271-7/+4
| | | | | | | | | | | | | | | | This patch deletes some unnecessary attr_mask if condition in hip08 according to the IB protocol. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is setoulijun2018-04-271-1/+1
| | | | | | | | | | | | | | | | | | Only when the IB_QP_PATH_MTU flag of attr_mask is set it is valid to assign the mtu field of qp context when qp type is not GSI and UD. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Fix the qp context state diagramoulijun2018-04-271-1/+2
| | | | | | | | | | | | | | | | | | According to RoCE protocol, it is possible to transition from error to error state for modifying qp in hip08. This patch fix it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Intercept illegal RDMA operation when use inline dataoulijun2018-04-271-0/+5
| | | | | | | | | | | | | | | | | | RDMA read operation is not supported inline data. If user cofigures issue a RDMA read and use inline data, it will happen a hardware error. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/hns: Bugfix for init hem tableoulijun2018-04-271-4/+4
| | | | | | | | | | | | | | | | | | During init hem table, type should be used instead of table->type which is finally initializaed with type. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/nes: fix nes_netdev_start_xmit()'s return typeLuc Van Oostenryck2018-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, but the implementation in this driver returns an 'int'. Fix this by returning 'netdev_tx_t' in this driver too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/cxgb4: release hw resources on device removalRaju Rangoju2018-04-273-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The c4iw_rdev_close() logic was not releasing all the hw resources (PBL and RQT memory) during the device removal event (driver unload / system reboot). This can cause panic in gen_pool_destroy(). The module remove function will wait for all the hw resources to be released during the device removal event. Fixes c12a67fe(iw_cxgb4: free EQ queue memory on last deref) Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/mlx5: Properly check return value of mlx5_get_uars_pageLeon Romanovsky2018-04-271-3/+1
| | | | | | | | | | | | | | | | | | | | | | Starting from commit 72f36be06138 ("net/mlx5: Fix mlx5_get_uars_page to return error code") the mlx5_get_uars_page() call returns error in case of failure, but it was mistakenly overlooked in the merge commit. Fixes: e7996a9a77fc ("Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git") Reported-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Fix represent correct netdevice in dual port RoCEParav Pandit2018-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit bcf87f1dbbec ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode") incorrectly mapped primary device's netdevice to 2nd port netdevice. It always represented primary port's netdevice for 2nd port netdevice when ib representors were not used. This results into failing to process CM request arriving on 2nd port due to incorrect mapping of netdevice. This fix corrects it by considering the right mdev. Cc: <stable@vger.kernel.org> # 4.16 Fixes: bcf87f1dbbec ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Use unlimited rate when static rate is not supportedDanit Goldberg2018-04-271-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the change, if the user passed a static rate value different than zero and the FW doesn't support static rate, it would end up configuring rate of 2.5 GBps. Fix this by using rate 0; unlimited, in cases where FW doesn't support static rate configuration. Cc: <stable@vger.kernel.org> # 3.10 Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/mlx5: Protect from shift operand overflowLeon Romanovsky2018-04-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that user didn't supply values too large that can cause overflow. UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:263:23 shift exponent -2147483648 is negative CPU: 0 PID: 292 Comm: syzkaller612609 Not tainted 4.16.0-rc1+ #131 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ubsan_epilogue+0xe/0x81 set_rq_size+0x7c2/0xa90 create_qp_common+0xc18/0x43c0 mlx5_ib_create_qp+0x379/0x1ca0 create_qp.isra.5+0xc94/0x2260 ib_uverbs_create_qp+0x21b/0x2a0 ib_uverbs_write+0xc2c/0x1010 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 do_syscall_64+0x1aa/0x740 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x433569 RSP: 002b:00007ffc6e62f448 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433569 RDX: 0000000000000070 RSI: 00000000200042c0 RDI: 0000000000000003 RBP: 00000000006d5018 R08: 00000000004002f8 R09: 00000000004002f8 R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000 R13: 000000000040c9f0 R14: 000000000040ca80 R15: 0000000000000006 Cc: <stable@vger.kernel.org> # 3.10 Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Cc: syzkaller <syzkaller@googlegroups.com> Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/mlx5: Fix multiple NULL-ptr deref errors in rereg_mr flowLeon Romanovsky2018-04-271-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failure in rereg MR releases UMEM but leaves the MR to be destroyed by the user. As a result the following scenario may happen: "create MR -> rereg MR with failure -> call to rereg MR again" and hit "NULL-ptr deref or user memory access" errors. Ensure that rereg MR is only performed on a non-dead MR. Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.5 Fixes: 395a8e4c32ea ("IB/mlx5: Refactoring register MR code") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * infiniband: mlx5: fix build errors when INFINIBAND_USER_ACCESS=mRandy Dunlap2018-04-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix build errors when INFINIBAND_USER_ACCESS=m and MLX5_INFINIBAND=y. The build error occurs when the mlx5 driver code attempts to use USER_ACCESS interfaces, which are built as a loadable module. Fixes these build errors: drivers/infiniband/hw/mlx5/main.o: In function `populate_specs_root': ../drivers/infiniband/hw/mlx5/main.c:4982: undefined reference to `uverbs_default_get_objects' ../drivers/infiniband/hw/mlx5/main.c:4994: undefined reference to `uverbs_alloc_spec_tree' drivers/infiniband/hw/mlx5/main.o: In function `depopulate_specs_root': ../drivers/infiniband/hw/mlx5/main.c:5001: undefined reference to `uverbs_free_spec_tree' Build-tested with multiple config combinations. Fixes: 8c84660bb437 ("IB/mlx5: Initialize the parsing tree root without the help of uverbs") Cc: stable@vger.kernel.org # reported against 4.16 Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: remove duplicate header fileZhu Yanjun2018-04-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | The header file fs_helpers.h is included twice. So it should be removed. Fixes: 802c2125689d ("IB/mlx5: Add IPsec support for egress and ingress") CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | net/mlx5: Fix mlx5_get_vector_affinity functionIsrael Rukshin2018-04-261-1/+1
|/ | | | | | | | | | | | | | | | | | Adding the vector offset when calling to mlx5_vector2eqn() is wrong. This is because mlx5_vector2eqn() checks if EQ index is equal to vector number and the fact that the internal completion vectors that mlx5 allocates don't get an EQ index. The second problem here is that using effective_affinity_mask gives the same CPU for different vectors. This leads to unmapped queues when calling it from blk_mq_rdma_map_queues(). This doesn't happen when using affinity_hint mask. Fixes: 2572cf57d75a ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0") Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
* Merge tag 'for-linus-unmerged' of ↵Linus Torvalds2018-04-0676-1016/+1956
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Doug and I are at a conference next week so if another PR is sent I expect it to only be bug fixes. Parav noted yesterday that there are some fringe case behavior changes in his work that he would like to fix, and I see that Intel has a number of rc looking patches for HFI1 they posted yesterday. Parav is again the biggest contributor by patch count with his ongoing work to enable container support in the RDMA stack, followed by Leon doing syzkaller inspired cleanups, though most of the actual fixing went to RC. There is one uncomfortable series here fixing the user ABI to actually work as intended in 32 bit mode. There are lots of notes in the commit messages, but the basic summary is we don't think there is an actual 32 bit kernel user of drivers/infiniband for several good reasons. However we are seeing people want to use a 32 bit user space with 64 bit kernel, which didn't completely work today. So in fixing it we required a 32 bit rxe user to upgrade their userspace. rxe users are still already quite rare and we think a 32 bit one is non-existing. - Fix RDMA uapi headers to actually compile in userspace and be more complete - Three shared with netdev pull requests from Mellanox: * 7 patches, mostly to net with 1 IB related one at the back). This series addresses an IRQ performance issue (patch 1), cleanups related to the fix for the IRQ performance problem (patches 2-6), and then extends the fragmented completion queue support that already exists in the net side of the driver to the ib side of the driver (patch 7). * Mostly IB, with 5 patches to net that are needed to support the remaining 10 patches to the IB subsystem. This series extends the current 'representor' framework when the mlx5 driver is in switchdev mode from being a netdev only construct to being a netdev/IB dev construct. The IB dev is limited to raw Eth queue pairs only, but by having an IB dev of this type attached to the representor for a switchdev port, it enables DPDK to work on the switchdev device. * All net related, but needed as infrastructure for the rdma driver - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers - SRP performance updates - IB uverbs write path cleanup patch series from Leon - Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to set the port for ib_srpt to listen on in configfs in order for it to be enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port) - TSO and Scatter FCS support in mlx4 - Refactor of modify_qp routine to resolve problems seen while working on new code that is forthcoming - More refactoring and updates of RDMA CM for containers support from Parav - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory' user API features - Infrastructure updates for the new IOCTL interface, based on increased usage - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit kernel as was originally intended. See the commit messages for extensive details - Syzkaller bugs and code cleanups motivated by them" * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits) IB/rxe: Fix for oops in rxe_register_device on ppc64le arch IB/mlx5: Device memory mr registration support net/mlx5: Mkey creation command adjustments IB/mlx5: Device memory support in mlx5_ib net/mlx5: Query device memory capabilities IB/uverbs: Add device memory registration ioctl support IB/uverbs: Add alloc/free dm uverbs ioctl support IB/uverbs: Add device memory capabilities reporting IB/uverbs: Expose device memory capabilities to user RDMA/qedr: Fix wmb usage in qedr IB/rxe: Removed GID add/del dummy routines RDMA/qedr: Zero stack memory before copying to user space IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR IB/mlx5: Add information for querying IPsec capabilities IB/mlx5: Add IPsec support for egress and ingress {net,IB}/mlx5: Add ipsec helper IB/mlx5: Add modify_flow_action_esp verb IB/mlx5: Add implementation for create and destroy action_xfrm IB/uverbs: Introduce ESP steering match filter IB/uverbs: Add modify ESP flow_action ...
| * IB/mlx5: Device memory mr registration supportAriel Levkovich2018-04-053-0/+84
| | | | | | | | | | | | | | | | | | | | Adding mlx5_ib driver implementation for reg_dm_mr callback which allows registering device memory (DM) as an MR for local and remote access. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * net/mlx5: Mkey creation command adjustmentsAriel Levkovich2018-04-052-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change updates the mlx5 interface to create mkey on the device. The updates in the command mailbox include increasing the access mode type field to 5 bits in order to support additional types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device memory access type and will be used when registering MR on allocated device memory. All the places that use the old access mode format are adjusted as well. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Device memory support in mlx5_ibAriel Levkovich2018-04-054-2/+286
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the mlx5_ib driver implementation for the device memory allocation API. It implements the ib_device callbacks for allocation and deallocation operations as well as a new mmap command support which allows mapping an allocated device memory to a VMA. The change also adds reporting of device memory maximum size and alignment parameters reported in device capabilities. The allocation/deallocation operations are using new firmware commands to allocate MEMIC memory on the device. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/qedr: Fix wmb usage in qedrKalderon, Michal2018-04-051-8/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch comes as a result of Sinan Kaya's work and the decision that writel() must be a strong enough barrier for DMA. wmb usages in qedr driver have either been removed where they were there only to order DMA accesses, and replaced with smp_wmb and comments for the places that the barrier was there for SMP reasons. Fixes: 561e5d48968b ("RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs") Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/qedr: Zero stack memory before copying to user spaceJason Gunthorpe2018-04-051-3/+3
| | | | | | | | | | | | | | | | | | | | The fact this struct was not init'd like all the others was missed when the padding reserved field was added. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 71e80a4781af ("RDMA/qedr: Fix uABI structure layouts for 32/64 compat") Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIRMatan Barak2018-04-042-4/+16
| | | | | | | | | | | | | | | | | | | | | | When a Raw Ethernet QP is created, we actually create a few objects. One of these objects is a TIR. Currently, a TIR could hash (and spread the traffic) by IP or port only. Adding a hashing by IPSec SPI to TIR creation with the required UAPI bit. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Add information for querying IPsec capabilitiesMatan Barak2018-04-041-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users should be able to query for IPSec support. Adding a few capabilities bits as part of the driver specific part in alloc_ucontext: MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_REQ_METADATA Payload's header is returned with metadata representing the IPSec decryption state. MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_RX Support ESP_AES_GCM in ingress path. MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_TX Support ESP_AES_GCM in egress path. MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_SPI_RSS_ONLY Hardware doesn't support matching SPI in flow steering rules but just hashing and spreading the traffic accordingly. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Add IPsec support for egress and ingressAviad Yehezkel2018-04-042-12/+117
| | | | | | | | | | | | | | | | | | | | | | | | This commit introduces support for the esp_aes_gcm flow specification for the Innova device. To that end we add support for egress steering and some validations that an IPsec rule is indeed valid. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Add modify_flow_action_esp verbMatan Barak2018-04-041-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | Adding implementation in mlx5 driver to modify action_xfrm object. This merely call the accel layer. Currently a user can modify only the ESN parameters. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Add implementation for create and destroy action_xfrmAviad Yehezkel2018-04-042-1/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding implementation in mlx5 driver to create and destroy action_xfrm object. This merely call the accel layer. A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states that [s]he expects a metadata header to be added to the payload. This header represents information regarding the transformation's state. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx4: Check for egress flow steeringBoris Pismenny2018-04-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | ConnectX3 doesn't support egress flow steering. Return an EOPNOTSUPP error when such a flow is being created. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Initialize the parsing tree root without the help of uverbsMatan Barak2018-04-042-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to have a custom parsing tree, a provider driver needs to assign its parsing tree to ib_device specs_tree field. Otherwise, the uverbs client assigns a common default parsing tree for it. In downstream patches, the mlx5_ib driver gains a custom parsing tree, which contains both the common objects and a new flags field for the UVERBS_FLOW_ACTION_ESP_CREATE command. This patch makes mlx5_ib assign its own tree to specs_root, which later on will be extended. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA: Use ib_gid_attr during GID modificationParav Pandit2018-04-036-67/+43
| | | | | | | | | | | | | | | | | | | | Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/providers: Avoid null netdev check for RoCEParav Pandit2018-04-039-70/+53
| | | | | | | | | | | | | | | | | | | | | | Now that IB core GID cache ensures that all RoCE entries have an associated netdev remove null checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/providers: Avoid zero GID check for RoCEParav Pandit2018-04-035-19/+1
| | | | | | | | | | | | | | | | | | | | | | Now that the IB core GID cache ensures that a zero GID doesn't exist in the GID table remove zero GID checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit2018-04-039-68/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/qedr: Remove GID add/del dummy routinesParav Pandit2018-04-033-39/+0
| | | | | | | | | | | | | | | | | | | | | | | | qedr driver's add_gid() and del_gid() callbacks are doing simple checks which are already done by the ib core before invoking these callback routines. Therefore, code is simplified to skip implementing add_gid() and del_gid() callback functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * i40iw: Remove pre-production workaround for resource profile 1Shiraz Saleem2018-04-031-2/+0
| | | | | | | | | | | | | | | | | | Support for resource profile 1 is currenlty deprecated due to a pre-production errata. Remove this workaround as its no longer needed. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/mlx5: Fix definition of mlx5_ib_create_qp_respJason Gunthorpe2018-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This structure is pushed down the ex and the non-ex path, so it needs to be aligned to 8 bytes to go through ex without implicit padding. Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so take the approach of just copying the minimum length. New user space will consistently provide 8 bytes in both cases. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>