summaryrefslogtreecommitdiffstats
path: root/include/rdma/ib_verbs.h
Commit message (Collapse)AuthorAgeFilesLines
* RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgszMike Marciniszyn2023-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 64k pages introduce the situation in this diagram when the HCA 4k page size is being used: +-------------------------------------------+ <--- 64k aligned VA | | | HCA 4k page | | | +-------------------------------------------+ | o | | | | o | | | | o | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ <--- Live HCA page |OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO| <--- offset | | <--- VA | MR data | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ | o | | | | o | | | | o | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ The VA addresses are coming from rdma-core in this diagram can be arbitrary, but for 64k pages, the VA may be offset by some number of HCA 4k pages and followed by some number of HCA 4k pages. The current iterator doesn't account for either the preceding 4k pages or the following 4k pages. Fix the issue by extending the ib_block_iter to contain the number of DMA pages like comment [1] says and by using __sg_advance to start the iterator at the first live HCA page. The changes are contained in a parallel set of iterator start and next functions that are umem aware and specific to umem since there is one user of the rdma_for_each_block() without umem. These two fixes prevents the extra pages before and after the user MR data. Fix the preceding pages by using the __sq_advance field to start at the first 4k page containing MR data. Fix the following pages by saving the number of pgsz blocks in the iterator state and downcounting on each next. This fix allows for the elimination of the small page crutch noted in the Fixes. Fixes: 10c75ccb54e4 ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Fix a couple of obvious typos in commentsChuck Lever2023-10-041-1/+1
| | | | | | | | Fix typos. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169643338101.8035.6826446669479247727.stgit@manet.1015granger.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA: Annotate struct rdma_hw_stats with __counted_byKees Cook2023-10-021-1/+1
| | | | | | | | | | | | | | | | | | | Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct rdma_hw_stats. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-1-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
* IB/core: Add support for XDR link speedOr Har-Toov2023-09-261-0/+2
| | | | | | | | | | | | | | Add new IBTA speed XDR, the new rate that was added to Infiniband spec as part of XDR and supporting signaling rate of 200Gb. In order to report that value to rdma-core, add new u32 field to query_port response. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/9d235fc600a999e8274010f0e18b40fa60540e6c.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA/core: Add support to dump SRQ resource in RAW formatwenglianfa2023-09-201-0/+1
| | | | | | | | | | | | | | | | | Add support to dump SRQ resource in raw format. It enable drivers to return the entire device specific SRQ context without setting each field separately. Example: $ rdma res show srq -r dev hns3 149000... $ rdma res show srq -j -r [{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}] Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA/core: Add dedicated SRQ resource tracker functionwenglianfa2023-09-201-0/+1
| | | | | | | | Add a dedicated callback function for SRQ resource tracker. Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA Remove unused function declarationsYue Haibing2023-08-131-2/+0
| | | | | | | | | | | Commit c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev") declared but never implemented ib_device_netdev(), remove it. Commit 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") declared but never implemented iw_cm_unbind_qp() and iw_cm_get_qp(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230809142718.42316-1-yuehaibing@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA: Add ib_virt_dma_to_page()Jason Gunthorpe2023-04-161-0/+25
| | | | | | | | | | | | | | | | | Make it clearer what is going on by adding a function to go back from the "virtual" dma_addr to a kva and another to a struct page. This is used in the ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib). Call them instead of a naked casting and virt_to_page() when working with dma_addr values encoded by the various ib_map functions. This also fixes the virt_to_page() casting problem Linus Walleij has been chasing. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA/mlx: Calling qp event handler in workqueue contextMark Zhang2023-01-151-1/+1
| | | | | | | | | | | Move the call of qp event handler from atomic to workqueue context, so that the handler is able to block. This is needed by following patches. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA: Extend RDMA kernel verbs ABI to support flushLi Zhijian2022-12-091-1/+17
| | | | | | | | | | | | | | | | | | | | | | This commit extends the RDMA kernel verbs ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing RDMA kernel verbs ABI. It makes device/HCA support new FLUSH attributes/capabilities, and it also makes memory region support new FLUSH access flags. Users can use ibv_reg_mr(3) to register flush access flags. Only the access flags also supported by device's capabilities can be registered successfully. Once registered successfully, it means the MR is flushable. Similarly, A flushable MR should also have one or both of GLOBAL_VISIBILITY and PERSISTENT attributes/capabilities like device/HCA. Link: https://lore.kernel.org/r/20221206130201.30986-3-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Extend RDMA kernel ABI to support atomic writeXiao Yang2022-12-011-0/+3
| | | | | | | | | | 1) Define new atomic write request/completion in kernel. 2) Define new atomic write capability in kernel. 3) Define new atomic write opcode for RC service in packet. Link: https://lore.kernel.org/r/1669905432-14-3-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Add netdevice_tracker to ib_device_set_netdev()Jason Gunthorpe2022-11-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This will cause an informative backtrace to print if the user of ib_device_set_netdev() isn't careful about tearing down the ibdevice before its the netdevice parent is destroyed. Such as like this: unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 leaked reference. ib_device_set_netdev+0x266/0x730 siw_newlink+0x4e0/0xfd0 nldev_newlink+0x35c/0x5c0 rdma_nl_rcv_msg+0x36d/0x690 rdma_nl_rcv+0x2ee/0x430 netlink_unicast+0x543/0x7f0 netlink_sendmsg+0x918/0xe20 sock_sendmsg+0xcf/0x120 ____sys_sendmsg+0x70d/0x8b0 ___sys_sendmsg+0x11d/0x1b0 __sys_sendmsg+0xfa/0x1d0 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This will help debug the issues syzkaller is seeing. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-a7c81b3842ce+e5-netdev_tracker_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
* RDMA/core: return -EOPNOSUPP for ODP unsupported deviceLi Zhijian2022-10-191-1/+1
| | | | | | | | | | | | ib_reg_mr(3) which is used to register a MR with specific access flags for specific HCA will set errno when something go wrong. So, here we should return the specific -EOPNOTSUPP when the being requested ODP access flag is unsupported by the HCA(such as RXE). Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20221001020045.8324-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
* Merge tag 'dma-mapping-5.20-2022-08-06' of ↵Linus Torvalds2022-08-061-0/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy, Christoph Hellwig) - restructure the PCIe peer to peer mapping support (Logan Gunthorpe) - allow the IOMMU code to communicate an optional DMA mapping length and use that in scsi and libata (John Garry) - split the global swiotlb lock (Tianyu Lan) - various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang, Lukas Bulwahn, Robin Murphy) * tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits) swiotlb: fix passing local variable to debugfs_create_ulong() dma-mapping: reformat comment to suppress htmldoc warning PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() RDMA/rw: drop pci_p2pdma_[un]map_sg() RDMA/core: introduce ib_dma_pci_p2p_dma_supported() nvme-pci: convert to using dma_map_sgtable() nvme-pci: check DMA ops when indicating support for PCI P2PDMA iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg iommu: Explicitly skip bus address marked segments in __iommu_map_sg() dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support dma-direct: support PCI P2PDMA pages in dma-direct map_sg dma-mapping: allow EREMOTEIO return code for P2PDMA transfers PCI/P2PDMA: Introduce helpers for dma_map_sg implementations PCI/P2PDMA: Attempt to set map_type if it has not been set lib/scatterlist: add flag for indicating P2PDMA segments in an SGL swiotlb: clean up some coding style and minor issues dma-mapping: update comment after dmabounce removal scsi: sd: Add a comment about limiting max_sectors to shost optimal limit ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit ...
| * RDMA/core: introduce ib_dma_pci_p2p_dma_supported()Logan Gunthorpe2022-07-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the helper function ib_dma_pci_p2p_dma_supported() to check if a given ib_device can be used in P2PDMA transfers. This ensures the ib_device is not using virt_dma and also that the underlying dma_device supports P2PDMA. Use the new helper in nvme-rdma to replace the existing check for ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows switching away from pci_p2pdma_[un]map_sg(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | RDMA: Fix comment typoXin Gao2022-07-221-1/+1
|/ | | | | | | | The double `get' is duplicated, remove one. Link: https://lore.kernel.org/r/20220722021833.15669-1-gaoxin@cdjrlc.com Signed-off-by: Xin Gao <gaoxin@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Fix typo in commentJulia Lawall2022-05-241-1/+1
| | | | | | | | | Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-86-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* Merge branch 'mlx5-next' of ↵Jason Gunthorpe2022-04-121-8/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== Mellanox shared branch that includes: * Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code, clean build config options and delete useless kTLS vs. TLS separation. [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf * Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com Together with FPGA TLS, the IPsec went to EOL state in the November of 2019 [1]. Exactly like FPGA TLS, no active customers exist for this upstream code and all the complexity around that area can be deleted. [2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf * Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de ==================== * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Remove not-implemented IPsec capabilities net/mlx5: Remove ipsec_ops function table net/mlx5: Reduce kconfig complexity while building crypto support net/mlx5: Move IPsec file to relevant directory net/mlx5: Remove not-needed IPsec config net/mlx5: Align flow steering allocation namespace to common style net/mlx5: Unify device IPsec capabilities check net/mlx5: Remove useless IPsec device checks net/mlx5: Remove ipsec vs. ipsec offload file separation RDMA/core: Delete IPsec flow action logic from the core RDMA/mlx5: Drop crypto flow steering API RDMA/mlx5: Delete never supported IPsec flow action net/mlx5: Remove FPGA ipsec specific statistics net/mlx5: Remove XFRM no_trailer flag net/mlx5: Remove not-used IDA field from IPsec struct net/mlx5: Delete metadata handling logic net/mlx5_fpga: Drop INNOVA IPsec support IB/mlx5: Fix undefined behavior due to shift overflowing the constant net/mlx5: Cleanup kTLS function names and their exposure net/mlx5: Remove tls vs. ktls separation as it is the same net/mlx5: Remove indirection in TLS build net/mlx5: Reliably return TLS device capabilities net/mlx5_fpga: Drop INNOVA TLS support Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/core: Delete IPsec flow action logic from the coreLeon Romanovsky2022-04-091-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | The removal of mlx5 flow steering logic, left the kernel without any RDMA drivers that implements flow action callbacks supplied by RDMA/core. Any user access to them caused to EOPNOTSUPP error, which can be achieved by simply removing ioctl implementation. Link: https://lore.kernel.org/r/a638e376314a2eb1c66f597c0bbeeab2e5de7faf.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
* | RDMA: Split kernel-only global device caps from uverbs device capsJason Gunthorpe2022-04-061-51/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* | IB/uverbs: Move part of enum ib_device_cap_flags to uapiXiao Yang2022-04-041-38/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | 1) Part of enum ib_device_cap_flags are used by ibv_query_device(3) or ibv_query_device_ex(3), so we define them in include/uapi/rdma/ib_user_verbs.h and only expose them to userspace. 2) Reformat enum ib_device_cap_flags by removing the indent before '='. Link: https://lore.kernel.org/r/20220331032419.313904-2-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* | IB/uverbs: Move enum ib_raw_packet_caps to uapiXiao Yang2022-04-041-7/+11
|/ | | | | | | | | | This enum is used by ibv_query_device_ex(3) so it should be defined in include/uapi/rdma/ib_user_verbs.h. Link: https://lore.kernel.org/r/20220331032419.313904-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Calculate UDP source port based on flow label or lqpn/rqpnZhu Yanjun2022-01-071-0/+17
| | | | | | | | | | Calculate and set UDP source port based on the flow label. If flow label is not defined in GRH then calculate it based on lqpn/rqpn. Link: https://lore.kernel.org/r/20220106180359.2915060-2-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/hns: Use the core code to manage the fixed mmap entriesChengchang Tang2021-10-291-0/+9
| | | | | | | | | | | Add a new implementation for mmap by using the new mmap entry API. This makes way for further use of the dynamic mmap allocator in this driver. Link: https://lore.kernel.org/r/20211028105640.1056-1-liangwenpeng@huawei.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Set sgtable nents when using ib_dma_virt_map_sg()Logan Gunthorpe2021-10-131-1/+6
| | | | | | | | | | | | | | | | ib_dma_map_sgtable_attrs() should be mapping the sgls and setting nents but the ib_uses_virt_dma() path falls back to ib_dma_virt_map_sg() which will not set the nents in the sgtable. Check the return value (per the map_sg calling convention) and set sgt->nents appropriately on success. Fixes: 79fbd3e1241c ("RDMA: Use the sg_table directly and remove the opencoded version from umem") Link: https://lore.kernel.org/r/20211013165942.89806-1-logang@deltatee.com Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/mlx5: Add modify_op_stat() supportAharon Landau2021-10-121-0/+2
| | | | | | | | | | | | | Add support for ib callback modify_op_stat() to add or remove an optional counter. When adding, a steering flow table is created with a rule that catches and counts all the matching packets. When removing, the table and flow counter are destroyed. Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/counter: Add optional counter supportAharon Landau2021-10-121-0/+13
| | | | | | | | | | | | | | | | | | | | An optional counter is a driver-specific counter that may be dynamically enabled/disabled. This enhancement allows drivers to expose counters which are, for example, mutually exclusive and cannot be enabled at the same time, counters that might degrades performance, optional debug counters, etc. Optional counters are marked with IB_STAT_FLAG_OPTIONAL flag. They are not exported in sysfs, and must be at the end of all stats, otherwise the attr->show() in sysfs would get wrong indexes for hwcounters that are behind optional counters. Link: https://lore.kernel.org/r/20211008122439.166063-7-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/counter: Add an is_disabled field in struct rdma_hw_statsAharon Landau2021-10-121-0/+3
| | | | | | | | | | | | Add a bitmap in rdma_hw_stat structure, with each bit indicates whether the corresponding counter is currently disabled or not. By default hwcounters are enabled. Link: https://lore.kernel.org/r/20211008122439.166063-6-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Add a helper API rdma_free_hw_stats_structMark Zhang2021-10-121-23/+4
| | | | | | | | | | | | | Add a new API rdma_free_hw_stats_struct to pair with rdma_alloc_hw_stats_struct (which is also de-inlined). This will be useful when there are more alloc/free works in following patches. Link: https://lore.kernel.org/r/20211008122439.166063-5-markzhang@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/counter: Add a descriptor in struct rdma_hw_statsAharon Landau2021-10-121-6/+15
| | | | | | | | | | | | Add a counter statistic descriptor structure in rdma_hw_stats. In addition to the counter name, more meta-information will be added. This code extension is needed for optional-counter support in the following patches. Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* Merge branch 'sg_nents' into rdma.git for-nextJason Gunthorpe2021-08-301-0/+28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Maor Gottlieb ==================== Fix the use of nents and orig_nents in the sg table append helpers. The nents should be used by the DMA layer to store the number of DMA mapped sges, the orig_nents is the number of CPU sges. Since the sg append logic doesn't always create a SGL with exactly orig_nents entries store a total_nents as well to allow the table to be properly free'd and reorganize the freeing logic to share across all the use cases. ==================== Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> * 'sg_nents': RDMA: Use the sg_table directly and remove the opencoded version from umem lib/scatterlist: Fix wrong update of orig_nents lib/scatterlist: Provide a dedicated function to support table append
| * RDMA: Use the sg_table directly and remove the opencoded version from umemMaor Gottlieb2021-08-241-0/+28
| | | | | | | | | | | | | | | | | | | | This allows using the normal sg_table APIs and makes all the code cleaner. Remove sgt, nents and nmapd from ib_umem. Link: https://lore.kernel.org/r/20210824142531.3877007-4-maorg@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* | RDMA/core: Reorganize create QP low-level functionsLeon Romanovsky2021-08-031-4/+12
| | | | | | | | | | | | | | | | | | | | The low-level create QP function grew to be larger than any sensible inline function should be. The inline attribute is not really needed for that function and can be implemented as exported symbol. Link: https://lore.kernel.org/r/2c08709d86f876c3dfb77684357b2a939e570ca4.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* | RDMA: Globally allocate and release QP memoryLeon Romanovsky2021-08-031-5/+25
|/ | | | | | | | | | | | | | Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory. Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* IB/core: Shuffle locks in ib_port_data to save memoryAnand Khoje2021-06-211-1/+3
| | | | | | | | | | | | | | pahole shows two 4-byte holes in struct ib_port_data after pkey_list_lock and netdev_lock respectively. Shuffling the netdev_lock to be after pkey_list_lock, this shaves off eight bytes from the struct. Link: https://lore.kernel.org/r/20210616154509.1047-3-anand.a.khoje@oracle.com Suggested-by: Haakon Bugge <haakon.bugge@oracle.com> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/mlx5: Enable Relaxed Ordering by default for kernel ULPsAvihai Horon2021-06-211-0/+8
| | | | | | | | | | | | | | Relaxed Ordering is a capability that can only benefit users that support it. All kernel ULPs should support Relaxed Ordering, as they are designed to read data only after observing the CQE and use the DMA API correctly. Hence, implicitly enable Relaxed Ordering by default for MR transfers in kernel ULPs. Link: https://lore.kernel.org/r/b7e820aab7402b8efa63605f4ea465831b3b1e5e.1623236426.git.leonro@nvidia.com Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Remove rdma_set_device_sysfs_group()Jason Gunthorpe2021-06-161-23/+7
| | | | | | | | | The driver's device group can be specified as part of the ops structure like the device's port group. No need for the complicated API. Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Change ops->init_port to ops->port_groupsJason Gunthorpe2021-06-161-6/+3
| | | | | | | | | | | | | | | init_port was only being used to register sysfs attributes against the port kobject. Now that all users are creating static attribute_group's we can simply set the attribute_group list in the ops and the core code can just handle it directly. This makes all the sysfs management quite straightforward and prevents any driver from abusing the naked port kobject in future because no driver code can access it. Link: https://lore.kernel.org/r/114f68f3d921460eafe14cea5a80ca65d81729c3.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Create the device hw_counters through the normal groups mechanismJason Gunthorpe2021-06-161-4/+5
| | | | | | | | | | | | | | Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add(). This requires setting up the hw_counters before device_add(), so it gets split up from the already split port sysfs flow. Move all the memory freeing to the release function. Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Split port and device counter sysfs attributesJason Gunthorpe2021-06-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code creates a 'struct hw_stats_attribute' for each sysfs entry that contains a naked 'struct attribute' inside. It then proceeds to attach this same structure to a 'struct device' kobj and a 'struct ib_port' kobj. However, this violates the typing requirements. 'struct device' requires the attribute to be a 'struct device_attribute' and 'struct ib_port' requires the attribute to be 'struct port_attribute'. This happens to work because the show/store function pointers in all three structures happen to be at the same offset and happen to be nearly the same signature. This means when container_of() was used to go between the wrong two types it still managed to work. However clang CFI detection notices that the function pointers have a slightly different signature. As with show/store this was only working because the device and port struct layouts happened to have the kobj at the front. Correct this by have two independent sets of data structures for the port and device case. The two different attributes correctly include the port/device_attribute struct and everything from there up is kept split. The show/store function call chains start with device/port unique functions that invoke a common show/store function pointer. Link: https://lore.kernel.org/r/a8b3864b4e722aed3657512af6aa47dc3c5033be.1623427137.git.leonro@nvidia.com Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointerJason Gunthorpe2021-06-161-1/+2
| | | | | | | | | | | | It is much saner to store a pointer to the kobject structure that contains the cannonical stats pointer than to copy the stats pointers into a public structure. Future patches will require the sysfs pointer for other purposes. Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Split the alloc_hw_stats() ops to port and device variantsJason Gunthorpe2021-06-161-6/+7
| | | | | | | | | | | | | | | This is being used to implement both the port and device global stats, which is causing some confusion in the drivers. For instance EFA and i40iw both seem to be misusing the device stats. Split it into two ops so drivers that don't support one or the other can leave the op NULL'd, making the calling code a little simpler to understand. Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com Tested-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Remove unnecessary struct declarationWan Jiabing2021-05-111-1/+0
| | | | | | | | | | The declaration of struct ib_grh is uncessary here, because it is defined at line 766. Link: https://lore.kernel.org/r/20210510062843.15707-1-wanjiabing@vivo.com Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Remove never used ib_modify_wq function callLeon Romanovsky2021-05-111-2/+0
| | | | | | | | The function ib_modify_wq() is not used, so remove it. Link: https://lore.kernel.org/r/c5e48d517b9163fe4f9ffd224050b83fdb3571c6.1620552935.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/restrack: Add support to get resource tracking for SRQNeta Ostrovsky2021-04-221-0/+5
| | | | | | | | | | | In order to track SRQ resources, a new restrack object is initialized and added to the resource tracking database. Link: https://lore.kernel.org/r/0db71c409f24f2f6b019bf8797a8fed96fe7079c.1618753110.git.leonro@nvidia.com Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdevMike Marciniszyn2021-04-071-0/+2
| | | | | | | | | | | | | | | | | The current rdma_netdev handling in ipoib hooks the tx_timeout handler, but prints out a totally useless message that prevents effective debugging especially when multiple transmit queues are being used. Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1 to print additional information. The existing non-helpful message is avoided when the driver has presented a callback. Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA: Support more than 255 rdma portsMark Bloch2021-03-261-82/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Remove unused req_ncomp_notif device operationGal Pressman2021-03-111-15/+0
| | | | | | | | | The request_ncomp_notif device operation and function are unused, remove them. Link: https://lore.kernel.org/r/20210311150921.23726-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Introduce and use API to read port immutable dataParav Pandit2021-02-051-0/+3
| | | | | | | | | | | | | | | | | | | Currently mlx5 driver caches port GID table length for 2 ports. It is also cached by IB core as port immutable data. When mlx5 representor ports are present, which are usually more than 2, invalid access to port_caps array can happen while validating the GID table length which is only for 2 ports. To avoid this, take help of the IB cores port immutable data by exposing an API to read the port immutable fields. Remove mlx5 driver's internal cache, thereby reduce code and data. Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* RDMA/core: Add device method for registering dma-buf based memory regionJianxin Xiong2021-01-201-1/+5
| | | | | | | | | | | | | | | Dma-buf based memory region requires one extra parameter and is processed quite differently. Adding a separate method allows clean separation from regular memory regions. Link: https://lore.kernel.org/r/1608067636-98073-3-git-send-email-jianxin.xiong@intel.com Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>