summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2014-04-034-29/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "Main batch of InfiniBand/RDMA changes for 3.15: - The biggest change is core API extensions and mlx5 low-level driver support for handling DIF/DIX-style protection information, and the addition of PI support to the iSER initiator. Target support will be arriving shortly through the SCSI target tree. - A nice simplification to the "umem" memory pinning library now that we have chained sg lists. Kudos to Yishai Hadas for realizing our code didn't have to be so crazy. - Another nice simplification to the sg wrappers used by qib, ipath and ehca to handle their mapping of memory to adapter. - The usual batch of fixes to bugs found by static checkers etc. from intrepid people like Dan Carpenter and Yann Droneaud. - A large batch of cxgb4, ocrdma, qib driver updates" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits) RDMA/ocrdma: Unregister inet notifier when unloading ocrdma RDMA/ocrdma: Fix warnings about pointer <-> integer casts RDMA/ocrdma: Code clean-up RDMA/ocrdma: Display FW version RDMA/ocrdma: Query controller information RDMA/ocrdma: Support non-embedded mailbox commands RDMA/ocrdma: Handle CQ overrun error RDMA/ocrdma: Display proper value for max_mw RDMA/ocrdma: Use non-zero tag in SRQ posting RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr() RDMA/ocrdma: Increment abi version count RDMA/ocrdma: Update version string be2net: Add abi version between be2net and ocrdma RDMA/ocrdma: ABI versioning between ocrdma and be2net RDMA/ocrdma: Allow DPP QP creation RDMA/ocrdma: Read ASIC_ID register to select asic_gen RDMA/ocrdma: SQ and RQ doorbell offset clean up RDMA/ocrdma: EQ full catastrophe avoidance RDMA/cxgb4: Disable DSGL use by default RDMA/cxgb4: rx_data() needs to hold the ep mutex ...
| *---. Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', ↵Roland Dreier2014-04-032-6/+7
| |\ \ \ | | | | | | | | | | | | | | | 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next
| | | | * IB/mlx4: Fix a sparse endianness warningBart Van Assche2014-03-171-1/+1
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warning for the mlx4 driver: $ make M=drivers/infiniband C=2 CF=-D__CHECK_ENDIAN__ drivers/infiniband/hw/mlx4/qp.c:1885:31: warning: restricted __be16 degrades to integer Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * mlx4_core: Make buffer larger to avoid overflow warningDan Carpenter2014-04-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My static checker complains that the sprintf() here can overflow. drivers/infiniband/hw/mlx4/main.c:1836 mlx4_ib_alloc_eqs() error: format string overflow. buf_size: 32 length: 69 This seems like a valid complaint. The "dev->pdev->bus->name" string can be 48 characters long. I just made the buffer 80 characters instead of 69 and I changed the sprintf() to snprintf(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * mlx4_core: Fix some indenting in mlx4_ib_add()Dan Carpenter2014-04-011-2/+3
| | |/ | | | | | | | | | | | | | | | | | | | | | The code was indented too far and also kernel style says we should have curly braces. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * / IB: Refactor umem to use linear SG tableYishai Hadas2014-03-042-23/+20
| |/ | | | | | | | | | | | | | | | | | | | | This patch refactors the IB core umem code and vendor drivers to use a linear (chained) SG table instead of chunk list. With this change the relevant code becomes clearer—no need for nested loops to build and use umem. Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | mlx4: Use actual number of PCI functions (PF + VFs) for alias GUID logicMatan Barak2014-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The code which is dealing with SRIOV alias GUIDs in the mlx4 IB driver has some logic which operated according to the maximal possible active functions (PF + VFs). After the single port VFs code integration this resulted in a flow of false-positive warnings going to the kernel log after the PF driver started the alias GUID work. Fix it by referring to the actual number of functions. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx4: Adapt code for N-Port VFMatan Barak2014-03-203-18/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for N-Port VFs, this includes: 1. Adding support in the wrapped FW command In wrapped commands, we need to verify and convert the slave's port into the real physical port. Furthermore, when sending the response back to the slave, a reverse conversion should be made. 2. Adjusting sqpn for QP1 para-virtualization The slave assumes that sqpn is used for QP1 communication. If the slave is assigned to a port != (first port), we need to adjust the sqpn that will direct its QP1 packets into the correct endpoint. 3. Adjusting gid[5] to modify the port for raw ethernet In B0 steering, gid[5] contains the port. It needs to be adjusted into the physical port. 4. Adjusting number of ports in the query / ports caps in the FW commands When a slave queries the hardware, it needs to view only the physical ports it's assigned to. 5. Adjusting the sched_qp according to the port number The QP port is encoded in the sched_qp, thus in modify_qp we need to encode the correct port in sched_qp. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | IB/mlx4_ib: Adapt code to use caps.num_ports instead of a constantMatan Barak2014-03-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports. Instead, we should only loop until the number of actual ports in i the device, which is stored in dev->caps.num_ports. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: Activate RoCE/SRIOVJack Morgenstein2014-03-121-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To activate RoCE/SRIOV, need to remove the following: 1. In mlx4_ib_add, need to remove the error return preventing initialization of a RoCE port under SRIOV. 2. In update_vport_qp_params (in resource_tracker.c) need to remove the error return when a RoCE RC or UD qp is detected. This error return causes the INIT-to-RTR qp transition to fail in the wrapper function under RoCE/SRIOV. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4_ib: Fix SIDR support of for UD QPs under SRIOV/RoCEShani Michaelli2014-03-121-15/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID in multiplex_cm_handler and demux_cm_handler. * Handle Service ID Resolution messages and REQ messages separately, for their formats are different. Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: Implement IP based gids support for RoCE/SRIOVJack Morgenstein2014-03-125-24/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since there is no connection between the MAC/VLAN and the GID when using IP-based addressing, the proxy QP1 (running on the slave) must pass the source-mac, destination-mac, and vlan_id information separately from the GID. Additionally, the Host must pass the remote source-mac and vlan_id back to the slave, This is achieved as follows: Outgoing MADs: 1. Source MAC: obtained from the CQ completion structure (struct ib_wc, smac field). 2. Destination MAC: obtained from the tunnel header 3. vlan_id: obtained from the tunnel header. Incoming MADs 1. The source (i.e., remote) MAC and vlan_id are passed in the tunnel header to the proxy QP1. VST mode support: For outgoing MADs, the vlan_id obtained from the header is discarded, and the vlan_id specified by the Hypervisor is used instead. For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the "invalid" vlan (0xffff) is substituted when forwarding to the slave. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: Add ref counting to port MAC table for RoCEJack Morgenstein2014-03-122-43/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IB side of RoCE requires the MAC table index of the MAC address used by its QPs. To obtain the real MAC index, the IB side registers the MAC (increasing its ref count, and also returning the real MAC index) during the modify-qp sequence. This protects against the ETH side deleting or modifying that MAC table entry while the QP is active. Note that until the modify-qp command returns success, the MAC and VLAN information only has "candidate" status. If the modify-qp succeeds, the "candidate" info is promoted to the operational MAC/VLAN info for the qp. If the modify fails, the candidate MAC/VLAN is unregistered, and the old qp info is preserved. The patch is a bit complex, because there are multiple qp transitions where the primary-path information may be modified: INIT-to-RTR, and SQD-to-SQD. Similarly for the alternate path information. Therefore the code must handle cases where path information has already been entered into the QP context by previous qp transitions. For the MAC address, the success logic is as follows: 1. If there was no previous MAC, simply move the candidate MAC information to the operational information, and reset the candidate MAC info. 2. If there was a previous MAC, unregister it. Then move the MAC information from candidate to operational, and reset the candidate info (as in 1. above). The MAC address failure logic is the same for all cases: - Unregister the candidate MAC, and reset the candidate MAC info. For Vlan registration, the logic is similar. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: In RoCE allow guests to have multiple GIDSJack Morgenstein2014-03-121-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | The GIDs are statically distributed, as follows: PF: gets 16 GIDs VFs: Remaining GIDS are divided evenly between VFs activated by the driver. If the division is not even, lower-numbered VFs get an extra GID. For an IB interface, the number of gids per guest remains as before: one gid per guest. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: Adjust QP1 multiplexing for RoCE/SRIOVJack Morgenstein2014-03-123-20/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This requires the following modifications: 1. Fix build_mlx4_header to properly fill in the ETH fields 2. Adjust mux and demux QP1 flow to support RoCE. This commit still assumes only one GID per slave for RoCE. The commit enabling multiple GIDs is a subsequent commit, and is done separately because of its complexity. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net,IB/mlx: Bump all Mellanox driver versionsAmir Vadai2014-02-251-2/+2
|/ | | | | | | Bump all Mellanox driver versions. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*-. Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', ↵Roland Dreier2014-02-141-50/+133
|\ \ | | | | | | | | | 'ocrdma', 'qib' and 'usnic' into for-next
| | * IB/mlx4: Build the port IBoE GID table properly under bondingMoni Shoua2014-02-131-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When scanning netdevices we need to check a few more conditions and cases to build the IBoE GID table properly. For example, under bonding we must make sure that when a port is down, the bond IP address isn't programmed as a GID, since doing so will cause failure with IB core flows that selects ports by GID. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/mlx4: Do IBoE GID table resets per-portMoni Shoua2014-02-131-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IBoE code used to reset the GID table did it for all Ethernet ports of the device. Since the whole architecture of generating GIDs and responding to events is port-based, this is inefficient and can lead to wrong content in the GID table. Change the reset flow to be per-port. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/mlx4: Do IBoE locking earlier when initializing the GID tableMoni Shoua2014-02-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating the GID table under IBoE requires read/write from/to shared data structures. These data structures are protected with the device iboe lock. The flows that modify the GID table start from 1. Initializing the GID table 2. NETDEV events 3. INET or INET6 events This patch makes sure that the flow of initializing the GID table is consistent with the other two flows w.r.t on what step the lock is taken. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/mlx4: Move rtnl locking to the right placeMoni Shoua2014-02-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the one hand, the invocation of netdev_master_upper_dev_get() within mlx4_ib_scan_netdevs() must be done with rtnl lock held. On the other hand, it's wrong to call rtnl_lock() from within this function since it's also called by our netdev notifier callback. Therefore move the locking to mlx4_ib_add() so that both cases are covered. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/mlx4: Make sure GID index 0 is always occupiedMoni Shoua2014-02-131-24/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure that for Ethernet ports, the port GID table index 0 is always occupied with a default GID of the relevant IPv6 link-local adderss. This provides better user experience for legacy applications that don't use the RDMA CM and were working on index 0 prior to the IP addressing change. Also, as GIDs are generated from IP addresses of the network devices that are associated with the port, it's basically possible that the GID table will be empty if no IP address was assigned. This doesn't comply with the IB spec section 4.1.1 "GID usage and properties". Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only deviceMatan Barak2014-02-131-1/+6
| |/ | | | | | | | | | | | | | | | | | | | | | | When the device has only Ethernet ports, don't try to allocate range of steerable UD QPs since they aren't needed. This fixes an issue where mlx4 VFs tried to allocate a range of UD steerable QPs, but failed to do so. Fixes: c1c98501121e ("IB/mlx4: Add support for steerable IB UD QPs") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* / IB: Report using RoCE IP based gids in port capsMoni Shoua2014-02-131-1/+1
|/ | | | | | | | | | | | For userspace RoCE UD QPs we need to know the GID format that the kernel uses, e.g when working over older kernels. For that end, add a new port capability IB_PORT_IP_BASED_GIDS and report it when query port is issued. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge branch 'ip-roce' into for-nextRoland Dreier2014-01-226-199/+437
|\ | | | | | | | | Conflicts: drivers/infiniband/hw/mlx4/main.c
| * IB/mlx4: Use IS_ENABLED(CONFIG_IPV6)Roland Dreier2014-01-191-6/+6
| | | | | | | | | | | | ...instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/mlx4: Add dependency INETMatan Barak2014-01-191-1/+1
| | | | | | | | | | | | | | | | Since mlx4_ib supports IP based addressing, a dependency on INET needs to be added, since mlx4_ib registers itself for net device events. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressingMoni Shoua2014-01-184-58/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. Therefore, we need to extract them from the CQE and place them in struct ib_wc (to be used for cases were they were taken from the gid). Also, when modifying a QP or building address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID tableMoni Shoua2014-01-182-143/+334
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the mlx4 driver set IBoE (RoCE) gids to encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change this scheme such that gids encode interface IP addresses (both IP4 and IPv6). This requires learning the IP addresses which are of use by a netdevice associated with the HCA port, formatting them to gids and adding them to the port gid table. Furthermore, events of add and delete address are caught to maintain the gid table accordingly. Associated IP addresses may belong to a master of an Ethernet netdevice on top of that port so this should be considered when building and maintaining the gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/core: Ethernet L2 attributes in verbs/cm structuresMatan Barak2014-01-141-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add the support for Ethernet L2 attributes in the verbs/cm/cma structures. When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority in a similar manner that the IB L2 (and the L4 PKEY) attributes are used. Thus, those attributes were added to the following structures: * ib_ah_attr - added dmac * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority) * ib_wc - added smac, vlan_id * ib_sa_path_rec - added smac, dmac, vlan_id * cm_av - added smac and vlan_id For the path record structure, extra care was taken to avoid the new fields when packing it into wire format, so we don't break the IB CM and SA wire protocol. On the active side, the CM fills. its internal structures from the path provided by the ULP. We add there taking the ETH L2 attributes and placing them into the CM Address Handle (struct cm_av). On the passive side, the CM fills its internal structures from the WC associated with the REQ message. We add there taking the ETH L2 attributes from the WC. When the HW driver provides the required ETH L2 attributes in the WC, they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core code checks for the presence of these flags, and in their absence does address resolution from the ib_init_ah_from_wc() helper function. ib_modify_qp_is_ok is also updated to consider the link layer. Some parameters are mandatory for Ethernet link layer, while they are irrelevant for IB. Vendor drivers are modified to support the new function signature. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| |
| \
*-. \ Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', ↵Roland Dreier2014-01-224-22/+323
|\ \ \ | |_|/ |/| | | | | 'ocrdma', 'qib', 'srp' and 'usnic' into for-next
| | * IB/mlx4: Fix error return codeJulia Lawall2014-01-181-2/+6
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the return variable to an error code as done elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 (\(ret < 0\|ret != 0\)) { ... return ret; } | ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/mlx4: Add support for steerable IB UD QPsMatan Barak2014-01-143-7/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for steerable (NETIF) QP creation. When we create the device, we allocate a range of steerable QPs. Afterward when a QP is created with the NETIF flag, it's allocated from this range. Allocation is managed by bitmap allocator. Internal steering rules for those QPs is automatically generated on their creation. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/mlx4: Add mechanism to support flow steering over IB linksMatan Barak2014-01-141-1/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mlx4 device requires adding IB flow spec to rules that apply over infiniband link layer. This patch adds a mechanism to add such a rule. If higher levels e.g. IP/UDP/TCP flow specs are provided, the device requires us to add an empty wild-carded IB rule. Furthermore, the device requires the QPN to be put in the rule. Add here specific parsing support for IB empty rules and the ability to self-generate missing specs based on existing ones. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IB/mlx4: Enable device-managed steering support for IB ports tooMatan Barak2014-01-142-12/+20
|/ | | | | | | | | | | | Up until now, flow steering wasn't supported when using IB ports. This patch enables support for flow steering if all hardware ports support that, for example the new MLX4_DEV_CAP_FLAG2_DMFS_IPOIB mlx4 device capability. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-11-182-7/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...
| *---. Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', ↵Roland Dreier2013-11-172-7/+10
| |\ \ \ | | | | | | | | | | | | | | | 'nes', 'ocrdma', 'qib' and 'srp' into for-next
| | | | * IB/mlx4: Fix device max capabilities checkEli Cohen2013-11-151-1/+6
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | Move the check on max supported CQEs after the final number of entries is evaluated. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * IB/mlx4: Fix endless loop in resize CQEli Cohen2013-11-151-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/core: Re-enable create_flow/destroy_flow uverbsMatan Barak2013-11-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit reverts commit 7afbddfae993 ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/core: extended command: an improved infrastructure for uverbs commandsYann Droneaud2013-11-171-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit 400dbc96583f, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ | flags | 00 00 | command | | in_words | out_words | +----------------------------------------+ | response | | response | | provider_in_words | provider_out_words | | padding | +----------------------------------------+ | | . <uverbs input> . . (in_words * 8) . | | +----------------------------------------+ | | . <provider input> . . (provider_in_words * 8) . | | +----------------------------------------+ - response, if present: +----------------------------------------+ | | . <uverbs output space> . . (out_words * 8) . | | +----------------------------------------+ | | . <provider output space> . . (provider_out_words * 8) . | | +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
* | net/mlx4_core: Initialize all mailbox buffers to zero before useJack Morgenstein2013-11-071-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To guarantee that all unused fields in all FW commands for both inboxes and outboxes are zeroed out, initialize the mailbox buffer to all zeroes. This is especially important for SRIOV comm-channel virtual commands (such as QUERY_FUNC_CAP), where if new fields are added to support new features, the driver can depend on older kernels passing zeroes in these fields. In addition to zeroing out the mailbox buffer at allocation time, all (now unnecessary) calls to memset by the callers of mlx4_alloc_cmd_mailbox() are removed. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx4: Structures and init/teardown for VF resource quotasJack Morgenstein2013-11-041-4/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is step #1 for implementing SRIOV resource quotas for VFs. Quotas are implemented per resource type for VFs and the PF, to prevent any entity from simply grabbing all the resources for itself and leaving the other entities unable to obtain such resources. Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC, VLAN, and Counters. The quota system works as follows: Each entity (VF or PF) is given a max number of a given resource (its quota), and a guaranteed minimum number for each resource (starvation prevention). For QPs, CQs, SRQs, MPTs and MTTs: 50% of the available quantity for the resource is divided equally among the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool. The other 50% is the "free pool", allocated on a first-come-first-serve basis. For each VF/PF, resources are first allocated from its "guaranteed-minimum" pool. When that pool is exhausted, the driver attempts to allocate from the resource "free-pool". The quota (i.e., max) for the VFs and the PF is: The free-pool amount (50% of the real max) + the guaranteed minimum For MACs: Guarantee 2 MACs per VF/PF per port. As a result, since we have only 128 MACs per port, reduce the allowable number of VFs from 64 to 63. Any remaining MACs are put into a free pool. For VLANs: For the PF, the per-port quota is 128 and guarantee is 64 (to allow the PF to register at least a VLAN per VF in VST mode). For the VFs, the per-port quota is 64 and the guarantee is 0. We assume that VGT VFs are trusted not to abuse the VLAN resource. For Counters: For all functions (PF and VFs), the quota is 128 and the guarantee is 0. In this patch, we define the needed structures, which are added to the resource-tracker struct. In addition, we do initialization for the resource quota, and adjust the query_device response to use quotas rather than resource maxima. As part of the implementation, we introduce a new field in mlx4_dev: quotas. This field holds the resource quotas used to report maxima to the upper layers (ib_core, via query_device). The HCA maxima of these values are passed to the VFs (via QUERY_HCA) so that they may continue to use these in handling QPs, CQs, SRQs and MPTs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IB/core: Temporarily disable create_flow/destroy_flow uverbsYann Droneaud2013-10-211-0/+2
| | | | | | | | | | | | | | | | | | | | The create_flow/destroy_flow uverbs and the associated extensions to the user-kernel verbs ABI are under review and are too experimental to freeze at this point. So userspace is not exposed to experimental features and an uinstable ABI, temporarily disable this for v3.12 (with a Kconfig option behind staging to reenable it if desired). The feature will be enabled after proper cleanup for v3.13. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com [ Add a Kconfig option to reenable these verbs. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/mlx4: Add receive flow steering supportHadar Hen Zion2013-08-282-0/+247
| | | | | | | | | | | | | | | | Implement ib_create_flow() and ib_destroy_flow(). Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides a 64-bit registration ID, which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/mlx4: Use default pkey when creating tunnel QPsJack Morgenstein2013-07-311-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* net: pass info struct via netdevice notifierJiri Pirko2013-05-281-1/+1
| | | | | | | | | | | | | | So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-05-082-0/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - XRC transport fixes - Fix DHCP on IPoIB - mlx4 preparations for flow steering - iSER fixes - miscellaneous other fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits) IB/iser: Add support for iser CM REQ additional info IB/iser: Return error to upper layers on EAGAIN registration failures IB/iser: Move informational messages from error to info level IB/iser: Add module version mlx4_core: Expose a few helpers to fill DMFS HW strucutures mlx4_core: Directly expose fields of DMFS HW rule control segment mlx4_core: Change a few DMFS fields names to match firmare spec mlx4: Match DMFS promiscuous field names to firmware spec mlx4_core: Move DMFS HW structs to common header file IB/mlx4: Set link type for RAW PACKET QPs in the QP context IB/mlx4: Disable VLAN stripping for RAW PACKET QPs mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level RDMA/iwcm: Don't touch cmid after dropping reference IB/qib: Correct qib_verbs_register_sysfs() error handling IB/ipath: Correct ipath_verbs_register_sysfs() error handling RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled SRPT: Fix odd use of WARN_ON() IPoIB: Fix ipoib_hard_header() return value RDMA: Rename random32() to prandom_u32() RDMA/cxgb3: Fix uninitialized variable ...
| *-. Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' ↵Roland Dreier2013-05-083-1/+28
| |\ \ | | | | | | | | | | | | into for-next
| | | * IB/mlx4: Set link type for RAW PACKET QPs in the QP contextEli Cohen2013-04-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the link type is Ethernet, setting the link type in the QP context will enable TCP/IP stateless offloads (checksum, LSO, RSS) for RAW PACKET Ethernet QPs. For IB UD QPs this worked OK since the value assumed by the firmware for IB link layer is zero. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>