summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* IB/hfi1: Add an s_acked_ack_queue pointerKaike Wan2019-02-056-7/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The s_ack_queue is managed by two pointers into the ring: r_head_ack_queue and s_tail_ack_queue. r_head_ack_queue is the index of where the next received request is going to be placed and s_tail_ack_queue is the entry of the request currently being processed. This works perfectly fine for normal Verbs as the requests are processed one at a time and the s_tail_ack_queue is not moved until the request that it points to is fully completed. In this fashion, s_tail_ack_queue constantly chases r_head_ack_queue and the two pointers can easily be used to determine "queue full" and "queue empty" conditions. The detection of these two conditions are imported in determining when an old entry can safely be overwritten with a new received request and the resources associated with the old request be safely released. When pipelined TID RDMA WRITE is introduced into this mix, things look very different. r_head_ack_queue is still the point at which a newly received request will be inserted, s_tail_ack_queue is still the currently processed request. However, with pipelined TID RDMA WRITE requests, s_tail_ack_queue moves to the next request once all TID RDMA WRITE responses for that request have been sent. The rest of the protocol for a particular request is managed by other pointers specific to TID RDMA - r_tid_tail and r_tid_ack - which point to the entries for which the next TID RDMA DATA packets are going to arrive and the request for which the next TID RDMA ACK packets are to be generated, respectively. What this means is that entries in the ring, which are "behind" s_tail_ack_queue (entries which s_tail_ack_queue has gone past) are no longer considered complete. This is where the problem is - a newly received request could potentially overwrite a still active TID RDMA WRITE request. The reason why the TID RDMA pointers trail s_tail_ack_queue is that the normal Verbs send engine uses s_tail_ack_queue as the pointer for the next response. Since TID RDMA WRITE responses are processed by the normal Verbs send engine, s_tail_ack_queue had to be moved to the next entry once all TID RDMA WRITE response packets were sent to get the desired pipelining between requests. Doing otherwise would mean that the normal Verbs send engine would not be able to send the TID RDMA WRITE responses for the next TID RDMA request until the current one is fully completed. This patch introduces the s_acked_ack_queue index to point to the next request to complete on the responder side. For requests other than TID RDMA WRITE, s_acked_ack_queue should always be kept in sync with s_tail_ack_queue. For TID RDMA WRITE request, it may fall behind s_tail_ack_queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Allow for extra entries in QP's s_ack_queueKaike Wan2019-02-052-1/+12
| | | | | | | | | | | | | | | | | | | | | The TID RDMA WRITE protocol differs from normal IB RDMA WRITE in that TID RDMA WRITE requests do require responses, not just ACKs. Therefore, TID RDMA WRITE requests need to be treated as RDMA READ requests from the point of view of the QPs' s_ack_queue. In other words, the QPs' need to allow for TID RDMA WRITE requests to be stored in their s_ack_queue. However, because the user does not know anything about the TID RDMA capability and/or protocols, these extra entries in the queue cannot be advertized to the user. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Build TID RDMA WRITE requestKaike Wan2019-02-055-0/+104
| | | | | | | | | | | | | This patch adds the functions to build TID RDMA WRITE request. The work request opcode, packet opcode, and packet formats for TID RDMA WRITE protocol are also defined in this patch. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge branch 'tid-read' into hfi1-tidDoug Ledford2019-02-0527-173/+4669
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the series for adding TID RDMA read. Kaike put in a lot of effort into making this more consumable for review so special thanks to him. Allocating resources and tracing are separated out followed by patches which build up the read request. Then we have the patches to receive incoming TID RDMA read requests and handle integration with the RC protocol. See the cover letter of the original posting for more of a detailed overview of TID. https://www.spinics.net/lists/linux-rdma/msg66611.html * tid-read: IB/hfi1: Add static trace for TID RDMA READ protocol IB/hfi1: Enable TID RDMA READ protocol IB/hfi1: Add interlock between a TID RDMA request and other requests IB/hfi1: Integrate TID RDMA READ protocol into RC protocol IB/hfi1: Increment the retry timeout value for TID RDMA READ request IB/hfi1: Add functions for restarting TID RDMA READ request IB/hfi1: Add TID RDMA handlers IB/hfi1: Add functions to receive TID RDMA READ response IB/hfi1: Add a function to build TID RDMA READ response IB/hfi1: Add functions to receive TID RDMA READ request IB/hfi1: Set PbcInsertHcrc for TID RDMA packets IB/hfi1: Add functions to build TID RDMA READ request IB/hfi1: Add static trace for flow and TID management functions IB/hfi1: Add the counter n_tidwait IB/hfi1: TID RDMA RcvArray programming and TID allocation IB/hfi1: TID RDMA flow allocation IB/hfi: Move RC functions into a header file Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add static trace for TID RDMA READ protocolKaike Wan2019-02-057-9/+684
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA READ packets in IB header trace; 2. Tracks qpriv->s_flags and iow_flags in qpsleepwakeup trace; 3. Adds a new event to track RC ACK receiving; 4. Adds trace events for various stages of the TID RDMA READ protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Enable TID RDMA READ protocolKaike Wan2019-02-053-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables TID RDMA READ protocol by converting a qualified RDMA READ request into a TID RDMA READ request internally: (1) The TID RDMA capability must be enabled; (2) The request must start on a 4K page boundary and all receiving buffers must start on 4K page boundaries; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Each receiving buffer length must be a multiple of 4K. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add interlock between a TID RDMA request and other requestsKaike Wan2019-02-054-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA READ/WRITE requests are interleaved with TID RDMA READ/WRITE requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Integrate TID RDMA READ protocol into RC protocolKaike Wan2019-02-055-24/+307
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch integrates the TID RDMA READ protocol into the IB RC protocol. This protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA READ request into a TID RDMA READ request to avoid data copying on the requester side. The following codes are added in this patch: - Send the TID RDMA READ request; - Complete the TID RDMA READ send request; - Send the TID RDMA READ response; - Complete the TID RDMA READ request; Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Increment the retry timeout value for TID RDMA READ requestKaike Wan2019-02-053-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RC retry timeout value is based on the estimated time for the response packet to come back. However, for TID RDMA READ request, due to the use of header suppression, the driver is normally not notified for each incoming response packet until the last TID RDMA READ response packet. Consequently, the retry timeout value should be extended to cover the transaction time for the entire length of a segment (default 256K) instead of that for a single packet. This patch addresses the issue by introducing new retry timer functions to account for multiple packets and wrapper functions for backward compatibility. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add functions for restarting TID RDMA READ requestKaike Wan2019-02-053-20/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds functions to retry TID RDMA READ request. Since TID RDMA READ request could be retried from any segment boundary, it requires a number of tracking fields in various structures and those fields should be reset properly. The qp->s_num_rd_atomic field is reset before retry and therefore should be incremented for each new or retried RDMA READ or atomic request. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add TID RDMA handlersKaike Wan2019-02-054-22/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the TID RDMA READ pointers to the receiving opcode handlers. It also adds TID RDMA READ header sizes to header size table. A function to print the RHF EFLAGS errors is created so that it can be shared by both IB and TID RDMA receiving functions. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add functions to receive TID RDMA READ responseKaike Wan2019-02-053-0/+529
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the functions to receive TID RDMA READ response. The TID resource information in the KDETH packet header will direct the hardware to deliver the packet payload to the user buffer automatically and the software will handle the packet header for the last packet of a segment as all other packet headers are suppressed by default. The TID entries will be freed when all packets for a segment have been received. This patch also adds the functions to handle KDETH eflag errors, including flow sequence and generation errors, when a TID RDMA READ response packet is received . The flow sequence error can be recovered by software checking of the flow sequence and will disappear when the hardware flow is programmed with a new generation number. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add a function to build TID RDMA READ responseKaike Wan2019-02-052-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the function to build TID RDMA READ response packet. The previously received TID resource information will be used to build the KDETH packet, which will direct the delivery of packet payload by hardware. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add functions to receive TID RDMA READ requestKaike Wan2019-02-053-0/+342
| | | | | | | | | | | | | | | | | | | | | | This patch adds the functions to receive TID RDMA READ request. The TID resource information will be stored and tracked. Duplicate request will also be handled properly. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Set PbcInsertHcrc for TID RDMA packetsKaike Wan2019-02-051-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | All TID RDMA packets are in KDETH packet format and therefore the PbcInsertHcrc must be set properly before sending the packet to hardware. Otherwise, the packets will be dropped by the receiver. By default, HCRC is not inserted for 9B packets without KDETH, and this patch adds that back for TID RDMA packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add functions to build TID RDMA READ requestKaike Wan2019-02-055-1/+301
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the helper functions to build the TID RDMA READ request on the requester side. The key is to allocate TID resources (TID flow and TID entries) and send the resource information to the responder side along with the read request. Since the TID resources are limited, each TID RDMA READ request has to be split into segments with a default segment size of 256K. A software flow is allocated to track the data transaction for each segment. The work request opcode, packet opcode, and packet formats for TID RDMA READ protocol are also defined in this patch. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add static trace for flow and TID management functionsKaike Wan2019-02-053-0/+269
| | | | | | | | | | | | | | | | | | | | This patch adds the static trace for the flow and TID management functions to help debugging in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add the counter n_tidwaitKaike Wan2019-02-055-0/+17
| | | | | | | | | | | | | | | | | | | | | | This patch adds the counter n_tidwait to count the number of times the TID resource allocator has to wait for TID resources. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: TID RDMA RcvArray programming and TID allocationKaike Wan2019-02-059-18/+1033
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TID entries are used by hfi1 hardware to receive data payload from incoming packets directly into a user buffer and thus avoid data copying by software. This patch implements the functions for TID allocation, freeing, and programming TID RcvArray entries in hardware for kernel clients. TID entries are managed via lists of TID groups similar to PSM. Furthermore, to track TID resource allocation for each request, software flows are also allocated and freed as needed. Since software flows consume large amount of memory for tracking TID allocation and freeing, it is generally desirable to allocate them dynamically in the send queue and only for TID RDMA requests, but pre-allocate them for receive queue because the send queue could have thousands of entries while the receive queue has only a limited number of entries. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: TID RDMA flow allocationKaike Wan2019-02-058-0/+491
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hfi1 hardware flow is a hardware flow-control mechanism for a KDETH data packet that is received on a hfi1 port. It validates the packet by checking both the generation and sequence. Each QP that uses the TID RDMA mechanism will allocate a hardware flow from its receiving context for any incoming KDETH data packets. This patch implements: (1) a function to allocate hardware flow (2) a function to free hardware flow (3) a function to initialize hardware flow generation for a receiving context (4) a wait mechanism if the hardware flow is not available (4) a function to remove the qp from the wait queue for hardware flow when the qp is reset or destroyed. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi: Move RC functions into a header fileKaike Wan2019-02-055-76/+123
|/ | | | | | | | | | | | | This patch moves some RC helper functions into a header file so that they can be called from both RC and TID RDMA functions. In addition, a common function for rewinding a request is created in rdmavt so that it can be shared between qib and hfi1 driver. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge branch 'opfn' into hfi1-tidDoug Ledford2019-01-3119-141/+1114
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This series adds the OPFN feature, which is used as the negotiation protocol by TID RDMA. This adds a totally hidden, in-band negotiation transfer that happens on the consumer's queue pair but without the consumer's knowledge. For that reason, things like completions for OPFN transfers must be filtered out of the completion queue and not sent to the consumer. This feature does not impact any consumer APIs, but does impact the driver/driver wire API. At a high level OPFN enables exchanging parameters between two hosts using IB compare and swap requests to a special virtual address. The request uses a reserved IB work request opcode (see patch 3). * opfn: IB/hfi1: Add static trace for OPFN IB/hfi1: Integrate OPFN into RC transactions IB/hfi1, IB/rdmavt: Allow for extending of QP's s_ack_queue IB/hfi1: OPFN interface IB/hfi1: Add OPFN helper functions for TID RDMA feature IB/hfi1: OPFN support discovery Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add static trace for OPFNKaike Wan2019-01-315-106/+351
| | | | | | | | | | | | | | | | | | | | This patch adds the static trace to the OPFN code and moves tid related static trace code into a new header file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Integrate OPFN into RC transactionsKaike Wan2019-01-316-10/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPFN parameter negotiation allows a pair of connected RC QPs to exchange a set of parameters in succession. This negotiation does not commence till the first ULP request. Because OPFN operations are operations private to the driver, they do not generate user completions or put the QP into error when they run out of retries. This patch integrates the OPFN protocol into the transactions of an RC QP. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1, IB/rdmavt: Allow for extending of QP's s_ack_queueKaike Wan2019-01-313-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OPFN protocol uses the COMPARE_SWAP request to exchange data between the requester and the responder and therefore needs to be stored in the QP's s_ack_queue when the request is received on the responder side. However, because the user does not know anything about the OPFN protocol, this extra entry in the queue cannot be advertised to the user. This patch adds an extra entry in a QP's s_ack_queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: OPFN interfaceKaike Wan2019-01-315-1/+334
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPFN allows a pair of connected RC QPs to exchange a set of parameters in succession. The parameter exchange itself is done using the IB compare and swap request with a special virtual address. The request is triggered using a reserved IB work request opcode. This patch implements the OPFN interface to initialize, start, process, and reset the OPFN request. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Add OPFN helper functions for TID RDMA featureKaike Wan2019-01-317-2/+254
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the OPFN helper functions to initialize, encode, decode, and reset OPFN parameters for the TID RDMA feature. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: OPFN support discoveryMitko Haralanov2019-01-316-15/+77
|/ | | | | | | | | | | | | | | | OPFN (Omni Path Feature Negotiation) support discovery allows a RC QP to announce that it supports OPFN and also discover if OPFN is supported by the peer QP. OPFN parameter negotiation is skipped unless OPFN support is first discovered. OPFN support is announced by claiming what was the reserved bit in dword 1 of OmniPath modified base transport header in requests and responses. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/{hfi1, qib, rvt} Cleanup open coded sge usageMichael J. Ruhl2019-01-303-70/+6
| | | | | | | | | | | Several locations for manipulating sges use an open coded sequence that is covered by helper functions. Use the appropriate helper functions. Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/{hfi1,qib}: Cleanup open coded sge sizingMichael J. Ruhl2019-01-303-30/+6
| | | | | | | | | | | | | | | | | | | Sge sizing is done in several places using an open coded method. This can cause maintenance issues. The open coded method is encapsulated in a helper routine. The helper was introduced with commit: 1198fcea8a78 ("IB/hfi1, rdmavt: Move SGE state helper routines into rdmavt") Update all call sites that have the open coded path with the helper routine. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/ipoib: Make ipoib_intercept_dev_id_attr() staticKamal Heib2019-01-291-1/+1
| | | | | | | | The function ipoib_intercept_dev_id_attr() is only used in ipoib_main.c Fixes: f6350da41dc7 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/vmw_pvrdma: Support upto 64-bit PFNsAdit Ranadive2019-01-293-7/+21
| | | | | | | | | | Update the driver to use the new device capability to report 64-bit UAR PFNs. Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe2019-01-29498-2777/+4603
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Implement DEVX hot unplug for async command FDYishai Hadas2019-01-291-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | Implement DEVX hot unplug for the async command FD. This is done by managing a list of the inflight commands and wait until all launched work is completed as part of devx_hot_unplug_async_cmd_event_file. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Implement the file ops of DEVX async command FDYishai Hadas2019-01-291-2/+55
| | | | | | | | | | | | | | | | | | Implement the file ops of the DEVX async command FD, this enables using the FD for reading the events and manage other options on the FD. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Introduce async DEVX obj query APIYishai Hadas2019-01-293-1/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce async DEVX obj query API to get the command response back to user space once it's ready without blocking when calling the firmware. The event's data includes a header with some meta data then the firmware output command data. The header includes: - The input 'wr_id' to let application recognizing the response. The input FD attribute is used to have the event data ready on. Downstream patches from this series will implement the file ops to let application read it. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FDYishai Hadas2019-01-295-5/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async commands completion. The core layer should allow the driver to set object from type FD in a safe mode, this option was added with a matching comment in place. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * net/mlx5: Add pci AtomicOps requestMichael Guralnik2019-01-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling pci_enable_atomic_ops_to_root enables AtomicOp requests to pci root port. AtomicOp requests will be enabled only if the completer and all intermediate pci bridges support PCI atomic operations. This, together with appropriate settings in the NVCONFIG should enable PCI atomic operations on the device. PCI atomic operations were first introduced in PCI Express Base Specification 2.1. The Supported operations are Swap (Unconditional Swap), CAS (Compare and Swap) and FetchAdd (Fetch and Add). Unlike other atomic operation modes PCI atomic operations gives the user the option to do atomic operations on local memory, without involving verbs api, without it compromising the operation's atomicity. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * net/mlx5: Make mlx5_cmd_exec_cb() a safe APIJason Gunthorpe2019-01-245-48/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | APIs that have deferred callbacks should have some kind of cleanup function that callers can use to fence the callbacks. Otherwise things like module unloading can lead to dangling function pointers, or worse. The IB MR code is the only place that calls this function and had a really poor attempt at creating this fence. Provide a good version in the core code as future patches will add more places that need this fence. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * RDMA/mad: Reduce MAD scope to mlx5_ib onlyLeon Romanovsky2019-01-157-85/+47
| | | | | | | | | | | | | | | | | | | | Management Datagram Interface (MAD) is applicable only when physical port is Infiniband. It makes MAD command logic to be completely unrelated to eth/core parts of mlx5. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
| * Linux 5.0-rc2v5.0-rc2Linus Torvalds2019-01-141-1/+1
| |
| * kernel/sys.c: Clarify that UNAME26 does not generate unique versions anymoreJonathan Neuschäfer2019-01-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UNAME26 is a mechanism to report Linux's version as 2.6.x, for compatibility with old/broken software. Due to the way it is implemented, it would have to be updated after 5.0, to keep the resulting versions unique. Linus Torvalds argued: "Do we actually need this? I'd rather let it bitrot, and just let it return random versions. It will just start again at 2.4.60, won't it? Anybody who uses UNAME26 for a 5.x kernel might as well think it's still 4.x. The user space is so old that it can't possibly care about differences between 4.x and 5.x, can it? The only thing that matters is that it shows "2.4.<largeenough>", which it will do regardless" Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge tag 'armsoc-fixes' of ↵Linus Torvalds2019-01-1426-93/+321
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "A bigger batch than I anticipated this week, for two reasons: - Some fallout on Davinci from board file -> DTB conversion, that also includes a few longer-standing fixes (i.e. not recent regressions). - drivers/reset material that has been in linux-next for a while, but didn't get sent to us until now for a variety of reasons (maintainer out sick, holidays, etc). There's a functional dependency in there such that one platform (Altera's SoCFPGA) won't boot without one of the patches; instead of reverting the patch that got merged, I looked at this set and decided it was small enough that I'll pick it up anyway. If you disagree I can revisit with a smaller set. That being said, there's also a handful of the usual stuff: - Fix for a crash on Armada 7K/8K when the kernel touches PSCI-reserved memory - Fix for PCIe reset on Macchiatobin (Armada 8K development board, what this email is sent from in fact :) - Enable a few new-merged modules for Amlogic in arm64 defconfig - Error path fixes on Integrator - Build fix for Renesas and Qualcomm - Initialization fix for Renesas RZ/G2E .. plus a few more fixlets" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits) ARM: integrator: impd1: use struct_size() in devm_kzalloc() qcom-scm: Include <linux/err.h> header gpio: pl061: handle failed allocations ARM: dts: kirkwood: Fix polarity of GPIO fan lines arm64: dts: marvell: mcbin: fix PCIe reset signal arm64: dts: marvell: armada-ap806: reserve PSCI area ARM: dts: da850-lcdk: Correct the sound card name ARM: dts: da850-lcdk: Correct the audio codec regulators ARM: dts: da850-evm: Correct the sound card name ARM: dts: da850-evm: Correct the audio codec regulators ARM: davinci: omapl138-hawk: fix label names in GPIO lookup entries ARM: davinci: dm644x-evm: fix label names in GPIO lookup entries ARM: davinci: dm355-evm: fix label names in GPIO lookup entries ARM: davinci: da850-evm: fix label names in GPIO lookup entries ARM: davinci: da830-evm: fix label names in GPIO lookup entries arm64: defconfig: enable modules for amlogic s400 sound card reset: uniphier-glue: Add AHCI reset control support in glue layer dt-bindings: reset: uniphier: Add AHCI core reset description reset: uniphier-usb3: Rename to reset-uniphier-glue dt-bindings: reset: uniphier: Replace the expression of USB3 with generic peripherals ...
| | * Merge tag 'reset-for-5.0-rc2' of git://git.pengutronix.de/git/pza/linux into ↵Olof Johansson2019-01-1211-52/+212
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fixes Late reset controller changes for v5.0 This adds missing deassert functionality to the ARC HSDK reset driver, fixes some indentation and grammar issues in the kernel docs, adds a helper to count the number of resets on a device for the non-DT case as well, adds an early reset driver for SoCFPGA and simple reset driver support for Stratix10, and generalizes the uniphier USB3 glue layer reset to also cover AHCI. * tag 'reset-for-5.0-rc2' of git://git.pengutronix.de/git/pza/linux: reset: uniphier-glue: Add AHCI reset control support in glue layer dt-bindings: reset: uniphier: Add AHCI core reset description reset: uniphier-usb3: Rename to reset-uniphier-glue dt-bindings: reset: uniphier: Replace the expression of USB3 with generic peripherals ARM: socfpga: dts: document "altr,stratix10-rst-mgr" binding reset: socfpga: add an early reset driver for SoCFPGA reset: fix null pointer dereference on dev by dev_name reset: Add reset_control_get_count() reset: Improve reset controller kernel docs ARC: HSDK: improve reset driver Signed-off-by: Olof Johansson <olof@lixom.net>
| | | * reset: uniphier-glue: Add AHCI reset control support in glue layerKunihiko Hayashi2019-01-071-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a reset line included in AHCI glue layer to enable AHCI core implemented in UniPhier SoCs. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
| | | * dt-bindings: reset: uniphier: Add AHCI core reset descriptionKunihiko Hayashi2019-01-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add compatible strings for reset control of AHCI core implemented in UniPhier SoCs. The reset control belongs to AHCI glue layer. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
| | | * reset: uniphier-usb3: Rename to reset-uniphier-glueKunihiko Hayashi2019-01-073-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver works for controlling the reset lines including USB3 glue layer, however, this can be applied to other glue layers. Now this patch renames the driver from "reset-uniphier-usb3" to "reset-uniphier-glue". At the same time, this changes CONFIG_RESET_UNIPHIER_USB3 to CONFIG_RESET_UNIPHIER_GLUE. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
| | | * dt-bindings: reset: uniphier: Replace the expression of USB3 with generic ↵Kunihiko Hayashi2019-01-071-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | peripherals Replace the expression of "USB3 glue layer" with the glue layer of the generic peripherals to allow other devices to use it. The reset control belongs to this glue layer. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
| | | * ARM: socfpga: dts: document "altr,stratix10-rst-mgr" bindingDinh Nguyen2019-01-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "altr,stratix10-rst-mgr" is used for the Stratix10 reset manager. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
| | | * reset: socfpga: add an early reset driver for SoCFPGADinh Nguyen2019-01-075-11/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a separate reset driver that uses the reset operations in reset-simple. The reset driver for the SoCFPGA platform needs to register early in order to be able bring online timers that needed early in the kernel bootup. We do not need this early reset driver for Stratix10, because on arm64, Linux does not need the timers are that in reset. Linux is able to run just fine with the internal armv8 timer. Thus, we use a new binding "altr,stratix10-rst-mgr" for the Stratix10 platform. The Stratix10 platform will continue to use the reset-simple platform driver, while the 32-bit platforms(Cyclone5/Arria5/Arria10) will use the early reset driver. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> [p.zabel@pengutronix.de: fixed socfpga of_device_id in reset-simple] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>