summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
Commit message (Collapse)AuthorAgeFilesLines
* iw_cxgb4: Atomically flush per QP HW CQEsBharat Potnuri2018-05-091-1/+1
| | | | | | | | | | | | | | | | commit 2df19e19ae90d94fd8724083f161f368a2797537 upstream. When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire corresponding QP lock before moving the CQEs into its corresponding SW queue and accessing the SQ contents for completing a WR. Ignore CQEs if corresponding QP is already flushed. Cc: stable@vger.kernel.org Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* RDMA/cxgb4: release hw resources on device removalRaju Rangoju2018-05-091-0/+4
| | | | | | | | | | | | | | | | | | | | commit 26bff1bd74a4f7417509a83295614e9dab995b2a upstream. The c4iw_rdev_close() logic was not releasing all the hw resources (PBL and RQT memory) during the device removal event (driver unload / system reboot). This can cause panic in gen_pool_destroy(). The module remove function will wait for all the hw resources to be released during the device removal event. Fixes c12a67fe(iw_cxgb4: free EQ queue memory on last deref) Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iw_cxgb4: reflect the original WR opcode in drain cqesSteve Wise2018-01-171-2/+0
| | | | | | | | | | | | | | | | | | commit 96a236ed286776554fbd227c6d2876fd3b5dc65d upstream. The flush/drain logic was not retaining the original wr opcode in its completion. This can cause problems if the application uses the completion opcode to make decisions. Use bit 10 of the CQE header word to indicate the CQE is a special drain completion, and save the original WR opcode in the cqe header opcode field. Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cxgb4: Convert PDBG to pr_debugJoe Perches2017-04-201-24/+20
| | | | | | | | | | | | | | Use a more typical logging style. Miscellanea: o Obsolete the c4iw_debug module parameter o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* cxgb4: Use more common logging styleJoe Perches2017-04-201-0/+6
| | | | | | | | | | | | | Convert printks to pr_<level> Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* sched/headers: Prepare to remove the <linux/mm_types.h> dependency from ↵Ingo Molnar2017-03-021-1/+1
| | | | | | | | | | | | | | | <linux/sched.h> Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2017-02-201-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Implement wraparound-safe refcount_t and kref_t types based on generic atomic primitives (Peter Zijlstra) - Improve and fix the ww_mutex code (Nicolai Hähnle) - Add self-tests to the ww_mutex code (Chris Wilson) - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr Bueso) - Micro-optimize the current-task logic all around the core kernel (Davidlohr Bueso) - Tidy up after recent optimizations: remove stale code and APIs, clean up the code (Waiman Long) - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) fork: Fix task_struct alignment locking/spinlock/debug: Remove spinlock lockup detection code lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS lkdtm: Convert to refcount_t testing kref: Implement 'struct kref' using refcount_t refcount_t: Introduce a special purpose refcount type sched/wake_q: Clarify queue reinit comment sched/wait, rcuwait: Fix typo in comment locking/mutex: Fix lockdep_assert_held() fail locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock() locking/rwsem: Reinit wake_q after use locking/rwsem: Remove unnecessary atomic_long_t casts jump_labels: Move header guard #endif down where it belongs locking/atomic, kref: Implement kref_put_lock() locking/ww_mutex: Turn off __must_check for now locking/atomic, kref: Avoid more abuse locking/atomic, kref: Use kref_get_unless_zero() more locking/atomic, kref: Kill kref_sub() locking/atomic, kref: Add kref_read() locking/atomic, kref: Add KREF_INIT() ...
| * locking/atomic, kref: Add kref_read()Peter Zijlstra2017-01-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | iw_cxgb4: free EQ queue memory on last derefSteve Wise2017-01-101-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") introduced a bug where the RDMA QP EQ queue memory (and QIDs) are possibly freed before the underlying connection has been fully shutdown. The result being a possible DMA read issued by HW after the queue memory has been unmapped and freed. This results in possible WR corruption in the worst case, system bus errors if an IOMMU is in use, and SGE "bad WR" errors reported in the very least. The fix is to defer unmap/free of queue memory and QID resources until the QP struct has been fully dereferenced. To do this, the c4iw_ucontext must also be kept around until the last QP that references it is fully freed. In addition, since the last QP deref can happen in an IRQ disabled context, we need a new workqueue thread to do the final unmap/free of the EQ queue memory. Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: refactor sq/rq drain logicSteve Wise2017-01-101-4/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the addition of the IB/Core drain API, iw_cxgb4 supported drain by watching the CQs when the QP was out of RTS and signalling "drain complete" when the last CQE is polled. This, however, doesn't fully support the drain semantics. Namely, the drain logic is supposed to signal "drain complete" only when the application has _processed_ the last CQE, not just removed them from the CQ. Thus a small timing hole exists that can cause touch after free type bugs in applications using the drain API (nvmf, iSER, for example). So iw_cxgb4 needs a better solution. The iWARP Verbs spec mandates that "_at some point_ after the QP is moved to ERROR", the iWARP driver MUST synchronously fail post_send and post_recv calls. iw_cxgb4 was currently not allowing any posts once the QP is in ERROR. This was in part due to the fact that the HW queues for the QP in ERROR state are disabled at this point, so there wasn't much else to do but fail the post operation synchronously. This restriction is what drove the first drain implementation in iw_cxgb4 that has the above mentioned flaw. This patch changes iw_cxgb4 to allow post_send and post_recv WRs after the QP is moved to ERROR state for kernel mode users, thus still adhering to the Verbs spec for user mode users, but allowing flush WRs for kernel users. Since the HW queues are disabled, we just synthesize a CQE for this post, queue it to the SW CQ, and then call the CQ event handler. This enables proper drain operations for the various storage applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: invalidate the mr when posting a read_w_inv wrSteve Wise2016-11-161-1/+1
| | | | | | | | | Also, rearrange things a bit to have a common c4iw_invalidate_mr() function used everywhere that we need to invalidate. Fixes: 49b53a93a64a ("iw_cxgb4: add fast-path for small REG_MR operations") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge tag 'for-linus' of ↵Linus Torvalds2016-10-091-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
| * IB/cxgb4: Move user vendor structuresLeon Romanovsky2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves cxgb4 vendor's specific structures to common UAPI folder which will be visible to all consumers. These structures are used by user-space library driver (libcxgb4) and currently manually copied to that library. This move will allow cross-compile against these files and simplify introduction of vendor specific data. Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-09-231-0/+1
|\ \
| * \ Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into ↵Jens Axboe2016-09-131-0/+1
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | for-linus Sagi writes: Here we have: - Kconfig dependencies fix from Arnd - nvme-rdma device removal fixes from Steve - possible bad deref fix from Colin
| | * iw_cxgb4: block module unload until all ep resources are releasedSteve Wise2016-09-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise an endpoint can be still closing down causing a touch after free crash. Also WARN_ON if ulps have failed to destroy various resources during device removal. Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | | libcxgb,iw_cxgb4,cxgbit: add cxgb_compute_wscale()Varun Prakash2016-09-151-9/+0
|/ / | | | | | | | | | | | | | | | | Add cxgb_compute_wscale() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | iw_cxgb4: don't block in destroy_qp awaiting the last derefSteve Wise2016-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | Blocking in c4iw_destroy_qp() causes a deadlock when apps destroy a qp or disconnect a cm_id from their cm event handler function. There is no need to block here anyway, so just replace the refcnt atomic with a kref object and free the memory on the last put. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | RDMA/iw_cxgb4: Low resource fixes for Completion queueHariprasad S2016-06-231-0/+1
| | | | | | | | | | | | | | | | | | | | Pre-allocate buffers to deallocate completion queue, so that completion queue is deallocated during RDMA termination when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | RDMA/iw_cxgb4: Low resource fixes for Memory registrationHariprasad S2016-06-231-0/+2
| | | | | | | | | | | | | | | | | | Pre-allocate buffers for deregistering memory region and memory window during RDMA connection close, when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | RDMA/iw_cxgb4: Low resource fixes for connection managerHariprasad S2016-06-231-0/+19
|/ | | | | | | | | | Pre-allocate buffers for sending various control messages to close connection, abort connection, etc so that we gracefully handle connections when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
*-. Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into ↵Doug Ledford2016-05-131-1/+8
|\ \ | | | | | | | | | k.o/for-4.7
| * | RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiationHariprasad S2016-05-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->Stop the ep timer after MPA negotiation so that the arp failures during send_mpa_reply/reject will be handled by process_timeout() after the ep timer expires. ->Added case MPA_REP_SENT in process_timeout(). ->For MPA reject, c4iw_ep_disconnect tries to start an already started timer, which leads to warning message "timer already started". -> In case of mpa reject stop the timer and call send_mpa_reject(). -> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer only for send_mpa_reply(), which is set in c4iw_accept_cr(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Add few history bits for epHariprasad S2016-05-131-1/+7
| |/ | | | | | | | | | | | | | | | | | | | | - add EP_DISC_FAIL history bit - add QP_REFED/DEREFED history bits - Add functions to ref/deref the cm_id and add history bit for the same - add CLOSE_CON_RPL history Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/core: Enhance ib_map_mr_sg()Bart Van Assche2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/core: Add passing an offset into the SG to ib_map_mr_sgChristoph Hellwig2016-05-131-3/+2
|/ | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
*-. Merge branches 'nes', 'cxgb4' and 'iwpm' into k.o/for-4.6Doug Ledford2016-03-161-42/+0
|\ \
| | * iw_cxgb4: remove port mapper related codeSteve Wise2016-03-161-42/+0
| |/ | | | | | | | | | | | | | | Now that most of the port mapper code been moved to iwcm, we can remove it from iw_cxgb4. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| |
| \
*-. \ Merge branches 'mlx4', 'mlx5' and 'ocrdma' into k.o/for-4.6Doug Ledford2016-03-161-1/+2
|\ \ \ | | |/ | |/|
| | * IB/core: Add vendor's specific data to alloc mwMatan Barak2016-03-011-1/+2
| |/ | | | | | | | | | | | | | | | | | | Passing udata to the vendor's driver in order to pass data from the user-space driver to the kernel-space driver. This data will be used in downstream patches. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* / iw_cxgb4: add queue drain functionsSteve Wise2016-02-291-0/+4
|/ | | | | | | | | | | | | Add completion objects, named sq_drained and rq_drained, to the c4iw_qp struct. The queue-specific completion object is signaled when the last CQE is drained from the CQ for that queue. Add c4iw_drain_sq() to block until qp->rq_drained is completed. Add c4iw_drain_rq() to block until qp->sq_drained is completed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB: remove in-kernel support for memory windowsChristoph Hellwig2015-12-231-2/+0
| | | | | | | | | | | | Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB: remove support for phys MRsChristoph Hellwig2015-12-231-11/+0
| | | | | | | | | | | | We have stopped using phys MRs in the kernel a while ago, so let's remove all the cruft used to implement them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: Remove old FRWR APISagi Grimberg2015-10-281-18/+0
| | | | | | | | No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: Support the new memory registration APISagi Grimberg2015-10-281-0/+7
| | | | | | | | | | | | | | | | | | | Support the new memory registration API by allocating a private page list array in c4iw_mr and populate it when c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR by duplicating build_fastreg just take the needed information from different places: - page_size, iova, length (ib_mr) - page array (c4iw_mr) - key, access flags (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: Support ib_alloc_mr verbSagi Grimberg2015-08-301-1/+3
| | | | | Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Change provider's API of create_cq to be extendibleMatan Barak2015-06-121-4/+4
| | | | | | | | | | | | | | | Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: support for bar2 qid densities exceeding the page sizeHariprasad S2015-06-111-2/+3
| | | | | | | | | | | | | | | Handle this configuration: Queues Per Page * SGE BAR2 Queue Register Area Size > Page Size Use cxgb4_bar2_sge_qregs() to obtain the proper location within the bar2 region for a given qid. Rework the DB and GTS write functions to make use of this bar2 info. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: Remove negative advice dmesg warningsHariprasad S2015-05-051-0/+7
| | | | | | | | | Remove these log messages in favor of per-endpoint counters as well as device-global counters that can be inspected via debugfs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/cxgb4: Don't hang threads forever waiting on WR repliesHariprasad S2015-02-181-15/+14
| | | | | | | | | | | | | In c4iw_wait_for_reply(), if a FW6_MSG WR reply is not received after C4IW_WR_TO seconds, fail the WR operation and mark the device as fatally dead. Further, if the device is marked fatally dead, then fail the WR wait immediately. Also change the timeout to 60 seconds. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-07-221-1/+1
|\ | | | | | | | | | | | | | | | | Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * RDMA/cxgb4: Call iwpm_init() only onceSteve Wise2014-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | We need to only register with the iwpm core once. Currently it is being done for every adapter, which causes a failure for each adapter but the first, making multiple adapters unusable. Fixes: 9eccfe109b27 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | iw_cxgb4: Don't limit TPTE count to 32KBHariprasad Shenai2014-07-211-1/+1
| | | | | | | | | | | | | | | | Use the size advertised by FW Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cxgb4/iw_cxgb4: work request logging featureHariprasad Shenai2014-07-151-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit enhances the iwarp driver to optionally keep a log of rdma work request timining data for kernel mode QPs. If iw_cxgb4 module option c4iw_wr_log is set to non-zero, each work request is tracked and timing data maintained in a rolling log that is 4096 entries deep by default. Module option c4iw_wr_log_size_order allows specifing a log2 size to use instead of the default order of 12 (4096 entries). Both module options are read-only and must be passed in at module load time to set them. IE: modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10 The timing data is viewable via the iw_cxgb4 debugfs file "wr_log". Writing anything to this file will clear all the timing data. Data tracked includes: - The host time when the work request was posted, just before ringing the doorbell. The host time when the completion was polled by the application. This is also the time the log entry is created. The delta of these two times is the amount of time took processing the work request. - The qid of the EQ used to post the work request. - The work request opcode. - The cqe wr_id field. For sq completions requests this is the swsqe index. For recv completions this is the MSN of the ingress SEND. This value can be used to match log entries from this log with firmware flowc event entries. - The sge timestamp value just before ringing the doorbell when posting, the sge timestamp value just after polling the completion, and CQE.timestamp field from the completion itself. With these three timestamps we can track the latency from post to poll, and the amount of time the completion resided in the CQ before being reaped by the application. With debug firmware, the sge timestamp is also logged by firmware in its flowc history so that we can compute the latency from posting the work request until the firmware sees it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cxgb4/iw_cxgb4: use firmware ord/ird resource limitsHariprasad Shenai2014-07-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Advertise a larger max read queue depth for qps, and gather the resource limits from fw and use them to avoid exhaustinq all the resources. Design: cxgb4: Obtain the max_ordird_qp and max_ird_adapter device params from FW at init time and pass them up to the ULDs when they attach. If these parameters are not available, due to older firmware, then hard-code the values based on the known values for older firmware. iw_cxgb4: Fix the c4iw_query_device() to report these correct values based on adapter parameters. ibv_query_device() will always return: max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp) max_res_rd_atom = max_ird_adapter Bump up the per qp max module option to 32, allowing it to be increased by the user up to the device max of max_ordird_qp. 32 seems to be sufficient to maximize throughput for streaming read benchmarks. Fail connection setup if the negotiated IRD exhausts the available adapter ird resources. So the driver will track the amount of ird resource in use and not send an RI_WR/INIT to FW that would reduce the available ird resources below zero. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | iw_cxgb4: Detect Ing. Padding Boundary at run-timeHariprasad Shenai2014-07-151-0/+12
|/ | | | | | | | | Updates iw_cxgb4 to determine the Ingress Padding Boundary from cxgb4_lld_info, and take subsequent actions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-06-121-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
| * iw_cxgb4: don't truncate the recv window sizeHariprasad Shenai2014-06-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | Fixed a bug that shows up with recv window sizes that exceed the size of the RCV_BUFSIZ field in opt0 (>= 1024K). If the recv window exceeds this, then we specify the max possible in opt0, add add the rest in via a RX_DATA_ACK credits. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | RDMA/cxgb4: Add support for iWARP Port Mapper user space serviceSteve Wise2014-06-101-0/+44
|/ | | | | | | | | | Based on original work by Vipul Pandya. Signed-off-by: Steve Wise <swise@opengridcomputing.com> [ Fix htons -> ntohs to make sparse happy. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Fix endpoint mutex deadlocksSteve Wise2014-04-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cases where the cm calls c4iw_modify_rc_qp() with the endpoint mutex held, they must be called with internal == 1. rx_data() and process_mpa_reply() are not doing this. This causes a deadlock because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some !internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex. The design was intended to only do the disconnect for !internal calls. Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with internal == 1, and then disconnect only after releasing the mutex. Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with internal == 1 and set a new attr flag telling it to send a TERMINATE message. Previously this was implied by !internal. Change process_mpa_reply() to return whether the caller should disconnect after releasing the endpoint mutex. Now rx_data() will do the disconnect in the cases where process_mpa_reply() wants to disconnect after the TERMINATE is sent. Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal, and to send a TERMINATE message if attrs->send_term is 1. Change abort_connection() to not aquire the ep mutex for setting the state, and make all calls to abort_connection() do so with the mutex held. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>