summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2007-10-1168-1550/+1930
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits) mlx4_core: Fix section mismatches IPoIB: Allow setting policy to ignore multicast groups IB/mthca: Mark error paths as unlikely() in post_srq_recv functions IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages. IB/ipath: Remove redundant link state checks IB/ipath: Fix IB_EVENT_PORT_ERR event IB/ipath: Better handling of unexpected GPIO interrupts IB/ipath: Maintain active time on all chips IB/ipath: Fix QHT7040 serial number check IB/ipath: Indicate a couple of chip bugs to userspace IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close IB/ipath: Remove duplicate copy of LMC IB/ipath: Add ability to set the LMC via the sysfs debugging interface IB/ipath: Optimize completion queue entry insertion and polling IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED IB/ipath: Generate flush CQE when QP is in error state IB/ipath: Remove redundant code IB/ipath: Future proof eeprom checksum code (contents reading) IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate ...
| * IPoIB: Allow setting policy to ignore multicast groupsOr Gerlitz2007-10-104-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel IB stack allows (through the RDMA CM) userspace applications to join and use multicast groups from the IPoIB MGID range. This allows multicast traffic to be handled directly from userspace QPs, without going through the kernel stack, which gives better performance for some applications. However, to fully interoperate with IP multicast, such userspace applications need to participate in IGMP reports and queries, or else routers may not forward the multicast traffic to the system where the application is running. The simplest way to do this is to share the kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to join multicast groups that are being handled directly in userspace. However, in such cases, the actual multicast traffic should not also be handled by the IPoIB interface, because that would burn resources handling multicast packets that will just be discarded in the kernel. To handle this, this patch adds lookup on the database used for IB multicast group reference counting when IPoIB is joining multicast groups, and if a multicast group is already handled by user space, then the IPoIB kernel driver ignores the group. This is controlled by a per-interface policy flag. When the flag is set, IPoIB will not join and attach its QP to a multicast group which already has an entry in the database; when the flag is cleared, IPoIB will behave as before this change. For each IPoIB interface, the /sys/class/net/$intf/umcast attribute controls the policy flag. The default value is off/0. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/mthca: Mark error paths as unlikely() in post_srq_recv functionsEli Cohen2007-10-101-4/+4
| | | | | | | | | | Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.Dave Olson2007-10-091-3/+4
| | | | | | | | | | | | | | | | Fixed to be the same as everywhere else. copy and then zero the page * in the array first, and then pass the copy to the VM routines. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Remove redundant link state checksRalph Campbell2007-10-091-6/+0
| | | | | | | | | | | | | | | | | | This patch removes some redundant checks when the SMA changes the link state since the same checks are made in the lower level function that sets the state. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Fix IB_EVENT_PORT_ERR eventRalph Campbell2007-10-095-12/+31
| | | | | | | | | | | | | | | | | | | | The link state event calls were being generated when the SM told the SMA to change link states. This works for IB_EVENT_PORT_ACTIVE but not if the link goes down and stays down. The fix is to generate event calls from the interrupt handler when the HW link state changes. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Better handling of unexpected GPIO interruptsMichael Albaugh2007-10-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The General Purpose I/O pins can be configured to cause interrupts. At the end of the interrupt code dealing with all known causes, a message is output if any bits remain un-handled. Since this is a "can't happen" scenario, it should only be triggered by bugs elsewhere. It is harmless, and potentially beneficial, to limit the damage by masking any such unexpected interrupts. This patch adds disabling of interrupts from any pins that should not have been allowed to interrupt, in addition to emitting a message. Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Maintain active time on all chipsMichael Albaugh2007-10-091-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a count of "active hours" maintained in EEPROM, to aid troubleshooting. The definition of "active" is based on traffic exceeding a threshold in any given 5-second polling interval. As originally written, the check was inadvertently bypassed for chips whose counters were 64-bits wide, and only applied to chips with 32-bit wide counters. This patch moves the test for amount of traffic "out" to a more common location, rather than depending on a side-effect of the software emulation of 64-bit counts on chips whose hardware is only 32-bits wide. Signed-off-by: Michael Albaugh <Michael.Albaugh@Qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Fix QHT7040 serial number checkDave Olson2007-10-091-29/+15
| | | | | | | | | | | | | | | | | | | | Remove all the OEM and bringup boards, and complain and fail initialization if one is found. QHT7040 with GPIO rework (128ywwuuuu) is OK, older 112ywwuuuu is no longer supported). The check that had been added was failing both the 112 and 128 series. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Indicate a couple of chip bugs to userspaceArthur Jones2007-10-093-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | A couple of chip bugs in the iba6110 and in the iba6120 are not in more recent chips. This first bug swaps two of the pioavail register locations. In the second bug, the chip can sometimes forget to dma the pio avail register to memory. We indicate the presence of these bugs with runtime flags and we indicate the presence of the flags by bumping the SWMINOR. Signed-off-by: Arthur Jones <arthur.jones@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: iba6110 rev4 no longer needs recv header overrun workaroundArthur Jones2007-10-091-2/+4
| | | | | | | | | | | | | | | | | | | | iba6110 rev3 and earlier had a chip bug where the chip could overrun the recv header queue. rev4 fixed this chip bug so userspace no longer needs to workaround it. Now we only set the workaround flag for older chip versions. Signed-off-by: Arthur Jones <arthur.jones@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_closeArthur Jones2007-10-093-51/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipath_poll() suffered from a couple subtle bugs. Under the right conditions we could leave recv interrupts enabled on an ipath user context on close, thereby taking potentially unwanted interrupts on the next open -- this is fixed by unconditionally turning off recv interrupts on close. Also, we now use counters rather than set/clear bits which allows us to make sure we catch all interrupts at the cost of changing the semantics slightly (it's now give me all events since the last time I called poll() rather than give me all events since I called _this_ poll routine). We also added some memory barriers which may help ensure we get all notifications in a timely manner. Signed-off-by: Arthur Jones <arthur.jones@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Remove duplicate copy of LMCRalph Campbell2007-10-094-26/+29
| | | | | | | | | | | | | | | | The LMC value was being saved by the SMA in two places. This patch cleans it up so only one copy is kept. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Add ability to set the LMC via the sysfs debugging interfaceRalph Campbell2007-10-091-1/+39
| | | | | | | | | | | | | | | | | | This patch adds the ability to set the LMC via a sysfs file as if the SM sent a SubnSet(PortInfo) MAD. It is useful for debugging when no SM is running. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Optimize completion queue entry insertion and pollingRalph Campbell2007-10-092-47/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | The code to add an entry to the completion queue stored the QPN which is needed for the user level verbs view of the completion queue entry but the kernel struct ib_wc contains a pointer to the QP instead of a QPN. When the kernel polled for a completion queue entry, the QPN was lookup up and the QP pointer recovered. This patch stores the CQE differently based on whether the CQ is a kernel CQ or a user CQ thus avoiding the QPN to QP lookup overhead. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHEDRalph Campbell2007-10-093-5/+29
| | | | | | | | | | | | | | | | This patch implements the IB_EVENT_QP_LAST_WQE_REACHED event which is needed by ib_ipoib to destroy the QP when used in connected mode. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Generate flush CQE when QP is in error stateRalph Campbell2007-10-091-2/+20
| | | | | | | | | | | | | | | | | | Follow the IB spec. (C10-96) for post send which states that a flushed completion event should be generated for work requests posted when a QP is in the error state. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Remove redundant codeRalph Campbell2007-10-091-5/+0
| | | | | | | | | | | | | | This patch removes some redundant initialization code. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Future proof eeprom checksum code (contents reading)Dave Olson2007-10-091-2/+8
| | | | | | | | | | | | | | | | | | | | | | In an earlier change, the amount of data read from the flash was mistakenly limited to the size known to the current driver. This causes problems when the length is increased, and written with the new longer version; the checksum would fail because not enough data was read. Always read the full 128 byte length to prevent this. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediateRalph Campbell2007-10-091-10/+11
| | | | | | | | | | | | | | | | This patch fixes a bug in the receive processing for UC RDMA WRITE with immediate which caused the last packet to be dropped. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Correctly describe workaround for TID write chip bugDave Olson2007-10-091-5/+8
| | | | | | | | | | | | | | | | | | This is a comment change, only, correcting the comment to match the implemented workaround, rather than the original workaround, and clarifying why it's needed. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Remove unneeded code for ipathfsRalph Campbell2007-10-091-187/+0
| | | | | | | | | | | | | | | | | | The ipathfs file system is used to export binary data verses ASCII data such as through /sys. This patch removes some unneeded files since the data is available through other /sys files. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Verify host bus bandwidth to chip will not limit performanceDave Olson2007-10-091-0/+86
| | | | | | | | | | | | | | | | | | | | There have been a number of issues where host bandwidth via HT or PCIe to the InfiniPath chip has been limited in some fashion (BIOS, configuration, etc.), resulting in user confusion. This check gives a clear warning that something is wrong and needs to be resolved. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Change UD to queue work requests like RC & UCRalph Campbell2007-10-097-611/+494
| | | | | | | | | | | | | | | | | | | | The code to post UD sends tried to process work requests at the time ib_post_send() is called without using a WQE queue. This was fine as long as HW resources were available for sending a packet. This patch changes UD to be handled more like RC and UC and shares more code. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: Performance optimization for CPU differencesRalph Campbell2007-10-094-35/+53
| | | | | | | | | | | | | | | | | | Different processors have different ordering restrictions for write combining. By taking advantage of this, we can eliminate some write barriers when writing to the send buffers. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipath: iba6110 rev4 GPIO counters supportArthur Jones2007-10-092-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | On iba6110 rev4, support for three more IB counters were added. The LocalLinkIntegrityError counter, the ExcessiveBufferOverrunErrors counter and support for error counting of flow control packets on an invalid VL. These counters trigger GPIO interrupts and the sw keeps track of the counts. Since we also use GPIO interrupts to signal packet reception, we need to turn off the fast interrupts, or we risk losing a GPIO interrupt. Signed-off-by: Arthur Jones <arthur.jones@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ehca: Fix clipping of device limits to INT_MAXRoland Dreier2007-10-091-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doing min_t(int, foo, INT_MAX) doesn't work correctly, because if foo is bigger than INT_MAX, then when treated as a signed integer, it will become negative and hence such an expression is just an elaborate NOP. Fix such cases in ehca to do min_t(unsigned, foo, INT_MAX) instead. This fixes negative reported values for max_cqe, max_pd and max_ah: Before: max_cqe: -64 max_pd: -1 max_ah: -1 After: max_cqe: 2147483647 max_pd: 2147483647 max_ah: 2147483647 Based on a bug report and fix from Anton Blanchard <anton@samba.org>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()Dotan Barak2007-10-091-8/+10
| | | | | | | | | | | | | | | | Make the way QP is being created in ipoib_cm_create_tx_qp() consistent with ipoib_cm_create_rx_qp(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled upRoland Dreier2007-10-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Firmware commands are sent to the HCA by writing multiple words to a command register block. Access to this block of registers is serialized with a mutex. However, on large SGI systems, problems were seen with multiple CPUs issuing FW commands at the same time, because the writes to the register block may be reordered within the system interconnect and reach the HCA in a different order than they were issued (even with the mutex). Fix this by adding an mmiowb() before dropping the mutex. Tested-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retriesSean Hefty2007-10-091-0/+2
| | | | | | | | | | | | | | | | | | | | Automatically queue MRA message to decrease the number of retries sent by the remote side during connection establishment. This also has the effect of increasing the overall connection timeout without using a longer retry time in the case of dropped packets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/cm: Modify interface to send MRAs in response to duplicate messagesSean Hefty2007-10-091-28/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IB CM provides a message received acknowledged (MRA) message that can be sent to indicate that a REQ or REP message has been received, but will require more time to process than the timeout specified by those messages. In many cases, the application may not know how long it will take to respond to a CM message, but the majority of the time, it will usually respond before a retry has been sent. Rather than sending an MRA in response to all messages just to handle the case where a longer timeout is needed, it is more efficient to queue the MRA for sending in case a duplicate message is received. This avoids sending an MRA when it is not needed, but limits the number of times that a REQ or REP will be resent. It also provides for a simpler implementation than generating the MRA based on a timer event. (That is, trying to send the MRA after receiving the first REQ or REP if a response has not been generated, so that it is received at the remote side before a duplicate REQ or REP has been received) Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/mthca: Increase max number of QPs per multicast group to 56Roland Dreier2007-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | Increase the number of QPs allowed per multicast group from 8 to 56. This allows for one QP per core on 16-core systems, which are now quite common, and allows some space for future growth. This is basically the same patch that Jack Morgenstein <jackm@dev.mellanox.co.il> just supplied for mlx4. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/mlx4: Implement FMRsJack Morgenstein2007-10-093-0/+114
| | | | | | | | | | | | | | | | Implement FMRs for mlx4. This is an adaptation of code from mthca. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * mlx4_core: Write MTTs from CPU instead with of WRITE_MTT FW commandJack Morgenstein2007-10-091-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | Write MTT entries directly to ICM from the driver (eliminating use of WRITE_MTT command). This reduces the number of FW commands needed to register an MR by at least a factor of 2 and speeds up memory registration significantly. This code will also be used to implement FMRs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/uverbs: Make ib_uverbs_release_event_file() staticRoland Dreier2007-10-092-9/+8
| | | | | | | | | | | | | | | | ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it static to that file. Also move the definition before the first use, so a forward declaration is not needed. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/umad: Fix bit ordering and 32-on-64 problems on big endian systemsRoland Dreier2007-10-091-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The declaration of struct ib_user_mad_reg_req.method_mask[] exported to userspace was an array of __u32, but the kernel internally treated it as a bitmap made up of longs. This makes a difference for 64-bit big-endian kernels, where numbering the bits in an array of__u32 gives: |31.....0|63....31|95....64|127...96| while numbering the bits in an array of longs gives: |63..............0|127............64| 64-bit userspace can handle this by just treating method_mask[] as an array of longs, but 32-bit userspace is really stuck: the meaning of the bits in method_mask[] depends on whether the kernel is 32-bit or 64-bit, and there's no sane way for userspace to know that. Fix this by updating <rdma/ib_user_mad.h> to make it clear that method_mask[] is an array of longs, and using a compat_ioctl method to convert to an array of 64-bit longs to handle the 32-on-64 problem. This fixes the interface description to match existing behavior (so working binaries continue to work) in almost all situations, and gives consistent semantics in the case of 32-bit userspace that can run on either a 32-bit or 64-bit kernel, so that the same binary can work for both 32-on-32 and 32-on-64 systems. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/umad: Add P_Key index supportRoland Dreier2007-10-091-29/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for setting the P_Key index of sent MADs and getting the P_Key index of received MADs. This requires a change to the layout of the ABI structure struct ib_user_mad_hdr, so to avoid breaking compatibility, we default to the old (unchanged) ABI and add a new ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware of the new ABI to opt into using it. We plan on switching to the new ABI by default in a year or so, and this patch adds a warning that is printed when an application uses the old ABI, to push people towards converting to the new ABI. Signed-off-by: Roland Dreier <rolandd@cisco.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Hal Rosenstock <hal@xsigo.com>
| * IB/ehca: Return srq_attr->max_sge in ehca_query_srq()Joachim Fenkes2007-10-091-0/+1
| | | | | | | | | | | | | | Totally forgot this. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ehca: Adjust 64-bit alignment of create QP response for userspaceHoang-Nam Nguyen2007-10-091-0/+1
| | | | | | | | | | Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ehca: Fix mem leak of firmware ctrlblock in ehca_create_srq()Hoang-Nam Nguyen2007-10-091-0/+2
| | | | | | | | | | Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/mlx4: Display misc device information under /sys/class/infiniband/Jack Morgenstein2007-10-091-0/+45
| | | | | | | | | | | | | | | | | | | | | | display the following device information under /sys/class/infiniband/mlx4_X: board_id, fw_ver, hw_rev, hca_type. This patch makes this information available to userspace utilities such as ibstat and ibv_devinfo. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/core: Fix handling of multicast response failuresRalph Campbell2007-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was looking at the code for multicast.c and noticed that ib_sa_join_multicast() calls queue_join() which puts the request at the front of the group->pending_list. If this is a second request, it seems like it would interfere with process_join_error() since group->last_join won't point to the member at the head of the pending_list. The sequence would thus be: 1. ib_sa_join_multicast() puts member1 on head of pending_list and starts work thread 2. mcast_work_handler() calls send_join() which sets group->last_join to member1 3. ib_sa_join_multicast() puts member2 on head of pending_list 4. join operation for member1 receives failures response from SA. 5. join_handler() is called with error status 6. process_join_error() fails to process member1 since it doesn't match the first entry in the group->pending_list. The impact is that the failed join request is tossed. The second request is processed, and after it completes, the original request ends up being retried. This change also results in join requests being processed in FIFO order. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ehca: Misc cpuinit section annotations and #ifdef cleanupsSatyam Sharma2007-10-091-18/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Replace {un}register_cpu_notifier with {un}register_hotcpu_notifier thereby losing a couple of #ifdef HOTPLUG_CPU pairs. * Move comp_pool_callback_nb declaration to below that of callback function so that initialization of .notifier_call and .priority can occur at build time itself and not runtime. * Mark the notifier_block (and callback function, and another static function used by it) as __cpuinit{data} for the sake of consistency and remove enclosing #ifdef. (This may increase size for modular build of this module, however, because these are no longer dropped unconditionally now.) Signed-off-by: Satyam Sharma <satyam@infradead.org> Acked-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/iser: Remove unnecessary includesRoland Dreier2007-10-093-5/+0
| | | | | | | | | | | | | | | | <asm/scatterlist.h> is not needed because everyplace it appears, <linux/scatterlist.h> also appears. <asm/io.h> is not needed because nothing seems to be using device IO anyway. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/cma: Use neigh_event_send() to start neighbour discoverySteve Wise2007-10-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Calling arp_send() to initiate neighbour discovery (ND) doesn't do the full ND protocol. Namely, it doesn't handle retransmitting the arp request if it is dropped. The function neigh_event_send() does all this. Without doing full ND, RDMA address resolution fails in the presence of dropped ARP broadcast packets. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ehca: Only use MR large pages for hugetlb regionsJoachim Fenkes2007-10-091-10/+15
| | | | | | | | | | | | | | | | ...because, on virtualized hardware like System p, we can't be sure that the physical pages behind them are contiguous otherwise. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/umem: Add hugetlb flag to struct ib_umemJoachim Fenkes2007-10-091-1/+19
| | | | | | | | | | | | | | | | | | During ib_umem_get(), determine whether all pages from the memory region are hugetlb pages and report this in the "hugetlb" member. Low-level drivers can use this information if they need it. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/srp: Add QoS support through service IDSean Hefty2007-10-091-0/+2
| | | | | | | | | | | | | | | | Provide the target service ID when performing a path record query to support optional QoS capability. QoS requires support from the SA. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/ucma: Allow user space to set service typeSean Hefty2007-10-091-1/+73
| | | | | | | | | | | | | | | | Export the ability to set the type of service to user space. Model the interface after setsockopt. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/cma: Add ability to specify type of serviceSean Hefty2007-10-091-10/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide support to specify a type of service for a communication identifier. A new function call is used when dealing with IPv4 addresses. For IPv6 addresses, the ToS is specified through the traffic class field in the sockaddr_in6 structure. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ The comments Eitan Zahavi and myself have made over the v1 post at <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html> were fully addressed. ] Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>